CN113205526B - Distribution line accurate semantic segmentation method based on multi-source information fusion - Google Patents
Distribution line accurate semantic segmentation method based on multi-source information fusion Download PDFInfo
- Publication number
- CN113205526B CN113205526B CN202110355431.XA CN202110355431A CN113205526B CN 113205526 B CN113205526 B CN 113205526B CN 202110355431 A CN202110355431 A CN 202110355431A CN 113205526 B CN113205526 B CN 113205526B
- Authority
- CN
- China
- Prior art keywords
- semantic segmentation
- mask
- rcnn
- distribution line
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 104
- 238000009826 distribution Methods 0.000 title claims abstract description 54
- 238000000034 method Methods 0.000 title claims abstract description 33
- 230000004927 fusion Effects 0.000 title claims abstract description 23
- 238000012549 training Methods 0.000 claims abstract description 28
- 238000012360 testing method Methods 0.000 claims abstract description 20
- 238000012545 processing Methods 0.000 claims abstract description 7
- 238000007781 pre-processing Methods 0.000 claims abstract description 4
- 238000011176 pooling Methods 0.000 claims description 8
- 238000001514 detection method Methods 0.000 claims description 7
- 239000013598 vector Substances 0.000 claims description 6
- 238000013507 mapping Methods 0.000 claims description 5
- 238000010187 selection method Methods 0.000 claims description 2
- 230000005611 electricity Effects 0.000 claims 1
- 230000006870 function Effects 0.000 description 19
- 238000013135 deep learning Methods 0.000 description 6
- 238000000605 extraction Methods 0.000 description 6
- 238000013527 convolutional neural network Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 238000003709 image segmentation Methods 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000007613 environmental effect Effects 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000010845 search algorithm Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/30—Determination of transform parameters for the alignment of images, i.e. image registration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/28—Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10024—Color image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20076—Probabilistic image processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Probability & Statistics with Applications (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to the technical field of computer vision and image processing, and discloses a distribution line accurate semantic segmentation method based on multi-source information fusion, which is used for acquiring a 3D (three-dimensional) point cloud picture of a laser radar and an RGB (red, green and blue) image of a high-precision vision camera and fusing the three images; improving a Mask-RCNN network, and constructing an improved Mask-RCNN semantic segmentation model; and improving the loss function; acquiring distribution line pictures on site to prepare a data set, and dividing the data set into a test set and a training set; preprocessing a data set, and training and testing the improved Mask-RCNN semantic segmentation model by utilizing a test set and a training set; and inputting the fused data serving as network input into an improved Mask-RCNN semantic segmentation model for semantic segmentation. Compared with the prior art, the method is based on the improved Mask-RCNN semantic segmentation model, and accurate high-speed semantic segmentation is achieved on the distribution line.
Description
Technical Field
The invention relates to the technical field of computer vision and image processing, in particular to a distribution line accurate semantic segmentation method based on multi-source information fusion.
Background
With the vigorous development of economy in China, the power utilization can not be left everywhere in social production and daily life of people at present, so that higher requirements are put forward for power supply departments, and not only sufficient power supply quantity but also higher power supply reliability are ensured. The technology of "performing live-line work on a distribution line" is developed for satisfying related operations such as maintenance, inspection, and testing on a power supply device and a power supply line under a condition of continuous power supply.
However, the operator has a high risk when performing live-line work on the distribution line, so the accurate safety warning system is very important when performing live-line work. The accurate semantic segmentation of the distribution line is one of the most core technologies in the live-wire work safety early warning, and the reliability of the safety early warning is directly determined by the semantic segmentation precision. The existing distribution line erection environment is complex, the facility arrangement is dense, and therefore distribution line information acquired by a single sensor is easily influenced by surrounding complex environmental factors, acquired data information is inaccurate, and the reliability of safety early warning is reduced. And the problems of low precision, low early warning reliability and the like exist in most of the existing distribution line semantic segmentation.
Image semantic segmentation refers to the segmentation of pixels expressing different semantic categories from the perspective of the pixels, and is one of the core technologies of image processing tasks. With the introduction of the artificial intelligence era, image semantic segmentation gradually becomes a research hotspot in advanced science and technology fields such as unmanned driving, indoor navigation and the like.
In the field of image semantic segmentation, machine learning technology represented by deep learning continuously obtains better results, and gradually replaces the traditional segmentation method. Compared with the traditional segmentation method, the segmentation method based on deep learning can independently learn and extract the characteristics of the image by building a deep learning network, so that end-to-end classification learning is carried out, and the speed and the precision of semantic segmentation can be effectively improved.
In 2015, a Full Convolutional Network (FCN) was proposed for the first time, which is to use a deep learning technique in the field of semantic segmentation for the first time, convert all full connection layers used for a picture classification task in a Convolutional neural network into Convolutional layers, and introduce a deconvolution layer and a hopping structure, thereby ensuring the stability and robustness of the network. With the advent of FCN, deep learning formally enters the field of image semantic segmentation.
As a most commonly used model in the field of medical image segmentation, U-Net is well-known as its most typical U-shaped symmetric structure, and both sides of the U-shaped symmetric structure are respectively subjected to down-sampling operation and up-sampling operation. Context information of the image can be obtained through downsampling, and accurate positioning of the boundary of semantic segmentation can be achieved through upsampling, so that the model can have high segmentation capability under the condition of training less data. In the same year, a semantic segmentation model named SegNet is developed, which adopts an encoder-decoder structure to perform semantic segmentation on an image and performs upsampling by using an index of maxpool, thereby saving the memory of a network model.
Semantic segmentation models of deep lab series of the Google team are also advancing in the field of semantic segmentation. The Deep Convolutional Neural Network (DCNN) and the fully-connected Conditional Random Field (CRF) form the deep convolutional 1, so that the problem of inaccurate positioning of the deep convolutional neural network can be effectively solved. The DeepLabv2 semantic segmentation model is innovated on the basis of DeepLabv1, and a cavity space convolution pooling pyramid (ASPP) module is fused on the model structure. The module can effectively improve the network segmentation capability. The improved DeepLabv3 version appears in the same year, and the core idea is to improve the ASPP structure and introduce a batch normalization layer, so that the segmentation precision of the network is improved. And the latest DeepLabv3+ semantic segmentation model adds a coder and a decoder and an Xception backbone network on the basis of DeepLabv3, thereby improving the speed and the precision of network semantic segmentation.
In addition, the PSPNet semantic segmentation model proposed by Zhao et al introduces a pyramid pooling module, so that the semantic segmentation network can improve the capability of acquiring the global context information of the image. And a Mask-RCNN semantic segmentation model proposed by He et al, which mainly expands the fast-RCNN model, adds a network branch for segmenting tasks on the basis of the model, adopts ROIAlign to replace RoIPooling in the fast-RCNN, and combines a residual error network and a Feature Pyramid Network (FPN) for feature extraction of an image, so that the network realizes high-quality segmentation of the image while detecting a target.
A large number of experiments show that the image semantic segmentation algorithm based on deep learning has better performance in the aspect of processing image semantic segmentation. However, the hot-line work environment is complex, the requirement on the segmentation precision is high, and the traditional semantic segmentation model cannot meet the work requirement.
Disclosure of Invention
The purpose of the invention is as follows: aiming at the problems in the prior art, the invention provides a distribution line accurate semantic segmentation method based on multi-source information fusion.
The technical scheme is as follows: the invention provides a distribution line accurate semantic segmentation method based on multi-source information fusion, wherein a laser radar and a high-precision vision camera are installed on one side of a distribution line and are electrically connected with a distribution line accurate semantic segmentation system, and the distribution line accurate semantic segmentation system acquires information of the laser radar and the high-precision vision camera and then realizes semantic segmentation through the following steps:
setp 1: acquiring a 3D point cloud picture of a laser radar and an RGB image of a high-precision vision camera, and carrying out registration fusion on the two images;
setp 2: improving a Mask-RCNN network, modifying a downsampling structure of ResNet, disassembling a large-kernel convolution for the ResNet network, replacing the large-kernel convolution with a plurality of layers of small convolutions, providing a new network structure, and constructing an improved Mask-RCNN semantic segmentation model;
setp 3: improving a Mask-RCNN semantic segmentation model loss function, and adding an L2 norm loss function at the tail of the original Mask-RCNN loss function to increase the constraint of the distribution line shape;
setp 4: acquiring related distribution line pictures on a live-line work site to prepare a data set, and dividing the data set into a test set and a training set;
setp 5: preprocessing the data set, and training and testing the improved Mask-RCNN semantic segmentation model by utilizing a test set and a training set;
setp 6: and inputting the data fused with the Setp1 as network input into an improved Mask-RCNN semantic segmentation model for semantic segmentation.
Further, the improved Mask-RCNN semantic segmentation model modifies a candidate region of the Mask-RCNN, and the candidate region selection method comprises the following steps:
firstly, extracting candidate regions from a picture obtained by Setp1 by using a Hough line (arc line) detection algorithm, directly abandoning the regions without lines (arcs) and reducing the original 2000 candidate regions into 100 candidate regions;
then, the picture is directly normalized to a format required by the convolutional network, the whole picture is sent to the convolutional network, the fifth common pooling layer is replaced by a RoI pooling layer, the picture is subjected to 5 layers of convolution operation to obtain a feature map, the obtained coordinate information is converted into coordinates corresponding to the feature map through a certain mapping relation, a corresponding candidate area is intercepted, a feature vector with a fixed length is extracted through the RoI layer, and the feature vector is sent to a full connection layer.
Further, the new network structure after modification has ResNet50 as the backbone network, and ResNet uses cross-layer connection.
Further, the middle n × n convolutional blocks of the new network structure are changed to 1 × n and n × 1 convolutional block pair, and each pair of convolutional blocks are connected in parallel.
Further, the modified loss function in Setp2 is defined as:
L=L cls +L box +αL mask +βL re (1)
wherein L is cls ,L box ,L mask Respectively classifying loss, detecting frame loss and Mask loss in Mask-RCNN semantic segmentation model loss function, L re For the registration loss of the 3D point cloud data, alpha and beta respectively represent the mask loss and the weight coefficient of the registration loss; l is mask 、L re 、L cls And L box Are respectively defined as:
L cls (p i ,p i * )=-log[p i p i * +(1-p i )(1-p i * )] (4)
wherein, y (i) ,y' (i) Respectively a true value and a predicted value; p is a radical of i A predicted classification probability for an anchor point; when the anchor point is a positive sample, p i * 1; when the anchor point is negative, p i * =0;t i Is the predicted offset of the anchor point and,representing the offset of the anchor point relative to the true value;r is SmoothL 1 The function of the function(s) is,
further, the specific steps of fusing the 3D point cloud image of the laser radar and the RGB image of the high-precision vision camera are as follows: firstly, defining a uniform coordinate system, establishing registration relation between feature points of a 3D point cloud picture and RGB images, and enabling a point p on a space coordinate system on a radar point cloud picture to be in contact with the feature points i The (x, y, z) is mapped into a plane coordinate system in a two-dimensional space, and is input into a subsequent semantic segmentation model as a network input.
Further, when the improved Mask-RCNN semantic segmentation model is trained and tested by using a test set and a training set, the following processing needs to be performed on a data set:
1) zooming the picture: during training and testing of an improved Mask-RCNN semantic segmentation model, zooming pictures in a data set into 960 x 540;
2) data enhancement: and (4) averaging the pictures in the data set and training by utilizing horizontal inversion.
Has the advantages that:
1. according to the invention, data are acquired based on a multi-source information fusion mode, accurate identification and extraction of the distribution line can be realized by using information of multiple dimensions, the accuracy and integrity of extraction of the distribution line are effectively improved, and the reliability of the safety early warning system is further ensured.
2. The invention is based on a classical improved method for a ResNet network, namely, the large-kernel convolution is disassembled, namely, the large-kernel convolution is replaced by a plurality of layers of small convolutions, so that the depth of the network can be deepened.
3. The network structure proposed by the invention: the middle n multiplied by n convolution blocks are changed into n pairs of convolution blocks of 1 multiplied by n and n multiplied by 1, and each pair of convolution blocks are connected in parallel, so that the network computing speed is accelerated, and the probability of network overfitting is reduced.
4. The invention adds an L2 loss at the end of the loss function to increase the constraint of the distribution line shape.
Drawings
FIG. 1 is a schematic diagram of the fusion of a 3D point cloud image and RGB image data of a laser radar;
fig. 2 is a structure diagram of a backbone network ResNet 101;
FIG. 3 is a diagram of a ResNet improvement architecture;
fig. 4 is a diagram illustrating a distribution line segmentation result according to an embodiment of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
The invention discloses a distribution line accurate semantic segmentation method based on multi-source information fusion, wherein a laser radar and a high-precision vision camera are installed on one side of a distribution line and are electrically connected with a distribution line accurate semantic segmentation system, and the distribution line accurate semantic segmentation system acquires information of the laser radar and the high-precision vision camera and then realizes semantic segmentation through the following steps:
setp 1: and acquiring a 3D point cloud picture of the laser radar and an RGB image of the high-precision vision camera, and registering and fusing the two images.
Setp 2: the Mask-RCNN network is improved, a downsampling structure of ResNet is modified, large-kernel convolution is disassembled from the ResNet network, the large-kernel convolution is replaced by multiple layers of small convolution, a new network structure is provided, and an improved Mask-RCNN semantic segmentation model is constructed to increase the detection speed of distribution lines.
In feature extraction of the original Mask-RCNN, firstly, coordinate information of 2000 candidate regions (region disposals) is obtained from an input picture by using a selective search algorithm (selective search). In the invention, because the distribution line has very obvious geometric characteristics of straight lines or arcs, a Hough line (arc) detection algorithm is used for extracting candidate regions for a picture, the regions without straight lines (arcs) are directly abandoned, and the original 2000 candidate regions are reduced into 100 candidate regions. By the operation, the training and detection speed of the network can be greatly increased.
Then, directly normalizing the picture to a format required by a convolutional network, sending the whole picture into the convolutional network, replacing a fifth common pooling layer with a RoI pooling layer, carrying out 5-layer convolution operation on the picture to obtain a feature map (feature maps), converting the coordinate information obtained at the beginning into coordinates corresponding to the feature map through a certain mapping relation, intercepting a corresponding candidate region, extracting feature vectors with fixed length after passing through the RoI layer, and sending the feature vectors into a full connection layer.
Setp 3: improving a Mask-RCNN semantic segmentation model loss function, and adding an L2 norm loss function at the tail of the original Mask-RCNN loss function to increase the constraint of the distribution line shape;
setp 4: and acquiring related distribution line pictures on a live-line work site to prepare a data set, and dividing the data set into a test set and a training set.
Setp 5: and preprocessing the data set, and training and testing the improved Mask-RCNN semantic segmentation model by utilizing a test set and a training set.
Setp 6: and (4) inputting the data fused with the Setp1 as network input into an improved Mask-RCNN semantic segmentation model for semantic segmentation.
For multi-source information fusion input:
because the setting up environment of distribution lines is comparatively complicated, the facility arrangement is comparatively intensive, therefore distribution lines information that single sensor gathered receives is influenced by surrounding complex environmental factor very easily for the data information who obtains is inaccurate, and then leads to safety precaution's reliability to reduce. Data are acquired based on a multi-source information fusion mode, accurate identification and extraction of the distribution lines can be achieved through information of multiple dimensions, accuracy and integrity of extraction of the distribution lines are effectively improved, and reliability of a safety early warning system is guaranteed.
Because the environment of live-line work is complex, the invention adopts the laser radar and integrates the high-precision vision camera as multi-source information input, the 3D point cloud picture of the laser radar can accurately acquire the position information of a target, the RGB vision camera can well acquire the surrounding vision information, the two are integrated to more accurately acquire the surrounding environment information of the live-line work, the anti-interference capability of the sensor is improved, and the distribution line is ensured to be completely and accurately identified and extracted.
As the radar cloud point image is 3D data, in order to meet the requirement of input of a Mask-RCNN semantic segmentation model, the fusion result of the radar 3D cloud point image and the RGB image needs to be 4-channel RGB-D data. The fusion algorithm of the radar 3D point cloud picture and the RGB image mainly comprises the following processes: firstly, a uniform coordinate system is defined, and a registration relation between points of the 3D point cloud picture and the RGB image is established. Point p on space coordinate system of radar point cloud picture i (x, y, z) is mapped into a planar coordinate system in two-dimensional space, the mapping formula is as follows:
wherein,is the mapped image coordinates, and h and w are the height and width of the desired range image representation. f ═ f u +f d For the vertical field of view of the lidar, f u Is the size of the elevation angle on the horizontal line, f d Is the size of the depression angle below the horizontal. r | | | p i || 2 Representing the range of points on a spherical coordinate system. This allows mapping the points on the 3D point cloud onto coordinates on the RGB image. Therefore, data fusion is realized, and the data fusion is used as network input and is input into a subsequent semantic segmentation model.
Improved Mask-RCNN semantic segmentation model
Improved network structure
The Mask-RCNN is a very flexible framework and can complete various image processing tasks such as target detection, semantic segmentation and the like. In order to ensure the accuracy of the distribution line segmentation of the network, the invention improves the Mask-RCNN network. And modifying a downsampling structure in ResNet according to the characteristics of the distribution line.
The invention uses ResNet50 as a backbone network. ResNet uses cross-layer connections to make training easier. The network structure of ResNet50 is shown in FIG. 2.
Based on a classical improved method for a ResNet network, the method for decomposing the large-kernel convolution is to replace the large-kernel convolution by a plurality of layers of small convolutions, and the structure diagram is shown in figure 3, so that the network depth can be deepened. This idea comes from the inclusion v2 network.
Based on the above improvement method, the present invention provides a new network structure: the middle n × n convolution block is changed to n pairs of 1 × n and n × 1 convolution blocks, and each pair of convolution blocks is connected in parallel. Therefore, the network computing speed can be increased, and the probability of network overfitting is reduced. Referring to fig. 3, the embodiment of the present invention takes 5 × 5 convolution blocks as an example, and changes 5 × 5 convolution blocks into 5 pairs of convolution blocks of 1 × 5 and 5 × 1, and connects each pair of convolution blocks in parallel.
Second, improving the loss function of the model
Because the shape of the distribution line is fixed, the method carries out optimization on the loss function of the Mask-RCNN semantic segmentation model, adds an L2 loss function at the tail of the loss function of the Mask-RCNN semantic segmentation model to strengthen the shape constraint, and defines the improved loss function as follows:
L=L cls +L box +αL mask +βL re (1)
wherein L is cls ,L box ,L mask Respectively classification loss, detection frame loss and Mask loss in Mask-RCNN semantic segmentation model loss function, L re The method comprises the steps that (1) registration loss of 3D point cloud data is obtained, and alpha and beta respectively represent weight coefficients of mask loss and registration loss; l is mask 、L re 、L cls And L box Are respectively defined as:
L cls (p i ,p i * )=-log[p i p i * +(1-p i )(1-p i * )] (4)
wherein, y (i) ,y' (i) Respectively a true value and a predicted value; p is a radical of formula i A predicted classification probability for an anchor point; when the anchor point is a positive sample, p i * 1; when the anchor point is negative, p i * =0;t i Is the predicted offset of the anchor point and,representing the offset of the anchor point relative to the true value;r is smoothL 1 The function of the function(s) is,
experiments and analyses
The experimental environment adopted by the invention is shown in table 1, and the parameters in the model training process are shown in table 2:
TABLE 1 Experimental Environment
TABLE 2 training parameters
The data set for Setp4 was processed as follows:
the invention uses the laser radar, integrates the high-precision vision camera to collect the relevant distribution line pictures on the live-line work site to prepare a data set, and the data set comprises 1800 pictures. The dataset is first preprocessed and the image size is set to 1920 x 1080. And then manually marking the data by using a marking tool to generate a label picture and a yaml file storing label names. The invention selects 1700 pictures for training and 100 pictures for testing.
In addition, the following operations are performed on the data set during the model training process.
Zooming the picture: during training and testing of the model herein, to increase the model training speed, the pictures inside the data set need to be scaled to 960 × 540.
Data enhancement: in order to make the input picture meet the requirement of the network architecture, data enhancement such as mean value removal, horizontal inversion and the like is also applied to training.
The method is used for carrying out semantic segmentation on the 10KV distribution line based on the improved Mask-RCNN model, and the visual segmentation result is shown in fig. 4, wherein the first column is an original picture, the second column is a label picture, and the third column is a segmentation result picture.
As shown in fig. 4, the method provided by the present invention can realize accurate segmentation of the distribution line in the complex background of live-wire work.
Meanwhile, the invention selects a plurality of classical semantic segmentation models to compare based on the data set created by the invention, wherein the SegNet semantic segmentation model represents a document: badrinarayanan V, Kendall A, Cipolla R.Seg Net: ADeep capacitive Encoder-Decoder Architecture for Image Segmentation [ J ]. IEEE Transactions on Pattern Analysis and Machine Analysis 2015: 1. The U-Net semantic segmentation model represents the literature: ronneberger O, Fischer P, Brox T.U-Net, volumetric Networks for biological Image Segmentation [ J ]. 2015. The Deeplabv3+ semantic segmentation model represents the literature: chen L C, Zhu Y, Papandrou G, et al, encoder-decoder with associated automatic image segmentation [ C ]// Proceedings of the European conference on computer vision (ECCV),2018: 801-. The Mask-RCNN semantic segmentation model represents the literature: he, K., Gkioxari, G., Dollar, P., et al. (2017). Mask R-CNN. in 2017IEEE International Conference on Computer Vision (ICCV) -Mask R-CNN, Venice, Italy, October 22-29,2017 (pp. 2980-2988). Methods model performance was assessed using mean cross-over ratio (MIoU). The comparison results are shown in table 3, which shows that the method provided by the invention has better effect than other methods.
Table 3 compares the results with other models
The above embodiments are merely illustrative of the technical concepts and features of the present invention, and the purpose of the embodiments is to enable those skilled in the art to understand the contents of the present invention and implement the present invention, and not to limit the protection scope of the present invention. All equivalent changes and modifications made according to the spirit of the present invention should be covered in the protection scope of the present invention.
Claims (7)
1. The utility model provides a distribution lines accurate semantic segmentation method based on multisource information fusion which characterized in that installs laser radar and high accuracy vision camera in distribution lines one side, and it all is connected with the accurate semantic segmentation system electricity of distribution lines, the accurate semantic segmentation system of distribution lines realizes semantic segmentation through following steps after acquireing laser radar and high accuracy vision camera information:
setp 1: acquiring a 3D point cloud picture of a laser radar and an RGB (red, green and blue) image of a high-precision vision camera, and registering and fusing the two images;
setp 2: improving a Mask-RCNN network, modifying a downsampling structure of ResNet, disassembling a large-kernel convolution for the ResNet network, replacing the large-kernel convolution with a plurality of layers of small convolutions, providing a new network structure, and constructing an improved Mask-RCNN semantic segmentation model;
setp 3: improving a Mask-RCNN semantic segmentation model loss function, and adding an L2 norm loss function at the tail of the original Mask-RCNN loss function to increase the constraint of the distribution line shape;
setp 4: collecting related distribution line pictures on a live-line work site to prepare a data set, and dividing the data set into a test set and a training set;
setp 5: preprocessing the data set, and training and testing the improved Mask-RCNN semantic segmentation model by utilizing a test set and a training set;
setp 6: and (4) inputting the data fused with the Setp1 as network input into an improved Mask-RCNN semantic segmentation model for semantic segmentation.
2. The method for accurate semantic segmentation of distribution lines based on multi-source information fusion according to claim 1, wherein the improved Mask-RCNN semantic segmentation model modifies candidate regions of Mask-RCNN, and the candidate region selection method comprises:
firstly, extracting candidate regions from a picture obtained by Setp1 by using a Hough line detection algorithm, directly abandoning the regions detected to have no straight line, and reducing the original 2000 candidate regions into 100 candidate regions;
then, the picture is directly normalized to a format required by the convolutional network, the whole picture is sent to the convolutional network, the fifth common pooling layer is replaced by a RoI pooling layer, the picture is subjected to 5 layers of convolution operation to obtain a feature map, the obtained coordinate information is converted into coordinates corresponding to the feature map through a certain mapping relation, a corresponding candidate area is intercepted, a feature vector with a fixed length is extracted through the RoI layer, and the feature vector is sent to a full connection layer.
3. The distribution line accurate semantic segmentation method based on multi-source information fusion of claim 1, wherein a ResNet50 is used as a backbone network in the modified new network structure, and the ResNet uses cross-layer connection.
4. The distribution line accurate semantic segmentation method based on multi-source information fusion of claim 3, characterized in that the middle nxn convolution blocks of a new network structure are changed into 1 convolution block pair of 1 xn and nx1, and each pair of convolution blocks are connected in parallel.
5. The distribution line accurate semantic segmentation method based on multi-source information fusion of claim 1, wherein the modified loss function in the Setp3 is defined as:
L=L cls +L box +αL mask +βL re (1)
wherein L is cls ,L box ,L mask Respectively classifying loss, detecting frame loss and Mask loss in Mask-RCNN semantic segmentation model loss function, L re For the registration loss of the 3D point cloud data, alpha and beta respectively represent the mask loss and the weight coefficient of the registration loss; l is mask 、L re 、L cls And L box Are respectively defined as:
L cls (p i ,p i * )=-log[p i p i * +(1-p i )(1-p i * )] (4)
wherein, y (i) ,y' (i) Respectively a true value and a predicted value; p is a radical of i A predicted classification probability for an anchor point; when the anchor point is a positive sample, p i * 1; when the anchor point is negative, p i * =0;t i Is an anchorThe predicted offset of the point(s) is,representing the offset of the anchor point relative to the true value;r is smoothL 1 The function of the function(s) is,
6. the distribution line accurate semantic segmentation method based on multi-source information fusion of claim 1, wherein the specific steps of fusing the 3D point cloud image of the laser radar and the RGB image of the high-precision vision camera are as follows: firstly, defining a uniform coordinate system, establishing a registration relation between a 3D point cloud picture and RGB image characteristic points, and aligning a point p on a space coordinate system on a radar point cloud picture i The (x, y, z) is mapped into a plane coordinate system in a two-dimensional space, and is input into a subsequent semantic segmentation model as a network input.
7. The distribution line accurate semantic segmentation method based on multi-source information fusion of claim 1, wherein when a test set and a training set are used for training and testing the improved Mask-RCNN semantic segmentation model, the following processing needs to be performed on a data set:
1) zooming the picture: during training and testing of an improved Mask-RCNN semantic segmentation model, zooming pictures in a data set into 960 x 540;
2) data enhancement: the pictures in the data set are de-averaged and trained using horizontal inversion.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110355431.XA CN113205526B (en) | 2021-04-01 | 2021-04-01 | Distribution line accurate semantic segmentation method based on multi-source information fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110355431.XA CN113205526B (en) | 2021-04-01 | 2021-04-01 | Distribution line accurate semantic segmentation method based on multi-source information fusion |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113205526A CN113205526A (en) | 2021-08-03 |
CN113205526B true CN113205526B (en) | 2022-07-26 |
Family
ID=77026115
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110355431.XA Active CN113205526B (en) | 2021-04-01 | 2021-04-01 | Distribution line accurate semantic segmentation method based on multi-source information fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113205526B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114297237A (en) * | 2021-12-14 | 2022-04-08 | 重庆邮电大学 | Three-dimensional point cloud data retrieval method and device based on category fusion and computer equipment |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111210443A (en) * | 2020-01-03 | 2020-05-29 | 吉林大学 | Deformable convolution mixing task cascading semantic segmentation method based on embedding balance |
-
2021
- 2021-04-01 CN CN202110355431.XA patent/CN113205526B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111210443A (en) * | 2020-01-03 | 2020-05-29 | 吉林大学 | Deformable convolution mixing task cascading semantic segmentation method based on embedding balance |
Also Published As
Publication number | Publication date |
---|---|
CN113205526A (en) | 2021-08-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111985376A (en) | Remote sensing image ship contour extraction method based on deep learning | |
CN110246181B (en) | Anchor point-based attitude estimation model training method, attitude estimation method and system | |
Alidoost et al. | A CNN-based approach for automatic building detection and recognition of roof types using a single aerial image | |
CN113240691A (en) | Medical image segmentation method based on U-shaped network | |
CN111950453A (en) | Optional-shape text recognition method based on selective attention mechanism | |
Yang et al. | An ensemble Wasserstein generative adversarial network method for road extraction from high resolution remote sensing images in rural areas | |
CN109635726B (en) | Landslide identification method based on combination of symmetric deep network and multi-scale pooling | |
CN113838064B (en) | Cloud removal method based on branch GAN using multi-temporal remote sensing data | |
CN108230330B (en) | Method for quickly segmenting highway pavement and positioning camera | |
CN111462140B (en) | Real-time image instance segmentation method based on block stitching | |
CN111461006B (en) | Optical remote sensing image tower position detection method based on deep migration learning | |
CN110490915B (en) | Point cloud registration method based on convolution-limited Boltzmann machine | |
US11763471B1 (en) | Method for large scene elastic semantic representation and self-supervised light field reconstruction | |
CN113052106A (en) | Airplane take-off and landing runway identification method based on PSPNet network | |
CN116883650A (en) | Image-level weak supervision semantic segmentation method based on attention and local stitching | |
CN115272306A (en) | Solar cell panel grid line enhancement method utilizing gradient operation | |
CN113205526B (en) | Distribution line accurate semantic segmentation method based on multi-source information fusion | |
CN117274627A (en) | Multi-temporal snow remote sensing image matching method and system based on image conversion | |
CN113706562A (en) | Image segmentation method, device and system and cell segmentation method | |
CN114882494A (en) | Multi-mode attention-driven three-dimensional point cloud feature extraction method | |
CN116612357B (en) | Method, system and storage medium for constructing unsupervised RGBD multi-mode data set | |
Chen et al. | BARS: a benchmark for airport runway segmentation | |
CN116740528A (en) | Shadow feature-based side-scan sonar image target detection method and system | |
CN116385477A (en) | Tower image registration method based on image segmentation | |
CN112069997B (en) | Unmanned aerial vehicle autonomous landing target extraction method and device based on DenseHR-Net |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |