CN111882563A - Semantic segmentation method based on directional convolutional network - Google Patents

Semantic segmentation method based on directional convolutional network Download PDF

Info

Publication number
CN111882563A
CN111882563A CN202010669134.8A CN202010669134A CN111882563A CN 111882563 A CN111882563 A CN 111882563A CN 202010669134 A CN202010669134 A CN 202010669134A CN 111882563 A CN111882563 A CN 111882563A
Authority
CN
China
Prior art keywords
network
convolution
directional
semantic segmentation
full
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010669134.8A
Other languages
Chinese (zh)
Other versions
CN111882563B (en
Inventor
武伯熹
蔡登�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202010669134.8A priority Critical patent/CN111882563B/en
Publication of CN111882563A publication Critical patent/CN111882563A/en
Application granted granted Critical
Publication of CN111882563B publication Critical patent/CN111882563B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a semantic segmentation method based on a directional convolutional network, which comprises the following steps: (1) constructing a full convolution network of directional convolution; (2) adding the top layer of the constructed directional full convolution network into a pooling layer and a full connection layer network to form a first depth model, and pre-training on a large data set; (3) extracting a full convolution part in the pre-trained first depth model, initializing parameters of a directional full convolution network by using a full convolution layer, and adding a new full connection layer to form a second depth model; (4) training a second depth model using the semantically picture segmented data set until the model converges; (5) and analyzing the picture to be detected by using the trained second depth model, predicting the category of each pixel in the picture, and forming and outputting a semantic segmentation graph of the picture. The method can promote the relation between the semantic segmentation learning perception field and the central pixel, and improve the robustness of the training model.

Description

Semantic segmentation method based on directional convolutional network
Technical Field
The invention belongs to the field of computer vision and image processing, and particularly relates to a semantic segmentation method based on a directional convolutional network.
Background
With the discovery and the deep research of the deep learning theory, the rapid breakthrough and the remarkable improvement of a plurality of fields and tasks belonging to the computer vision are realized. Among them, semantic segmentation is one of the most troublesome computational vision tasks and the current popular research direction due to its high requirement on the fineness of the vision system. And (3) according to the requirement of a semantic segmentation task, predicting the class of an object to which each pixel belongs on the picture with any size by a computational vision system. The mainstream Semantic Segmentation solution at present adopts a full convolution network architecture, which starts from the work of < full relational networks for Semantic Segmentation > proposed by joint on computer Vision and Pattern Recognition in 2014 by Jonathan Long et al, berkeley division university, california. The work is based on the experience in the field of calculation and identification, only a convolution neural network (full convolution) is used for image processing, and a bilinear interpolation method is combined, so that output prediction and input picture pixels are in one-to-one correspondence. By an end-to-end training method, the neural network is trained under the framework of supervised learning, and the image characteristics far superior to the traditional learning are obtained. The work "Encoder-decoder with algorithm partial conversion for the semantic image segmentation" published by Liang-Chieh Chen et al in the European Conference on Computer Vison Conference 2018 introduced the DeepLab v3+ method as a leading-edge solution in this field. The size of the effective perception field is improved through various technologies such as diffusion convolution and the like.
However, the effectiveness of the full convolution network is not fully understood, and the prediction result thereof has some defects. Careful analysis of the prediction process of the full convolution network revealed that for a prediction on a single pixel, the neural network could obtain all pixels on the perceptual horizon (the input part to which the neural network output can be directly related), but only the class of the central pixel is output. On one hand, no mechanism is used for explicitly guiding the neural network to predict the pixels in the center of the perception field in the training process, and on the other hand, the experimental result shows that the full convolution network really learns the relevance between the perception field and the pixels in the center from the data. Such contrast elicits us to understand the deep mechanisms of convolutional networks and, based on this understanding, to encourage neural networks to give higher attention to central locations, thereby yielding a more robust semantic segmentation system.
Disclosure of Invention
The invention provides a semantic segmentation method based on a directional convolutional network, which can promote semantic segmentation to learn the relation between a perception field and a central pixel, improve the robustness of a training model and enable the image semantic segmentation to be more accurate.
A semantic segmentation method based on a directional convolutional network is characterized by comprising the following steps:
(1) constructing a full convolution network of directional convolution;
(2) adding the top layer of the constructed directional full convolution network into a pooling layer and a full connection layer network to form a first depth model, and pre-training on a large data set;
(3) extracting a full convolution part in the pre-trained first depth model, initializing parameters of a directional full convolution network by using a full convolution layer, and adding a new full connection layer to form a second depth model;
(4) training a second depth model using the semantically picture segmented data set until the model converges;
(5) and analyzing the picture to be detected by using the trained second depth model, predicting the category of each pixel in the picture, and forming and outputting a semantic segmentation graph of the picture.
The method comprises the steps of firstly constructing a full convolution network only using directional convolution, then pre-training a deep learning network containing a full convolution layer and a pooling-full connection layer to serve as initialization of the full convolution layer, then training on a training data set (semantic segmentation task), and predicting the category of each pixel on an input image. The method can promote the potential task of 'predicting and sensing the centre-of-field pixel' of deep network learning, so that a robust semantic segmentation model is more easily generated.
In step (1), all the common convolutions are replaced by directional convolutions. The exact definition of the directional convolution is as follows:
for normal convolution operations, there is a linear transformation as follows:
Figure BDA0002581592590000031
wherein, ycoIs the co-th characteristic of the outputWhere ci represents the index of the input features, total CiA characteristic value; s is a position set of pixels sampled in the convolution calculation process; w is as,ci、xs,ciAnd bcoRepresenting the weight, input and offset required in the linear operation process, respectively. Because the ordinary convolution adopts a uniform sampling mode, the selection of the offset set S is as follows:
S={0,1,-1}2
for directional convolution the offset set is no longer constant as above, but is chosen from the following dynamic set:
Mk={(s1,s2)|(s1-e1)2+(s2-e2)2≤22
Figure BDA0002581592590000032
s1,s2∈[-2,2];s1,s2∈Z}∪{(0,0)}
wherein, the value range of k is an integer from 0 to 15, which represents 16 different directions; the specific value rule of S is as follows:
S=M(ci%16)
where ci represents the index of the input channel, and the division by 16 means sorting into 16 different groups.
This makes the original 3 x 3 square sampling region become sector-shaped regions in different directions. Because the central pixel is sampled all the time and the surrounding pixels are sampled in turn, the central pixel has more paths to transmit information on the finally generated calculation graph, and the attention of the central pixel is improved.
The directional convolution described above is named DirConv-I, I indicates that the selection of directionality is based on the input dimension. Similarly, DirConv-O according to the output dimension can be obtained, with the convolution offset:
S=M(co%16)
the above design is based on a variant of the 3 x 3 convolution, which can be treated as a 2 x 2 type convolution to get a slim version of the directional convolution: DirConv-SI and DirConv-SO.
In the step (2), in order to alleviate the problem of excessive data volume of semantic segmentation, the large-scale data set adopts a large-scale image recognition data set ImageNet, so that the convergence speed and the training quality of the semantic segmentation can be accelerated.
The specific steps of the step (2) are as follows:
(2-1) adding an image pooling layer on the top layer of the full convolution network to enable the image to be changed into a feature vector from a three-dimensional feature map, and then deforming the feature vector into a 1000-dimensional vector by using a full connection network, wherein the 1000 image categories correspond to ImageNet;
(2-2) training the constructed first depth model on GPUs, wherein each GPU calculates 32 images at a time, and 8 GPUs are trained in parallel;
(2-3) using the SGDM optimization algorithm, the initial learning rate is 0.256, and after every 30 cycles, the learning rate is reduced to 10% for a total of 90 cycles of training, and the Momentum parameter is set to 0.9 until the model converges.
In the step (3), the parameters of the directional full convolution network are initialized by using the full convolution layer in the previous step, a full connection layer is added afterwards, the characteristic value is transformed into a c-dimensional vector, and c corresponds to the number of object categories in the target semantic segmentation data set. The newly added fully connected layer is initialized randomly with a gaussian distribution.
The specific process of the step (4) is as follows:
(4-1) inputting the pictures of the training set into a second depth model, and generating a feature map after calculation;
(4-2) replacing the last span convolution in the model network with a non-span convolution, and adjusting the diffusivity of all the subsequent convolution networks to be 2;
(4-3) because the resolution of the image is reduced by using the convolution span in the network operation process, the finally generated feature map is equivalent to 1/16 of the original image, so that the feature map needs to be amplified to the size of the original image by adopting bilinear interpolation;
(4-4) feeding the generated features into a softmax function to obtain the probability distribution of the prediction sequence, and using the probability distributioncalculating the gradient of the parameters on the network by using a cross-entropy loss function, and updating the parameter values by using an SGDM optimization algorithm; initial learning rate is set to 10-3
(4-5) repeating the above steps until the model converges.
Compared with the prior art, the invention has the following beneficial effects:
1. the invention designs a novel directional convolutional network layer based on understanding of the sensing field of the convolutional network, and highlights the attention of the neural network to the center of the sensing field, so that the neural network can learn the connotation correlation between input and output more easily.
2. The invention has strong applicability, and can be directly and effectively deployed in most of the existing semantic segmentation technologies by replacing a common convolutional network with a directional convolutional network without influencing other method processes.
Drawings
FIG. 1 is a schematic flow chart of a semantic segmentation method based on a directional convolutional network according to the present invention;
FIG. 2 is a visualization of convolution kernels of different convolution networks and corresponding perception fields in an embodiment of the present invention.
Detailed Description
The invention will be described in further detail below with reference to the drawings and examples, which are intended to facilitate the understanding of the invention without limiting it in any way.
As shown in fig. 1, a semantic segmentation method based on a directional convolutional network includes the following steps:
and S01, constructing a full convolution network of the directional convolution.
By adopting the design of a deep learning network reference residual error network ResNet-101, parameters such as network depth, network width, image resolution, convolution span and the like are kept unchanged, 3-by-3 type convolution is replaced by directional convolution, and a visual graph of the general convolution of a directional convolution kernel is shown in figure 2.
And S02, adding the top layer of the constructed directional full-convolution network into a pooling layer and a full-connection layer network, and pre-training on a large data set. The directional fully convolutional network constructed in step S01 cannot be directly used in the pre-training process of the image recognition task because the output feature vectors do not meet the criteria of the data set ImageNet. ImageNet is the first ultra-large image Recognition dataset published by Jia Deng et al, Stanford university, Conference on Computer Vision and Pattern Recognition 2009, as an article of "ImageNet: A large-scale high-efficiency image database," which searches the Internet for pictures of 1000 classes of objects, with an image size maintained at about 256 × 256, each class having over 1000 training pictures. 1281167 pictures are in the training set, and 50000 pictures are in the verification set.
The full convolution network is accessed to the image pooling layer, so that the feature map can be reduced to feature vectors, and then the feature vectors are converted into prediction vectors with the length of 1000 by using the full connection layer. Pre-training is done on ImageNet with the same training pattern as ResNet-101. The results after pre-training are shown in table 1, and it can be seen that the directional convolution can achieve the same pre-training effect:
TABLE 1
Figure BDA0002581592590000061
And S03, extracting the full convolution part in the step S02 for subsequent training, wherein other parameters are random initialization.
And S04, starting to train the model, wherein in the training process, when the image is too large, the image can be reduced to one 16 th by using convolution spans, and in the subsequent prediction process, the convolution spans can be cancelled, so that the prediction resolution is improved, and a better result is generated. This difference is due to the fact that multiple pictures must be used for simultaneous training during the training process.
And S05, performing semantic segmentation tasks by using the trained model.
To demonstrate the effectiveness of the method of the invention, tests were performed on the cityscaps dataset. The basic model is ResNet-101, and the semantic segmentation method adopts a DeepLabv3/3+ framework. For semantic segmentation task, the average IOU index of 21 classes in Cityscapes is used for evaluation, and the result is shown in Table 2.
TABLE 2
Figure BDA0002581592590000071
The results show that the segmentation effect can be effectively improved by four kinds of directional convolution. While showing their number of parameters, it can be seen that DirConv-SI and DirConv-SO can achieve better results with fewer parameters.
And continuing to use multiple deformations of the image to help the neural network to perform joint prediction. Since a single input may cause unstable prediction, the present embodiment uses both the flipped and multi-scaled pictures and the network with adjusted convolutional span to predict the final result, and the result is shown in table 3.
TABLE 3
Figure BDA0002581592590000072
As shown in the above table, OS8 represents the reduction of the convolution span of 2 to 3 positions, resulting in an output stride of 8; MS represents averaging after prediction by using three inputs of [0.75,1,1.25] in different proportions; flip represents simultaneous use of the flipped image. The directional convolution achieves a continuous boost in cooperation with these methods. The above experiment was based on the DeepLabv3+ model.
The embodiments described above are intended to illustrate the technical solutions and advantages of the present invention, and it should be understood that the above-mentioned embodiments are only specific embodiments of the present invention, and are not intended to limit the present invention, and any modifications, additions and equivalents made within the scope of the principles of the present invention should be included in the scope of the present invention.

Claims (6)

1. A semantic segmentation method based on a directional convolutional network is characterized by comprising the following steps:
(1) constructing a full convolution network of directional convolution;
(2) adding the top layer of the constructed directional full convolution network into a pooling layer and a full connection layer network to form a first depth model, and pre-training on a large data set;
(3) extracting a full convolution part in the pre-trained first depth model, initializing parameters of a directional full convolution network by using a full convolution layer, and adding a new full connection layer to form a second depth model;
(4) training a second depth model using the semantically picture segmented data set until the model converges;
(5) and analyzing the picture to be detected by using the trained second depth model, predicting the category of each pixel in the picture, and forming and outputting a semantic segmentation graph of the picture.
2. The method for semantic segmentation based on the directional convolutional network as claimed in claim 1, wherein in step (1), the definition of the directional convolution is as follows:
Figure FDA0002581592580000011
wherein, ycoIs the co-th feature of the output, ci represents the index of the input feature, total CiA characteristic value; s is a position set of pixels sampled in the convolution calculation process; w is as,ci、xs,ciAnd bcoRespectively representing the weight, input and offset required in the linear operation process; the offset set S is selected from the following dynamic sets:
Mk={(s1,s2)|(s1-e1)2+(s2-e2)2≤22
Figure FDA0002581592580000012
s1,s2∈[-2,2];s1,s2∈Z}∪{(0,0)}
wherein, the value range of k is an integer from 0 to 15, which represents 16 different directions; the specific value rule of S is as follows:
S=M(ci%16)
where ci represents the index of the input channel, and the division by 16 means sorting into 16 different groups.
3. The method for semantic segmentation based on the directional convolutional network as claimed in claim 1, wherein in step (2), the large-scale data set is a large-scale image recognition data set ImageNet.
4. The semantic segmentation method based on the directional convolutional network as claimed in claim 3, wherein the step (2) comprises the following steps:
(2-1) adding an image pooling layer on the top layer of the full convolution network to enable the image to be changed into a feature vector from a three-dimensional feature map, and then deforming the feature vector into a 1000-dimensional vector by using a full connection network, wherein the 1000 image categories correspond to ImageNet;
(2-2) training the constructed first depth model on GPUs, wherein each GPU calculates 32 images at a time, and 8 GPUs are trained in parallel;
(2-3) using the SGDM optimization algorithm, the initial learning rate is 0.256, and after every 30 cycles, the learning rate is reduced to 10% for a total of 90 cycles of training, and the Momentum parameter is set to 0.9 until the model converges.
5. The method for semantic segmentation based on the directional convolutional network as claimed in claim 1, wherein in step (3), the newly added full-link layer is initialized randomly with gaussian distribution.
6. The semantic segmentation method based on the directional convolutional network as claimed in claim 1, wherein the specific process of step (4) is as follows:
(4-1) inputting the pictures of the training set into a second depth model, and generating a feature map after calculation;
(4-2) replacing the last span convolution in the model network with a non-span convolution, and adjusting the diffusivity of all the subsequent convolution networks to be 2;
(4-3) amplifying the feature map to the size of the original image by adopting bilinear interpolation;
(4-4) sending the generated features into a softmax function to obtain probability distribution of a prediction sequence, calculating the gradient of parameters on the network by using a cross-entropy loss function, and updating parameter values by using an SGDM (generalized minimum mean square deviation) optimization algorithm; initial learning rate is set to 10-3
(4-5) repeating the above steps until the model converges.
CN202010669134.8A 2020-07-13 2020-07-13 Semantic segmentation method based on directional full convolution network Active CN111882563B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010669134.8A CN111882563B (en) 2020-07-13 2020-07-13 Semantic segmentation method based on directional full convolution network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010669134.8A CN111882563B (en) 2020-07-13 2020-07-13 Semantic segmentation method based on directional full convolution network

Publications (2)

Publication Number Publication Date
CN111882563A true CN111882563A (en) 2020-11-03
CN111882563B CN111882563B (en) 2022-05-27

Family

ID=73151747

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010669134.8A Active CN111882563B (en) 2020-07-13 2020-07-13 Semantic segmentation method based on directional full convolution network

Country Status (1)

Country Link
CN (1) CN111882563B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107564025A (en) * 2017-08-09 2018-01-09 浙江大学 A kind of power equipment infrared image semantic segmentation method based on deep neural network
CN108564587A (en) * 2018-03-07 2018-09-21 浙江大学 A kind of a wide range of remote sensing image semantic segmentation method based on full convolutional neural networks
CN110443805A (en) * 2019-07-09 2019-11-12 浙江大学 A kind of semantic segmentation method spent closely based on pixel
CN110826596A (en) * 2019-10-09 2020-02-21 天津大学 Semantic segmentation method based on multi-scale deformable convolution

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107564025A (en) * 2017-08-09 2018-01-09 浙江大学 A kind of power equipment infrared image semantic segmentation method based on deep neural network
CN108564587A (en) * 2018-03-07 2018-09-21 浙江大学 A kind of a wide range of remote sensing image semantic segmentation method based on full convolutional neural networks
CN110443805A (en) * 2019-07-09 2019-11-12 浙江大学 A kind of semantic segmentation method spent closely based on pixel
CN110826596A (en) * 2019-10-09 2020-02-21 天津大学 Semantic segmentation method based on multi-scale deformable convolution

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
JONATHAN LONG 等: "Fully Convolutional Networks for Semantic Segmentation", 《OPENACCESS》 *
KAI ZHU 等: "One-Shot Texture Retrieval with Global Context Metric", 《ARXIV:1905.06656V2》 *
SEBASTIAN SABOGAL 等: "ReCoN: A Reconfigurable CNN Acceleration Framework for Hybrid Semantic Segmentation on Hybrid SoCs for Space Applications", 《2019 IEEE SPACE COMPUTING CONFERENCE (SCC)》 *
林云等: "基于语义分割的活体检测算法", 《吉林大学学报(工学版)》 *

Also Published As

Publication number Publication date
CN111882563B (en) 2022-05-27

Similar Documents

Publication Publication Date Title
CN110020682B (en) Attention mechanism relation comparison network model method based on small sample learning
CN111583263B (en) Point cloud segmentation method based on joint dynamic graph convolution
WO2021098362A1 (en) Video classification model construction method and apparatus, video classification method and apparatus, and device and medium
CN105138973B (en) The method and apparatus of face authentication
Dong et al. Crowd counting by using top-k relations: A mixed ground-truth CNN framework
Zeng et al. Single image super-resolution using a polymorphic parallel CNN
Zhang et al. Efficient feature learning and multi-size image steganalysis based on CNN
CN109857871B (en) User relationship discovery method based on social network mass contextual data
CN111160533A (en) Neural network acceleration method based on cross-resolution knowledge distillation
CN112070768B (en) Anchor-Free based real-time instance segmentation method
CN112489164B (en) Image coloring method based on improved depth separable convolutional neural network
CN109993702B (en) Full-text image super-resolution reconstruction method based on generation countermeasure network
KR20230073751A (en) System and method for generating images of the same style based on layout
CN115222998A (en) Image classification method
CN115565043A (en) Method for detecting target by combining multiple characteristic features and target prediction method
CN116168197A (en) Image segmentation method based on Transformer segmentation network and regularization training
CN116091823A (en) Single-feature anchor-frame-free target detection method based on fast grouping residual error module
Li et al. A graphical approach for filter pruning by exploring the similarity relation between feature maps
US20220301106A1 (en) Training method and apparatus for image processing model, and image processing method and apparatus
WO2024060839A1 (en) Object operation method and apparatus, computer device, and computer storage medium
CN115280329A (en) Method and system for query training
CN111814884A (en) Target detection network model upgrading method based on deformable convolution
CN111882563B (en) Semantic segmentation method based on directional full convolution network
CN116562366A (en) Federal learning method based on feature selection and feature alignment
CN112529081B (en) Real-time semantic segmentation method based on efficient attention calibration

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant