CN114494701A - Semantic segmentation method and device based on graph structure neural network - Google Patents

Semantic segmentation method and device based on graph structure neural network Download PDF

Info

Publication number
CN114494701A
CN114494701A CN202210134177.5A CN202210134177A CN114494701A CN 114494701 A CN114494701 A CN 114494701A CN 202210134177 A CN202210134177 A CN 202210134177A CN 114494701 A CN114494701 A CN 114494701A
Authority
CN
China
Prior art keywords
semantic
network
segmentation
feature map
segmentation result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210134177.5A
Other languages
Chinese (zh)
Inventor
胡浩基
白健弘
王化良
龙永文
欧阳涛
黄源甲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Foshan Shunde Midea Electrical Heating Appliances Manufacturing Co Ltd
Original Assignee
Zhejiang University ZJU
Foshan Shunde Midea Electrical Heating Appliances Manufacturing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU, Foshan Shunde Midea Electrical Heating Appliances Manufacturing Co Ltd filed Critical Zhejiang University ZJU
Priority to CN202210134177.5A priority Critical patent/CN114494701A/en
Publication of CN114494701A publication Critical patent/CN114494701A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Operations Research (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a semantic segmentation method and a semantic segmentation device based on a graph structure neural network. A class semantic enhancement module (CSE) is proposed that uses graph models to create graph structures between channels and further outputs a "channel" - "object" relationship matrix to reconstruct a feature map. In addition, a full convolution network layer fusing the prior information of the object is created based on the similar semantic enhancement module, and a fine segmentation result is generated through rough segmentation and a feature map. On the basis of the two modules, a semantic enhancement network (CSENet) is proposed, and the network captures the interdependence and the relationship between 'channel' and 'object'. And a rough-to-fine segmentation strategy is used, and the CSENet layer and the CPFC layer are sequentially stacked, so that the rough segmentation result is gradually optimized, and the fine segmentation result is finally output as the final network output. Experiments show that the method can effectively improve the performance of the existing semantic segmentation network.

Description

Semantic segmentation method and device based on graph structure neural network
Technical Field
The invention relates to the fields of deep learning, semantic segmentation and the like, in particular to a semantic segmentation method and a semantic segmentation device based on a graph structure neural network.
Background
Semantic segmentation is a challenging basic task in computer vision, aimed at understanding and segmenting scenes. In the real world, objects in a scene are not independent, but interact to form a complex scene. Accurately capturing the interdependencies between objects helps to understand scene semantics, thus completing pixel-level segmentation in a scene.
Slave full convolution network[1]FCN-based approaches have been the dominant solution since the advent (FCN). Recent efforts to employ multi-scale strategies have successfully exploited object context information. Deep Lab series method[2][3][4]Convolution modules with different void rates are continuously explored to enlarge the receptive field and enhance the object context characteristics. PSPNet[5]A global context is learned using a global average pool. As an extension and extension of the above method, some work[6][7][8][9][10]By aggregating the feature maps at multiple scales, a wider range of semantic information is obtained. Although the multiscale approach broadens the receptive field, it results in partial local information loss and correlation between objects is neglected.
"relationship" based methods have performed well in recent years because they are not limited by the scope of the receptive field. They can be divided into two groups depending on the relationship dimension employed. One is pixel relation for studying imagesThe interaction between elements. The other is a regional relationship, which aims to study the characteristics of a specific region composed of certain pixels. Adaptive regions, a priori spatial distribution of objects, and per feature mapping of channels are common definitions of regions. For methods based on pixel relationships, DANet[11]The correlation between pixels is studied and a coherent characterization of similar pixels is given. ACFNet[12]The higher order relationships between pixels were further investigated with the aim of tracking the interdependencies between objects. Although Intersectssa[13]A factorized pixel-level attention is proposed to reduce computational cost, but establishing correlations between pixels still significantly increases the complexity of the model.
Although region relationship-based methods can reduce the complexity of the model, the performance of these methods depends largely on the quality of region partitioning due to the complexity and variability of the image scene and semantics. The adaptive region partitioning method does not perform well because it lacks strong discrimination information about the target region. The attention module of DANet treats the channels as a characteristic representation of the object and enhances the representation of each channel by weighted summation between the channels. The implicit, uncopyable object region characteristics of the channel still face the problem of poor performance. Acnnet directly uses the coarse segmentation as an a priori spatial distribution to enhance the features of each pixel. However, each channel describes the object from various aspects (local or global) and is not limited to the spatial distribution of the object, which leads to inaccurate feature optimization.
[1].Long J,Shelhamer E,Darrell T.Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition.2015:3431-3440.
[2].Chen L C,Papandreou G,Kokkinos l,et al.Deeplab:Semantic image segmentation with deep convolutional nets,atrous convolution,and fully connected crfs[J].IEEE transactions on pattern analysis and machine intelligence,2017,40(4):834-848.
[3].Chen L C,Zhu Y,Papandreou G,et al.Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of the European conference on computer vision(ECCV).2018:801-818.
[4].Chen L C,Papandreou G,Schroff F,et al.Rethinking atrous convolution for semantic image segmentation[J]arXiv preprint arXiv:1706.05587,2017.
[5].Zhao H,Shi J,Qi X,et al.Pyramid scene parsing network[C]//Proceedings of the lEEE conference on computer vision and pattern recognition.2017:2881-2890.
[6].Zhang H,Dana K,Shi J,et al.Context encoding for semantic segmentation[C]//Proceedings of the IEEE conference on Computer Vision and Pattern Recognition.2018:7151-7160.
[7].Yang M,Yu K,Zhang C,et al.Denseaspp for sementic segmentation in street scenes[C]//Proceedings of the lEEE conference on computer Vision and pattern recognition.2018:3684-3692.
[8].He J,Deng Z,Zhou L,et al.Adaptive pyramid context network for semantic segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision end Pattern Recognition.2019:7519-7528.
[9].Lin D,Shen D,Shen S,et a1.Zigzagnet:Fusing top-down and bottom-up context for object segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:7490-7499.
[10].Fu J,Liu J,Tian H,et al.Dual attention network for scene segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:3146-3154.
[11].Zhang H,Zhang H,Wang C,et al.Co-occurrent features in semantic segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:548-557.
[12].Huang L,Yuan Y,Guo J,et al.Interlaced sparse self-attention for semantic segmentation[J].arXiv preprint arXiv:1907.12273,2019.
[13].He K,Zhang X,Ren S,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition.2016:770-778.
Disclosure of Invention
The method is mainly applied to solving the semantic segmentation problem in a scene, namely, the correct class label is distributed to each pixel in the input picture. Semantic segmentation has a very wide application scene in the industry, and relates to medical image segmentation, automatic driving, face recognition, geological detection and the like.
Aiming at the defects of the prior art and solving the problems of inaccurate characteristic representation and higher complexity of the prior art, the invention provides a semantic segmentation method and a semantic segmentation device based on a graph structure neural network.
Firstly, a Class Semantic Enhancement module (CSE) is constructed by using a graph attention mechanism, and the module creates graph structures among channels and further outputs a more accurate ' channel ' -object ' similarity matrix to reconstruct distinguishable Class Semantic representations. Secondly, a full convolution network layer (CPFC) containing object prior information is constructed based on a Class semantic enhancement module, and the CPFC generates a more refined segmentation result through a rough segmentation result and a reconstructed feature map. On the basis of the two modules, a semantic enhancement network (CSENet) is formed, the network can capture the interdependence between different dimensions of the feature map and the prior information between a channel and an object, and therefore the final segmentation result can be generated.
The purpose of the invention can be realized by the following technical method: a semantic segmentation method of a neural network based on a graph structure comprises the following steps:
(1) acquiring an image to be segmented, and inputting the image into a residual error network for feature extraction; generating a primary segmentation result from the extracted feature map through a convolution network;
(2) inputting the preliminary segmentation result into a plurality of similar semantic enhancement networks which are connected in sequence, and gradually refining the segmentation result, wherein each similar semantic enhancement network comprises a similar semantic enhancement module and a full convolution network layer fusing object prior information; the quasi-semantic enhancement module takes the generated feature map and the segmentation result of the last quasi-semantic enhancement network as input, outputs a new feature map and a joint probability density matrix to a full convolution network layer of the prior information of the fusion object, and further outputs the segmentation result;
(3) and restoring the segmentation result after the last class semantic enhancement network is refined into the original resolution of the input image, and taking the class with the highest confidence as the final class of each pixel to obtain the image after semantic segmentation.
Further, a similar semantic enhancement module in the similar semantic enhancement network converts a feature map and a segmentation result of the previous semantic enhancement network into matrixes and multiplies the matrixes to obtain an object-channel relation matrix, linear mapping is performed on the relation matrix by using k learnable parameters, cosine similarity is calculated in pairs for each mapped dimensional vector to form k adjacent matrixes Ak(k=1,2,3),AkElement A in (A)i,jThe value of (a) can represent the correlation degree between a characteristic diagram channel i and a characteristic diagram channel j, and the larger the value of (b), the more compact the correlation of the two channels; interacting and aggregating semantic information on the dimension with close association by using a graph neural network to generate a joint probability density matrix; and multiplying the relation matrix and the joint probability density matrix element by element, then multiplying the multiplied relation matrix and the joint probability density matrix by a matrix converted by the segmentation result to obtain a reconstructed feature map, fusing the reconstructed feature map with the feature map generated by the last similar semantic enhancement network, and outputting the fused feature map.
Further, the specific process of generating the joint probability density matrix is as follows: is defined as P ∈ RC*NWherein R isC*NThe vector space is a C x N dimensional real vector space, and the mathematical expression of the vector space is as follows:
P=σ(||Akk)
wherein psik∈RN*NA learnable parameter matrix that is the kth graph structure; sigma represents a sigmoid function; element P in matrix PijFeature graph F representing the ith dimension of the feature graphiIn relation to the article with label number jProbability of a body.
Further, the specific processing process of the full convolution network layer fusing the prior information of the object on the characteristic diagram and the joint probability density matrix is as follows:
Figure BDA0003503699710000041
wherein
Figure BDA0003503699710000042
For the fine segmentation result, C is the number of feature map channels, pjiIs the element in the jth row and ith column in the joint probability density matrix P, wijFor the element in row i and column j in the learnable convolution kernel W, fjIs the ith channel of the feature map, biThe bias term is learnable for the ith.
Further, the quasi-semantic enhancement network respectively calculates the losses of the quasi-semantic enhancement module and the full convolution network layer fusing the prior information of the object by using a cross entropy function, the final loss is obtained by weighting and summing the two losses, and the proportion of the two losses in the final loss is controlled by an exponential decay factor.
Further, the process of restoring the thinned segmentation result to the original resolution of the input image is realized by bilinear interpolation.
In a second aspect, the present invention also provides an apparatus comprising:
one or more processors;
a memory for storing one or more programs;
the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method for semantic segmentation of graph structure based neural networks.
In a third aspect, the present invention also provides a computer-readable storage medium for storing one or more computer programs, the one or more computer programs comprising program code for performing the above-mentioned method for semantic segmentation of a graph structure based neural network, when the computer program runs on a computer.
The invention has the beneficial effects that:
(1) the semantic enhancement module models the relevance between a channel and an object and the relevance between the channel and the channel, and learns the category semantic information which considers the intra-class consistency and the inter-class differentiability through the organic combination of the two relevance.
(2) The full convolution network layer fusing the object prior information is provided, the object prior information is acted on convolution kernel parameters by the full convolution network layer, and the operation can effectively improve the accuracy of the segmentation result.
(3) And a rough-to-fine segmentation strategy is used, and the CSENet layer and the CPFC layer are sequentially stacked, so that the rough segmentation result is gradually optimized, and the fine segmentation result is finally output as the final network output. Experiments show that the method can effectively improve the performance of the existing semantic segmentation network.
(4) The proposed network parameters are few, and the method can be applied to most of segmentation methods based on the deep neural network, so that the remarkable performance improvement is obtained.
Drawings
Fig. 1 is an overall framework structure of the deep neural network of the present invention.
FIG. 2 is a diagram of a class semantic enhancement module (CSE) according to the present invention.
Fig. 3 is a schematic diagram of a full convolution network layer (CPFC) for fusing object prior information according to the present invention.
Fig. 4 is an example of an image to be segmented input by the present invention.
Fig. 5 is an example of a coarse segmentation result generated by the present invention.
FIG. 6 is an example of the final output of the present invention.
FIG. 7 is a block diagram of a semantic segmentation apparatus based on a graph-structured neural network according to the present invention.
Detailed Description
The following describes embodiments of the present invention in further detail with reference to the accompanying drawings.
The invention provides a graph structure-based semantic segmentation task-oriented neural network, which specifically comprises the following steps:
1. problem description and variable definition
For the semantic segmentation problem with a total number of classes N, the goal is to assign the correct class label to each pixel of the input image. The existing standard deep neural network-based method is as follows: for a given input image I ∈ R3*H*WWhere H and W are the image height and width values, respectively. First, a picture I is input into a backbone network (e.g., a residual network) to generate a feature map F ∈ RC*h*wWherein C is the number of characteristic image channels, and h and w are height and width values after down sampling. Then, a convolution kernel ω ∈ R is usedN *C*1*1Performing convolution operation with the characteristic diagram to generate X0∈RN*h*w. Finally, the interpolation function and argmax function act on X respectively0Outputting the segmentation result Y0∈RH*W
Taking the picture to be segmented as an example in fig. 4, the picture includes "targets" of multiple categories, for example: pedestrians, roads, signs, etc. The goal of the semantic segmentation task is to assign the correct label to each type of "target" in the image. The present invention aims to further process the inaccurate segmentation result (i.e., the "coarse segmentation" result) to obtain a more accurate segmentation result (i.e., the "fine segmentation" result). Fig. 5 is an example of a rough segmentation result, and it can be observed that the neural network is not accurate in segmenting the sidewalk in the dashed frame at the lower left of the picture, and the error classification phenomenon exists for the direction board in the solid frame at the upper middle part of the picture. Fig. 6 shows the "fine segmentation" result outputted by the present invention, and it can be observed that in the segmentation result of the present invention, the left-bottom side of the "sidewalk" is more accurately segmented, and the phenomenon of misclassification of the "sign" is eliminated, which enhances the accuracy of the neural network in the segmentation task to a certain extent.
As shown in FIG. 1, the present invention proposes a coarse-to-fine segmentation network called semantic enhancement class network (CSENet), which can be flexibly inserted after the standard segmentation model to generate more accurate results, gradually refine the feature map F and coarse segmentationResults X0
Specifically, as shown in fig. 2, the semantic-like enhancement network proposed by the present invention is composed of a full convolution network layer (CPFC) in which a semantic-like enhancement module (CSE) fuses object prior information, and gradually refines the segmentation result by stacking n semantic-like enhancement networks in sequence. The CSE module reconstructs a characteristic diagram by using high-order information among channels; the CPFC module combines the class-channel relationship graph with the reconstructed feature graph to obtain a fine-grained segmentation result { Xi∈RN*HWI ═ 1, 2, 3 … n }, where X isiIndicating the result of the division of the ith cell.
2. Feature extraction network
Most of the semantic segmentation methods based on the deep neural network input an image to be segmented into a backbone network, extract picture features, and output a feature map with high dimensionality and semantic information for subsequent processing. The invention uses a more common and advanced performance Residual network (ResNet) [ He K, Zhang X, Ren S, et al. deep Residual learning for image registration [ C ]// Proceedings of the IEEE registration on computer vision and pattern registration.2016: 770-778 as a feature extraction network (i.e., a backbone network). The residual error network belongs to a convolutional neural network and mainly comprises a convolutional layer and a nonlinear layer. The network introduces a residual error structure in a classical convolutional neural network, solves the problems of gradient disappearance, gradient explosion, network degradation and the like to a certain extent, and improves the performance of the traditional convolutional neural network. The residual structure can be described by the following mathematical expression:
H(X)=F(X)+X
wherein, X is the output of an upper network; f represents a mapping function of the layer network to the input; h is the final output map for that layer network.
From the above formula, each layer of output of the residual network is the sum of the output after being mapped by the network parameters and the original input, i.e. the network parameters only need to fit the "residual" between the "input" and the "ideal output".
3. Description of the overall construction
The invention firstly introduces the whole structure of the proposed semantic enhancement network and explains the relation between the semantic enhancement network and other modules such as a backbone network. As shown in fig. 1, the overall flow of the graph structure-based semantic segmentation-oriented neural network proposed by the present invention can be described by the following mathematical formula:
F0=Backbone(I)
X0=FC(F0)
Fi,Xi=CSENeti(Fi-1,Xi-1)i=1,2,3…n
Yn=Argmax(Bilinear(Xn))
wherein, I is an input image to be segmented; backhaul represents the above Backbone network; f0Outputting a characteristic diagram for the backbone network; FC is used for FCN network and generates segmentation result X0A convolutional network layer of (a); CSENetiEnhancing the network for the ith class of semantics; fiAnd XiFeature maps and segmentation results output for the ith CSENet; xnThe segmentation result is output after the nth iteration; biliner represents Bilinear interpolation operation; argmax is the maximum value operation on the category dimension; y isnAnd outputting the final segmentation result for the network.
From the above formula, the proposed semantic information-like enhancement network takes the feature map and the rough segmentation result as input, and outputs the feature map with more clear semantic information-like and the more refined segmentation result. Specifically, the network first obtains the primitive feature graph F using the convolution layer in the backbone network and the FCN network0And preliminary segmentation result X0. Then the two are input into a time sequence network consisting of n CSENets to output a segmentation result X after n iterationsn. Finally, the result is carried out bilinear interpolation to restore the original resolution of the input image, the category with the highest confidence level is taken as the final category of each pixel, and the final category is output and recorded as Yn
The quasi-semantic enhancement network provided by the invention consists of two modules which are respectively a quasi-semantic enhancement module and a full convolution network layer fusing object prior information. The input and output of the two modules can be described by the following formula:
Fi,Pi=CSEi(Fi-1,Xi-1)
Xi=CPFCi(Fi,Pi)
wherein, CSEiAnd CPFCiThe i-th semantic enhancement module and the full convolution network layer fusing the prior information of the object are respectively.
As can be seen from the above formula, the CSE module takes the feature map and the rough segmentation result generated in the previous iteration (i.e. the i-1 st iteration) as input, and outputs a new feature map FiWhile outputting a joint probability density matrix Pi(ii) a CPFC layer with FiPx is used as input, and the segmentation result X after the ith round of iteration is outputi
4. CSE module
(1) Computing correlations between categories and feature maps
The invention defines the correlation between classes and feature graph channels as follows:
Figure BDA0003503699710000081
wherein, FiRepresenting a feature map in the ith dimension, XjRepresents the jth rough segmentation graph; t represents a transpose operation;
Figure BDA0003503699710000082
representing the summation of each element in the jth rough segmentation map; ri,jIs shown as FiAverage response value of the above jth class. Higher Ri,jIndicating that the ith dimension and the jth class are semantically more related and vice versa.
(2) Learning a joint probability density matrix P
First, the invention builds K independent graph structures to represent the interdependence between different dimensions of features. Their adjacency matrices are defined as follows:
Figure BDA0003503699710000083
wherein
Figure BDA0003503699710000084
The inner product operation is represented by the following operation,
Figure BDA0003503699710000085
is a learnable projection function, RiA vector for representing the strength of the relationship between the ith dimension of the characteristic diagram and the objects corresponding to the label numbers of various types,
Figure BDA0003503699710000086
is the edge weight between the ith and jth feature dimensions. Is higher
Figure BDA0003503699710000087
Is represented by FiAnd FjMore likely to contain the same class.
Secondly, the standard multi-head graph model is used for interacting and aggregating the class semantics on the mutually dependent dimensionality, and finally a joint probability density matrix is generated and defined as P belonging to RC*NThe mathematical expression is as follows:
P=σ(IIAkk)
wherein A iskFor the k-th adjacency matrix between feature map channels, R is the channel-object relationship matrix, psik∈RN*NA learnable parameter matrix that is the kth graph structure; sigma represents a sigmoid function; ith, j element P in matrix PijIs represented by FiRelated to the probability of class j.
(3) Feature reconstruction
Adjusting R using a joint probability density matrix P to further reconstruct a feature map
Figure BDA0003503699710000088
Figure BDA0003503699710000089
Subsequently, the residuals are subtracted using a convolution kernel with dimension 2C 1Network generated feature map F and
Figure BDA00035036997100000810
and fusing and outputting a new feature diagram F', wherein |, indicates a corresponding element multiplication operation.
5. CPFC layer
As shown in fig. 3, the CPFC layer uses the joint probability density matrix P output by the CSE module, the reconstructed new feature map F 'as input, and uses the joint probability density matrix P to multiply the learnable convolution kernel element by element, and then performs convolution operation with the feature map F', and outputs a fine segmentation result. The mathematical expression of the process is as follows:
Figure BDA00035036997100000811
wherein p isjiIs the element in the jth row and ith column in the joint probability density matrix P, wijFor the element in row i and column j in the learnable convolution kernel W, fjIs the ith channel of feature map F, biThe bias term is learnable for the ith.
6. Loss function
In the design of the loss function, the invention uses the cross entropy function to respectively calculate the loss L of the rough segmentation and the fine segmentationcAnd Lf. The mathematical expression of the cross entropy function is:
Figure BDA0003503699710000091
where x is the network input, p (x) is the desired output, q (x) is the actual output of the network, and H (p, q) is the cross entropy.
The final loss L can be illustrated by the following equation:
L=(a+γ)Lc+(b-γ+0.1)Lf
Figure BDA0003503699710000092
wherein L isc,LfCoarse and fine segmentation losses calculated for the above-mentioned use of cross entropy losses; l is a loss function of the final application of the network; a and b are constants, a is 0.4, and b is 0.9; iter is the current iteration number of the semantic enhancement network (namely the current semantic enhancement network); gamma is an exponential decay factor whose value is less than a threshold iteration number iterγThe time is exponentially attenuated from b to 0, and the design enables the network to pay more attention to the rough segmentation result in the initial training stage and pay more attention to the fine segmentation result in the later stage of the network.
7. Outputting the segmentation result
Obtaining X after n iterationsnThen, after bilinear interpolation and maximum value operation on class dimension are carried out on the segmentation result, the final segmentation result Y is obtainedn
Given point (x)1,y1),(x1,y2),(x2,y1),(x2,y2) And their values q11,q12,q21,q22The value q of the point to be found (x, y) is given by:
Figure BDA0003503699710000093
according to this rule, the invention works on X with dimensions N × hwnThe matrix with dimension N × HW is obtained by upsampling, where H and W are the image height value and width value, respectively, N is the total number of classes, and l represents the l-th class. Finally, the N x 1 dimensional vector corresponding to the pixel of the ith row and the jth column of the output after the nth iteration
Figure BDA0003503699710000094
Take its maximum value
Figure BDA0003503699710000095
As the final class of this pixel, namely:
Figure BDA0003503699710000096
Figure BDA0003503699710000097
obtained YnThe dimension is H x W, and the dimension is H x W,
Figure BDA0003503699710000098
the value of (d) represents the class label of the pixel in the ith row and the jth column.
The embodiment of the invention facing the automatic driving task is as follows:
(1) preparation work
Firstly, a data set required by an experiment needs to be prepared as Cityscapes, the data set has 5000 images of driving scenes in an urban environment, the data set is correctly divided, namely pedestrians, roads, signs and the like are successfully distinguished, and the data set can be effectively applied to an automatic driving task.
Next, the backbone network pre-training parameters are loaded (download link is https:// download. catalog. org/models/respet 50-19c 8. 357. pth).
(2) Setting the hyper-parameters, which mainly comprises the following hyper-parameters:
name of hyper-parameter Initial learning rate Void fraction a b epoch Batch size n
Value of 0.009 8 0.4 0.9 120 16 3
(3) And selecting any data set to train the network, and testing the accuracy of the network after the training is finished. In experiments where the backbone network was ResNet-50 and the dataset was ctyscapes, the original FCN network accuracy was 72.25% and after using the CSENet proposed by the present invention, the accuracy on the ctyscapes dataset was 75.18%.
Corresponding to the embodiment of the semantic segmentation method based on the graph structure neural network, the invention also provides an embodiment of a semantic segmentation device based on the graph structure neural network.
Referring to fig. 7, an embodiment of the present invention provides a semantic segmentation apparatus based on a graph structure neural network, which includes a memory and one or more processors, where the memory stores executable codes, and when the processors execute the executable codes, the semantic segmentation apparatus is configured to implement the semantic segmentation method based on a graph structure neural network in the foregoing embodiment.
The semantic segmentation apparatus based on graph structure neural network of the present invention can be applied to any device with data processing capability, such as a computer or other devices or apparatuses. The device embodiments may be implemented by software, or by hardware, or by a combination of hardware and software. The software implementation is taken as an example, and as a logical device, the device is formed by reading corresponding computer program instructions in the nonvolatile memory into the memory for running through the processor of any device with data processing capability. From a hardware aspect, as shown in fig. 7, the present invention is a hardware structure diagram of any device with data processing capability in which a semantic segmentation apparatus based on a graph structure neural network is located, except for the processor, the memory, the network interface, and the nonvolatile memory shown in fig. 7, in an embodiment, any device with data processing capability in which the apparatus is located may also include other hardware according to an actual function of the any device with data processing capability, which is not described again.
The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the invention. One of ordinary skill in the art can understand and implement it without inventive effort.
An embodiment of the present invention further provides a computer-readable storage medium, on which a program is stored, where the program, when executed by a processor, implements the graph structure neural network-based semantic segmentation method in the foregoing embodiments.
The computer readable storage medium may be an internal storage unit, such as a hard disk or a memory, of any data processing capability device described in any of the foregoing embodiments. The computer readable storage medium may also be any external storage device of a device with data processing capabilities, such as a plug-in hard disk, a Smart Media Card (SMC), an SD Card, a Flash memory Card (Flash Card), etc. provided on the device. Further, the computer readable storage medium may include both an internal storage unit and an external storage device of any data processing capable device. The computer-readable storage medium is used for storing the computer program and other programs and data required by the arbitrary data processing-capable device, and may also be used for temporarily storing data that has been output or is to be output.
The above-described embodiments are intended to illustrate rather than to limit the invention, and any modifications and variations of the present invention are within the spirit of the invention and the scope of the appended claims.

Claims (8)

1. A semantic segmentation method based on a graph structure neural network is characterized by comprising the following steps:
(1) acquiring an image to be segmented, and inputting the image into a residual error network for feature extraction; generating a primary segmentation result from the extracted feature map through a convolution network;
(2) inputting the preliminary segmentation result into a plurality of similar semantic enhancement networks which are connected in sequence, and gradually refining the segmentation result, wherein each similar semantic enhancement network comprises a similar semantic enhancement module and a full convolution network layer fusing object prior information; the quasi-semantic enhancement module takes the generated feature map and the segmentation result of the last quasi-semantic enhancement network as input, outputs a new feature map and a joint probability density matrix to a full convolution network layer of the prior information of the fusion object, and further outputs the segmentation result;
(3) and restoring the segmentation result after the last class semantic enhancement network is refined into the original resolution of the input image, and taking the class with the highest confidence as the final class of each pixel to obtain the image after semantic segmentation.
2. The method according to claim 1, wherein the class is a class of the graph-structured neural networkA similar semantic enhancement module in the semantic enhancement network converts the characteristic diagram and the segmentation result of the previous semantic enhancement network into matrixes and multiplies the matrixes to obtain an object-channel relation matrix, the relation matrixes are linearly mapped by using k learnable parameters, cosine similarity is calculated in pairs for each mapped vector to form k adjacent matrixes Ak(k=1,2,3),AkElement A in (A)i,jThe value of (a) can represent the correlation degree between a characteristic diagram channel i and a characteristic diagram channel j, and the larger the value of (b), the more compact the correlation of the two channels; interacting and aggregating semantic information on the dimension with close association by using a graph neural network to generate a joint probability density matrix; and multiplying the relation matrix and the joint probability density matrix element by element, then multiplying the multiplied relation matrix and the joint probability density matrix by a matrix converted by the segmentation result to obtain a reconstructed feature map, fusing the reconstructed feature map with the feature map generated by the last similar semantic enhancement network, and outputting the fused feature map.
3. The method for semantic segmentation based on the graph structure neural network according to claim 2, wherein the specific process of generating the joint probability density matrix is as follows: is defined as P ∈ RC*NWherein R isC*NThe vector space is a C x N dimensional real vector space, and the mathematical expression of the vector space is as follows:
P=σ(||Akk)
wherein psik∈RN*NA learnable parameter matrix that is the kth graph structure; sigma represents a sigmoid function; element P in matrix PijFeature graph F representing the ith dimension of the feature graphiRefers to the probability of the object with label number j.
4. The semantic segmentation method based on the graph structure neural network as claimed in claim 1, wherein the full convolution network layer fusing the prior information of the object specifically processes the feature map and the joint probability density matrix as follows:
Figure FDA0003503699700000011
wherein
Figure FDA0003503699700000012
For the fine segmentation result, C is the number of feature map channels, pjiIs the element in the jth row and ith column in the joint probability density matrix P, wijFor the element in row i and column j in the learnable convolution kernel W, fjIs the ith channel of the feature map, biThe bias term is learnable for the ith.
5. The semantic segmentation method based on the graph structure neural network according to claim 1, characterized in that the semantic-like enhancement network uses a cross entropy function to calculate losses of the semantic-like enhancement module and a full convolution network layer fusing prior information of an object, respectively, the final loss is obtained by weighted summation of the two losses, and the proportion of the two losses in the final loss is controlled by an exponential decay factor.
6. The semantic segmentation method based on the graph structure neural network as claimed in claim 1, wherein the process of restoring the refined segmentation result to the original resolution of the input image is realized by bilinear interpolation.
7. An apparatus, characterized in that the apparatus comprises:
one or more processors;
a memory for storing one or more programs;
the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method recited in any of claims 1-6.
8. A computer-readable storage medium storing one or more computer programs, the one or more computer programs comprising program code for performing the method for graph structure neural network based semantic segmentation of a neural network of any one of claims 1-6 above, when the computer program runs on a computer.
CN202210134177.5A 2022-02-14 2022-02-14 Semantic segmentation method and device based on graph structure neural network Pending CN114494701A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210134177.5A CN114494701A (en) 2022-02-14 2022-02-14 Semantic segmentation method and device based on graph structure neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210134177.5A CN114494701A (en) 2022-02-14 2022-02-14 Semantic segmentation method and device based on graph structure neural network

Publications (1)

Publication Number Publication Date
CN114494701A true CN114494701A (en) 2022-05-13

Family

ID=81481166

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210134177.5A Pending CN114494701A (en) 2022-02-14 2022-02-14 Semantic segmentation method and device based on graph structure neural network

Country Status (1)

Country Link
CN (1) CN114494701A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115436881A (en) * 2022-10-18 2022-12-06 兰州大学 Positioning method, system, computer equipment and readable storage medium
CN115601550A (en) * 2022-12-13 2023-01-13 深圳思谋信息科技有限公司(Cn) Model determination method, model determination device, computer equipment and computer-readable storage medium
CN116739992A (en) * 2023-05-17 2023-09-12 福州大学 Intelligent auxiliary interpretation method for thyroid capsule invasion

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115436881A (en) * 2022-10-18 2022-12-06 兰州大学 Positioning method, system, computer equipment and readable storage medium
CN115436881B (en) * 2022-10-18 2023-07-07 兰州大学 Positioning method, positioning system, computer equipment and readable storage medium
CN115601550A (en) * 2022-12-13 2023-01-13 深圳思谋信息科技有限公司(Cn) Model determination method, model determination device, computer equipment and computer-readable storage medium
CN116739992A (en) * 2023-05-17 2023-09-12 福州大学 Intelligent auxiliary interpretation method for thyroid capsule invasion
CN116739992B (en) * 2023-05-17 2023-12-22 福州大学 Intelligent auxiliary interpretation method for thyroid capsule invasion

Similar Documents

Publication Publication Date Title
Monga et al. Algorithm unrolling: Interpretable, efficient deep learning for signal and image processing
He et al. Knowledge adaptation for efficient semantic segmentation
Jampani et al. Superpixel sampling networks
Zhou et al. D-LinkNet: LinkNet with pretrained encoder and dilated convolution for high resolution satellite imagery road extraction
Zhang et al. Deep gated attention networks for large-scale street-level scene segmentation
Mao et al. Poseur: Direct human pose regression with transformers
CN114494701A (en) Semantic segmentation method and device based on graph structure neural network
Fu et al. Contextual deconvolution network for semantic segmentation
Li et al. Diffusion Models for Image Restoration and Enhancement--A Comprehensive Survey
Seo et al. Progressive attention networks for visual attribute prediction
CN115953665B (en) Target detection method, device, equipment and storage medium
Dong et al. BCNet: Bidirectional collaboration network for edge-guided salient object detection
Zhang et al. Multi-focus image fusion based on non-negative sparse representation and patch-level consistency rectification
CN114419406A (en) Image change detection method, training method, device and computer equipment
Ding et al. SAB Net: A semantic attention boosting framework for semantic segmentation
Chong et al. High-order Markov random field as attention network for high-resolution remote-sensing image compression
Wu et al. Hprn: Holistic prior-embedded relation network for spectral super-resolution
Hu et al. Lightweight single image deraining algorithm incorporating visual saliency
Su et al. Physical model and image translation fused network for single-image dehazing
Quan et al. Deep learning-based image and video inpainting: A survey
Yao et al. SSNet: A novel transformer and CNN hybrid network for remote sensing semantic segmentation
CN117437423A (en) Weak supervision medical image segmentation method and device based on SAM collaborative learning and cross-layer feature aggregation enhancement
Wang et al. Single image rain removal via cascading attention aggregation network on challenging weather conditions
Nguyen et al. A novel multi-branch wavelet neural network for sparse representation based object classification
Bertels et al. Convolutional neural networks for medical image segmentation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination