CN111860517A - Semantic segmentation method under small sample based on decentralized attention network - Google Patents

Semantic segmentation method under small sample based on decentralized attention network Download PDF

Info

Publication number
CN111860517A
CN111860517A CN202010601796.1A CN202010601796A CN111860517A CN 111860517 A CN111860517 A CN 111860517A CN 202010601796 A CN202010601796 A CN 202010601796A CN 111860517 A CN111860517 A CN 111860517A
Authority
CN
China
Prior art keywords
image
layer
segmented
network
semantic segmentation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010601796.1A
Other languages
Chinese (zh)
Other versions
CN111860517B (en
Inventor
张磊
李欣
甄先通
常峰贵
简治平
左利云
胥亮
李镇昌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Gaitech Robotics Technology Co ltd
Guangdong University of Petrochemical Technology
Original Assignee
Shandong Gaitech Robotics Technology Co ltd
Guangdong University of Petrochemical Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Gaitech Robotics Technology Co ltd, Guangdong University of Petrochemical Technology filed Critical Shandong Gaitech Robotics Technology Co ltd
Priority to CN202010601796.1A priority Critical patent/CN111860517B/en
Publication of CN111860517A publication Critical patent/CN111860517A/en
Application granted granted Critical
Publication of CN111860517B publication Critical patent/CN111860517B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Algebra (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a small sample lower semantic segmentation method based on a decentralized attention network, which belongs to the technical field of semantic segmentation, and provides a decentralized attention network mechanism which is used for a small sample semantic segmentation task, can activate more pixel points belonging to an object foreground, and can establish a more stable association relation between a support image and an image to be segmented, so that the support image and the image to be segmented have better generalization performance when the consistency of the shape and the like of the support image and the image to be segmented is poor, simultaneously, multi-scale attention information fusion is used for the semantic segmentation task, semantic information obtained from a deep network multilayer is subjected to decentralized attention network mechanism, and by utilizing upper sampling and residual error network fusion, semantic segmentation is carried out on the fused foreground, thereby increasing the robustness when the object scale changes, the system has better performance.

Description

Semantic segmentation method under small sample based on decentralized attention network
Technical Field
The invention relates to the technical field of semantic segmentation, in particular to a semantic segmentation method under a small sample based on a decentralized attention network.
Background
Deep learning has been widely used in semantic segmentation of computer vision, but in practical semantic segmentation applications, the performance of a deep model obtained by learning is affected due to less labeled support data. At present, prototype-based methods are popular, where a prototype refers to a representation of a class of objects, and in the deep learning framework, the prototype is an output generated by a deep neural network based on the labeled information of the support image and its corresponding object. In other words, a prototype is an associative mapping between the input support image and the object class. For semantic segmentation under small samples, it is basically based on a prototype method, where the representation of the prototype has multiple forms. The characteristic of the support image is pooled to be used as a prototype, and the prototype and the image characteristic to be segmented are used together to generate segmentation mapping. A prototype representation is extracted from the support image using masked mean pooling, and the segmentation map is predicted by calculating the cosine distance between the prototype representation and the image to be segmented. However, these methods are all prototypes that are fixed and thus lack generalization. Besides, through a drawing attention mechanism, a pixel-to-pixel connection relation between a support image and an image to be segmented is established, but in the method, due to deviation in pixel competition, only a few parts of foreground objects in the support image are used for establishing the mapping relation, and therefore the information transfer from the support image to the image to be segmented is limited to a great extent.
Semantic segmentation in computer vision refers to the separation of objects of interest in an image from the background. The current general method is to extract a global description from the support image (labeled image) as a prototype to help the image to be segmented to complete the semantic segmentation task. However, this method is difficult to achieve good results in small samples, where a simple global vector representation of the prototype may have bias and lack of generalization capability. In the other method, a connection relation between a support image and an image to be segmented is established through an attention machine mechanism, but in the attention machine mechanism, due to deviation in pixel point selection, only a small part of foreground objects in the support image are easy to use to establish the mapping relation, so that the information is influenced to be transmitted from the support image to the image to be segmented.
Disclosure of Invention
1. Technical problem to be solved
Aiming at the problems in the prior art, the invention aims to provide a semantic segmentation method under a small sample based on a distraction network, provides a distraction network mechanism, is used for the semantic segmentation task of the small sample, can activate more pixel points belonging to the object foreground, and can establish more stable incidence relation between the support image and the image to be segmented, so that the support image and the image to be segmented have better generalization performance when the consistency of the shapes and the like of the support image and the image to be segmented is poor, meanwhile, the multi-scale attention information fusion is used for the semantic segmentation task, the semantic information obtained from the deep network multi-layer is subjected to the decentralized attention network mechanism, and by utilizing the upsampling and residual error network fusion, and semantic segmentation is carried out on the fused result, so that the robustness of the foreground object in scale change is improved, and the system has better performance.
2. Technical scheme
In order to solve the above problems, the present invention adopts the following technical solutions.
A semantic segmentation method under a small sample based on a distraction network comprises the following steps: the method comprises the following steps of training a data set, a deep neural network, a frame parameter phi, a support image and a target mark mask image of the support image, wherein the training data set comprises the mask image with each image having a segmentation mark, the deep neural network adopts a resnet101 network structure and parameters obtained by training on ImageNet, the frame parameter phi is a convolutional layer parameter for obtaining k and v and a convolutional layer parameter in a decoder, and the semantic segmentation learning process comprises the following steps:
s1, randomly extracting an image and a label thereof as an image to be segmented for all tasks of the training data set, and taking the rest images as a support image set;
s2, initializing the convolution layer parameters of k and v and the convolution layer parameters in the decoder randomly, namely phi;
s3, utilizing resnet101 network to generate three-layer characteristic representation for image to be segmented and support image
Figure BDA0002558785010000031
The 1 st and 2 nd block outputs of the resnet101 network are 1 th layer, the third block output is 2 th layer, and the 4 th block output is 3 th layer;
s4, for each l, from 1 to 3, the following operations are carried out:
S4.1, pair
Figure BDA0002558785010000032
And
Figure BDA0002558785010000033
two convolutional layers (phi parameters) are used to generate corresponding key word and value pairs, respectively { k }q l,vq lAnd ks l,vs l};
S4.2, calculating an A matrix, wherein the elements are as follows:
Figure BDA0002558785010000034
s4.3, averaging the matrix A according to rows to generate As
S4.4, to AsArranging the middle pixels in descending order to obtain the position e corresponding to the pixel jj
S4.5, adding AsThe weight of middle pixel j is adjusted according to the following formula:
Figure BDA0002558785010000035
wherein H and W are the height and width of the image;
s4.6, reconstructing the A matrix as
Figure BDA0002558785010000036
Wherein each element is as follows:
Figure BDA0002558785010000037
s4.7, mixing
Figure BDA0002558785010000038
Normalized by softmax layer as follows:
Figure BDA0002558785010000039
s4.8, generating a distraction map for the ith position according to the following mode:
Figure BDA00025587850100000310
wherein | | | represents a concatenation operation;
s4.9, repeat S2.1 to l ═ 3;
s5, mixing
Figure BDA0002558785010000041
Is subjected to bilinear upsampling and is summed by a residual module
Figure BDA0002558785010000042
The attention features of the system are connected in series, the result after the series connection is subjected to bilinear upsampling, and the bilinear upsampling is summed by a residual error module
Figure BDA0002558785010000043
The attention characteristics of (1) are connected in series, and a dense expression is obtained through a convolutional layer (phi parameter);
s6, performing dense expression through a softmax layer, and obtaining a final segmentation result for each pixel point, wherein the foreground is the background;
s7, comparing the image with a real segmentation mask image, calculating a cross entropy, and solving the gradient of the cross entropy to phi;
S8, updating phi;
s9, loop back to S1, until convergence.
3. Further, the support image is in a one-shot condition, the support image is 1 image, and the semantic segmentation process includes the following steps:
s1, utilizing resnet101 network to generate three-layer characteristic representation for image to be segmented and support image
Figure BDA0002558785010000044
The 1 st and 2 nd block outputs of the resnet101 network are 1 th layer, the third block output is 2 th layer, and the 4 th block output is 3 th layer;
s2, for each l, from 1 to 3, the following operations are carried out:
s2.1, pair
Figure BDA0002558785010000045
And
Figure BDA0002558785010000046
generating corresponding key word and value pair by using two convolution layers, wherein the key word and the value pair are respectively { kq l,vq lAnd ks l,vs l};
S2.2, calculating an A matrix, wherein the elements
Figure BDA0002558785010000047
S2.3, averaging the matrix A according to rows to generate As;
S2.4, to AsArranging the middle pixels in descending order to obtain the position e corresponding to the pixel jj
S2.5, adding AsThe weight of middle pixel j is adjusted according to the following formula:
Figure BDA0002558785010000048
wherein H and W are the height and width of the image;
s2.6, reconstructing an A matrix, wherein each element is as follows:
Figure BDA0002558785010000051
s2.7, normalizing by a softmax layer as follows:
Figure BDA0002558785010000052
s2.8, generating a distraction map for the ith position according to the following mode:
Figure BDA0002558785010000053
wherein | | | represents a concatenation operation;
s2.9, repeat S2.1 to l ═ 3;
s3, mixing
Figure BDA0002558785010000054
Is subjected to bilinear upsampling and is summed by a residual module
Figure BDA0002558785010000055
The attention characteristics are connected in series, the result after the series connection is subjected to bilinear upsampling and is subjected to a residual error moduleWhich is and
Figure BDA0002558785010000056
the attention characteristics of (1) are connected in series, and a dense expression is obtained after passing through a convolution layer;
and S4, the dense expression passes through the softmax layer, and the final segmentation result, namely the foreground or the background, is obtained for each pixel point.
Further, f isq lComprising two branch structures, said fq lThe next branch is the image to be segmented, which is output corresponding to different convolution layers through a multilayer convolution neural network to cover the information characteristic representation of different semantic levels, wherein fq lThe upper branch is a support image and a corresponding mask image (namely label), and the feature representation f covering different semantic level information is generated through different layers of the convolutional neural network (the convolutional neural network of the next branch has the same parameters)s l
Further, f iss lAnd performing dot multiplication on the labeling information to obtain information and the representation of the image to be segmented of the corresponding layer, wherein the obtained information and the corresponding layer comprise a DGA (differential global feature analysis) dispersed graph attention mechanism and an RFU (cross-domain fuzzy iron) enhanced fusion unit, and the attention mechanism generated in the DGA dispersed graph attention mechanism link is fused with the RFU enhanced fusion unit link through the attention mechanism corresponding to different layers to generate the final image to be segmented of the label.
Further, the DGA decentralized graphics attention mechanism inputs a representation f containing different levels of semantic information obtained from a support image and an image to be segmentedsAnd fqThe output is a distraction representation faSaid fsAnd fqRespectively connecting with channels and collaterals two convolution layers, wherein the two convolution layers map the two convolution layers into a space represented by a key word k and a numerical value v, the k is used for measuring the distance between an image to be segmented and a support image, the v is used for storing detail information extracted from feature mapping, and the k is used for extracting the detail information from feature mappingsAnd kqA is obtained by inner product calculation, wherein the calculation is as follows:
Figure BDA0002558785010000061
further, A isi,jExpressing the relevance between the pixel i in the image to be segmented and the pixel j in the support image, and averaging the A matrix according to rows to obtain the A matrixs
Furthermore, the RFU enhancement fusion unit includes an upsampling mechanism, the upsampling mechanism employs bilinear upsampling, the upsampling mechanism connects attention characteristics of the two layers in series through a residual module, the residual module generates a dense representation through a convolution layer, the upsampling mechanism outputs the final RFU enhancement fusion unit to a convolution layer and a softmax layer, and whether each pixel is a foreground or a background is separately determined.
3. Advantageous effects
Compared with the prior art, the invention has the advantages that:
(1) the invention provides a decentralized attention network mechanism, which is used for a small sample semantic segmentation task, can activate more pixel points belonging to an object foreground, can establish a more stable incidence relation between a support image and an image to be segmented, has better generalization performance when the consistency of the shapes and the like of the support image and the image to be segmented is poor, simultaneously fuses multi-scale attention information for the semantic segmentation task, performs semantic segmentation on a fused result by utilizing upsampling and residual network fusion through the decentralized attention network mechanism and semantic information obtained from multiple layers of a depth network, increases the robustness when the foreground object scale changes, and enables a system to have better performance.
Drawings
FIG. 1 is a schematic view of a one-shot situation according to the present invention;
FIG. 2 is a graphical illustration of the DGA dispersion pattern attention machine of the present invention;
FIG. 3 is an exemplary graph of some of the experimental results of the present method of the present invention.
Detailed Description
The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention; it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments, and all other embodiments obtained by those skilled in the art without any inventive work are within the scope of the present invention.
In the description of the present invention, it should be noted that the terms "upper", "lower", "inner", "outer", "top/bottom", and the like indicate orientations or positional relationships based on those shown in the drawings, and are only for convenience of description and simplification of description, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and thus should not be construed as limiting the present invention. Furthermore, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
In the description of the present invention, it should be noted that, unless otherwise specifically stated or limited, the terms "mounted," "disposed," "sleeved/connected," "connected," and the like are used in a broad sense, and for example, "connected" may be a fixed connection, a detachable connection, an integral connection, a mechanical connection, an electrical connection, a direct connection, an indirect connection through an intermediate medium, and a communication between two elements.
Example 1:
referring to fig. 1-3, a method for semantic segmentation under a small sample based on a distraction network includes: the method comprises the following steps of training a data set, a deep neural network, a frame parameter phi, a support image and a target mark mask image of the support image, wherein the training data set comprises the mask image of which each image is provided with a segmentation mark, the deep neural network adopts a resnet101 network structure and parameters obtained by training on ImageNet, the frame parameter phi is a convolutional layer parameter for obtaining k and v and a convolutional layer parameter in a decoder, and the semantic segmentation learning process comprises the following steps:
s1, randomly extracting an image and a label thereof as an image to be segmented for all tasks of the training data set, and taking the rest images as a support image set;
s2, initializing the convolution layer parameters of k and v and the convolution layer parameters in the decoder randomly, namely phi;
s3, utilizing resnet101 network to generate three-layer characteristic representation for image to be segmented and support image
Figure BDA0002558785010000081
The 1 st and 2 nd block outputs of the resnet101 network are 1 th layer, the third block output is 2 th layer, and the 4 th block output is 3 th layer;
s4, for each l, from 1 to 3, the following operations are carried out:
s4.1, pair
Figure BDA0002558785010000082
And
Figure BDA0002558785010000083
two convolutional layers (phi parameters) are used to generate corresponding key word and value pairs, respectively { k } q l,vq lAnd ks l,vs l};
S4.2, calculating an A matrix, wherein the elements are as follows:
Figure BDA0002558785010000084
s4.3, averaging the matrix A according to rows to generate As
S4.4, to AsArranging the middle pixels in descending order to obtain the position e corresponding to the pixel jj
S4.5, adding AsThe weight of middle pixel j is adjusted according to the following formula:
Figure BDA0002558785010000085
wherein H and W are the height and width of the image;
s4.6, reconstructing the A matrix as
Figure BDA0002558785010000086
Wherein each element is as follows:
Figure BDA0002558785010000087
s4.7, mixing
Figure BDA0002558785010000088
Normalized by softmax layer as follows:
Figure BDA0002558785010000091
s4.8, generating a distraction map for the ith position according to the following mode:
Figure BDA0002558785010000092
where | represents a concatenation operation,
s4.9, repeat S2.1 to l ═ 3;
s5, mixing
Figure BDA0002558785010000093
Is subjected to bilinear upsampling and is summed by a residual module
Figure BDA0002558785010000094
The attention features of the system are connected in series, the result after the series connection is subjected to bilinear upsampling, and the bilinear upsampling is summed by a residual error module
Figure BDA0002558785010000095
The attention characteristics of (1) are connected in series, and a dense expression is obtained through a convolutional layer (phi parameter);
s6, performing dense expression through a softmax layer, and obtaining a final segmentation result for each pixel point, wherein the foreground is the background;
s7, comparing the image with a real segmentation mask image, calculating a cross entropy, and solving the gradient of the cross entropy to phi;
s8, updating phi;
s9, loop back to S1, until convergence.
Referring to fig. 1-3, in a one-shot situation, the support image is 1 image, and its semantics
The segmentation process comprises the following steps:
s1, utilizing resnet101 network to generate three-layer characteristic representation for image to be segmented and support image
Figure BDA0002558785010000096
The 1 st and 2 nd block outputs of the resnet101 network are 1 th layer, the third block output is 2 th layer, and the 4 th block output is 3 th layer;
s2, for each l, from 1 to 3, carrying out the following operations;
s2.1, pair
Figure BDA0002558785010000097
And
Figure BDA0002558785010000098
generating corresponding key word and value pair by using two convolution layers, wherein the key word and the value pair are respectively { kq l,vq lAnd ks l,vs l};
S2.2, calculating an A matrix, wherein the elements
Figure BDA0002558785010000101
S2.3, averaging the matrix A according to rows to generate As;
S2.4, to AsArranging the middle pixels in descending order to obtain the position e corresponding to the pixel jj
S2.5, adding AsThe weight of the middle pixel j is adjusted according to the following formula;
Figure BDA0002558785010000102
wherein H and W are the height and width of the image;
s2.6, reconstructing an A matrix, wherein each element is as follows:
Figure BDA0002558785010000103
s2.7, normalizing by a softmax layer as follows:
Figure BDA0002558785010000104
s2.8, generating a distraction map for the ith position according to the following mode:
Figure BDA0002558785010000105
wherein | | | represents a concatenation operation;
s2.9, repeat S2.1 to l ═ 3;
s3, mixing
Figure BDA0002558785010000106
Is subjected to bilinear upsampling and is summed by a residual module
Figure BDA0002558785010000107
The attention features of the system are connected in series, the result after the series connection is subjected to bilinear upsampling, and the bilinear upsampling is summed by a residual error module
Figure BDA0002558785010000108
The attention characteristics of (1) are connected in series, and a dense expression is obtained after passing through a convolution layer;
and S4, the dense expression passes through the softmax layer, and the final segmentation result, namely the foreground or the background, is obtained for each pixel point.
Please refer to FIGS. 1-3, fq lComprising two branched structures, fq lThe next branch is the image to be segmented, which is passed through a multi-layer convolutional neural network,corresponding outputs of different convolutional layers covering information characteristic representation of different semantic levels, fq lThe upper branch is a support image and a corresponding mask image (namely label), and the feature representation f covering different semantic level information is generated through different layers of the convolutional neural network (the convolutional neural network of the next branch has the same parameters)s lFurther, fs lAnd performing dot multiplication on the labeling information to obtain information and representation of the image to be segmented of the corresponding layer, wherein the obtained information and the corresponding layer comprise a DGA (differential global feature analysis) dispersed graph attention mechanism and an RFU (cross-domain fuzzy iron) enhanced fusion unit, and the attention mechanism generated in the DGA dispersed graph attention mechanism link is fused with the RFU enhanced fusion unit link through the attention mechanism corresponding to different layers to generate the final labeled image to be segmented.
Referring to FIGS. 1-3, the input to the DGA scatter plot attention mechanism is a representation f containing different levels of semantic information derived from the support image and the image to be segmented sAnd fqThe output is a distraction representation fa,fsAnd fqTwo convolution layers of channels and collaterals are respectively mapped into a space with key words k representing and numerical values representing v, k is used for measuring the distance between an image to be segmented and a support image, v is used for storing detail information extracted from feature mapping, and ksAnd kqA is obtained by inner product calculation, wherein the calculation is as follows:
Figure BDA0002558785010000111
Ai,jexpressing the relevance between the pixel i in the image to be segmented and the pixel j in the support image, and averaging the A matrix according to rows to obtain the A matrixsThe RFU enhancement fusion unit comprises an upsampling mechanism, the upsampling mechanism adopts bilinear upsampling, the upsampling mechanism connects attention characteristics of the two layers in series through a residual module, one residual module generates dense representation through one convolution layer, the upsampling mechanism outputs the final RFU enhancement fusion unit to one convolution layer and one softmax layer, and whether each pixel is a foreground or a background is judged independently.
The invention provides a decentralized attention network mechanism, which is used for a small sample semantic segmentation task, can activate more pixel points belonging to an object foreground, can establish a more stable incidence relation between a support image and an image to be segmented, has better generalization performance when the consistency of the shapes and the like of the support image and the image to be segmented is poor, simultaneously fuses multi-scale attention information for the semantic segmentation task, performs semantic segmentation on a fused result by utilizing upsampling and residual network fusion through the decentralized attention network mechanism and semantic information obtained from multiple layers of a depth network, increases the robustness when the foreground object scale changes, and enables a system to have better performance.
The foregoing is only a preferred embodiment of the present invention; the scope of the invention is not limited thereto. Any person skilled in the art should be able to cover the technical scope of the present invention by equivalent or modified solutions and modifications within the technical scope of the present invention.

Claims (7)

1. A semantic segmentation method under a small sample based on a distraction network comprises the following steps: the image recognition method comprises the following steps of training a data set, a deep neural network, a frame parameter phi, a support image and a target labeling mask image supporting the image, wherein the training data set comprises the mask image with each image being segmented and labeled, the deep neural network adopts a resnet101 network structure and parameters obtained by training on ImageNet, and the frame parameter phi is a convolutional layer parameter for obtaining k and v and a convolutional layer parameter in a decoder, and is characterized in that: the semantic segmentation learning process comprises the following steps:
s1, randomly extracting an image and a label thereof as an image to be segmented for all tasks of the training data set, and taking the rest images as a support image set;
s2, initializing the convolution layer parameters of k and v and the convolution layer parameters in the decoder randomly, namely phi;
S3, utilizing resnet101 network to generate three-layer characteristic representation for image to be segmented and support image
Figure FDA0002558785000000011
The 1 st and 2 nd block outputs of the resnet101 network are 1 th layer, the third block output is 2 th layer, and the 4 th block output is 3 th layer;
s4, for each l, from 1 to 3, the following operations are carried out:
s4.1, pair
Figure FDA0002558785000000012
And
Figure FDA0002558785000000013
two convolutional layers (phi parameters) are used to generate corresponding key word and value pairs, respectively { k }q l,vq lAnd ks l,vs l};
S4.2, calculating an A matrix, wherein the elements are as follows:
Figure FDA0002558785000000014
s4.3, averaging the matrix A according to rows to generate As
S4.4, to AsArranging the middle pixels in descending order to obtain the position e corresponding to the pixel jj
S4.5, adding AsThe weight of middle pixel j is adjusted according to the following formula:
Figure FDA0002558785000000015
where H and W are the height and width of the image.
S4.6, reconstructing the A matrix as
Figure FDA0002558785000000016
Wherein each element is as follows:
Figure FDA0002558785000000021
s4.7, mixing
Figure FDA0002558785000000022
Normalized by softmax layer as follows:
Figure FDA0002558785000000023
s4.8, generating a distraction map for the ith position according to the following mode:
Figure FDA0002558785000000024
wherein | | | represents a concatenation operation;
s4.9, repeat S2.1 to l ═ 3;
s5, mixing
Figure FDA0002558785000000025
Is subjected to bilinear upsampling and is summed by a residual module
Figure FDA0002558785000000026
The attention features of the system are connected in series, the result after the series connection is subjected to bilinear upsampling, and the bilinear upsampling is summed by a residual error module
Figure FDA0002558785000000027
The attention characteristics of (1) are connected in series, and a dense expression is obtained through a convolutional layer (phi parameter);
S6, performing dense expression through a softmax layer, and obtaining a final segmentation result for each pixel point, wherein the foreground is the background;
s7, comparing the image with a real segmentation mask image, calculating a cross entropy, and solving the gradient of the cross entropy to phi;
s8, updating phi;
s9, loop back to S1, until convergence.
2. The method of claim 1, wherein the semantic segmentation under small samples based on the distraction network comprises: the support image is in a one-shot condition, the support image is 1 image, and the semantic segmentation process comprises the following steps:
s1: generation of three-layer feature representation for images to be segmented and support images using resnet101 network
Figure FDA0002558785000000028
The 1 st and 2 nd block outputs of the resnet101 network are 1 th layer, the third block output is 2 th layer, and the 4 th block output is 3 th layer;
s2, for each l, from 1 to 3, the following operations are carried out:
s2.1, pair
Figure FDA0002558785000000031
And
Figure FDA0002558785000000032
generating corresponding key word and value pair by using two convolution layers, wherein the key word and the value pair are respectively { kq l,vq lAnd ks l,vs l};
S2.2, calculating an A matrix, wherein the elements
Figure FDA0002558785000000033
S2.3, averaging the matrix A according to rows to generate As;
S2.4, to AsArranging the middle pixels in descending order to obtain the position e corresponding to the pixel jj
S2.5, adding AsThe weight of middle pixel j is adjusted according to the following formula:
Figure FDA0002558785000000034
wherein H and W are the height and width of the image;
S2.6, reconstructing an A matrix, wherein each element is as follows:
Figure FDA0002558785000000035
s2.7, normalizing by a softmax layer as follows:
Figure FDA0002558785000000036
s2.8, generating a distraction map for the ith position according to the following mode:
Figure FDA0002558785000000037
wherein | | | represents a concatenation operation;
s2.9, repeat S2.1 to l ═ 3;
s3, mixing
Figure FDA0002558785000000038
Is subjected to bilinear upsampling and is summed by a residual module
Figure FDA0002558785000000039
The attention features of the system are connected in series, the result after the series connection is subjected to bilinear upsampling, and the bilinear upsampling is summed by a residual error module
Figure FDA00025587850000000310
The attention characteristics of (1) are connected in series, and a dense expression is obtained after passing through a convolution layer;
and S4, the dense expression passes through the softmax layer, and the final segmentation result, namely the foreground or the background, is obtained for each pixel point.
3. The method of claim 1, wherein the semantic segmentation under small samples based on the distraction network comprises: f isq lComprising two branch structures, said fq lThe next branch is the image to be segmented, which is passed through the multi-layer convolutionOutputting corresponding to different convolution layers through a network, covering information characteristic representation of different semantic levels, wherein f isq lThe upper branch is a support image and a corresponding mask image (namely label), and the feature representation f covering different semantic level information is generated through different layers of the convolutional neural network (the convolutional neural network of the next branch has the same parameters) s l
4. The method of claim 1, wherein the semantic segmentation under small samples based on the distraction network comprises: f iss lAnd performing dot multiplication on the labeling information to obtain information and the representation of the image to be segmented of the corresponding layer, wherein the obtained information and the corresponding layer comprise a DGA (differential global feature analysis) dispersed graph attention mechanism and an RFU (cross-domain fuzzy iron) enhanced fusion unit, and the attention mechanism generated in the DGA dispersed graph attention mechanism link is fused with the RFU enhanced fusion unit link through the attention mechanism corresponding to different layers to generate the final image to be segmented of the label.
5. The method of claim 4, wherein the semantic segmentation under small samples based on the distraction network comprises: the DGA dispersed graph attention mechanism inputs a representation f containing different levels of semantic information obtained from a support image and an image to be segmentedsAnd fqThe output is a distraction representation faSaid fsAnd fqRespectively connecting with channels and collaterals two convolution layers, wherein the two convolution layers map the two convolution layers into a space represented by a key word k and a numerical value v, the k is used for measuring the distance between an image to be segmented and a support image, the v is used for storing detail information extracted from feature mapping, and the k is used for extracting the detail information from feature mapping sAnd kqA is obtained by inner product calculation, wherein the calculation is as follows:
Figure FDA0002558785000000041
6. a substrate according to claim 5The semantic segmentation method under the small sample of the distraction network is characterized in that: a is describedi,jExpressing the relevance between the pixel i in the image to be segmented and the pixel j in the support image, and averaging the A matrix according to rows to obtain the A matrixs
7. The method of claim 1, wherein the semantic segmentation under small samples based on the distraction network comprises: the RFU enhancement fusion unit comprises an upsampling mechanism, the upsampling mechanism adopts bilinear upsampling, the upsampling mechanism connects attention characteristics of the two layers in series through a residual module, the residual module generates dense representation through a convolution layer, the upsampling mechanism outputs the final RFU enhancement fusion unit to the convolution layer and a softmax layer, and whether each pixel is a foreground or a background is judged independently.
CN202010601796.1A 2020-06-28 2020-06-28 Semantic segmentation method under small sample based on distraction network Active CN111860517B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010601796.1A CN111860517B (en) 2020-06-28 2020-06-28 Semantic segmentation method under small sample based on distraction network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010601796.1A CN111860517B (en) 2020-06-28 2020-06-28 Semantic segmentation method under small sample based on distraction network

Publications (2)

Publication Number Publication Date
CN111860517A true CN111860517A (en) 2020-10-30
CN111860517B CN111860517B (en) 2023-07-25

Family

ID=72988651

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010601796.1A Active CN111860517B (en) 2020-06-28 2020-06-28 Semantic segmentation method under small sample based on distraction network

Country Status (1)

Country Link
CN (1) CN111860517B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112419320A (en) * 2021-01-22 2021-02-26 湖南师范大学 Cross-modal heart segmentation method based on SAM and multi-layer UDA
CN112819073A (en) * 2021-02-01 2021-05-18 上海明略人工智能(集团)有限公司 Classification network training method, image classification device and electronic equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110675330A (en) * 2019-08-12 2020-01-10 广东石油化工学院 Image rain removing method of encoding-decoding network based on channel level attention mechanism
CN110705340A (en) * 2019-08-12 2020-01-17 广东石油化工学院 Crowd counting method based on attention neural network field
CN110796105A (en) * 2019-11-04 2020-02-14 中国矿业大学 Remote sensing image semantic segmentation method based on multi-modal data fusion
CN110929696A (en) * 2019-12-16 2020-03-27 中国矿业大学 Remote sensing image semantic segmentation method based on multi-mode attention and self-adaptive fusion
CN111127493A (en) * 2019-11-12 2020-05-08 中国矿业大学 Remote sensing image semantic segmentation method based on attention multi-scale feature fusion
CN111210432A (en) * 2020-01-12 2020-05-29 湘潭大学 Image semantic segmentation method based on multi-scale and multi-level attention mechanism
CN111275688A (en) * 2020-01-19 2020-06-12 合肥工业大学 Small target detection method based on context feature fusion screening of attention mechanism

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110675330A (en) * 2019-08-12 2020-01-10 广东石油化工学院 Image rain removing method of encoding-decoding network based on channel level attention mechanism
CN110705340A (en) * 2019-08-12 2020-01-17 广东石油化工学院 Crowd counting method based on attention neural network field
CN110796105A (en) * 2019-11-04 2020-02-14 中国矿业大学 Remote sensing image semantic segmentation method based on multi-modal data fusion
CN111127493A (en) * 2019-11-12 2020-05-08 中国矿业大学 Remote sensing image semantic segmentation method based on attention multi-scale feature fusion
CN110929696A (en) * 2019-12-16 2020-03-27 中国矿业大学 Remote sensing image semantic segmentation method based on multi-mode attention and self-adaptive fusion
CN111210432A (en) * 2020-01-12 2020-05-29 湘潭大学 Image semantic segmentation method based on multi-scale and multi-level attention mechanism
CN111275688A (en) * 2020-01-19 2020-06-12 合肥工业大学 Small target detection method based on context feature fusion screening of attention mechanism

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
XIANTONG ZHEN ET AL.: "One-Shot Learning for Semantic Segmentation", 《ARXIV》 *
和超等: "多尺度特征融合工件目标语义分割", 《中国图象图形学报》 *
蔡雨等: "基于特征融合的实时语义分割算法", 《激光与光电子学进展》 *
高丹等: "A-PSPNet:一种融合注意力机制的PSPNet图像语义分割模型", 《中国电子科学研究院学报》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112419320A (en) * 2021-01-22 2021-02-26 湖南师范大学 Cross-modal heart segmentation method based on SAM and multi-layer UDA
CN112419320B (en) * 2021-01-22 2021-04-27 湖南师范大学 Cross-modal heart segmentation method based on SAM and multi-layer UDA
CN112819073A (en) * 2021-02-01 2021-05-18 上海明略人工智能(集团)有限公司 Classification network training method, image classification device and electronic equipment

Also Published As

Publication number Publication date
CN111860517B (en) 2023-07-25

Similar Documents

Publication Publication Date Title
CN108460338B (en) Human body posture estimation method and apparatus, electronic device, storage medium, and program
CN109345575B (en) Image registration method and device based on deep learning
Park et al. Few-shot font generation with localized style representations and factorization
CN106547880B (en) Multi-dimensional geographic scene identification method fusing geographic area knowledge
CN110222140A (en) A kind of cross-module state search method based on confrontation study and asymmetric Hash
CN107180430A (en) A kind of deep learning network establishing method and system suitable for semantic segmentation
CN108304357A (en) A kind of Chinese word library automatic generation method based on font manifold
CN111860517A (en) Semantic segmentation method under small sample based on decentralized attention network
CN114049381A (en) Twin cross target tracking method fusing multilayer semantic information
CN111985532B (en) Scene-level context-aware emotion recognition deep network method
Zhai et al. Deep texton-coherence network for camouflaged object detection
CN116229056A (en) Semantic segmentation method, device and equipment based on double-branch feature fusion
Halvardsson et al. Interpretation of swedish sign language using convolutional neural networks and transfer learning
CN112131969A (en) Remote sensing image change detection method based on full convolution neural network
CN113592894A (en) Image segmentation method based on bounding box and co-occurrence feature prediction
CN116258990A (en) Cross-modal affinity-based small sample reference video target segmentation method
Yang et al. Application of multitask joint sparse representation algorithm in Chinese painting image classification
CN116681960A (en) Intelligent mesoscale vortex identification method and system based on K8s
Guo et al. Decoupling semantic and edge representations for building footprint extraction from remote sensing images
CN106462773A (en) Pattern recognition system and method using GABOR functions
Yu et al. Coupling dual graph convolution network and residual network for local climate zone mapping
CN117370578A (en) Method for supplementing food safety knowledge graph based on multi-mode information
CN117634556A (en) Training method and device for semantic segmentation neural network based on water surface data
Wang et al. Self-attention deep saliency network for fabric defect detection
Wu et al. Retentive Compensation and Personality Filtering for Few-Shot Remote Sensing Object Detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant