CN114638836A - Urban street view segmentation method based on highly effective drive and multi-level feature fusion - Google Patents

Urban street view segmentation method based on highly effective drive and multi-level feature fusion Download PDF

Info

Publication number
CN114638836A
CN114638836A CN202210148745.7A CN202210148745A CN114638836A CN 114638836 A CN114638836 A CN 114638836A CN 202210148745 A CN202210148745 A CN 202210148745A CN 114638836 A CN114638836 A CN 114638836A
Authority
CN
China
Prior art keywords
network
feature
layer
features
fusion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210148745.7A
Other languages
Chinese (zh)
Other versions
CN114638836B (en
Inventor
熊炜
赵迪
孙鹏
陈奕博
田紫欣
强观臣
万相奎
李利荣
宋海娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hubei University of Technology
Original Assignee
Hubei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hubei University of Technology filed Critical Hubei University of Technology
Priority to CN202210148745.7A priority Critical patent/CN114638836B/en
Publication of CN114638836A publication Critical patent/CN114638836A/en
Application granted granted Critical
Publication of CN114638836B publication Critical patent/CN114638836B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a city street view segmentation method based on highly effective driving and multi-level feature fusion, which combines the position prior of a city street view image and adds HEAM in an improved network, thereby enhancing the feature extraction capability of the network on corresponding targets at different height positions. The HEAM is embedded into the feature extraction network and the ASPP structure, so that the extraction capability of deep features and multi-scale features is improved when the network performs deep convolution and performs multilayer cavity convolution through the ASPP. The HEAM extracts height context information representing the corresponding target level portion, and predicts features or categories of the target level portion from the height context information. Shallow features in the network have high resolution and contain more position and detail information, while deep features have low resolution and poor perception of detail, but have stronger semantic information. Therefore, segmentation accuracy is often improved by fusing shallow and deep features to make the features have more complete information representation.

Description

Urban street view segmentation method based on highly effective driving and multi-level feature fusion
Technical Field
The invention belongs to the technical field of digital image processing and computer vision, relates to a city street view segmentation method, and particularly relates to a city street view semantic segmentation method based on highly efficient driving and multi-level feature fusion.
Background
With the rapid development of computer hardware equipment, image semantic segmentation algorithms are widely applied to computer vision automatic driving tasks, and many segmentation algorithms are used for processing city street view segmentation tasks, and the results are remarkable. For the task of segmenting the urban street scene, a lot of target objects exist in the street scene, and various small target objects are difficult to segment. Therefore, high-precision segmentation of multi-target objects under city street view becomes a key point of research.
At present, the semantic segmentation method is mainly based on the full convolution network and the image context knowledge, and can be roughly divided into 3 modes: (1) adopting jump connection, multi-scale feature fusion or cavity convolution in CNN to improve receptive field; (2) introducing a conditional random field to perform subsequent image segmentation processing after the CNN processing is finished; (3) the images are input as sequences by means of the memory capacity of the recurrent neural network RNN, and the segmentation performance is improved.
The attention mechanism can enable the network to focus more on important characteristics of the target, and irrelevant information is omitted, so that the depth dependence relationship between two pixels is established. For a two-dimensional image, besides the size space of the image, the other dimension is the number of channels, and the channel attention mechanism is used for enhancing or suppressing the corresponding channels for different tasks by judging the importance degree of each channel of the feature map so as to pay attention to local interesting information. However, these methods have poor robustness and high model complexity, and most importantly, the spatial position information of the urban street scene object is not well utilized.
Disclosure of Invention
In order to solve the technical problems, the invention provides a city street view semantic segmentation method based on highly effective driving and multi-level feature fusion, which can remarkably improve the city street view segmentation effect, does not bring excessive parameters and can realize high-precision classification segmentation on multiple targets.
The technical scheme adopted by the invention is as follows: a city street view segmentation method based on highly effective drive and multi-level feature fusion inputs a city street view image into a city street view segmentation network to obtain a well segmented city street view;
the city street view segmentation network comprises a ResNet50 feature extraction network, a multi-level feature fusion network MFFM, a highly effective attention-driving network HEAM and a cavity space pyramid pooling network ASPP;
the highly effective driving attention network HEAM is respectively embedded into the ResNet50 feature extraction network and the void space pyramid pooling network ASPP network so as to improve the effective extraction of the features of the network in the height direction; shallow feature map of the ResNet50 feature extraction network
Figure BDA0003509740810000021
Inputting and outputting deep characteristic diagram for the high-driving effective attention network HEAM
Figure BDA0003509740810000022
Wherein CxIs the number of channels, HxAnd WxThe height and width dimensions of the feature map, x ═ l, h, respectively; the highly effective driving attention network HEAM is embedded into the ASPP network, namely the output of the ResNet50 feature extraction network is used as a shallow feature map XlSaid hollow space pyramidOutput of the pooled network ASPP network as a deep profile XhThus realizing attention operation of HEAM;
the input city street view image firstly enters the ResNet50 feature extraction network to complete the deep extraction of features, and further the expansion of the receptive field is realized through the ASPP, so that the coding processing is completed; the multi-level feature fusion network MFFM fuses deep-layer and shallow-layer features of the network in a decoding process, and then the fusion features are connected to a decoding end in a jumping mode, so that data information loss caused by an up-sampling operation is reduced.
Compared with the prior art, the invention has the remarkable advantages that:
(1) the invention provides a highly effective driving attention network HEAM, which effectively combines the position prior of the city street view image and successfully embeds the position prior into a feature extraction network, thereby enhancing the feature extraction capability of the network on corresponding targets at different height positions.
(2) The invention provides a multi-level feature fusion network MFFM, which enables features to have more complete information expression to improve the segmentation precision by deeply fusing shallow features and deep features.
(3) The segmentation effect of the method is on a test set of a CamVid data set, the MIoU evaluation index reaches 68.2%, and the current higher SOTA segmentation effect is achieved.
Drawings
FIG. 1 is a diagram of a network structure for dividing city street views according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an improved FPN multi-scale fusion according to an embodiment of the present invention;
FIG. 3 is a diagram of an ECA network architecture according to an embodiment of the present invention;
FIG. 4 is a block diagram of a highly efficient attention driving network HEAM according to an embodiment of the present invention.
Detailed Description
In order to facilitate the understanding and implementation of the present invention for those of ordinary skill in the art, the present invention is further described in detail with reference to the accompanying drawings and examples, it is to be understood that the embodiments described herein are merely illustrative and explanatory of the present invention and are not restrictive thereof.
The invention provides a city street view segmentation method based on highly effective driving and multi-level feature fusion, which is characterized in that a city street view image is input into a city street view segmentation network to obtain a well segmented city street view;
referring to fig. 1, the city street view segmentation network provided in this embodiment includes a ResNet50 feature extraction network, a multi-level feature fusion network MFFM, a highly efficient attention-driving network HEAM, and a void space pyramid pooling network ASPP;
the attention network HEAM is driven highly effectively and is respectively embedded into a ResNet50 feature extraction network and a void space pyramid pooling network ASPP network so as to improve the effective extraction of the features of the network in the height direction; shallow feature map of ResNet50 feature extraction network
Figure BDA0003509740810000031
For high-driving effective attention network HEAM input, the output is a deep characteristic diagram
Figure BDA0003509740810000032
Wherein C isxIs the number of channels, HxAnd WxThe height and width dimensions of the feature map, x ═ l, h, respectively; as shown in FIG. 1, HEAM is embedded into ASPP structure, and the output of feature extraction network is taken as shallow feature map X in FIG. 4lThe output of the ASPP structure is taken as the deep profile X in FIG. 4hThus, attention operation of the HEAM is realized.
The city street view image input by the embodiment firstly enters a ResNet50 feature extraction network to complete the deep extraction of features, and further expands the receptive field through a cavity space pyramid pooling network ASPP, so that the coding processing is completed; the multi-level feature fusion network MFFM fuses deep and shallow features of the network in a decoding process, and then the fusion features are connected to a decoding end in a jumping mode, so that data information loss caused by an up-sampling operation is reduced.
In the embodiment, the city street view image is selected as input data, and the input image is preprocessed by using methods such as random erasing, image turning, smoothing and the like. The image resize is 720 x 720 when the network model is input, and then the image is fed into the city street view segmentation network as shown in fig. 1. In the multi-stage feature fusion network MFFM (MFFM) of this embodiment, a backbone network based on Resnet is divided into 4 layers, and features of each layer are extracted from the backbone network Resnet50 as feature fusion branches. And a first branch, namely, a third layer and a fourth layer of characteristics are spliced through a channel and then fused with the characteristics of a second layer, and due to the fact that the sizes of the characteristic graphs are different, FPN multi-scale characteristic fusion needs to be carried out on the characteristic graphs, and finally jump-splicing is carried out on the characteristic graphs and the characteristics sampled by the first 2 times in the decoding block. And after splicing of the first branch and 2 times of upsampling operation, extracting the first layer and second layer characteristics from the backbone network to perform improved FPN fusion operation, and further leading out another branch to be connected to the second 2 times of upsampling characteristics in a jumping mode. The improved network refines the up-sampling of 4 times in the original network into two up-sampling operations of 2 times, and jumps the deep characteristics of multiple stages in the connection backbone network after each up-sampling operation, thereby effectively improving the information loss in the decoding block image recovery process.
The shallow layer features correspond to local information of the image, and simple targets can be distinguished by utilizing abundant local information; the deep features correspond to global information of the image, and the global information such as color, texture and shape is used for distinguishing more fine complex targets. The characteristic pyramid network can be fused with the multilevel characteristics of the backbone network, and multi-target segmentation of different sizes is realized.
In this embodiment, an FPN fusion operation is performed by extracting features of a first layer and a second layer from a backbone network, see fig. 2, where three feature maps of different levels are provided, and the size and resolution of the three feature maps are sequentially reduced from bottom to top, and the two feature maps of different levels can be fused into a new feature map with the same size as the features of the shallow layer through the FPN. In this embodiment, taking the first layer and the second layer of the multi-level feature fusion network MFFM as an example, the second layer as a deep feature is firstly subjected to 1 × 1 convolution operation to reduce the dimensionality, and then is subjected to an upsampling operation by adopting a bilinear interpolation method, so that the size of the deep feature is expanded to the size of the first layer of feature size; and the first layer is used as a shallow layer feature to perform channel dimension reduction treatment, then the two layers of features are subjected to important feature information attention through an ECA network, and finally a feature map is fused by using a channel splicing method. By adopting the improved characteristic pyramid module, richer semantic information and spatial information can be obtained, and meanwhile, the ECA network is added, so that the characteristic extraction capability of the network is improved, and the prediction accuracy of the network is effectively enhanced.
The ECA network is improved on the basis of SENEt, which reduces the dimension of a characteristic diagram, and researches show that: the reduction in dimensions is detrimental to the channel attention. As shown in fig. 3, without reducing dimensionality, first performing channel-by-channel global average pooling, then performing cross-channel interaction capture on each channel and its k adjacent channels through a one-dimensional convolution operation, and finally generating channel weights through a nonlinear Sigmoid function. The channel weight in ECA can be calculated by equation (1):
ω=σ(Wy), (1)
where ω represents the channel weight of the entire ECA, σ is the Sigmoid function, W is a C × C channel weight matrix, and y is the input feature matrix. ω can in turn be represented as:
ω=σ(C1Dk(y)), (2)
wherein, C1D represents a one-dimensional convolution, k is the number of adjacent channels, and k is a preset value.
Referring to fig. 4, the highly efficient driver attention network HEAM provided by the present embodiment can generate channel-by-channel scaling factors from its context information by compressing the width dimension to obtain the weight magnitude on the channel-by-channel height. Shallow profile XlGenerating a two-dimensional attention weight map A with dimension reduced in the width direction after the HANet operation, and combining the attention weight map A and the deep layer feature map XhElement-by-element point multiplication to obtain brand-new three-dimensional characteristic diagram with height direction and position dependence
Figure BDA0003509740810000041
Continue to deep profile XhDoing ECA network processingGenerating
Figure BDA0003509740810000042
And adding the two feature graphs generated in parallel element by element to generate a final output result, thereby realizing highly effective driving of the features.
In this embodiment, the highly efficient driver attention network HEAM generates a channel attention map consisting of channel-by-channel height scaling factors via the HANET
Figure BDA0003509740810000051
Figure BDA0003509740810000052
Shallow feature map XlAnd the two-dimensional attention weight map a is obtained by element-by-element multiplication,
Figure BDA0003509740810000053
generated by deep profile through ECA network, and finally
Figure BDA0003509740810000054
And
Figure BDA0003509740810000055
fusion generation
Figure BDA0003509740810000056
As shown in formulas (3), (4), and (5):
Figure BDA0003509740810000057
Figure BDA0003509740810000058
Figure BDA0003509740810000059
wherein,
Figure BDA00035097408100000510
in order to multiply the elements one by one,
Figure BDA00035097408100000511
is an element-by-element addition.
FHANetThe method specifically comprises the following five steps of (a), (b), (c), (d) and (e):
(a) width pooling: firstly, the characteristic diagram
Figure BDA00035097408100000512
Compressing the width dimension to generate a feature map
Figure BDA00035097408100000513
To obtain the height context information for each row, as shown in equation (6):
Z=Gpool(Xl), (6)
(b) and (d) interpolation processing: because the urban street view images are distributed with great difference in the height direction, all row information of the matrix Z does not need to be considered, and therefore the characteristic diagram is generated by the Z interpolation through the downsampling
Figure BDA00035097408100000514
Order hyper-parameter in the invention
Figure BDA00035097408100000515
Then (d) sampling again to restore the dimension to Cl×Hl×1。
(c) High drive attention map calculation: by means of characteristic diagrams
Figure BDA00035097408100000516
Convolution operations are employed as inputs to generate an attention map that better accounts for the relationship between adjacent rows than if fully connected layers were used. Obtaining an attention map A from N convolutional layers can be expressed by equation (7):
Figure BDA00035097408100000517
where σ denotes a Sigmoid function, δ denotes a ReLU activation function,
Figure BDA00035097408100000518
representing the ith one-dimensional convolutional layer. In the present invention, let the super parameter N be 3, i.e. there are 3 convolutional layers operating. The first layer convolution compresses the channel by r times
Figure BDA00035097408100000519
The second layer of convolution stretches the channel by a factor of 2
Figure BDA00035097408100000520
The last layer of convolution restores the channel to Ch
Figure BDA00035097408100000521
GupRepresenting an upsampling operation.
(e) Position coding: since the person has position prior knowledge of the object during driving observation, the invention is inspired by the fact that the characteristic diagram Q of the intermediate layeriAdding sinusoidal position codes, the position codes can be defined as formulas (8), (9):
PE(p,2i)=sin(p/1002i/C), (8)
PE(p,2i+1)=cos(p/1002i/C). (9)
where p represents a position factor in the vertical direction of the entire graph, and i is the number of vertical positions, such that
Figure BDA0003509740810000061
New characteristic diagram
Figure BDA0003509740810000062
Produced by equation (10):
Figure BDA0003509740810000063
in the embodiment, a ResNet50 feature extraction network is pre-trained on an Imagenet classification data set, and then a pre-training model is used for carrying out migration training on a network model; calculating a gradient by using a random gradient (SGD), wherein the initial learning rate lr is 1e-2, the momentum is 0.9, the weight attenuation degree is 5e-4, and the learning rate attenuation adopts a poly strategy; when training the city street view data set CamVid, the input size resize is 720 × 720, the batch size is 4, the number of training iterations is 14000 (300epoch), and the loss function is the cross entropy loss function.
According to the embodiment, aiming at the test of the training model, multi-class segmentation graphs of the image can be output, different classes of semantic information in the graphs are marked into different colors, and then the automatic driving system is assisted to distinguish city street view targets.
The method combines the position prior of the city street view image, and adds a height-driven attention attachment model (HEAM) in the improved network, thereby enhancing the feature extraction capability of the network on corresponding targets at different height positions. The HEAM is embedded into a ResNet50 feature extraction network and a void space pyramid pooling (ASPP) structure, so that the extraction capability of deep features and multi-scale features is improved when the network performs deep convolution and performs multi-layer void convolution through the ASPP. The HEAM extracts height context information representing each horizontal partition, and predicts a feature or category of each horizontal partition from the height context information. For the multi-target semantic segmentation task, different target objects have different sizes, and if the same layer of features are used for segmentation, accurate segmentation of multiple targets is difficult to complete. Shallow features in the network have high resolution and contain more location and detail information, while deep features have low resolution and poor perception of detail, but possess stronger semantic information. Therefore, segmentation accuracy is often improved by fusing shallow and deep features to make the features have more complete information representation.
It should be understood that the above description of the preferred embodiments is given for clarity and not for any purpose of limitation, and that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (7)

1. A city street view segmentation method based on highly efficient driving and multi-level feature fusion is characterized by comprising the following steps: inputting the city street view image into a city street view segmentation network to obtain a city street view with segmentation performance meeting preset conditions;
the city street view segmentation network comprises a ResNet50 feature extraction network, a multi-level feature fusion network MFFM, a highly effective attention-driving network HEAM and a cavity space pyramid pooling network ASPP;
the highly effective driving attention network HEAM is respectively embedded into the ResNet50 feature extraction network and the hole space pyramid pooling network ASPP network; shallow feature map of the ResNet50 feature extraction network
Figure FDA0003509740800000011
Inputting and outputting deep characteristic diagram for the high-driving effective attention network HEAM
Figure FDA0003509740800000012
Wherein C isxIs the number of channels, HxAnd WxThe height and width dimensions of the feature map are, respectively, x ═ l, h; the highly effective drive attention network HEAM is embedded into the ASPP network, and the output of the ResNet50 feature extraction network is used as a shallow feature map XlThe output of the ASPP network is used as a deep feature map XhThus realizing attention operation of the HEAM;
the input city street view image firstly enters the ResNet50 feature extraction network to complete the deep extraction of features, and further the amplification of the receptive field is realized through the ASPP, so that the coding processing is completed; the multi-level feature fusion network MFFM fuses deep and shallow features of the network in a decoding process, and then the fusion features are connected to a decoding end in a jumping mode, so that data information loss caused by an up-sampling operation is reduced.
2. The city street view segmentation method based on the fusion of the highly efficient driver and the multi-level features as claimed in claim 1, wherein: the multi-level feature fusion network MFFM divides the ResNet50 feature extraction network into 4 layers, and respectively extracts features of each layer from the ResNet50 feature extraction network as feature fusion branches; splicing the characteristics of the third layer and the fourth layer through a channel, then performing FPN multi-scale characteristic fusion with the characteristics of the second layer, and finally performing jump splicing with the first 2 times of up-sampled characteristics in the decoding block; and in the second branch, after splicing of the first branch is finished and a 2-time upsampling operation is performed, the first layer and the second layer of characteristics are extracted from the ResNet50 characteristic extraction network to perform FPN fusion operation, and then the other branch is led out to be connected to the second 2-time upsampling characteristic in a jumping mode.
3. The city street view segmentation method based on the fusion of the highly efficient driver and the multi-level features as claimed in claim 2, wherein: in the second branch, extracting first layer and second layer features from the ResNet50 feature extraction network to perform FPN fusion operation, wherein for improved FPN fusion operation, the second layer is used as a deep layer feature, firstly, the dimension is reduced through 1 × 1 convolution operation, and then, an up-sampling operation is performed by adopting a bilinear interpolation method to expand the dimension of the deep layer feature to the dimension of the first layer feature; and the first layer is used as a shallow layer feature to perform channel dimension reduction treatment, then the two layers of features are subjected to important feature information attention through an ECA network, and finally a channel splicing method is used for fusing feature graphs.
4. The city street view segmentation method based on the fusion of the highly efficient driver and the multi-level features as claimed in claim 3, wherein: the ECA network firstly performs channel-by-channel global average pooling under the condition of not reducing dimensionality, and then realizes the purpose of performing one-dimensional convolution operation on each channel and k channels thereofCapturing cross-channel interaction of adjacent channels, and finally generating channel weight through a nonlinear Sigmoid function; wherein, in the ECA, the channel weight ω is σ (Wy), ω represents the channel weight of the entire ECA, σ is a Sigmoid function, W is a C × C channel weight matrix, and y is an input feature matrix; ω can again be represented as: ω ═ σ (C1D)k(y)), wherein C1D represents a one-dimensional convolution, k is the number of adjacent channels, and k is a preset value.
5. The city street view segmentation method based on the fusion of the highly efficient driver and the multi-level features as claimed in claim 1, wherein: the highly efficient driver attention network HEAM, shallow profile XlGenerating a two-dimensional attention weight map A with dimension reduced in the width direction after the HANet operation, and combining the attention weight map A and the deep layer feature map XhElement-by-element point multiplication to obtain brand-new three-dimensional characteristic diagram with height direction and position dependence
Figure FDA0003509740800000021
Continue to deep profile XhECA network processing generation
Figure FDA0003509740800000022
And adding the two feature graphs generated in parallel element by element to generate a final output result, thereby realizing highly effective driving of the features.
6. The city street view segmentation method based on the fusion of the highly efficient driver and the multi-level features as claimed in claim 5, wherein: the highly efficient driver attention network HEAM generates a channel attention map consisting of channel-by-channel height scaling factors via the HANET
Figure FDA0003509740800000023
Figure FDA0003509740800000024
Mapping shallow feature XlAnd two-dimensional attention weight map A by element multiplicationSo as to obtain the compound with the characteristics of,
Figure FDA0003509740800000025
from the deep profile XhGenerated through an ECA network and finally will
Figure FDA0003509740800000026
And
Figure FDA0003509740800000027
fusion generation
Figure FDA0003509740800000028
As shown in formulas (3), (4), and (5):
Figure FDA0003509740800000029
Figure FDA00035097408000000210
Figure FDA00035097408000000211
wherein, the element-by-element multiplication,
Figure FDA00035097408000000212
is added element by element;
FHANetcomprises five steps of (a), (b), (c), (d) and (e), wherein the step (a) is width pooling: firstly, a characteristic diagram is drawn
Figure FDA0003509740800000031
Compressing width dimensions to generate feature maps
Figure FDA0003509740800000032
To obtain high-level context information per line, e.g. publicFormula (6):
Z=Gpool(Xl), (6)
the step (b) and the step (d) are interpolation processing: generation of feature maps by downsampling versus Z-interpolation
Figure FDA0003509740800000033
Then, in step (d), the dimension C is recovered by up-samplingl×Hl×1;
Step (c) calculates for the high drive attention map: by means of characteristic diagrams
Figure FDA0003509740800000034
Convolution operations are used as input to generate an attention map; the attention map A obtained from the N convolutional layers is expressed by equation (7):
Figure FDA0003509740800000035
where σ denotes the Sigmoid function, δ denotes the ReLU activation function,
Figure FDA0003509740800000036
representing the ith one-dimensional convolutional layer; the super parameter N is 3, namely 3 convolutional layers are operated; the first layer of convolution compresses the channel by r times
Figure FDA0003509740800000037
The second layer of convolution stretches the channel by a factor of 2
Figure FDA0003509740800000038
The last layer of convolution restores the channel to Ch
Figure FDA0003509740800000039
GupRepresenting an upsample operation;
the step (e) is position coding: in the middle layer characteristic diagram QiAdding sinusoidal position codes, wherein the position codes are defined as formulas (8) and (9):
PE(p,2i)=sin(p/1002i/C), (8)
PE(p,2i+1)=cos(p/1002i/C). (9)
where p represents a position factor in the vertical direction of the entire graph, and i is the number of vertical positions, such that
Figure FDA00035097408000000310
New characteristic diagram
Figure FDA00035097408000000311
Produced by equation (10):
Figure FDA00035097408000000312
7. the city street view segmentation method based on the fusion of the highly efficient driver and the multi-level features as claimed in any one of claims 1 to 4, wherein: obtaining a trained city street view segmentation network after training;
firstly, pre-training a ResNet50 feature extraction network on an Imagenet classification data set, and then carrying out migration training on a network model by using a pre-training model; calculating gradient by using random gradient descent, wherein the initial learning rate lr is 1e-2, momentum is 0.9, weight attenuation is 5e-4, and the learning rate attenuation adopts a poly strategy; when training the city street view data set CamVid, the input size resize is 720 × 720, the batch size is 4, the number of training iterations is 14000, and the loss function is a cross entropy loss function.
CN202210148745.7A 2022-02-18 2022-02-18 Urban street view segmentation method based on highly effective driving and multi-level feature fusion Active CN114638836B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210148745.7A CN114638836B (en) 2022-02-18 2022-02-18 Urban street view segmentation method based on highly effective driving and multi-level feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210148745.7A CN114638836B (en) 2022-02-18 2022-02-18 Urban street view segmentation method based on highly effective driving and multi-level feature fusion

Publications (2)

Publication Number Publication Date
CN114638836A true CN114638836A (en) 2022-06-17
CN114638836B CN114638836B (en) 2024-04-30

Family

ID=81945671

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210148745.7A Active CN114638836B (en) 2022-02-18 2022-02-18 Urban street view segmentation method based on highly effective driving and multi-level feature fusion

Country Status (1)

Country Link
CN (1) CN114638836B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112446384A (en) * 2019-12-06 2021-03-05 黑芝麻智能科技(上海)有限公司 Fast instance partitioning
CN115035299A (en) * 2022-06-20 2022-09-09 河南大学 Improved city street view image segmentation method based on deep learning
CN115690704A (en) * 2022-09-27 2023-02-03 淮阴工学院 LG-CenterNet model-based complex road scene target detection method and device
CN116188584A (en) * 2023-04-23 2023-05-30 成都睿瞳科技有限责任公司 Method and system for identifying object polishing position based on image

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160157828A1 (en) * 2014-06-05 2016-06-09 Chikayoshi Sumi Beamforming method, measurement and imaging instruments, and communication instruments
CN110188817A (en) * 2019-05-28 2019-08-30 厦门大学 A kind of real-time high-performance street view image semantic segmentation method based on deep learning
CN111563909A (en) * 2020-05-10 2020-08-21 中国人民解放军91550部队 Semantic segmentation method for complex street view image
CN112651423A (en) * 2020-11-30 2021-04-13 深圳先进技术研究院 Intelligent vision system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160157828A1 (en) * 2014-06-05 2016-06-09 Chikayoshi Sumi Beamforming method, measurement and imaging instruments, and communication instruments
CN110188817A (en) * 2019-05-28 2019-08-30 厦门大学 A kind of real-time high-performance street view image semantic segmentation method based on deep learning
CN111563909A (en) * 2020-05-10 2020-08-21 中国人民解放军91550部队 Semantic segmentation method for complex street view image
CN112651423A (en) * 2020-11-30 2021-04-13 深圳先进技术研究院 Intelligent vision system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HE K,ZHANG X,REN S,E: "al.Deep residual learning for image recognition", 《2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION(CVPR)》, 30 June 2016 (2016-06-30) *
彭玉凤;谭建平;陈晖;全凌云;: "300MN液压机活动横梁横向偏移视觉检测图像分割方法研究及应用", 仪表技术与传感器, no. 05, 15 May 2010 (2010-05-15) *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112446384A (en) * 2019-12-06 2021-03-05 黑芝麻智能科技(上海)有限公司 Fast instance partitioning
CN112446384B (en) * 2019-12-06 2024-05-31 黑芝麻智能科技(上海)有限公司 Fast instance partitioning
CN115035299A (en) * 2022-06-20 2022-09-09 河南大学 Improved city street view image segmentation method based on deep learning
CN115690704A (en) * 2022-09-27 2023-02-03 淮阴工学院 LG-CenterNet model-based complex road scene target detection method and device
CN115690704B (en) * 2022-09-27 2023-08-22 淮阴工学院 LG-CenterNet model-based complex road scene target detection method and device
CN116188584A (en) * 2023-04-23 2023-05-30 成都睿瞳科技有限责任公司 Method and system for identifying object polishing position based on image
CN116188584B (en) * 2023-04-23 2023-06-30 成都睿瞳科技有限责任公司 Method and system for identifying object polishing position based on image

Also Published As

Publication number Publication date
CN114638836B (en) 2024-04-30

Similar Documents

Publication Publication Date Title
CN110111366B (en) End-to-end optical flow estimation method based on multistage loss
CN111582316B (en) RGB-D significance target detection method
CN112347859B (en) Method for detecting significance target of optical remote sensing image
CN114638836B (en) Urban street view segmentation method based on highly effective driving and multi-level feature fusion
CN110363716B (en) High-quality reconstruction method for generating confrontation network composite degraded image based on conditions
CN111612807B (en) Small target image segmentation method based on scale and edge information
CN112396607B (en) Deformable convolution fusion enhanced street view image semantic segmentation method
CN111539887B (en) Channel attention mechanism and layered learning neural network image defogging method based on mixed convolution
CN113469094A (en) Multi-mode remote sensing data depth fusion-based earth surface coverage classification method
CN111931787A (en) RGBD significance detection method based on feature polymerization
CN110929736A (en) Multi-feature cascade RGB-D significance target detection method
CN112329780B (en) Depth image semantic segmentation method based on deep learning
CN112270366B (en) Micro target detection method based on self-adaptive multi-feature fusion
CN113076957A (en) RGB-D image saliency target detection method based on cross-modal feature fusion
CN114549574A (en) Interactive video matting system based on mask propagation network
CN113538243B (en) Super-resolution image reconstruction method based on multi-parallax attention module combination
CN112991350A (en) RGB-T image semantic segmentation method based on modal difference reduction
CN114359293A (en) Three-dimensional MRI brain tumor segmentation method based on deep learning
CN115272437A (en) Image depth estimation method and device based on global and local features
CN116778165A (en) Remote sensing image disaster detection method based on multi-scale self-adaptive semantic segmentation
CN116402851A (en) Infrared dim target tracking method under complex background
CN116485867A (en) Structured scene depth estimation method for automatic driving
CN112288690A (en) Satellite image dense matching method fusing multi-scale and multi-level features
CN112149526A (en) Lane line detection method and system based on long-distance information fusion
CN115063704A (en) Unmanned aerial vehicle monitoring target classification method based on three-dimensional feature fusion semantic segmentation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant