WO2018076212A1 - De-convolutional neural network-based scene semantic segmentation method - Google Patents

De-convolutional neural network-based scene semantic segmentation method Download PDF

Info

Publication number
WO2018076212A1
WO2018076212A1 PCT/CN2016/103425 CN2016103425W WO2018076212A1 WO 2018076212 A1 WO2018076212 A1 WO 2018076212A1 CN 2016103425 W CN2016103425 W CN 2016103425W WO 2018076212 A1 WO2018076212 A1 WO 2018076212A1
Authority
WO
WIPO (PCT)
Prior art keywords
layer
neural network
picture
scene
semantic segmentation
Prior art date
Application number
PCT/CN2016/103425
Other languages
French (fr)
Chinese (zh)
Inventor
黄凯奇
赵鑫
程衍华
Original Assignee
中国科学院自动化研究所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院自动化研究所 filed Critical 中国科学院自动化研究所
Priority to PCT/CN2016/103425 priority Critical patent/WO2018076212A1/en
Publication of WO2018076212A1 publication Critical patent/WO2018076212A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis

Definitions

  • the invention relates to the field of pattern recognition, machine learning and computer vision, in particular to a scene semantic segmentation method based on deconvolution neural network.
  • Scene semantic segmentation is the use of computer to intelligently analyze images, and then determine the object categories to which each pixel in the image belongs, such as floors, walls, people, chairs, and so on.
  • the traditional scene semantic segmentation algorithm generally only relies on RGB (red, green and blue) to segment, which is easy to be affected by light changes, object color changes and background noise. It is not robust in practical use, and the accuracy is difficult. User needs.
  • the present invention is directed to the above problems existing in the prior art, and proposes a method based on deconvolution.
  • Neural network scene semantic segmentation method to improve the accuracy of scene semantic segmentation.
  • the scene semantic segmentation method based on deconvolution neural network of the present invention comprises the following steps:
  • Step S1 extracting a dense feature expression by using a full convolutional neural network for the scene picture
  • Step S2 using a locally sensitive deconvolution neural network and using the local affinity matrix of the picture, upsampling and optimizing the dense feature expression obtained in step S1 to obtain a score map of the picture, thereby achieving fine Semantic segmentation of the scene.
  • the local affinity matrix is obtained by extracting a SIFT (Scale-invariant feature transform) feature of the picture, and SPIN (Using Spin Images for Efficient Object Recognition in Cluttered 3D Scenes: in a complex three-dimensional scene)
  • SIFT Scale-invariant feature transform
  • SPIN Spin Images for Efficient Object Recognition in Cluttered 3D Scenes: in a complex three-dimensional scene
  • the feature of the effective target recognition using the rotated image and the gradient feature are then obtained by using the ucm-gPb (Contour Detection and Hierarchical Image Segmentation) algorithm.
  • the locally sensitive deconvolution neural network is formed by multiple splicing of three modules, which are a locally sensitive anti-aggregation layer, a deconvolution layer, and a locally sensitive mean aggregation layer.
  • the number of stitching is 2 or 3 times.
  • the output of the locally sensitive anti-aggregation layer is obtained by the following formula:
  • x represents the eigenvector of a pixel in the feature map
  • (i, j) and (o, o) represent arbitrary positions and center positions in the affinity matrix, respectively
  • Y ⁇ Y i, j ⁇ is a feature map of the inverse aggregate output.
  • the scene picture includes an RGB picture and a depth picture
  • the method further includes step S3: optimizing the RGB score map and the depth score map through the switch gate fusion layer, thereby Achieve more detailed scene semantic segmentation.
  • the switch gate fusion layer includes a splicing layer, a convolution layer, and a normalization layer.
  • the convolution layer is implemented by the following function: Where P rgb ⁇ ⁇ c ⁇ h ⁇ w is a score map based on RGB data prediction, P depth ⁇ ⁇ c ⁇ h ⁇ w is a score map based on depth data prediction, and W ⁇ R c ⁇ 2c ⁇ 1 ⁇ 1 is a switch gate
  • the filter of the fusion layer learning, C ⁇ R c ⁇ h ⁇ w is the contribution coefficient matrix of the convolution output.
  • the normalized layer is implemented by a sigmoid function (a function of S type, also referred to as an S-type growth curve).
  • the locally sensitive deconvolution neural network is used to strengthen the sensitivity of the full convolutional neural network to the local edges by using the local underlying information, thereby obtaining a more accurate scene segmentation, which can effectively overcome the full convolutional neural network.
  • the inherent defect is that a large amount of context information is aggregated to perform scene segmentation, causing blurring effects of edges.
  • the switch gate fusion layer by designing the switch gate fusion layer, the different functions of RGB and depth modes in different objects in different scenes can be effectively and automatically learned.
  • This dynamic adaptive contribution coefficient is superior to the non-discriminatory treatment method used by traditional algorithms, which can further improve the scene segmentation accuracy.
  • Figure 1 is a flow chart of one embodiment of the method of the present invention.
  • FIG. 2 is a schematic diagram of a full convolutional neural network for dense feature extraction in the present invention
  • 3a is a schematic diagram of a local sensitive deconvolution neural network according to an embodiment of the present invention.
  • Figure 3b is a schematic diagram of a locally sensitive anti-aggregation layer and a locally sensitive mean aggregation layer in accordance with one embodiment of the present invention
  • FIG. 4 is a switch door fusion layer in accordance with an embodiment of the present invention.
  • a deconvolution neural network based scene semantic segmentation method includes the following steps:
  • Step S1 extracting low-resolution dense feature expressions from the full-convolution neural network for the scene picture
  • Step S2 using a locally sensitive deconvolution neural network and using the local affinity matrix of the picture, upsampling and optimizing the dense feature expression obtained in step S1 to obtain a score map of the picture, thereby achieving fine Semantic segmentation of the scene.
  • the present invention employs a full convolutional neural network to efficiently extract dense features of a picture, which may be RGB pictures, and/or depth pictures.
  • a full convolutional neural network can be aggregated through multiple convolution, downsampling, and maximum aggregation processes.
  • the context information is used to characterize each pixel in the picture to obtain an RGB feature map S1 and/or a depth feature map S1.
  • the full convolutional neural network yields a low resolution feature map with very blurred edges.
  • the present invention embeds the underlying pixel level information into the deconvolution neural network to guide the training of the network.
  • the locally sensitive deconvolution neural network is used to perform upsampling learning and object edge optimization to obtain RGB fractional graph S2 and/or depth fractional graph S2, thereby achieving finer scene semantic segmentation.
  • step S2 the similarity relationship between each pixel in the picture and the neighboring pixels is first calculated, and a binarized local affinity matrix is obtained.
  • SIFT, SPIN and gradient features of RGB and depth pictures can be extracted, and the local affinity matrix is obtained by using the ucm-gPb algorithm.
  • the network structure can include three modules: a locally sensitive anti-aggregation layer, a deconvolution layer, and a locally sensitive average pooling.
  • the input of the locally sensitive anti-aggregation layer is the feature map response of the previous layer, and the local affinity matrix, the output is a feature map response of twice the resolution.
  • the main function of the network layer is to learn to restore the richer details in the original picture and get the result of clearer segmentation of the edges of the object.
  • the output of the locally sensitive anti-aggregation layer can be obtained by the following formula:
  • x represents the eigenvector of a pixel in the feature map
  • the input of the deconvolution layer is the output of the upper layer of the anti-aggregation layer, and the output is the signature response of the equal resolution.
  • the network layer is mainly used to smooth the feature map, because the anti-aggregation layer is prone to generate many broken object edges, and the deconvolution process can be used to learn to splicing the edges of the breaks.
  • Deconvolution uses the inverse of convolution, mapping each stimulus response value to multiple stimulus response outputs. The response graph after deconvolution will become relatively smoother.
  • the input of the locally sensitive mean gather layer is the output of the upper deconvolution layer, and the local affinity matrix, and the output is an equal resolution feature map response.
  • the network layer is mainly used to obtain a more robust feature representation for each pixel while maintaining sensitivity to the edges of the object.
  • the invention combines the locally sensitive anti-aggregation layer, the deconvolution layer and the locally sensitive mean aggregation layer multiple times, gradually upsampling and optimizing the detailed information of the scene segmentation, and obtaining a finer and more accurate scene segmentation effect.
  • the number of stitching is 2 or 3 times. The more the number of stitching, the finer and more accurate the segmentation is, but the larger the amount of calculation.
  • RGB color information and depth information describe information about different modalities of objects in the scene.
  • RGB images can describe the appearance, color, and texture features of an object
  • depth data provides spatial geometry, shape, and size information of an object. Effectively blending these two complementary information can improve the accuracy of scene semantic segmentation.
  • the existing methods basically treat the data of the two modes equivalently, and cannot distinguish the different contributions of the two modes when identifying different objects in different scenes.
  • the RGB fractional map and the depth fraction map obtained by the above steps S1 and S2 are optimally fused by gate fusion to obtain a fusion score map, thereby realizing More detailed scene semantic segmentation, as shown in Figure 4.
  • the switch gate fusion layer can effectively measure the importance of RGB (apparent) and depth (shape) information for identifying different objects in different scenes.
  • the switch gate fusion layer of the present invention is mainly composed of a stitching layer, a convolution layer and a normalized layer, which can automatically learn the weights of the two modes, thereby better integrating the two modes.
  • Complementary information is used in scene semantic segmentation.
  • the features obtained by the RGB and deep networks are first spliced through the splicing layer.
  • the second is the convolution operation, which learns the weight matrix of RGB and depth information through convolutional layer learning.
  • the convolution process can be implemented as follows:
  • P rgb ⁇ ⁇ c ⁇ h ⁇ w (characteristic map of c channels, each feature map height h, width w) is a fractional graph predicted based on RGB data
  • P depth ⁇ ⁇ c ⁇ h ⁇ w (parameter meanings as defined above) as a fraction of depth based on the prediction data
  • W ⁇ R c ⁇ 2c ⁇ 1 ⁇ 1 (c th sub-filter, filtering each of the sub-three-dimensional matrix 2c ⁇ 1 ⁇ 1) of the learning filter layer is fusion door
  • C ⁇ R c ⁇ h ⁇ w is the contribution coefficient matrix of the convolution output.
  • is a matrix point multiplication operation. Adding the RGB and depth scores as the final blend score is Based on the final score map, the semantic segmentation results can be obtained.
  • the new local sensitivity-based deconvolution neural network proposed by the present invention can be used for RGB-D indoor scene semantic segmentation.
  • the invention can well adapt to the light changes of the indoor scene, the background noise, the small objects and the occlusion, and can more effectively utilize the complementarity of RGB and depth to obtain more robust, higher precision, and the edge of the object remains more. Good scene semantic segmentation effect.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

Disclosed is a de-convolutional neural network-based scene semantic segmentation method. The method comprises the following steps of: S1, extracting intensive feature expression for a scene picture by using a full-convolutional neural network; and S2, carrying out up-sampling learning and object edge optimization on the intensive feature expression obtained in the step S1 through utilizing a locally sensitive de-convolutional neural network by means of a local affinity matrix of the picture, so as to obtain a score map of the picture and then realize refined scene semantic segmentation. Through the locally sensitive de-convolutional neural network, the sensitivity, to the local edge, of the full-convolutional neural network is strengthened by utilizing local bottom layer information, so that scene segmentation with higher precision is obtained.

Description

基于反卷积神经网络的场景语义分割方法Scene Semantic Segmentation Method Based on Deconvolution Neural Network 技术领域Technical field
本发明涉及模式识别、机器学习、计算机视觉领域,特别涉及一种基于反卷积神经网络的场景语义分割方法。The invention relates to the field of pattern recognition, machine learning and computer vision, in particular to a scene semantic segmentation method based on deconvolution neural network.
背景技术Background technique
随着计算机运算能力的飞速提升,计算机视觉、人工智能、机器感知等领域也迅猛发展。场景语义分割作为计算机视觉中一个基本问题之一,也得到了长足的发展。场景语义分割就是利用计算机对图像进行智能分析,进而判断图像中每个像素点所属的物体类别,如地板、墙壁、人、椅子等等。传统的场景语义分割算法一般仅仅依靠RGB(红绿蓝三原色)图片来进行分割,很容易受到光线变化、物体颜色变化以及背景嘈杂的干扰,在实际运用中很不鲁棒,精度也很难到用户需求。With the rapid advancement of computer computing power, the fields of computer vision, artificial intelligence, and machine perception have also developed rapidly. As one of the basic problems in computer vision, scene semantic segmentation has also been greatly developed. Scene semantic segmentation is the use of computer to intelligently analyze images, and then determine the object categories to which each pixel in the image belongs, such as floors, walls, people, chairs, and so on. The traditional scene semantic segmentation algorithm generally only relies on RGB (red, green and blue) to segment, which is easy to be affected by light changes, object color changes and background noise. It is not robust in practical use, and the accuracy is difficult. User needs.
深度传感技术的发展,像微软的Kinect,能够捕捉到高精度的深度图片,很好的弥补了传统的RGB图片的上述缺陷,为鲁棒性好、精度高的物体识别提供了可能性。在计算机视觉和机器人领域,有大量的研究探索如何有效的利用RGB和深度信息来提高场景分割的精度。这些算法基本上都是利用现在最先进的全卷积神经网络来进行场景分割,但是全卷积神经网络每个神经单元都有很大的感受野,很容易造成分割的物体边沿非常粗糙。其次在RGB和深度信息融合时也采用最简单的叠加策略,并不考虑这两种模态的数据在区分不同场景下的不同物体时所起的作用截然不同的情况,造成在语义分割时候许多物体分类错误。The development of depth sensing technology, like Microsoft's Kinect, can capture high-precision depth pictures, which makes up for the above-mentioned defects of traditional RGB pictures, and provides possibilities for object recognition with high robustness and high precision. In the field of computer vision and robotics, there is a lot of research to explore how to effectively use RGB and depth information to improve the accuracy of scene segmentation. These algorithms basically use the most advanced full convolutional neural network to perform scene segmentation, but each neural unit of the full convolutional neural network has a large receptive field, which is easy to cause the edge of the segmented object to be very rough. Secondly, the simplest superposition strategy is adopted in the fusion of RGB and depth information. It does not consider the fact that the data of these two modes play different roles in distinguishing different objects in different scenes, resulting in many semantic segmentation. The object is classified incorrectly.
发明内容Summary of the invention
本发明针对现有技术存在的上述问题,提出一种基于反卷积 神经网络的场景语义分割方法,以提高场景语义分割的精度。The present invention is directed to the above problems existing in the prior art, and proposes a method based on deconvolution. Neural network scene semantic segmentation method to improve the accuracy of scene semantic segmentation.
本发明的基于反卷积神经网络的场景语义分割方法,包括下述步骤:The scene semantic segmentation method based on deconvolution neural network of the present invention comprises the following steps:
步骤S1,对场景图片用全卷积神经网络提取密集特征表达;Step S1, extracting a dense feature expression by using a full convolutional neural network for the scene picture;
步骤S2,利用局部敏感的反卷积神经网络并借助所述图片的局部亲和度矩阵,对步骤S1中得到的密集特征表达进行上采样和优化,得到所述图片的分数图,从而实现精细的场景语义分割。Step S2, using a locally sensitive deconvolution neural network and using the local affinity matrix of the picture, upsampling and optimizing the dense feature expression obtained in step S1 to obtain a score map of the picture, thereby achieving fine Semantic segmentation of the scene.
进一步地,所述局部亲和度矩阵通过提取所述图片的SIFT(Scale-invariant feature transform:尺度不变特征变换)特征、SPIN(Using Spin Images for Efficient Object Recognition in Cluttered 3D Scenes:在复杂三维场景中利用旋转图像进行有效的目标识别)特征以及梯度特征,然后利用ucm-gPb(Contour Detection and Hierarchical Image Segmentation:轮廓检测和多级图像分割)算法求得。Further, the local affinity matrix is obtained by extracting a SIFT (Scale-invariant feature transform) feature of the picture, and SPIN (Using Spin Images for Efficient Object Recognition in Cluttered 3D Scenes: in a complex three-dimensional scene) The feature of the effective target recognition using the rotated image and the gradient feature are then obtained by using the ucm-gPb (Contour Detection and Hierarchical Image Segmentation) algorithm.
进一步地,所述局部敏感的反卷积神经网络由三个模块多次拼接而成,该三个模块分别是局部敏感的反聚集层、反卷积层和局部敏感的均值聚集层。Further, the locally sensitive deconvolution neural network is formed by multiple splicing of three modules, which are a locally sensitive anti-aggregation layer, a deconvolution layer, and a locally sensitive mean aggregation layer.
进一步地,所述拼接次数为2或3次。Further, the number of stitching is 2 or 3 times.
进一步地,通过以下公式得到所述局部敏感的反聚集层的输出结果:
Figure PCTCN2016103425-appb-000001
其中x代表特征图中某个像素点的特征向量,A={Ai,j}是x为中心得到的一个s×s大小的局部亲和度矩阵,表征周围领域的像素点和中间像素点是否相似,(i,j)和(o,o)分别代表亲和度矩阵中的任意位置及中心位置,Y={Yi,j}是反聚集输出的特征图。
Further, the output of the locally sensitive anti-aggregation layer is obtained by the following formula:
Figure PCTCN2016103425-appb-000001
Where x represents the eigenvector of a pixel in the feature map, A={A i,j } is a local affinity matrix of s×s size obtained by centering x, representing the pixel points and intermediate pixels of the surrounding domain Whether they are similar, (i, j) and (o, o) represent arbitrary positions and center positions in the affinity matrix, respectively, and Y = {Y i, j } is a feature map of the inverse aggregate output.
进一步地,通过以下公式实现所述局部敏感的均值聚集层:
Figure PCTCN2016103425-appb-000002
其中,y是输出的特征向量,A={Ai,j}是y为中心得到的一个s×s大小的局部亲和度矩阵,Ai,j表征周围领域的像素点和中间像素点是否相似,(i,j)和(o,o)分别代表亲和度矩阵中的任意位置及中心位置,X={Xi,j}是输入特征图。
Further, the locally sensitive mean aggregation layer is implemented by the following formula:
Figure PCTCN2016103425-appb-000002
Wherein, y is the feature vector output, A = {A i, j } of y obtained for the central partial affinity matrix a s × s size, A i, j characterization pixels and the center pixel points around the art whether Similarly, (i, j) and (o, o) represent arbitrary positions and center positions in the affinity matrix, respectively, and X = {X i, j } is an input feature map.
进一步地,在所述步骤S1中,所述场景图片包括RGB图片和深度图片,所述方法还包括步骤S3:将得到的RGB分数图和深度分数图通过开关门融合层进行最优化融合,从而实现更精细的场景语义分割。Further, in the step S1, the scene picture includes an RGB picture and a depth picture, and the method further includes step S3: optimizing the RGB score map and the depth score map through the switch gate fusion layer, thereby Achieve more detailed scene semantic segmentation.
进一步地,所述的开关门融合层包括拼接层、卷积层以及归一化层。Further, the switch gate fusion layer includes a splicing layer, a convolution layer, and a normalization layer.
进一步地,所述卷积层通过如下函数实现:
Figure PCTCN2016103425-appb-000003
其中Prgb∈□c×h×w为基于RGB数据预测的分数图,Pdepth∈□c×h×w为基于深度数据预测的分数图,W∈Rc×2c×1×1为开关门融合层学习的滤波子,C∈Rc×h×w是卷积输出的贡献系数矩阵。
Further, the convolution layer is implemented by the following function:
Figure PCTCN2016103425-appb-000003
Where P rgb ∈ □ c × h × w is a score map based on RGB data prediction, P depth ∈ □ c × h × w is a score map based on depth data prediction, and W ∈ R c × 2c × 1 × 1 is a switch gate The filter of the fusion layer learning, C ∈ R c × h × w is the contribution coefficient matrix of the convolution output.
进一步地,所述归一化层通过sigmoid函数(S型的函数,也称为S型生长曲线)实现。Further, the normalized layer is implemented by a sigmoid function (a function of S type, also referred to as an S-type growth curve).
本发明中,通过局部敏感的反卷积神经网络,利用局部底层信息来加强全卷积神经网络对局部边沿的敏感性,从而得到更高精度的场景分割,能够有效的克服全卷积神经网络的固有缺陷,即聚合了非常大的上下文信息来进行场景分割,造成边沿的模糊效应。In the invention, the locally sensitive deconvolution neural network is used to strengthen the sensitivity of the full convolutional neural network to the local edges by using the local underlying information, thereby obtaining a more accurate scene segmentation, which can effectively overcome the full convolutional neural network. The inherent defect is that a large amount of context information is aggregated to perform scene segmentation, causing blurring effects of edges.
进一步地,通过设计开关门融合层,能够有效的自动学习到语义分割中,对于不同场景下不同物体中RGB和深度两个模态所起的不同作用。这种动态自适应的贡献系数要优于传统算法所使用的无差别对待方法,能进一步提高场景分割精度。 Further, by designing the switch gate fusion layer, the different functions of RGB and depth modes in different objects in different scenes can be effectively and automatically learned. This dynamic adaptive contribution coefficient is superior to the non-discriminatory treatment method used by traditional algorithms, which can further improve the scene segmentation accuracy.
附图说明DRAWINGS
图1为本发明方法的一个实施例的流程图;Figure 1 is a flow chart of one embodiment of the method of the present invention;
图2为本发明中全卷积神经网络用于密集特征提取的原理图;2 is a schematic diagram of a full convolutional neural network for dense feature extraction in the present invention;
图3a为本发明的一个实施例的局部敏感反卷积神经网络原理图;3a is a schematic diagram of a local sensitive deconvolution neural network according to an embodiment of the present invention;
图3b为本发明的一个实施例的局部敏感的反聚集层和局部敏感的均值聚集层的原理图;Figure 3b is a schematic diagram of a locally sensitive anti-aggregation layer and a locally sensitive mean aggregation layer in accordance with one embodiment of the present invention;
图4为本发明的一个实施例的开关门融合层。4 is a switch door fusion layer in accordance with an embodiment of the present invention.
具体实施方式detailed description
下面参照附图来描述本发明的优选实施方式。本领域技术人员应当理解的是,这些实施方式仅仅用于解释本发明的技术原理,并非旨在限制本发明的保护范围。Preferred embodiments of the present invention are described below with reference to the accompanying drawings. Those skilled in the art should understand that these embodiments are only used to explain the technical principles of the present invention, and are not intended to limit the scope of the present invention.
如图1所示,本发明的一个实施方式的基于反卷积神经网络的场景语义分割方法包括下述步骤:As shown in FIG. 1, a deconvolution neural network based scene semantic segmentation method according to an embodiment of the present invention includes the following steps:
步骤S1,对场景图片用全卷积神经网络提取低分辨率的密集特征表达;Step S1, extracting low-resolution dense feature expressions from the full-convolution neural network for the scene picture;
步骤S2,利用局部敏感的反卷积神经网络并借助所述图片的局部亲和度矩阵,对步骤S1中得到的密集特征表达进行上采样和优化,得到所述图片的分数图,从而实现精细的场景语义分割。Step S2, using a locally sensitive deconvolution neural network and using the local affinity matrix of the picture, upsampling and optimizing the dense feature expression obtained in step S1 to obtain a score map of the picture, thereby achieving fine Semantic segmentation of the scene.
场景语义分割是一种典型的密集预测问题,需要预测图片中每个像素点的语义类别,因而要求对图片中的每个像素点都能够提取到一个鲁棒的特征表达。本发明采用全卷积神经网络来有效的提取图片的密集特征,所述图片可以是RGB图片,和/或深度图片。如图2所示,全卷积神经网络通过多次卷积、降采样和最大值聚集过程,能够聚合丰富的 上下文信息来对图片中每个像素点进行特征表达,得到RGB特征图S1和/或深度特征图S1。但是由于存在多次降采样操作以及最大值聚集,全卷积神经网络得到的是一个低分辨率特征图,并且物体边沿非常的模糊。Scene semantic segmentation is a typical intensive prediction problem. It is necessary to predict the semantic category of each pixel in the image. Therefore, it is required to extract a robust feature representation for each pixel in the image. The present invention employs a full convolutional neural network to efficiently extract dense features of a picture, which may be RGB pictures, and/or depth pictures. As shown in Figure 2, the full convolutional neural network can be aggregated through multiple convolution, downsampling, and maximum aggregation processes. The context information is used to characterize each pixel in the picture to obtain an RGB feature map S1 and/or a depth feature map S1. However, due to multiple downsampling operations and maximum aggregation, the full convolutional neural network yields a low resolution feature map with very blurred edges.
为此,本发明将底层的像素级别的信息嵌入到反卷积神经网络中进行指导网络的训练。利用局部敏感的反卷积神经网络对得到的密集特征表达进行上采样学习以及物体边沿优化,得到RGB分数图S2和/或深度分数图S2,从而实现更精细的场景语义分割。To this end, the present invention embeds the underlying pixel level information into the deconvolution neural network to guide the training of the network. The locally sensitive deconvolution neural network is used to perform upsampling learning and object edge optimization to obtain RGB fractional graph S2 and/or depth fractional graph S2, thereby achieving finer scene semantic segmentation.
具体地,在步骤S2中,首先计算图片中每个像素点与邻近像素的相似度关系,并得到一个二值化的局部亲和度矩阵。本发明中可提取RGB和深度图片的SIFT,SPIN以及梯度特征,利用ucm-gPb算法来得到该局部亲和度矩阵。然后将该局部亲和度矩阵与所得到的RGB特征图S1和/或深度特征图S1输入局部敏感的反卷积神经网络,对密集特征表达进行上采样学习以及物体边沿优化,从而得到更精细的场景语义分割。Specifically, in step S2, the similarity relationship between each pixel in the picture and the neighboring pixels is first calculated, and a binarized local affinity matrix is obtained. In the present invention, SIFT, SPIN and gradient features of RGB and depth pictures can be extracted, and the local affinity matrix is obtained by using the ucm-gPb algorithm. Then input the local affinity matrix and the obtained RGB feature map S1 and/or the depth feature map S1 into the locally sensitive deconvolution neural network, and perform upsampling learning on the dense feature expression and object edge optimization, thereby obtaining finer Semantic segmentation of the scene.
局部敏感的反卷积神经网络的目的在于将全卷积神经网络得到的粗糙的特征图进行上采样和优化得到更加精确的场景分割。如图3a所示,该网络结构可包含三个模块:局部敏感的反聚集层(unpooling),反卷积层,以及局部敏感的均值聚集层(average pooling)。The purpose of the locally sensitive deconvolution neural network is to upsample and optimize the coarse feature map obtained by the full convolutional neural network to obtain more accurate scene segmentation. As shown in Figure 3a, the network structure can include three modules: a locally sensitive anti-aggregation layer, a deconvolution layer, and a locally sensitive average pooling.
如图3b上部分所示,局部敏感的反聚集层的输入是上一层的特征图响应,以及局部亲和度矩阵,输出是两倍分辨率的特征图响应。该网络层的主要功能是学习恢复原始图片中的更丰富的细节信息,得到物体边沿更加清晰的分割的结果。As shown in the upper part of Figure 3b, the input of the locally sensitive anti-aggregation layer is the feature map response of the previous layer, and the local affinity matrix, the output is a feature map response of twice the resolution. The main function of the network layer is to learn to restore the richer details in the original picture and get the result of clearer segmentation of the edges of the object.
本发明中可通过以下公式得到局部敏感的反聚集层的输出结果:
Figure PCTCN2016103425-appb-000004
In the present invention, the output of the locally sensitive anti-aggregation layer can be obtained by the following formula:
Figure PCTCN2016103425-appb-000004
其中x代表特征图中某个像素点的特征向量,A={Ai,j}是x为中心得到的一个s×s大小的二值化局部亲和度矩阵,表征周围领域的像素点和中间像素点是否相似,(i,j)和(o,o)分别代表亲和度矩阵中的任意位置及中心位置,Y={Yi,j}是反聚集输出的特征图。通过反聚集操作,能够得到一个分辨率更好,细节更多的分割图。Where x represents the eigenvector of a pixel in the feature map, A={A i,j } is a binarized local affinity matrix of s×s size obtained by centering x, representing the pixel points of the surrounding domain and Whether the intermediate pixels are similar, (i, j) and (o, o) represent arbitrary positions and center positions in the affinity matrix, respectively, and Y = {Y i, j } is a feature map of the inverse aggregate output. Through the anti-aggregation operation, a segmentation map with better resolution and more details can be obtained.
反卷积层的输入是上一层反聚集层的输出,输出是等分辨率的特征图响应。该网络层主要是用来平滑特征图,因为反聚集层容易产生很多断裂的物体边沿,可利用反卷积过程来学***滑一些。The input of the deconvolution layer is the output of the upper layer of the anti-aggregation layer, and the output is the signature response of the equal resolution. The network layer is mainly used to smooth the feature map, because the anti-aggregation layer is prone to generate many broken object edges, and the deconvolution process can be used to learn to splicing the edges of the breaks. Deconvolution uses the inverse of convolution, mapping each stimulus response value to multiple stimulus response outputs. The response graph after deconvolution will become relatively smoother.
如图3b下部分所示,局部敏感的均值聚集层的输入是上一层反卷积层的输出,以及局部亲和度矩阵,输出是等分辨率的特征图响应。该网络层主要是用来得到每个像素点更加鲁棒的特征表达,同时能够保持对物体边沿的敏感性。As shown in the lower part of Figure 3b, the input of the locally sensitive mean gather layer is the output of the upper deconvolution layer, and the local affinity matrix, and the output is an equal resolution feature map response. The network layer is mainly used to obtain a more robust feature representation for each pixel while maintaining sensitivity to the edges of the object.
本发明中可通过以下公式得到局部敏感的反聚集层的输出结果:
Figure PCTCN2016103425-appb-000005
其中y是输出的特征向量,A={Ai,j}是y为中心得到的一个s×s大小的二值化局部亲和度矩阵,Ai,j表征周围领域的像素点和中间像素点是否相似,(i,j)和(o,o)分别代表亲和度矩阵中的任意位置及中心位置,X={Xi,j}是该操作的输入特征图。通过局部敏感的均值聚集之后,既能够得到非常鲁棒的特征表达,同时能够保持对物体边沿的敏感性。
In the present invention, the output of the locally sensitive anti-aggregation layer can be obtained by the following formula:
Figure PCTCN2016103425-appb-000005
Where y is the output eigenvector, A={A i,j } is a binarized local affinity matrix of s×s size obtained by y centering, A i,j characterizing the pixel and intermediate pixels of the surrounding domain Whether the points are similar, (i, j) and (o, o) represent arbitrary positions and center positions in the affinity matrix, respectively, and X = {X i, j } is the input feature map of the operation. After locally sensitive mean aggregation, both very robust feature representations can be obtained while maintaining sensitivity to the edges of the object.
本发明将局部敏感的反聚集层、反卷积层以及局部敏感的均值聚集层多次拼接组合在一起,逐渐的上采样和优化场景分割的细节信息,得到更精细、更准确的场景分割效果。优选地,所述拼接次数为2或3次。拼接次数越多,得到的场景分割越精细、准确,但是计算量也越大。 The invention combines the locally sensitive anti-aggregation layer, the deconvolution layer and the locally sensitive mean aggregation layer multiple times, gradually upsampling and optimizing the detailed information of the scene segmentation, and obtaining a finer and more accurate scene segmentation effect. . Preferably, the number of stitching is 2 or 3 times. The more the number of stitching, the finer and more accurate the segmentation is, but the larger the amount of calculation.
RGB色彩信息和深度信息描述了场景中物体的不同模态的信息,比如RGB图片能够描述物体的表观、颜色以及纹理特征,而深度数据提供了物体的空间几何、形状以及尺寸信息。有效的融合这两种互补的信息能够提升场景语义分割的精度。现有的方法基本都是将两种模态的数据等价的看待,无法区分这两种模态在识别不同场景下不同物体时的不同贡献。基于此,本发明的一个优选的实施方式中提出,将通过上述步骤S1和S2得到的RGB分数图和深度分数图通过开关门融合(gate fusion)进行最优化融合,得到融合分数图,从而实现更精细的场景语义分割,如图4所示。开关门融合层能够有效地衡量RGB(表观)和深度(形状)信息对于识别不同场景下的不同物体的重要性程度。RGB color information and depth information describe information about different modalities of objects in the scene. For example, RGB images can describe the appearance, color, and texture features of an object, while depth data provides spatial geometry, shape, and size information of an object. Effectively blending these two complementary information can improve the accuracy of scene semantic segmentation. The existing methods basically treat the data of the two modes equivalently, and cannot distinguish the different contributions of the two modes when identifying different objects in different scenes. Based on this, in a preferred embodiment of the present invention, it is proposed that the RGB fractional map and the depth fraction map obtained by the above steps S1 and S2 are optimally fused by gate fusion to obtain a fusion score map, thereby realizing More detailed scene semantic segmentation, as shown in Figure 4. The switch gate fusion layer can effectively measure the importance of RGB (apparent) and depth (shape) information for identifying different objects in different scenes.
优选地,本发明的开关门融合层主要由拼接层、卷积层以及归一化层组合而成,其能够自动的学习两种模态的权重,从而更好的融合这两种模态的互补信息用于场景语义分割中。Preferably, the switch gate fusion layer of the present invention is mainly composed of a stitching layer, a convolution layer and a normalized layer, which can automatically learn the weights of the two modes, thereby better integrating the two modes. Complementary information is used in scene semantic segmentation.
首先通过拼接层将RGB和深度网络得到的特征进行拼接。其次是卷积操作,通过卷积层学习得到RGB和深度信息的权重矩阵,卷积过程可如下实现:The features obtained by the RGB and deep networks are first spliced through the splicing layer. The second is the convolution operation, which learns the weight matrix of RGB and depth information through convolutional layer learning. The convolution process can be implemented as follows:
Figure PCTCN2016103425-appb-000006
Figure PCTCN2016103425-appb-000006
其中Prgb∈□c×h×w(c个通道的特征图,每个特征图高为h,宽为w)为基于RGB数据预测的分数图,Pdepth∈□c×h×w(参数意义同上)为基于深度数据预测的分数图,W∈Rc×2c×1×1(c个滤波子,每个滤波子为2c×1×1的三维矩阵)为开关门融合层学习的滤波子,C∈Rc×h×w是卷积输出的贡献系数矩阵。最后是归一化处理,优选地,通过sigmoid函数操作将Ck,i,j归一化到[0,1]区间内。最后我们记Crgb=C,Cdepth=1-C,并且将贡献系数矩阵作用原来的分数输出,得到: Where P rgb ∈ □ c × h × w (characteristic map of c channels, each feature map height h, width w) is a fractional graph predicted based on RGB data, P depth ∈ □ c × h × w (parameter meanings as defined above) as a fraction of depth based on the prediction data, W∈R c × 2c × 1 × 1 (c th sub-filter, filtering each of the sub-three-dimensional matrix 2c × 1 × 1) of the learning filter layer is fusion door Sub, C∈R c×h×w is the contribution coefficient matrix of the convolution output. Finally, the normalization process, preferably , normalizes C k,i,j into the interval [0,1] by the sigmoid function operation. Finally, we remember that C rgb = C, C depth = 1 - C, and the contribution coefficient matrix will act on the original fractional output, resulting in:
Figure PCTCN2016103425-appb-000007
Figure PCTCN2016103425-appb-000007
其中□为矩阵点乘操作。将RGB和深度的分数相加作为最后的融合分数,即为
Figure PCTCN2016103425-appb-000008
基于最终的分数图,就能够得到语义分割结果。
Where □ is a matrix point multiplication operation. Adding the RGB and depth scores as the final blend score is
Figure PCTCN2016103425-appb-000008
Based on the final score map, the semantic segmentation results can be obtained.
在归一化处理中,替代sigmoid函数可以用L1范数,L1范数就是x1=x1/(x1+x2+...+xn),保证概率和为1。还可以用tanh函数(双曲正切函数)。优选使用sigmoid,因为在神经网络中实现更简单,优化结果更好,收敛更快。In the normalization process, instead of the sigmoid function, the L1 norm can be used, and the L1 norm is x1=x1/(x1+x2+...+xn), and the probability sum is guaranteed to be 1. You can also use the tanh function (hyperbolic tangent function). It is preferable to use sigmoid because the implementation is simpler in the neural network, the optimization result is better, and the convergence is faster.
本发明提出的新的基于局部敏感的反卷积神经网络可用于RGB-D室内场景语义分割。该发明能够很好的适应室内场景的光线变化、背景嘈杂、小物体多以及遮挡等困难,并且能更加有效的利用RGB和深度的互补性,得到更加鲁棒、精度更高、物体边沿保持更好的场景语义分割效果。The new local sensitivity-based deconvolution neural network proposed by the present invention can be used for RGB-D indoor scene semantic segmentation. The invention can well adapt to the light changes of the indoor scene, the background noise, the small objects and the occlusion, and can more effectively utilize the complementarity of RGB and depth to obtain more robust, higher precision, and the edge of the object remains more. Good scene semantic segmentation effect.
至此,已经结合附图所示的优选实施方式描述了本发明的技术方案,但是,本领域技术人员容易理解的是,本发明的保护范围显然不局限于这些具体实施方式。在不偏离本发明的原理的前提下,本领域技术人员可以对相关技术特征作出等同的更改或替换,这些更改或替换之后的技术方案都将落入本发明的保护范围之内。 Heretofore, the technical solutions of the present invention have been described in conjunction with the preferred embodiments shown in the drawings, but it is obvious to those skilled in the art that the scope of the present invention is obviously not limited to the specific embodiments. Those skilled in the art can make equivalent changes or substitutions to the related technical features without departing from the principles of the present invention, and the technical solutions after the modifications or replacements fall within the scope of the present invention.

Claims (10)

  1. 一种基于反卷积神经网络的场景语义分割方法,其特征在于,所述方法包括下述步骤:A scene semantic segmentation method based on deconvolution neural network, characterized in that the method comprises the following steps:
    步骤S1,对场景图片用全卷积神经网络提取密集特征表达;Step S1, extracting a dense feature expression by using a full convolutional neural network for the scene picture;
    步骤S2,利用局部敏感的反卷积神经网络并借助所述图片的局部亲和度矩阵,对步骤S1中得到的密集特征表达进行上采样和优化,得到所述图片的分数图,从而实现精细的场景语义分割。Step S2, using a locally sensitive deconvolution neural network and using the local affinity matrix of the picture, upsampling and optimizing the dense feature expression obtained in step S1 to obtain a score map of the picture, thereby achieving fine Semantic segmentation of the scene.
  2. 根据权利要求1所述的方法,其特征在于,所述局部亲和度矩阵通过提取所述图片的SIFT特征、SPIN特征以及梯度特征,然后利用ucm-gPb算法求得。The method according to claim 1, wherein the local affinity matrix is obtained by extracting SIFT features, SPIN features, and gradient features of the picture, and then using an ucm-gPb algorithm.
  3. 根据权利要求1所述的方法,其特征在于,所述局部敏感的反卷积神经网络由三个模块多次拼接而成,该三个模块分别是局部敏感的反聚集层、反卷积层和局部敏感的均值聚集层。The method according to claim 1, wherein said locally sensitive deconvolution neural network is formed by splicing a plurality of modules, wherein the three modules are locally sensitive anti-aggregation layers and deconvolution layers, respectively. And a locally sensitive mean aggregation layer.
  4. 根据权利要求3所述的方法,其特征在于,所述拼接次数为2或3次。The method according to claim 3, wherein the number of stitching is 2 or 3 times.
  5. 根据权利要求3所述的方法,其特征在于,通过以下公式得到所述局部敏感的反聚集层的输出结果:The method according to claim 3, wherein the output of said locally sensitive anti-aggregation layer is obtained by the following formula:
    Figure PCTCN2016103425-appb-100001
    其中x代表特征图中某个像素点的
    Figure PCTCN2016103425-appb-100002
    Figure PCTCN2016103425-appb-100001
    Where x represents a pixel in the feature map
    Figure PCTCN2016103425-appb-100002
    特征向量,A={Ai,j}是x为中心得到的一个s×s大小的局部亲和度矩阵,表征周围领域的像素点和中间像素点是否相似,(i,j)和(o,o)分别代表亲和度矩阵中的任意位置及中心位置,Y={Yi,j}是反聚集输出的特征图。 The eigenvector, A={A i,j } is a local affinity matrix of s×s size obtained by centering x, and whether the pixel points and intermediate pixel points of the surrounding domain are similar, (i, j) and (o) , o) respectively represent an arbitrary position and a center position in the affinity matrix, and Y={Y i,j } is a feature map of the inverse aggregation output.
  6. 根据权利要求3所述的方法,其特征在于,通过以下公式实现所述局部敏感的均值聚集层:
    Figure PCTCN2016103425-appb-100003
    其中,y是输出的特征向量,A={Ai,j}是y为中心得到的一个s×s大小的局部亲和度矩阵,Ai,j表征周围领域的像素点和中间像素点是否相似,(i,j)和(o,o)分别代表亲和度矩阵中的任意位置及中心位置,X={Xi,j}是输入特征图。
    The method of claim 3 wherein said locally sensitive mean aggregation layer is implemented by the following formula:
    Figure PCTCN2016103425-appb-100003
    Where y is the output eigenvector, A={A i,j } is a local affinity matrix of s×s size obtained by y centering, and A i,j is used to represent whether the pixel points and intermediate pixel points of the surrounding domain are Similarly, (i, j) and (o, o) represent arbitrary positions and center positions in the affinity matrix, respectively, and X = {X i, j } is an input feature map.
  7. 根据权利要求1-6中任一项所述的方法,其特征在于,在所述步骤S1中,所述场景图片包括RGB图片和深度图片,所述方法还包括步骤S3:将得到的RGB分数图和深度分数图通过开关门融合层进行最优化融合,从而实现更精细的场景语义分割。The method according to any one of claims 1 to 6, wherein in the step S1, the scene picture comprises an RGB picture and a depth picture, the method further comprising the step S3: the obtained RGB score The graph and the depth score map are optimally fused by the switch gate fusion layer to achieve finer scene semantic segmentation.
  8. 根据权利要求7所述的方法,其特征在于,所述的开关门融合层包括拼接层、卷积层以及归一化层。The method according to claim 7, wherein said switching gate fusion layer comprises a splicing layer, a convolution layer and a normalization layer.
  9. 根据权利要求8所述的方法,其特征在于,所述卷积层通过如下函数实现:The method of claim 8 wherein said convolutional layer is implemented by the following function:
    Figure PCTCN2016103425-appb-100004
    其中Prgb∈□c×h×w为基于RGB数据
    Figure PCTCN2016103425-appb-100005
    Figure PCTCN2016103425-appb-100004
    Where P rgb ∈ □ c × h × w is based on RGB data
    Figure PCTCN2016103425-appb-100005
    预测的分数图,Pdepth∈□c×h×w为基于深度数据预测的分数图,W∈Rc×2c×1×1为开关门融合层学习的滤波子,C∈Rc×h×w是卷积输出的贡献系数矩阵。FIG predicted score, P depth ∈ □ c × h × w fractional FIG depth data based on the prediction, W∈R c × 2c × 1 × 1 gate to switch filtering sub-confluent layer of learning, C∈R c × h × w is the contribution coefficient matrix of the convolution output.
  10. 根据权利要求8所述的方法,其特征在于,所述归一化层通过sigmoid函数实现。 The method of claim 8 wherein said normalized layer is implemented by a sigmoid function.
PCT/CN2016/103425 2016-10-26 2016-10-26 De-convolutional neural network-based scene semantic segmentation method WO2018076212A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/103425 WO2018076212A1 (en) 2016-10-26 2016-10-26 De-convolutional neural network-based scene semantic segmentation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/103425 WO2018076212A1 (en) 2016-10-26 2016-10-26 De-convolutional neural network-based scene semantic segmentation method

Publications (1)

Publication Number Publication Date
WO2018076212A1 true WO2018076212A1 (en) 2018-05-03

Family

ID=62023002

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/103425 WO2018076212A1 (en) 2016-10-26 2016-10-26 De-convolutional neural network-based scene semantic segmentation method

Country Status (1)

Country Link
WO (1) WO2018076212A1 (en)

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109522966A (en) * 2018-11-28 2019-03-26 中山大学 A kind of object detection method based on intensive connection convolutional neural networks
CN109543502A (en) * 2018-09-27 2019-03-29 天津大学 A kind of semantic segmentation method based on the multiple dimensioned neural network of depth
CN109785435A (en) * 2019-01-03 2019-05-21 东易日盛家居装饰集团股份有限公司 A kind of wall method for reconstructing and device
CN109902755A (en) * 2019-03-05 2019-06-18 南京航空航天大学 A kind of multi-layer information sharing and correcting method for XCT slice
CN110427953A (en) * 2019-06-21 2019-11-08 中南大学 Robot is allowed to carry out the implementation method of vision place identification in changing environment based on convolutional neural networks and sequences match
CN110458939A (en) * 2019-07-24 2019-11-15 大连理工大学 The indoor scene modeling method generated based on visual angle
CN110826702A (en) * 2019-11-18 2020-02-21 方玉明 Abnormal event detection method for multitask deep network
CN110874841A (en) * 2018-09-04 2020-03-10 斯特拉德视觉公司 Object detection method and device with reference to edge image
CN110929613A (en) * 2019-11-14 2020-03-27 上海眼控科技股份有限公司 Image screening algorithm for intelligent traffic violation audit
CN111192271A (en) * 2018-11-14 2020-05-22 银河水滴科技(北京)有限公司 Image segmentation method and device
CN111242027A (en) * 2020-01-13 2020-06-05 北京工业大学 Unsupervised learning scene feature rapid extraction method fusing semantic information
CN111259901A (en) * 2020-01-13 2020-06-09 镇江优瞳智能科技有限公司 Efficient method for improving semantic segmentation precision by using spatial information
CN111311611A (en) * 2020-02-17 2020-06-19 清华大学深圳国际研究生院 Real-time three-dimensional large-scene multi-object instance segmentation method
US10692244B2 (en) 2017-10-06 2020-06-23 Nvidia Corporation Learning based camera pose estimation from images of an environment
CN111488880A (en) * 2019-01-25 2020-08-04 斯特拉德视觉公司 Method and apparatus for improving segmentation performance for detecting events using edge loss
CN111563507A (en) * 2020-04-14 2020-08-21 浙江科技学院 Indoor scene semantic segmentation method based on convolutional neural network
CN111723810A (en) * 2020-05-11 2020-09-29 北京航空航天大学 Interpretability method of scene recognition task model
CN111931689A (en) * 2020-08-26 2020-11-13 北京建筑大学 Method for extracting video satellite data identification features on line
CN112085747A (en) * 2020-09-08 2020-12-15 中国科学院计算技术研究所厦门数据智能研究院 Image segmentation method based on local relation guidance
CN112164078A (en) * 2020-09-25 2021-01-01 上海海事大学 RGB-D multi-scale semantic segmentation method based on encoder-decoder
CN112381948A (en) * 2020-11-03 2021-02-19 上海交通大学烟台信息技术研究院 Semantic-based laser stripe center line extraction and fitting method
CN113239891A (en) * 2021-06-09 2021-08-10 上海海事大学 Scene classification system and method based on deep learning
CN113658200A (en) * 2021-07-29 2021-11-16 东北大学 Edge perception image semantic segmentation method based on self-adaptive feature fusion
CN114332473A (en) * 2021-09-29 2022-04-12 腾讯科技(深圳)有限公司 Object detection method, object detection device, computer equipment, storage medium and program product
CN115496975A (en) * 2022-08-29 2022-12-20 锋睿领创(珠海)科技有限公司 Auxiliary weighted data fusion method, device, equipment and storage medium
CN115546271A (en) * 2022-09-29 2022-12-30 锋睿领创(珠海)科技有限公司 Visual analysis method, device, equipment and medium based on depth joint characterization
CN115953666A (en) * 2023-03-15 2023-04-11 国网湖北省电力有限公司经济技术研究院 Transformer substation field progress identification method based on improved Mask-RCNN
CN115995002A (en) * 2023-03-24 2023-04-21 南京信息工程大学 Network construction method and urban scene real-time semantic segmentation method
CN116051830A (en) * 2022-12-20 2023-05-02 中国科学院空天信息创新研究院 Cross-modal data fusion-oriented contrast semantic segmentation method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105389798A (en) * 2015-10-19 2016-03-09 西安电子科技大学 Synthetic aperture radar (SAR) image segmentation method based on deconvolution network and mapping inference network
CN105427313A (en) * 2015-11-23 2016-03-23 西安电子科技大学 Deconvolutional network and adaptive inference network based SAR image segmentation method
CN105488809A (en) * 2016-01-14 2016-04-13 电子科技大学 Indoor scene meaning segmentation method based on RGBD descriptor
CN105608692A (en) * 2015-12-17 2016-05-25 西安电子科技大学 PolSAR image segmentation method based on deconvolution network and sparse classification

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105389798A (en) * 2015-10-19 2016-03-09 西安电子科技大学 Synthetic aperture radar (SAR) image segmentation method based on deconvolution network and mapping inference network
CN105427313A (en) * 2015-11-23 2016-03-23 西安电子科技大学 Deconvolutional network and adaptive inference network based SAR image segmentation method
CN105608692A (en) * 2015-12-17 2016-05-25 西安电子科技大学 PolSAR image segmentation method based on deconvolution network and sparse classification
CN105488809A (en) * 2016-01-14 2016-04-13 电子科技大学 Indoor scene meaning segmentation method based on RGBD descriptor

Cited By (49)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10964061B2 (en) 2017-10-06 2021-03-30 Nvidia Corporation Learning-based camera pose estimation from images of an environment
US10692244B2 (en) 2017-10-06 2020-06-23 Nvidia Corporation Learning based camera pose estimation from images of an environment
CN110874841B (en) * 2018-09-04 2023-08-29 斯特拉德视觉公司 Object detection method and device with reference to edge image
CN110874841A (en) * 2018-09-04 2020-03-10 斯特拉德视觉公司 Object detection method and device with reference to edge image
CN109543502A (en) * 2018-09-27 2019-03-29 天津大学 A kind of semantic segmentation method based on the multiple dimensioned neural network of depth
CN109543502B (en) * 2018-09-27 2023-06-06 天津大学 Semantic segmentation method based on deep multi-scale neural network
CN111192271A (en) * 2018-11-14 2020-05-22 银河水滴科技(北京)有限公司 Image segmentation method and device
CN111192271B (en) * 2018-11-14 2023-08-22 银河水滴科技(北京)有限公司 Image segmentation method and device
CN109522966B (en) * 2018-11-28 2022-09-27 中山大学 Target detection method based on dense connection convolutional neural network
CN109522966A (en) * 2018-11-28 2019-03-26 中山大学 A kind of object detection method based on intensive connection convolutional neural networks
CN109785435A (en) * 2019-01-03 2019-05-21 东易日盛家居装饰集团股份有限公司 A kind of wall method for reconstructing and device
CN111488880A (en) * 2019-01-25 2020-08-04 斯特拉德视觉公司 Method and apparatus for improving segmentation performance for detecting events using edge loss
CN111488880B (en) * 2019-01-25 2023-04-18 斯特拉德视觉公司 Method and apparatus for improving segmentation performance for detecting events using edge loss
CN109902755A (en) * 2019-03-05 2019-06-18 南京航空航天大学 A kind of multi-layer information sharing and correcting method for XCT slice
CN110427953A (en) * 2019-06-21 2019-11-08 中南大学 Robot is allowed to carry out the implementation method of vision place identification in changing environment based on convolutional neural networks and sequences match
CN110427953B (en) * 2019-06-21 2022-11-29 中南大学 Implementation method for enabling robot to perform visual place recognition in variable environment based on convolutional neural network and sequence matching
CN110458939A (en) * 2019-07-24 2019-11-15 大连理工大学 The indoor scene modeling method generated based on visual angle
CN110458939B (en) * 2019-07-24 2022-11-18 大连理工大学 Indoor scene modeling method based on visual angle generation
CN110929613A (en) * 2019-11-14 2020-03-27 上海眼控科技股份有限公司 Image screening algorithm for intelligent traffic violation audit
CN110826702A (en) * 2019-11-18 2020-02-21 方玉明 Abnormal event detection method for multitask deep network
CN111259901A (en) * 2020-01-13 2020-06-09 镇江优瞳智能科技有限公司 Efficient method for improving semantic segmentation precision by using spatial information
CN111242027A (en) * 2020-01-13 2020-06-05 北京工业大学 Unsupervised learning scene feature rapid extraction method fusing semantic information
CN111242027B (en) * 2020-01-13 2023-04-14 北京工业大学 Unsupervised learning scene feature rapid extraction method fusing semantic information
CN111311611B (en) * 2020-02-17 2023-04-18 清华大学深圳国际研究生院 Real-time three-dimensional large-scene multi-object instance segmentation method
CN111311611A (en) * 2020-02-17 2020-06-19 清华大学深圳国际研究生院 Real-time three-dimensional large-scene multi-object instance segmentation method
CN111563507B (en) * 2020-04-14 2024-01-12 浙江科技学院 Indoor scene semantic segmentation method based on convolutional neural network
CN111563507A (en) * 2020-04-14 2020-08-21 浙江科技学院 Indoor scene semantic segmentation method based on convolutional neural network
CN111723810B (en) * 2020-05-11 2022-09-16 北京航空航天大学 Interpretability method of scene recognition task model
CN111723810A (en) * 2020-05-11 2020-09-29 北京航空航天大学 Interpretability method of scene recognition task model
CN111931689B (en) * 2020-08-26 2021-04-23 北京建筑大学 Method for extracting video satellite data identification features on line
CN111931689A (en) * 2020-08-26 2020-11-13 北京建筑大学 Method for extracting video satellite data identification features on line
CN112085747A (en) * 2020-09-08 2020-12-15 中国科学院计算技术研究所厦门数据智能研究院 Image segmentation method based on local relation guidance
CN112085747B (en) * 2020-09-08 2023-07-21 中科(厦门)数据智能研究院 Image segmentation method based on local relation guidance
CN112164078B (en) * 2020-09-25 2024-03-15 上海海事大学 RGB-D multi-scale semantic segmentation method based on encoder-decoder
CN112164078A (en) * 2020-09-25 2021-01-01 上海海事大学 RGB-D multi-scale semantic segmentation method based on encoder-decoder
CN112381948B (en) * 2020-11-03 2022-11-29 上海交通大学烟台信息技术研究院 Semantic-based laser stripe center line extraction and fitting method
CN112381948A (en) * 2020-11-03 2021-02-19 上海交通大学烟台信息技术研究院 Semantic-based laser stripe center line extraction and fitting method
CN113239891A (en) * 2021-06-09 2021-08-10 上海海事大学 Scene classification system and method based on deep learning
CN113658200A (en) * 2021-07-29 2021-11-16 东北大学 Edge perception image semantic segmentation method based on self-adaptive feature fusion
CN113658200B (en) * 2021-07-29 2024-01-02 东北大学 Edge perception image semantic segmentation method based on self-adaptive feature fusion
CN114332473A (en) * 2021-09-29 2022-04-12 腾讯科技(深圳)有限公司 Object detection method, object detection device, computer equipment, storage medium and program product
CN115496975B (en) * 2022-08-29 2023-08-18 锋睿领创(珠海)科技有限公司 Auxiliary weighted data fusion method, device, equipment and storage medium
CN115496975A (en) * 2022-08-29 2022-12-20 锋睿领创(珠海)科技有限公司 Auxiliary weighted data fusion method, device, equipment and storage medium
CN115546271B (en) * 2022-09-29 2023-08-22 锋睿领创(珠海)科技有限公司 Visual analysis method, device, equipment and medium based on depth joint characterization
CN115546271A (en) * 2022-09-29 2022-12-30 锋睿领创(珠海)科技有限公司 Visual analysis method, device, equipment and medium based on depth joint characterization
CN116051830B (en) * 2022-12-20 2023-06-20 中国科学院空天信息创新研究院 Cross-modal data fusion-oriented contrast semantic segmentation method
CN116051830A (en) * 2022-12-20 2023-05-02 中国科学院空天信息创新研究院 Cross-modal data fusion-oriented contrast semantic segmentation method
CN115953666A (en) * 2023-03-15 2023-04-11 国网湖北省电力有限公司经济技术研究院 Transformer substation field progress identification method based on improved Mask-RCNN
CN115995002A (en) * 2023-03-24 2023-04-21 南京信息工程大学 Network construction method and urban scene real-time semantic segmentation method

Similar Documents

Publication Publication Date Title
WO2018076212A1 (en) De-convolutional neural network-based scene semantic segmentation method
CN107066916B (en) Scene semantic segmentation method based on deconvolution neural network
WO2020108358A1 (en) Image inpainting method and apparatus, computer device, and storage medium
CN107578418B (en) Indoor scene contour detection method fusing color and depth information
CN106529447B (en) Method for identifying face of thumbnail
CN109583340B (en) Video target detection method based on deep learning
CN108875935B (en) Natural image target material visual characteristic mapping method based on generation countermeasure network
CN109410168B (en) Modeling method of convolutional neural network for determining sub-tile classes in an image
WO2018000752A1 (en) Monocular image depth estimation method based on multi-scale cnn and continuous crf
CN108564549B (en) Image defogging method based on multi-scale dense connection network
CN111476710B (en) Video face changing method and system based on mobile platform
TW200834459A (en) Video object segmentation method applied for rainy situations
CN111046868B (en) Target significance detection method based on matrix low-rank sparse decomposition
CN110705634B (en) Heel model identification method and device and storage medium
Huang et al. Automatic building change image quality assessment in high resolution remote sensing based on deep learning
CN112580661A (en) Multi-scale edge detection method under deep supervision
Feng et al. Low-light image enhancement algorithm based on an atmospheric physical model
Liu et al. Progressive complex illumination image appearance transfer based on CNN
CN111401209B (en) Action recognition method based on deep learning
Zhao et al. Color channel fusion network for low-light image enhancement
CN115953330B (en) Texture optimization method, device, equipment and storage medium for virtual scene image
CN116993760A (en) Gesture segmentation method, system, device and medium based on graph convolution and attention mechanism
Yuan et al. Explore double-opponency and skin color for saliency detection
CN108765384B (en) Significance detection method for joint manifold sequencing and improved convex hull
CN110910497A (en) Method and system for realizing augmented reality map

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16920114

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16920114

Country of ref document: EP

Kind code of ref document: A1