CN110751153A - Semantic annotation method for RGB-D image of indoor scene - Google Patents

Semantic annotation method for RGB-D image of indoor scene Download PDF

Info

Publication number
CN110751153A
CN110751153A CN201910886599.6A CN201910886599A CN110751153A CN 110751153 A CN110751153 A CN 110751153A CN 201910886599 A CN201910886599 A CN 201910886599A CN 110751153 A CN110751153 A CN 110751153A
Authority
CN
China
Prior art keywords
superpixel
pixel
super
feature
group
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910886599.6A
Other languages
Chinese (zh)
Other versions
CN110751153B (en
Inventor
王立春
刘甜
王少帆
孔德慧
李敬华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN201910886599.6A priority Critical patent/CN110751153B/en
Publication of CN110751153A publication Critical patent/CN110751153A/en
Application granted granted Critical
Publication of CN110751153B publication Critical patent/CN110751153B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

A semantic annotation method for an indoor scene RGB-D image can enable a receptive field in the indoor scene semantic annotation method not to be limited to superpixels, construct semantic feature representation of superpixel groups and further optimize the superpixel group features based on metric learning, and therefore accuracy of indoor scene understanding is improved. The semantic annotation method comprises the following steps: (1) performing superpixel segmentation on the RGB-D indoor scene image by adopting an gPb/UCM algorithm; (2) super-pixel feature extraction: executing Patch feature calculation and superpixel feature representation; (3) super-pixel group feature extraction: executing example superpixel groups and feature extraction thereof, and class superpixel groups and feature extraction thereof; (4) superpixel group feature vectorization: defining constant distance between Gaussian components, executing example super-pixel group characteristic vectorization and executing class super-pixel group characteristic vectorization; (5) metric learning: and learning the optimization matrix L and labeling the test sample based on the optimization matrix L.

Description

Semantic annotation method for RGB-D image of indoor scene
Technical Field
The invention relates to the technical field of multimedia technology and computer graphics, in particular to a semantic annotation method for an indoor scene RGB-D image.
Background
Scene understanding is a very important task in the field of computer vision. With the development of the field of artificial intelligence in recent years, a plurality of scene understanding methods and technologies emerge, and currently, mainstream scene understanding tasks can be divided into outdoor scene understanding and indoor scene understanding. The outdoor scene understanding can be applied to the field of transportation such as unmanned vehicles and unmanned planes, and the indoor scene understanding includes but is not limited to intelligent robots and indoor public place management. Because outdoor scene data is simpler compared to indoor scenes, a great deal of research is based on outdoor scenes at present. However, people have much more time indoors than outdoors, and the indoor scene understanding can bring more convenience to our life and work if the machine can understand the indoor scene, so the indoor scene understanding is an important research hotspot.
From a data perspective, indoor scene understanding may utilize data of multiple modalities. The original data form is mostly an RGB image, with R, G, B three channels; after the Kinect camera is on the market, the depth acquisition does not depend on laser any more, the data acquisition cost is reduced, and therefore the RGB-D data are popularized and applied. Due to the fact that depth information is added and depth completion is benefited, the three-dimensional data point cloud and the triangular mesh can be generated through reconstruction based on RGB-D data, and therefore richer source data are provided for indoor scene understanding.
In practice, the difficulty of understanding indoor scenes is much greater than outdoor scenes. One reason is due to the complexity of the indoor scene itself. Objects in an indoor scene are densely distributed, the shielding between the objects is serious, the appearance shapes of the similar objects are changeable, the similar objects have abundant and various texture characteristics, the different placing angles can cause the imaging with the great difference, and the light source change also presents complex and various expression forms. These features add unique complexity and difficulty to indoor scene understanding.
Most of semantic labeling of the traditional method is based on a segmentation region, superpixels are obtained by utilizing over segmentation, then characteristics are extracted from the superpixels, and labeling is carried out according to the characteristics. Ren et al over-segments the image using depth weighting gPb/UCM and describes sub-features using five kernels, namely depth gradient, color gradient, local binary pattern, color feature and global geometric feature, which are trained directly using SVM to obtain a class average accuracy of 71.4% in NYUv1, and after using context information and markov optimization, a class average accuracy of 76.1% is achieved. The class average accuracy rate is improved by 3% through the context information alone, and the context only utilizes the characteristics of the superpixel under the over-segmentation of different thresholds, but not the characteristics with larger receptive field based on the superpixel characteristics. Deep learning is mostly end-to-end semantic labeling, a few are performed on superpixels, the end-to-end semantic labeling has a continuously expanded receptive field, and the Park et al propose that RDF-Net performs multi-mode fusion of a residual error network to achieve 62.8% of class average accuracy in NYUv2 and achieve 60.1% in a SUN-RGBD data set. Fan et al select RNN as the base network, extend the single-mode RNN network to the multi-mode RNN network, and integrate HHA depth information and color information. Both networks use upsampling to expand the receptive field. The field of view can also be expanded using hole convolution without loss of resolution.
In general, the traditional method and the deep learning can jointly process a plurality of tasks in scene understanding, and meanwhile, the deep learning has certain advantages in the aspect of accuracy in the aspect of classification tasks compared with the traditional method. One reason for this is that deep learning has an ever-expanding field of view, which is mainly achieved by hole convolution and pooling layers. Most of the traditional methods are region-based, depend on superpixel segmentation, do not have continuously expanded receptive fields, and lack the utilization of global context information.
Disclosure of Invention
In order to overcome the defects of the prior art, the technical problem to be solved by the invention is to provide the semantic annotation method of the RGB-D image of the indoor scene, which can enable the receptive field in the semantic annotation method of the indoor scene not to be limited to the superpixel, optimize the semantic features of the superpixel group and improve the accuracy of understanding the indoor scene.
The technical scheme of the invention is as follows: the semantic annotation method for the RGB-D image of the indoor scene comprises the following steps:
(1) performing superpixel segmentation on the RGB-D indoor scene image by adopting an gPb/UCM algorithm;
(2) super-pixel feature extraction: executing Patch feature calculation and superpixel feature representation;
(3) super-pixel group feature extraction: executing example superpixel groups and feature extraction thereof, and class superpixel groups and feature extraction thereof;
(4) superpixel group feature vectorization: defining constant distance between Gaussian components, executing example super-pixel group characteristic vectorization and executing class super-pixel group characteristic vectorization;
(5) metric learning: and learning the optimization matrix L and labeling the test sample based on the optimization matrix L.
The method adopts gPb-ucm algorithm to carry out superpixel segmentation on RGB-D indoor scene images, a plurality of superpixels which are most likely to form an example are called superpixel groups, and a Gaussian mixture model is utilized to establish semantic feature representation of the superpixel groups based on superpixel features; mapping the Gaussian component of the Riemannian manifold space to an Euclidean space by using a Kullback-Leibler Divergence kernel distance to obtain the characteristic vector representation of the super pixel group; optimizing the feature vectors by using a large-interval nearest neighbor metric learning method, and finally performing semantic annotation on the superpixel groups based on optimized feature representation; therefore, the receptive field in the indoor scene semantic annotation method is not limited to the super-pixels, the semantic features of the super-pixel groups are optimized, and the accuracy of indoor scene understanding is improved.
Drawings
FIG. 1 is a flow chart of a semantic annotation method for an RGB-D image of an indoor scene according to the invention.
FIG. 2 is a flowchart of an embodiment of a semantic annotation method for an RGB-D image of an indoor scene according to the invention.
Detailed Description
As shown in fig. 1, the semantic annotation method for the RGB-D image of the indoor scene includes the following steps:
(1) performing superpixel segmentation on the RGB-D indoor scene image by adopting an gPb/UCM algorithm;
(2) super-pixel feature extraction: executing Patch feature calculation and superpixel feature representation;
(3) super-pixel group feature extraction: executing example superpixel groups and feature extraction thereof, and class superpixel groups and feature extraction thereof;
(4) superpixel group feature vectorization: defining constant distance between Gaussian components, executing example super-pixel group characteristic vectorization and executing class super-pixel group characteristic vectorization;
(5) metric learning: and learning the optimization matrix L and labeling the test sample based on the optimization matrix L.
The method adopts gPb-ucm algorithm to carry out superpixel segmentation on RGB-D indoor scene images, a plurality of superpixels which are most likely to form an example are called superpixel groups, and a Gaussian mixture model is utilized to establish semantic feature representation of the superpixel groups based on superpixel features; mapping the Gaussian component of the Riemannian manifold space to an Euclidean space by using a Kullback-Leibler Divergence kernel distance to obtain the characteristic vector representation of the super pixel group; optimizing the feature vectors by using a large-interval nearest neighbor metric learning method, and finally performing semantic annotation on the superpixel groups based on optimized feature representation; therefore, the receptive field in the indoor scene semantic annotation method is not limited to the super-pixels, the semantic features of the super-pixel groups are optimized, and the accuracy of indoor scene understanding is improved.
Preferably, the superpixel segmentation in the step (1) uses gPb/UCM algorithm to calculate probability values of the pixels belonging to the boundary through local and global features of the imageThe gPb/UCM algorithm is applied to the color image and the depth image respectively, and the calculation is carried out according to the formula (1)
Figure BDA0002207467800000032
Wherein the content of the first and second substances,is a probability value calculated based on the color image that a pixel belongs to the boundary,
Figure BDA0002207467800000034
is a probability value calculated based on the depth image that a pixel belongs to the boundary,
Figure BDA0002207467800000041
probability value obtained according to formula (1)
Figure BDA0002207467800000042
And setting different probability threshold values tr to obtain a multi-level segmentation result.
Preferably, the probability threshold tr is 0.06 and 0.08, and the pixels with the probability values smaller than the set threshold are connected into a region according to the eight-way principle, wherein each region is a super pixel.
Preferably, in step (2), Patch is defined as a 16 × 16-sized grid, n pixels are taken as steps, n is 2, sliding is performed from the upper left corner of the color image RGB and the Depth image Depth to the right and downward, finally, a dense grid is formed on the color image RGB and the Depth image Depth, and four types of features are calculated for each Patch: depth gradient features, color features, texture features.
Preferably, the superpixel feature F in the step (2)segDefined by formula (5):
Figure BDA0002207467800000043
respectively representing a super-pixel depth gradient feature, a super-pixel color feature and a super-pixel texture feature, and defining a formula (6):
wherein, Fg_d(i),Fg_c(i),Fcol(i),Ftex(i) Watch (A)A feature indicating that the ith center position falls within the super pixel seg, and n indicates the number of the latches whose center positions fall within the super pixel seg;
superpixel geometry
Figure BDA0002207467800000046
And
Figure BDA0002207467800000047
defined according to equation (7):
Figure BDA0002207467800000048
wherein the components are defined as follows:
super pixel area Aseg=∑s∈seg1, s are pixels within the super-pixel seg; super pixel perimeter PsegDefined by formula (8):
Figure BDA0002207467800000051
m, N represents the horizontal and vertical resolution of the RGB scene image respectively; seg, seg' represent different superpixels; n is a radical of4(s) is a set of four-neighbor domains of pixel s; b issegIs the set of boundary pixels of the super-pixel seg;
area to perimeter ratio R of super pixelsegDefined by formula (9):
Figure BDA0002207467800000052
Figure BDA0002207467800000053
is based on the x-coordinate s of the pixel sxY coordinate syAnd the second-order (2+0 ═ 2 or 0+2 ═ 2) Hu moments calculated by multiplying the x and y coordinates, respectively, are defined as equations (10), (11), (12)
Figure BDA0002207467800000054
Figure BDA0002207467800000055
Figure BDA0002207467800000056
Wherein
Figure BDA0002207467800000057
Respectively representing the mean value of x coordinates, the mean value of y coordinates, the square of the mean value of x coordinates and the square of the mean value of y coordinates of the pixels contained in the super pixels, and defining the following formula (13):
width and Height respectively represent the Width and Height of the image, i.e.
Figure BDA0002207467800000059
Performing a calculation based on the normalized pixel coordinate values;
Figure BDA00022074678000000510
Dvarrespectively representing the depth values s of the pixels s within the superpixel segdAverage value of (1), depth value sdMean of squares, variance of depth values, defined as (14):
Figure BDA0002207467800000061
Dmissthe proportion of pixels in the super-pixel that lose depth information is defined as (15):
Figure BDA0002207467800000062
Nsegis the norm vector modulo length of the point cloud corresponding to the superpixel, where the norm vector of the point cloud corresponding to the superpixel is divided by the principal componentAnalytical PCA estimation.
Preferably, the example superpixel group and its feature extraction performed in the step (3) is:
k superpixels assumed to be an example on an image form an example superpixel group, and the kth superpixel is characterized by FsegkThe characteristic of the example superpixel group is denoted as F ═ Fseg1,Fseg2,…,FsegKAnd establishing a Gaussian mixture model formula (16) based on the super-pixel group feature F by using a maximum expectation algorithm EM, wherein the super-pixel group feature of the example uses a Gaussian component set G ═ G1,g2,…,gmMeans for
Figure BDA0002207467800000063
Gaussian mixture model G (F)seg) From several Gaussian components gi(Fseg) Weighted sum representation, wherein FsegIs a random variable, the ith Gaussian component giObeying a common Gaussian distribution of gi(Fseg)~N(Fsegi,∑i) Ith weight ωiObtained by the maximum expectation algorithm EM, μiIs the expectation of the ith gaussian component, is a vector; sigmaiIs the variance of the ith Gaussian component, is a square matrix;
the class superpixel group and the feature extraction in the step (3) are as follows:
only training samples can construct a class superpixel group, given all training sample images, a set of a plurality of superpixel blocks marked as a jth class is called as a class superpixel group, the jth class contains P superpixels, and the class characteristics are represented as
Figure BDA0002207467800000064
Also using EM algorithm for FjEstablishing a Gaussian mixture model to obtain mj Gaussian components, and expressing the characteristic of the super-pixel group as a set
Figure BDA0002207467800000071
Training samples have N-class, class supergramsThe characteristics of the element group are expressed as a set
Figure BDA0002207467800000072
Is totally composed of
Figure BDA0002207467800000074
A gaussian component.
Preferably, the constant distance between gaussian components defined in step (4) is:
the distance between the two gaussian components is taken as the Kullback-Leibler Divergence distance,
gaussian component giAnd gjThe distance therebetween is calculated according to the formula (17)
Figure BDA0002207467800000075
Substituting equation (17) into equation (18) yields two Gaussian components giAnd gjConstant distance K (g)i,gj)
K(gi,gj)=exp{-[KLD(gi||gj)+KLD(gj||gi)]/2t2} (18)
Wherein t is an empirical parameter, and takes a value of 70 in a verification experiment;
perform example superpixel group feature vector quantization as:
computing the ith Gaussian component G of the example superpixel group feature GiAnd superpixel group feature HallAs a feature vector of the example superpixel group
Figure BDA0002207467800000076
Calculated according to equation (19):
Figure BDA0002207467800000077
this exampleFeature vector set representation of superpixel groups
Figure BDA0002207467800000078
1) Example superpixel feature vectorization of training samples: extracting example superpixel group characteristics from all training sample images according to the step (3), vectorizing Gaussian component characteristics of each example superpixel group according to a formula (19), and forming a training sample example characteristic set by using the vectorized example superpixel group characteristic vectors
Figure BDA0002207467800000079
Is a formula (20)
Figure BDA00022074678000000710
Where T is the number of instances in all training samples;
2) example superpixel feature vectorization of test samples: all Gaussian component features of an example superpixel group are vectorized according to formula (19) to form a vector set
Figure BDA0002207467800000081
Figure BDA0002207467800000082
WhereinVectorized features representing an nth gaussian component; performing superpixel-like group feature vector quantization as:
compute superpixel group-like features HallEach gaussian component h ofkAnd superpixel group feature HallAs a feature vector of a gaussian component of the super-pixel-like groupCalculated by equation (21):
Figure BDA0002207467800000085
all vectorized class superpixel group-like features form a training sample class feature set
Figure BDA0002207467800000086
Is a formula (22)
Figure BDA0002207467800000087
Preferably, the learning optimization matrix L in the step (5) is:
formula (23) is an objective function of metric learning, where M is a positive semidefinite mahalanobis matrix to be optimized, and M is optimized by using sample points i, where the feature vector corresponding to the sample points i is
Figure BDA0002207467800000088
Figure BDA0002207467800000089
Indicating that sample j is an ideal neighbor of sample point i, the eigenvector of sample j is noted as
Figure BDA00022074678000000810
l is the sample too close to sample i, whose eigenvector is noted
Figure BDA00022074678000000811
ξijlIs a constraint term, is larger than zero, regardless of the sample, and μ ═ 0.5 balances the weight between pulling force and repulsion force, y is when the samples i, l label are consistentil1 or else yil=0,
Figure BDA00022074678000000812
Figure BDA00022074678000000813
(2)ξijl≥0
Figure BDA00022074678000000814
Solving the formula (23) to obtain a semi-positive definite matrix M, which is decomposed into LLTL is an optimization matrix, take
Figure BDA0002207467800000091
And
Figure BDA0002207467800000092
is obtained by union of
Figure BDA0002207467800000093
Based on StrainLearning an optimization matrix L;
labeling test samples based on the optimization matrix L as follows:
the test case superpixel group to be labeled is represented as
Figure BDA0002207467800000094
Calculating the class label of the test sample according to the formula (24):
Figure BDA0002207467800000095
Figure BDA0002207467800000096
is the feature vector of the test case and,
Figure BDA0002207467800000097
training sample feature vector with class label class, find
Figure BDA0002207467800000098
Andat a minimum distanceIs denoted by vi,viIs the class of the test case.
The invention tests on a NYU v1 RGB-D data set, which contains 2284 scenes, 13 categories in total. The data set is partitioned into two disjoint subsets for training and testing, respectively. The training set contains 1370 scenarios and the test set contains 914 scenarios.
The method provided by the invention comprises the following specific steps:
1. superpixel segmentation
The super-pixel segmentation of the invention uses gPb/UCM algorithm to calculate the probability value of the pixel belonging to the boundary through the local and global characteristics of the imageThe gPb/UCM algorithm is applied to the color image and the depth image respectively, and the calculation is carried out according to the formula (1)
Figure 1
In the formula (1), the reaction mixture is,
Figure BDA00022074678000000913
is a probability value calculated based on the color image that a pixel belongs to the boundary,
Figure BDA00022074678000000914
is a probability value of a pixel belonging to a boundary calculated based on the depth image.
Figure BDA00022074678000000915
Probability value obtained according to formula (1)And setting different probability threshold values tr to obtain a multi-level segmentation result.
The probability threshold tr set in the invention is 0.06 and 0.08, and the pixels with the probability values smaller than the set threshold are connected into a region according to the eight-connection principle, wherein each region is a super pixel.
2. Superpixel feature extraction
2.1Patch feature calculation
Patch is defined as a 16x 16-sized grid (the grid size can be modified according to actual data), and the grid is slid from the upper left corner of the color image (RGB) and the Depth image (Depth) to the right and downwards in steps of n pixels (the invention takes 2 n values in the experiment), and finally a dense grid is formed on the color image (RGB) and the Depth image (Depth). Taking the diagram with size of N × M as an example, the number of Patch obtained finally is
Figure BDA0002207467800000101
Four types of features are calculated for each Patch: depth gradient features, color features, texture features.
2.1.1 depth gradient feature
Patch in depth image is noted as ZdFor each ZdComputing depth gradient feature Fg_dWherein the value of the t-th component is defined by equation (1):
in the formula (1), Z ∈ ZdRepresents the relative two-dimensional coordinate position of pixel z in depth Patch;
Figure BDA0002207467800000103
and
Figure BDA0002207467800000104
respectively representing the depth gradient direction and the gradient magnitude of the pixel z;
Figure BDA0002207467800000105
and
Figure BDA0002207467800000106
the depth gradient base vectors and the position base vectors are respectively, and the two groups of base vectors are predefined values; dgAnd dsRespectively representing the number of depth gradient base vectors and the number of position base vectors;is thatApplying mapping coefficient of t-th principal component obtained by Kernel Principal Component Analysis (KPCA),representing the kronecker product.
Figure BDA00022074678000001010
And
Figure BDA00022074678000001011
respectively a depth gradient gaussian kernel function and a position gaussian kernel function,
Figure BDA00022074678000001012
and
Figure BDA00022074678000001013
are parameters corresponding to a gaussian kernel function. Finally, the EMK (empirical model) algorithm is used for transforming the depth gradient feature, and the transformed feature vector is still marked as Fg_d
2.1.2 color gradient feature
Patch in color image is noted as ZcFor each ZcCalculating color gradient feature Fg_cWherein the value of the t-th component is defined by equation (2):
Figure BDA00022074678000001014
in the formula (2), Z ∈ ZcRepresents the relative two-dimensional coordinate position of a pixel z in the color image Patch;
Figure BDA0002207467800000111
and
Figure BDA0002207467800000112
respectively representing the gradient direction and the gradient magnitude of the pixel z;and
Figure BDA0002207467800000114
color gradient base vectors and position base vectors are respectively, and the two groups of base vectors are predefined values; c. CgAnd csRespectively representing the number of color gradient base vectors and the number of position base vectors;
Figure BDA0002207467800000115
is that
Figure BDA0002207467800000116
Applying mapping coefficient of t-th principal component obtained by Kernel Principal Component Analysis (KPCA),
Figure BDA0002207467800000117
representing the kronecker product.
Figure BDA0002207467800000118
And
Figure BDA0002207467800000119
respectively a color gradient gaussian kernel function and a position gaussian kernel function,and
Figure BDA00022074678000001111
are parameters corresponding to a gaussian kernel function. Finally, the color gradient feature is transformed by using an EMK (efficient Match kernel) algorithm, and the transformed feature vector is still marked as Fg_c
2.1.3 color characteristics
Patch in color image is noted as ZcFor each ZcCalculating color characteristics FcolWherein the value of the t-th component is defined by equation (3):
Figure BDA00022074678000001112
in the formula (3), Z ∈ ZcRepresents the relative two-dimensional coordinate position of pixel z in the color image Patch; r (z) is a three-dimensional vector, which is the RGB value of pixel z;
Figure BDA00022074678000001113
and
Figure BDA00022074678000001114
color basis vectors and position basis vectors are respectively adopted, and the two groups of basis vectors are predefined values; c. CcAnd csRespectively representing the number of the color basis vectors and the number of the position basis vectors;is thatApplying mapping coefficient of t-th principal component obtained by Kernel Principal Component Analysis (KPCA),
Figure BDA00022074678000001117
representing the kronecker product.
Figure BDA00022074678000001118
Andrespectively a color gaussian kernel function and a position gaussian kernel function,
Figure BDA00022074678000001120
and
Figure BDA00022074678000001121
are parameters corresponding to a gaussian kernel function. Finally, the color features are transformed by using an EMK (efficient Match kernel) algorithm, and the transformed feature vectors are still marked as Fcol
2.1.4 textural features
Firstly, an RGB scene image is converted into a gray scale image, and Patch in the gray scale image is recorded as ZgFor each ZgCalculating texture feature FtexWherein the value of the t-th component is defined by equation (4):
Figure BDA0002207467800000121
in the formula (4), Z ∈ ZgRepresents the relative two-dimensional coordinate position of pixel z in the color image Patch; s (z) represents the standard deviation of the pixel gray values in a 3 × 3 region centered on pixel z; LBP (z) is the Local Binary Pattern feature (LBP) of pixel z;
Figure BDA0002207467800000122
and
Figure BDA0002207467800000123
respectively are a local binary pattern base vector and a position base vector, and the two groups of base vectors are predefined values; gbAnd gsRespectively representing the number of the base vectors of the local binary pattern and the number of the position base vectors;
Figure BDA0002207467800000124
is that
Figure BDA0002207467800000125
Applying mapping coefficient of t-th principal component obtained by Kernel Principal Component Analysis (KPCA),representing the kronecker product.
Figure BDA0002207467800000127
And
Figure BDA0002207467800000128
respectively a local binary pattern gaussian kernel function and a position gaussian kernel function,andare parameters corresponding to a gaussian kernel function. Finally, the texture features are transformed by using an EMK (efficient Match kernel) algorithm, and the transformed feature vectors are still marked as Ftex
2.2. Super pixel feature
Super pixel feature FsegIs defined as formula (5):
Figure BDA00022074678000001211
Figure BDA00022074678000001212
respectively representing a super-pixel depth gradient characteristic, a super-pixel color characteristic and a super-pixel texture characteristic, and defining the following formula (6):
Figure BDA00022074678000001213
(6) in the formula, Fg_d(i),Fg_c(i),Fcol(i),Ftex(i) Indicates the characteristic of the Patch whose ith center position falls within the super pixel seg, and n indicates the number of the patches whose center positions fall within the super pixel seg.
Superpixel geometryAnd
Figure BDA00022074678000001215
is defined by the formula (7):
Figure BDA0002207467800000131
(7) the components in the formula are defined as follows:
super pixel area Aseg=∑s∈seg1, s are pixels within the super-pixel seg; super pixel perimeter PsegIs defined as formula (8):
in formula (8), M, N represents the horizontal and vertical resolutions of the RGB scene image, respectively; seg, seg' represent different superpixels; n is a radical of4(s) is a set of four-neighbor domains of pixel s; b issegIs the set of boundary pixels of the super-pixel seg.
Area to perimeter ratio R of super pixelsegIs defined as formula (9):
Figure BDA0002207467800000134
is based on the x-coordinate s of the pixel sxY coordinate syAnd a second-order (2+0 ═ 2 or 0+2 ═ 2) Hu moment calculated by multiplying the x coordinate by the y coordinate, respectively, as defined in equations (10), (11), (12)
Figure BDA0002207467800000135
Figure BDA0002207467800000136
Figure BDA0002207467800000137
In the formula (13)
Figure BDA0002207467800000138
Respectively representing the mean value of x coordinates, the mean value of y coordinates, the square of the mean value of x coordinates and the square of the mean value of y coordinates of the pixels contained in the super pixels, and defining the following formula (13):
Figure BDA0002207467800000139
width and Height respectively represent the Width and Height of the image, i.e.
Figure BDA00022074678000001310
The calculation is based on the normalized pixel coordinate values.
Figure BDA0002207467800000141
DvarRespectively representing the depth values s of the pixels s within the superpixel segdAverage value of (1), depth value sdMean of squares, variance of depth values, defined as (14):
Figure BDA0002207467800000142
Dmissthe proportion of pixels in the super-pixel that lose depth information is defined as (15):
Figure BDA0002207467800000143
Nsegis the principal normal vector modulo length of the point cloud corresponding to the superpixel, where the principal normal vector of the point cloud corresponding to the superpixel is estimated by Principal Component Analysis (PCA).
3 superpixel group feature extraction
Figure BDA0002207467800000144
Equation (16) is a general gaussian mixture model expression. Gaussian mixture model G (F)seg) From several Gaussian components gi(Fseg) Weighted sum representation, wherein FsegIs a random variable, the ith Gaussian component giObeying a common Gaussian distribution, i.e. gi(Fseg)~N(Fsegi,∑i) Ith weight ωiAutomatically calculated by the EM algorithm. (mu.) aiIs the expectation of the ith gaussian component, is a vector; sigmaiIs the variance of the ith gaussian component, which is a square matrix. )
3.1 example superpixel groups and feature extraction thereof
K superpixels most likely to become an example on an image form an example superpixel group, and the kth superpixel is characterized by FsegkThe characteristic of the example superpixel group is denoted as F ═ Fseg1,Fseg2,…,FsegKAnd (4) establishing a Gaussian mixture model (as shown in the formula 16) based on the super-pixel group characteristics F by utilizing a maximum Expectation algorithm (EM algorithm). The example superpixel group feature is the set of available gaussian components G ═ G1,g2,…,gmRepresents it.
Class 3.2 superpixel group and feature extraction thereof
Only training samples can construct a class superpixel group, and given all training sample images, a set of a plurality of superpixel blocks labeled as the jth class is called a class superpixel group. Class j contains P superpixels and the class is characterized by
Figure BDA0002207467800000145
Also using EM algorithm for FjEstablishing a Gaussian mixture model to obtain mj Gaussian components, and expressing the characteristic of the super-pixel group as a set
Figure BDA0002207467800000146
Training samples have N classes, and the class superpixel group characteristics are expressed as a set
Figure BDA0002207467800000147
Figure BDA0002207467800000148
Is totally composed of
Figure BDA0002207467800000149
A gaussian component.
4. Superpixel group feature vectorization
This section vectorizes the gaussian mixture model features. The scalar distance of the two gaussian components, i.e. the gaussian mixture model, is represented as a vector. First, a constant distance between gaussian components is defined.
4.1 constant distance between Gaussian components
1) The distance between two Gaussian components is KLD (Kullback-Leibler Divergence) distance, and the Gaussian component giAnd gjThe distance therebetween is calculated according to equation (17).
Figure BDA0002207467800000151
2) Substituting equation (17) into equation (18) yields two Gaussian components giAnd gjConstant distance K (g)i,gj)。
K(gi,gj)=exp{-[KLD(gi||gj)+KLD(gj||gi)]/2t2} (18)
(18) In the formula, t is an empirical parameter, and the value is 70 in the verification experiment of the invention.
4.2 example superpixel group feature vectorization
Computing the ith Gaussian component G of the example superpixel group feature GiAnd superpixel group feature HallAs a feature vector of the example superpixel group
Figure BDA0002207467800000152
The calculation formula is shown in formula (19):
feature vector set representation of the example superpixel group
Figure BDA0002207467800000154
1) Example superpixel feature vectorization of training samples: extracting example superpixel group characteristics from all training sample images according to 3.1, vectorizing Gaussian component characteristics of each example superpixel group according to a formula (19), and vectorizing the vectorized exampleThe super pixel group feature vectors form a training sample example feature set
Figure BDA0002207467800000155
As shown in equation (20).
Where T is the number of instances in all training samples. (20)
2) Example superpixel feature vectorization of test samples: vectorizing all Gaussian component features of an example superpixel group according to equation (19) can form a vector set
Figure BDA0002207467800000157
Wherein
Figure BDA0002207467800000158
Vectorized feature representing the nth Gaussian component
Class 4.3 superpixel group feature vectorization
Compute superpixel group-like features HallEach gaussian component h ofkAnd superpixel group feature HallAs a feature vector of a gaussian component of the super-pixel-like group
Figure BDA0002207467800000159
The calculation formula is shown in formula (21):
(21)
all vectorized class superpixel group-like features form a training sample class feature set
Figure BDA0002207467800000161
As shown in equation (22).
5 metric learning
After vector feature expression of the super-pixel group is obtained, the feature is optimized by utilizing metric learning, the feature distance of the same type of sample is reduced by utilizing pulling force, and the feature distance of different types of samples is increased by utilizing repulsion force.
A certain sample x in the class to be optimizediIs characterized by
Figure RE-GDA0002320976010000164
In other words, the surrounding positive samples can determine a local neighbor range, i.e., interval, that minimally bounds all positive samples xjThe positive samples may be referred to as ideal neighbors. Within this range, there are some negative examples xlKnown as fake neighbors. The feature space is converted by learning the linear conversion matrix, so that the fake-fake neighbor is pushed away from x to the maximum extent by using repulsive forceiSo that the ideal neighbor is pulled to the x degree by the pulling force to the maximum extenti
5.1 learning optimization matrix L
Equation (23) is an objective function of metric learning. Where M is the Markov matrix to be optimized, μ ═ 0.5 to balance the weight between pull and repulsion forces,
Figure BDA0002207467800000163
represents the ideal neighbor of sample point i, l is a fake neighbor, ξijlIs a constraint term, is greater than zero, regardless of the sample.
Figure BDA0002207467800000164
Figure BDA0002207467800000165
(2)ξijl≥0
Solving equation (23) yields a semi-positive definite matrix M, which can be decomposed into LLTL isTo optimize the matrix.
Get
Figure BDA0002207467800000167
And
Figure BDA0002207467800000168
is obtained by union of
Figure BDA0002207467800000169
Based on StrainThe optimization matrix L is learned.
5.2 labeling test samples based on optimization matrix L
The test case superpixel group to be labeled is represented as section 4.2
Figure BDA00022074678000001610
The class label of the test sample is calculated according to equation (24):
Figure BDA00022074678000001611
is the feature vector of the test case and,
Figure BDA00022074678000001612
training sample feature vector with class label class, find
Figure BDA00022074678000001613
And
Figure BDA00022074678000001614
at a minimum distance
Figure BDA00022074678000001615
Is denoted by vi,viIs the class of the test case.
In order to ensure that the maximum extent of the example superpixel group is close to that of the example, a ground truth is used for delineating the example superpixel group in the verification experiment so as to obtain the upper limit of the model. Meanwhile, a typical superpixel segmentation SLIC algorithm is selected to divide the picture into 30 areas, and each area is calculated as a superpixel group so as to test the general performance of the algorithm. The specific process of obtaining a superpixel group based on regions and superpixels is as follows:
inputting: the region set obtained by segmenting an image by using SLIC algorithm is recorded as
Figure BDA0002207467800000174
The ith division area is recorded as
Figure BDA0002207467800000172
The super-pixel set obtained by the same image by using gpb/ucm algorithm is recorded as
Figure BDA0002207467800000173
The jth super pixel is noted as
Figure BDA0002207467800000176
And (3) outputting: a mapping table map. map (j) ═ i denotes the jth super pixel
Figure BDA0002207467800000177
The category label of (1) is i.
The pseudo code describes:
Figure BDA0002207467800000178
the result (GT) represents the example superpixel set for which the test sample was built using a ground truth, the recognition result of which is the theoretical upper limit of the proposed model; the results (SLICs) represent example sets of superpixels for which samples are created using SLICs, the identification of which is relative to a particular superpixel segmentation (referred to herein as SLICs). The experimental results listed in table 1 show that the accuracy of the proposed algorithm reaches 82.1% under the condition that the example is accurate, and the accuracy is 52.2% when the example is determined using the SLIC segmentation result, i.e. whether the superpixel group determined based on the superpixel segmentation result is an accurate example or not has a great influence on the proposed model.
TABLE 1
Figure BDA0002207467800000181
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and all simple modifications, equivalent variations and modifications made to the above embodiment according to the technical spirit of the present invention still belong to the protection scope of the technical solution of the present invention.

Claims (8)

1. A semantic annotation method for an indoor scene RGB-D image is characterized by comprising the following steps: the method comprises the following steps:
(1) performing superpixel segmentation on the RGB-D indoor scene image by adopting an gPb/UCM algorithm;
(2) super-pixel feature extraction: executing Patch feature calculation and superpixel feature representation;
(3) super-pixel group feature extraction: executing example superpixel groups and feature extraction thereof, and class superpixel groups and feature extraction thereof;
(4) superpixel group feature vectorization: defining constant distance between Gaussian components, executing example super-pixel group characteristic vectorization and executing class super-pixel group characteristic vectorization;
(5) metric learning: and learning the optimization matrix L and labeling the test sample based on the optimization matrix L.
2. The method for semantic annotation of the RGB-D image of the indoor scene as claimed in claim 1, wherein: in the step (1), the gPb/UCM algorithm is used for superpixel segmentation, and probability values of the pixels belonging to the boundary are calculated through local and global features of the image
Figure FDA0002207467790000011
The gPb/UCM algorithm is applied to the color image and the depth image respectively, and the calculation is carried out according to the formula (1)
Figure FDA0002207467790000012
Wherein the content of the first and second substances,
Figure FDA0002207467790000013
is a probability value calculated based on the color image that a pixel belongs to the boundary,
Figure FDA0002207467790000014
is a probability value calculated based on the depth image that a pixel belongs to the boundary,
Figure FDA0002207467790000015
probability value obtained according to formula (1)
Figure FDA0002207467790000016
And setting different probability threshold values tr to obtain a multi-level segmentation result.
3. The method for semantic annotation of the RGB-D images of the indoor scene according to claim 2, wherein: the probability threshold tr is 0.06 and 0.08, and the pixels with the probability values smaller than the set threshold are connected into a region according to the eight-connection principle, wherein each region is a super pixel.
4. The method for semantic annotation of the RGB-D image of the indoor scene as claimed in claim 3, wherein: in the step (2), Patch is defined as a 16x 16-sized grid, n pixels are used as step lengths, the value of n is 2, sliding is performed from the upper left corner of the color image RGB and the Depth image Depth to the right and downwards, finally, dense grids are formed on the color image RGB and the Depth image Depth, and four types of features are calculated for each Patch: depth gradient features, color features, texture features.
5. The method as claimed in claim 4, wherein the method for semantic annotation of RGB-D images of indoor scene is characterized in thatIn the following steps: the super pixel characteristic F in the step (2)segDefined by formula (5):
Figure FDA0002207467790000021
Figure FDA0002207467790000022
respectively representing a super-pixel depth gradient feature, a super-pixel color feature and a super-pixel texture feature, and defining a formula (6):
Figure FDA0002207467790000023
wherein, Fg_d(i),Fg_c(i),Fcol(i),Ftex(i) Indicates the characteristic of the Patch whose ith center position falls within the super pixel seg, and n indicates the number of the patches whose center positions fall within the super pixel seg;
superpixel geometry
Figure FDA0002207467790000024
And
Figure FDA0002207467790000025
defined according to equation (7):
Figure FDA0002207467790000026
wherein the components are defined as follows:
super pixel area Aseg=∑s∈seg1, s are pixels within the super-pixel seg; super pixel perimeter PsegDefined by formula (8):
Figure FDA0002207467790000027
wherein M, N represents the horizontal and vertical components of an RGB scene image, respectivelyResolution; seg, seg' represent different superpixels; n is a radical of4(s) is a set of four-neighbor domains of pixel s; b issegIs the set of boundary pixels of the super-pixel seg;
area to perimeter ratio R of super pixelsegDefined by formula (9):
Figure FDA0002207467790000032
Figure FDA0002207467790000033
is based on the x-coordinate s of the pixel sxY coordinate syAnd the second-order (2+0 ═ 2 or 0+2 ═ 2) Hu moments calculated by multiplying the x and y coordinates, respectively, are defined as equations (10), (11), (12)
Figure FDA0002207467790000034
Figure FDA0002207467790000035
Figure FDA0002207467790000036
Wherein
Figure FDA0002207467790000037
Respectively representing the mean value of x coordinates, the mean value of y coordinates, the square of the mean value of x coordinates and the square of the mean value of y coordinates of the pixels contained in the super pixels, and defining the following formula (13):
Figure FDA0002207467790000038
width and Height respectively represent the Width and Height of the image, i.e.
Figure FDA0002207467790000039
Based on normalizationCalculating pixel coordinate values;
Figure FDA00022074677900000310
Dvarrespectively representing the depth values s of the pixels s within the superpixel segdAverage value of (1), depth value sdMean of squares, variance of depth values, defined as (14):
Figure FDA00022074677900000311
Dmissthe proportion of pixels in the super-pixel that lose depth information is defined as (15):
Figure FDA0002207467790000041
Nsegis the principal normal vector modulo length of the point cloud corresponding to the superpixel, where the principal normal vector of the point cloud corresponding to the superpixel is estimated by Principal Component Analysis (PCA).
6. The method for semantic annotation of the RGB-D image of the indoor scene as claimed in claim 5, wherein: the execution example superpixel group and the feature extraction thereof in the step (3) are as follows:
k superpixels assumed to be an example on an image form an example superpixel group, and the kth superpixel is characterized by FsegkThe characteristic of the example superpixel group is denoted as F ═ Fseg1,Fseg2,…,FsegKAnd establishing a Gaussian mixture model formula (16) based on the super-pixel group feature F by using a maximum expectation algorithm EM, wherein the super-pixel group feature of the example uses a Gaussian component set G ═ G1,g2,…,gmMeans for
Figure FDA0002207467790000042
Gaussian mixture model G (F)seg) From several Gaussian components gi(Fseg) Weighted sum representation, wherein FsegIs a random variable, the ith Gaussian component giObeying a common Gaussian distribution of gi(Fseg)~N(Fsegii) Ith weight ωiObtained by the maximum expectation algorithm EM, μiIs the expectation of the ith gaussian component, is a vector; sigmaiIs the variance of the ith Gaussian component, is a square matrix;
the class superpixel group and the feature extraction in the step (3) are as follows:
only training samples can construct a class superpixel group, given all training sample images, a set of a plurality of superpixel blocks marked as a jth class is called as a class superpixel group, the jth class contains P superpixels, and the class characteristics are represented as
Figure FDA0002207467790000043
Also using EM algorithm for FjEstablishing a Gaussian mixture model to obtain mj Gaussian components, and expressing the characteristic of the super-pixel group as a setTraining samples have N classes, and the class superpixel group characteristics are expressed as a set
Figure FDA0002207467790000045
Figure FDA0002207467790000046
Is totally composed of
Figure FDA0002207467790000047
A gaussian component.
7. The method for semantic annotation of the RGB-D image of the indoor scene as claimed in claim 6, wherein: defining the constant distance between the Gaussian components in the step (4) as: the distance between two Gaussian components adopts the Kullback-LeiblerDrigence distance, and the Gaussian component giAnd gjThe distance between them is according to the formula(17) Computing
Figure FDA0002207467790000051
Substituting equation (17) into equation (18) yields two Gaussian components giAnd gjConstant distance K (g)i,gj)
K(gi,gj)=exp{-[KLD(gi||gj)+KLD(gj||gi)]/2t2} (18)
Wherein t is an empirical parameter, and takes a value of 70 in a verification experiment;
perform example superpixel group feature vector quantization as:
computing the ith Gaussian component G of the example superpixel group feature GiAnd superpixel group feature HallAs a feature vector of the example superpixel group
Figure FDA0002207467790000052
Calculated according to equation (19):
Figure FDA0002207467790000053
feature vector set representation of the example superpixel group
Figure FDA0002207467790000054
1) Example superpixel feature vectorization of training samples: extracting example superpixel group characteristics from all training sample images according to the step (3), vectorizing Gaussian component characteristics of each example superpixel group according to a formula (19), and forming a training sample example characteristic set by using the vectorized example superpixel group characteristic vectors
Figure FDA0002207467790000055
Is a formula (20)
Figure FDA0002207467790000056
Where T is the number of instances in all training samples;
2) example superpixel feature vectorization of test samples: all Gaussian component features of an example superpixel group are vectorized according to formula (19) to form a vector set
Figure FDA0002207467790000061
Figure FDA0002207467790000062
Wherein
Figure FDA0002207467790000063
Vectorized features representing an nth gaussian component; performing superpixel-like group feature vector quantization as:
compute superpixel group-like features HallEach gaussian component h ofkAnd superpixel group feature HallAs a feature vector of a gaussian component of the super-pixel-like group
Figure FDA0002207467790000064
Calculated by equation (21):
Figure FDA0002207467790000065
all vectorized class superpixel group-like features form a training sample class feature set
Figure FDA0002207467790000066
Is a formula (22)
Figure FDA0002207467790000067
8. The method for semantic annotation of the RGB-D images of indoor scenes according to claim 7, wherein: the learning optimization matrix L in the step (5) is as follows:
formula (23) is an objective function of metric learning, where M is a positive semidefinite mahalanobis matrix to be optimized, and M is optimized by using sample points i, where the feature vector corresponding to the sample points i is
Figure FDA0002207467790000068
Figure FDA0002207467790000069
Indicating that sample j is an ideal neighbor of sample point i, the eigenvector of sample j is noted as
Figure FDA00022074677900000610
l is the sample too close to sample i, whose eigenvector is noted
Figure FDA00022074677900000611
ξijlIs a constraint term, is larger than zero, regardless of the sample, and μ ═ 0.5 balances the weight between pulling force and repulsion force, y is when the samples i, l label are consistentil1 or else yil=0,
Figure FDA00022074677900000612
(1)
Figure FDA00022074677900000613
(2)ξijl≥0
(3)
Figure FDA0002207467790000071
Solving the formula (23) to obtain a semi-positive definite matrix M, which is decomposed into LLTL is an optimization matrix, take
Figure FDA0002207467790000072
And
Figure FDA0002207467790000073
is obtained by union ofBased on StrainLearning an optimization matrix L;
labeling test samples based on the optimization matrix L as follows:
the test case superpixel group to be labeled is represented as
Figure FDA0002207467790000075
Calculating the class label of the test sample according to formula (24):
Figure FDA0002207467790000076
Figure FDA0002207467790000077
is the feature vector of the test case and,training sample feature vector with class label class, find
Figure FDA0002207467790000079
And
Figure FDA00022074677900000710
at a minimum distance
Figure FDA00022074677900000711
Is denoted by vi,viIs the class of the test case.
CN201910886599.6A 2019-09-19 2019-09-19 Semantic annotation method for indoor scene RGB-D image Active CN110751153B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910886599.6A CN110751153B (en) 2019-09-19 2019-09-19 Semantic annotation method for indoor scene RGB-D image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910886599.6A CN110751153B (en) 2019-09-19 2019-09-19 Semantic annotation method for indoor scene RGB-D image

Publications (2)

Publication Number Publication Date
CN110751153A true CN110751153A (en) 2020-02-04
CN110751153B CN110751153B (en) 2023-08-01

Family

ID=69276776

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910886599.6A Active CN110751153B (en) 2019-09-19 2019-09-19 Semantic annotation method for indoor scene RGB-D image

Country Status (1)

Country Link
CN (1) CN110751153B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116156517A (en) * 2023-03-16 2023-05-23 华能伊敏煤电有限责任公司 RIS deployment method under indoor scene

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014205231A1 (en) * 2013-06-19 2014-12-24 The Regents Of The University Of Michigan Deep learning framework for generic object detection
US20150030255A1 (en) * 2013-07-25 2015-01-29 Canon Kabushiki Kaisha Method and apparatus for classifying pixels in an input image and image processing system
CN107103326A (en) * 2017-04-26 2017-08-29 苏州大学 The collaboration conspicuousness detection method clustered based on super-pixel
CN107944428A (en) * 2017-12-15 2018-04-20 北京工业大学 A kind of indoor scene semanteme marking method based on super-pixel collection
CN109829449A (en) * 2019-03-08 2019-05-31 北京工业大学 A kind of RGB-D indoor scene mask method based on super-pixel space-time context

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014205231A1 (en) * 2013-06-19 2014-12-24 The Regents Of The University Of Michigan Deep learning framework for generic object detection
US20150030255A1 (en) * 2013-07-25 2015-01-29 Canon Kabushiki Kaisha Method and apparatus for classifying pixels in an input image and image processing system
CN107103326A (en) * 2017-04-26 2017-08-29 苏州大学 The collaboration conspicuousness detection method clustered based on super-pixel
CN107944428A (en) * 2017-12-15 2018-04-20 北京工业大学 A kind of indoor scene semanteme marking method based on super-pixel collection
CN109829449A (en) * 2019-03-08 2019-05-31 北京工业大学 A kind of RGB-D indoor scene mask method based on super-pixel space-time context

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KILIAN Q. WEINBERGER等: "Distance Metric Learning for Large Margin Nearest Neighbor Classification", 《THE JOURNAL OF MACHINE LEARNING RESEARCH》 *
WEN WANG等: "Discriminant analysis on Riemannian manifold of Gaussian distributions for face recognition with image sets", 《2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116156517A (en) * 2023-03-16 2023-05-23 华能伊敏煤电有限责任公司 RIS deployment method under indoor scene

Also Published As

Publication number Publication date
CN110751153B (en) 2023-08-01

Similar Documents

Publication Publication Date Title
Sankaranarayanan et al. Learning from synthetic data: Addressing domain shift for semantic segmentation
CN109829449B (en) RGB-D indoor scene labeling method based on super-pixel space-time context
CN111259936B (en) Image semantic segmentation method and system based on single pixel annotation
CN106920243A (en) The ceramic material part method for sequence image segmentation of improved full convolutional neural networks
US9449253B2 (en) Learning painting styles for painterly rendering
CN103886619B (en) A kind of method for tracking target merging multiple dimensioned super-pixel
CN107944428B (en) Indoor scene semantic annotation method based on super-pixel set
CN112837344B (en) Target tracking method for generating twin network based on condition countermeasure
CN112862792B (en) Wheat powdery mildew spore segmentation method for small sample image dataset
CN111368759B (en) Monocular vision-based mobile robot semantic map construction system
CN109376787B (en) Manifold learning network and computer vision image set classification method based on manifold learning network
CN110827304B (en) Traditional Chinese medicine tongue image positioning method and system based on deep convolution network and level set method
CN105574545B (en) The semantic cutting method of street environment image various visual angles and device
CN108537168A (en) Human facial expression recognition method based on transfer learning technology
CN111311702B (en) Image generation and identification module and method based on BlockGAN
Han et al. Weakly-supervised learning of category-specific 3D object shapes
CN103714556A (en) Moving target tracking method based on pyramid appearance model
CN107657276B (en) Weak supervision semantic segmentation method based on searching semantic class clusters
CN103593639A (en) Lip detection and tracking method and device
CN110135435B (en) Saliency detection method and device based on breadth learning system
CN114663880A (en) Three-dimensional target detection method based on multi-level cross-modal self-attention mechanism
CN112329830B (en) Passive positioning track data identification method and system based on convolutional neural network and transfer learning
CN110751153B (en) Semantic annotation method for indoor scene RGB-D image
CN116109656A (en) Interactive image segmentation method based on unsupervised learning
Li et al. Few-shot meta-learning on point cloud for semantic segmentation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant