CN111611919B - Road scene layout analysis method based on structured learning - Google Patents

Road scene layout analysis method based on structured learning Download PDF

Info

Publication number
CN111611919B
CN111611919B CN202010431561.2A CN202010431561A CN111611919B CN 111611919 B CN111611919 B CN 111611919B CN 202010431561 A CN202010431561 A CN 202010431561A CN 111611919 B CN111611919 B CN 111611919B
Authority
CN
China
Prior art keywords
sub
scene
hidden variable
labels
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010431561.2A
Other languages
Chinese (zh)
Other versions
CN111611919A (en
Inventor
李垚辰
袁建
董子坤
王雨潇
刘跃虎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
RESEARCH INSTITUTE OF XI'AN JIAOTONG UNIVERSITY IN SUZHOU
Original Assignee
RESEARCH INSTITUTE OF XI'AN JIAOTONG UNIVERSITY IN SUZHOU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by RESEARCH INSTITUTE OF XI'AN JIAOTONG UNIVERSITY IN SUZHOU filed Critical RESEARCH INSTITUTE OF XI'AN JIAOTONG UNIVERSITY IN SUZHOU
Priority to CN202010431561.2A priority Critical patent/CN111611919B/en
Publication of CN111611919A publication Critical patent/CN111611919A/en
Application granted granted Critical
Publication of CN111611919B publication Critical patent/CN111611919B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/54Surveillance or monitoring of activities, e.g. for recognising suspicious objects of traffic, e.g. cars on the road, trains or boats
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)
  • Traffic Control Systems (AREA)

Abstract

A road scene layout analysis method based on structured learning collects and expands a traffic scene image data set, and labels and preprocesses the data set according to scene platform classification; performing subregion segmentation on the image, performing superpixel segmentation on the image, training an enhanced decision tree regressor by using the characteristics of the superpixel and a label to obtain an initial segmentation result, and then optimizing the initial segmentation result by using a Markov random field to obtain a final segmentation result; secondly, extracting features from the sub-regions, training an SVM classifier by using the sub-region features and hidden variable labels, and predicting the combination of hidden variables of the sub-regions of each picture; finally, a decision tree is constructed by using the combination of the hidden variables of the sub-regions and the corresponding relation of the scene platform labels, and the labels of the scene platform corresponding to the labels of the group are found through the decision tree; the method is based on the road scene pictures and videos of the simple road traffic scene environment, can effectively realize the prediction of the traffic scene platform, and is accurate in prediction effect and simple and effective.

Description

Road scene layout analysis method based on structured learning
Technical Field
The invention belongs to the field of image processing, computer vision and pattern recognition, and particularly relates to a road scene layout analysis method based on structured learning.
Background
The estimation of the layout of traffic scenes has a very important application in the field of unmanned driving. In some practical applications, the method has wide application prospects in problems such as three-dimensional reconstruction of road scenes and the like. Common traffic scene layout estimation methods are probabilistic graph model inference based and convolutional neural network based prediction. However, methods based on probabilistic graph model inference, such as the method proposed by Geiger et al (refer to the method of Geiger: Geiger a, Lauer M, Wojek C, et al, 3d Traffic Scene Understanding From mobile Platforms [ J ]. IEEE Transactions on Pattern Analysis and Machine Analysis 2014,36(5): 1012-. However, the method has large calculation amount and is not easy to process; the prediction method based on the convolutional neural network, for example, the method proposed by FANG-YU, etc. (refer to methods of FANG-YU: f. -y.wu, s. -y.yan, j.s.smith, and b. -l.zhang, ' Traffic scene recognition based on excluded CNN and VLAD spatial spectra, ' in Machine Learning and Cybernetics (icmc lc), ' 2017International Conference on, vol.1, pp.156-161, IEEE,2017.), some image batches generated by the region pro-posal algorithm are extracted by CNN, and are reduced in dimension by VLAD, then encoded by VLAD, and classified by placing into a PCA classifier, and finally the Traffic scenes can be classified into 10 classes. However, the method has complicated steps and large calculation amount.
Disclosure of Invention
In order to solve the problems in the prior art, the present invention provides a road scene layout analysis method based on structured learning.
In order to achieve the purpose, the invention adopts the following technical scheme:
a road scene layout analysis method based on structured learning comprises the following steps:
step 1: collecting traffic scene images, forming a traffic scene image data set, and carrying out labeling and preprocessing on the traffic scene image data set according to scene platform classification;
step 2: performing subregion segmentation on the marked and preprocessed image based on supervised training and graph model optimization to obtain a subregion segmentation result;
and 3, step 3: modeling the sub-topics of the sub-regions in the segmentation result by using hidden variables, training a classifier by adopting a tangent plane structured SVM (support vector machine) method with N loose variables, continuously iteratively updating the weight, stopping training until the value of the loss function is minimum, obtaining the optimized parameters of the classifier, and deducing hidden variable labels by using the optimized parameters of the classifier to obtain hidden variables of the sub-regions;
and 4, step 4: and (3) constructing a decision tree by adopting a CART algorithm, and deducing a scene platform label corresponding to the subregion hidden variable label combination.
The further improvement of the invention is that in the step 1, the specific processes of labeling and preprocessing are as follows: labeling each pixel in the traffic scene image data set with a sub-region label, performing data cleaning on the labeled data set, screening out data labeled with missing, and then resetting the picture size to be 256 × 256.
The invention has the further improvement that the specific process of the step 2 is as follows: dividing a traffic scene image data set after marking and preprocessing into a training set and a testing set, performing superpixel segmentation on all images in the training set and extracting the characteristics of each superpixel, then training a lifting decision tree regressor by using the sub-region labels and the extracted characteristics of the superpixels, taking the output of the regressor as an initial segmentation result, and finally constructing a Markov random field for the initial segmentation result to optimize to obtain the segmentation result of the sub-region.
A further refinement of the invention is that the features of each super-pixel include sift features, RGB mean and variance of color, GIST features and location features of appearance;
the specific process of constructing a Markov random field for optimizing the initial segmentation result to obtain the segmentation result of the sub-region is as follows: minimization of the energy function j (c):
Figure BDA0002500782530000031
where SP is the set of superpixels, s i Is the ith super pixel, c i Is the ith super pixel correspondenceA sub-region class label of (a); s j Is the jth super pixel, c j Is the sub-region category label corresponding to the jth super pixel; a is the set of adjacent superpixel pairs, s i And s j Is one neighboring superpixel pair in set a of neighboring superpixel pairs; e data And E smooth Are the data term and the smoothing term, respectively, and λ is the weight of the smoothing term.
A further development of the invention consists in that the data item E data And a smoothing term E smooth The concrete form of (A) is as follows:
E data =-w i σ(L(s i ,c i )) (2)
E smooth (c i ,c j )=-log[(P(c i |c j )+P(c j |c i ))/2]×δ[c i ≠c j ] (3)
l(s) in data item i ,c i ) Is that the ith super pixel belongs to sub-region c i Is a sigmoid function, w i Is the weight of the ith super pixel; δ is an event function.
A further improvement of the invention is that P (c) in the smoothing term i |c j ) Is that a certain super-pixel belongs to sub-region c i When the neighboring super-pixel belongs to the sub-region c j The conditional probability of (a); when the condition c is satisfied i ≠c j The event function delta is 1 when the condition c is not satisfied i ≠c j The time event function δ is 0.
The invention is further improved in that the specific process of step 3 is as follows: firstly, modeling the subtopic of a sub-region in a segmentation result by using a hidden variable, then training a classifier by adopting a tangent plane structured SVM method with N loose variables, continuously iteratively updating the weight, and stopping training until the value of a loss function is minimum to obtain the optimized parameter of the classifier.
The invention is further improved in that the specific process of step 3 is as follows: when training the SVM classifier, inputting the extracted feature vector x of the sub-region i And hidden variable label z i Performing supervised training; wherein the extracted sub-regionThe domain features include HOG, Gabor, LBP and RGB, and the loss function is defined as follows:
Figure BDA0002500782530000041
Figure BDA0002500782530000042
ξ i ≥Δ(z i ,z)+F(z,x i ;ω)-F(z i ,x i ;ω)
where ξ is the relaxation variable, ω is the weight, λ is the penalty parameter, x i Is an L-dimensional feature vector, z i Is the hidden variable label of the ith sample, z is all labels contained in the hidden variable label set, Δ (z) i Z) is the distance value between the hidden variable label of the ith sample and a label in the set of hidden variable labels, and F is the objective function defined as follows:
Figure BDA0002500782530000043
wherein x is i Is an L-dimensional feature vector, ω is a weight, φ (x) i ,z i ) M is the number of sub-region samples as a feature mapping function; feature mapping function phi (x) i ,z i ) The form of (1) is as follows:
Figure BDA0002500782530000044
wherein the content of the first and second substances,
Figure BDA0002500782530000045
is phi (x) i ,z i ) A segment of a non-zero vector of (a),
Figure BDA0002500782530000046
value of (a) and x i Are the same and are in phi (x) i ,z i ) Z of (a) i A location; omega * An optimization parameter to minimize the loss function.
The invention is further improved in that when the hidden variable label is deduced, the hidden variable label z is exhausted to obtain the hidden variable label z which maximizes the objective function F (x, z; omega) * As a result of the inference:
z * =argmax z∈Z F(x,z;ω * ) (7)
wherein, ω is * Is the optimization parameter that minimizes the loss function, and Z is the set of hidden variable tags.
The invention is further improved in that the specific process of the step 4 is as follows: defining 14 labels of scene platforms according to scene layout and structure, and finding out hidden variable combination z related to each kind of scene platform * Is data; a decision tree is constructed by adopting a CART algorithm, a group of hidden variable labels are input into the decision tree, and the scene platform labels finally corresponding to the hidden variable labels can be found through the decision tree.
Compared with the prior art, the invention has the following beneficial effects:
firstly, acquiring and expanding a traffic scene image data set, and labeling and preprocessing the data set according to scene platform classification; secondly, performing superpixel segmentation on the image, training an enhanced decision tree regressor by using the characteristics of the superpixel and labels, performing subregion segmentation on the image, and optimizing an initial segmentation result by using a Markov random field to obtain a final segmentation result; then extracting features from the divided sub-regions, training an SVM classifier by using the sub-region features and artificially defined hidden variable labels, and predicting the combination of hidden variables of the sub-regions of each picture; and finally, a decision tree is constructed by using the combination of the hidden variables of the subareas and the corresponding relation of the labels of the scene platforms, and the labels of the scene platforms corresponding to the labels of the hidden variables of a group of subareas can be simply and conveniently found through the decision tree. The method has high accuracy, and is simple and effective. When the method is used for image subregion segmentation, compared with the existing methods such as an unsupervised clustering method and the like, the method utilizes the bottom-layer characteristics of the images to carry out supervised training and uses the graph model to optimize the result, so that the result is more accurate. When the method is used for scene platform prediction, compared with a method for analyzing picture layout and structure based on a neural network, the method utilizes hidden variables to model the sub-topics of the sub-regions, extracts bottom-layer features, and excavates high-level semantics of each part of the image from bottom to top, so that the defect that the whole representation can not simulate one picture is overcome, a complex network structure is not needed, the consumption required by training is less, and the confidence coefficient is higher; on the other hand, the scene platform labels corresponding to the hidden variable combinations are deduced by using the decision tree, compared with a supervised training method, the calculation amount for constructing the decision tree is small, and the deduction accuracy can reach 100% under the condition that the input hidden variable combinations are predicted correctly.
Drawings
Fig. 1 is a schematic view of a road scene platform.
Fig. 2 is a schematic diagram of road image segmentation.
FIG. 3 is a schematic road scene platform inference diagram.
FIG. 4 is a comparison graph of various SVM model tests.
Fig. 5 is a road scene platform decision tree.
Detailed Description
The invention is described in detail below with reference to the drawings and the detailed description.
The specific method of the invention is as follows:
step 1: collecting traffic scene images, forming a traffic scene image data set, and labeling and preprocessing the traffic scene image data set according to scene platform classification; the specific processes of labeling and preprocessing are as follows: labeling each pixel in the traffic scene image data set with a sub-region label, performing data cleaning on the labeled data set, screening out data labeled with a defect, and then resetting the picture size to be 256 × 256.
Step 2: and (3) carrying out subregion segmentation on the image after resetting: the image is subjected to subregion segmentation based on supervised training and graph model optimization to obtain a segmentation result of the subregion; the specific process is as follows:
dividing a traffic scene image data set after image resetting into a training set and a testing set, carrying out superpixel segmentation on all images in the training set and extracting the characteristics of each superpixel, wherein the extracted characteristics comprise sift characteristics, RGB mean value and variance of colors, and GIST characteristics and position characteristics of appearance. And then training a regression device of the boosted decision tree by using the sub-region labels and the extracted characteristics of the super-pixels, wherein the output of the regression device is an initial segmentation result, namely the likelihood ratio of each super-pixel belonging to each sub-region category, and the super-pixel belonging to the sub-region with the maximum likelihood ratio.
And finally, constructing a Markov random field for optimizing the initial segmentation result to obtain a final segmentation result, namely a segmentation result of the sub-region, wherein the specific process comprises the following steps: the following energy function j (c) is minimized:
Figure BDA0002500782530000061
where SP is the set of superpixels, s i Is the ith super pixel, c i Is the sub-region category label corresponding to the ith super pixel; s j Is the jth super pixel, c j Is the sub-region category label corresponding to the jth super pixel; a is the set of adjacent superpixel pairs, s i And s j Is one neighboring superpixel pair in set a of neighboring superpixel pairs; λ is the weight of the smoothing term, E data And E smooth Respectively a data item and a smoothing item. The concrete form is as follows:
E data =-w i σ(L(s i ,c i )) (2)
E smooth (c i ,c j )=-log[(P(c i |c j )+P(c j |c i ))/2]×δ[c i ≠c j ] (3)
l(s) in data item i ,c i ) Is that the ith super pixel belongs to sub-region c i Is a sigmoid function, w i Is the weight of the ith super pixel; p (c) in the smoothing term i |c j ) Is that a certain super-pixel belongs to sub-region c i When the neighboring super-pixel belongs to the sub-region c j Strip of (2)Piece probability, and vice versa; δ is an event function when condition c is satisfied i ≠c j The value is 1 when the value is 1, and is 0 when the value is not 0.
And 3, step 3: prediction of sub-region hidden variables: modeling the sub-topics of the sub-regions in the segmentation result by using hidden variables, training a classifier by adopting a tangent plane structured SVM (support vector machine) method with N loose variables, continuously iteratively updating the weight, stopping training until the value of the loss function is minimum, obtaining the optimized parameters of the classifier, and deducing hidden variable labels by using the optimized parameters of the classifier to obtain hidden variables of the sub-regions. The specific process is as follows:
firstly, modeling the sub-topics of the sub-regions in the segmentation result by using hidden variables, wherein the specific process comprises the following steps: k values are used to represent the hidden variable label z corresponding to the sub-topic of the sub-region, and the z shape is as z ∈ {1, 2. Hidden variables may represent sub-topics such as sky, road, left tree, right tree, etc.
Then, training a classifier by adopting a tangent plane structured SVM method with N relaxation variables, continuously iteratively updating the weight, stopping training until the value of the loss function is minimum, and obtaining the optimized parameters of the classifier, wherein the specific process is as follows:
when training the SVM classifier, inputting the feature vector x of the extracted sub-region i And hidden variable label z i Supervised training is performed. The extracted characteristics of the sub-region include HOG, Gabor, LBP, RGB, and the like. Training a classifier by adopting a tangent plane structured SVM method with N relaxation variables, wherein a loss function is defined as follows:
Figure BDA0002500782530000071
Figure BDA0002500782530000072
ξ i ≥Δ(z i ,z)+F(z,x i ;ω)-F(z i ,x i ;ω)
where ξ is the relaxation variable and ω isWeight, λ is a penalty parameter, x i Is an L-dimensional feature vector, z i Is the hidden variable label of the ith sample, z is all labels contained in the hidden variable label set, Δ (z) i Z) is the distance value between the hidden variable label of the ith sample and a label in the set of hidden variable labels, and F is the objective function defined as follows:
Figure BDA0002500782530000081
wherein x is i Is an L-dimensional feature vector, ω is a weight, ω is a C × L-dimensional vector matrix, φ (x) i ,z i ) For the feature mapping function, M is the number of sub-region samples. Feature mapping function phi (x) i ,z i ) The form of (1) is as follows:
Figure BDA0002500782530000082
wherein the content of the first and second substances,
Figure BDA0002500782530000083
is phi (x) i ,z i ) A non-zero vector of value x i Are the same and are in phi (x) i ,z i ) The known weight ω and the feature mapping function φ (x) at the z-th position i ,z i ) The product of (c) is a constant. Omega * An optimization parameter to minimize the loss function. Continuously iterating by gradient descent method to obtain weight omega for minimizing loss function *
When the hidden variable label is deduced, the hidden variable label z is exhausted, and the hidden variable label z which maximizes the objective function F (x, z; omega) is obtained * As a result of the inference:
z * =argmax z∈Z F(x,z;ω * ) (7)
wherein, ω is * Is the optimization parameter that minimizes the loss function, and Z is the set of hidden variable labels.
And 4, step 4: predicting scene platform labels: and (3) constructing a decision tree by adopting a CART algorithm, and deducing a scene platform label y corresponding to the subregion hidden variable label combination. The specific process is as follows:
defining labels of 14 scene platforms according to scene layout and structure, and finding out hidden variable combinations related to each type of scene platform as data; and constructing a decision tree by adopting a CART algorithm, measuring the selection of attributes by using a Gini index, selecting the attribute with the minimum Gini index for splitting, and recursively invoking processes of calculating the Gini index and splitting on two child nodes until the data has no new attribute and can be subdivided and unprocessed data, thereby completing construction. And inputting a group of sub-region hidden variable labels into the decision tree, and finding the scene platform label y finally corresponding to the group of hidden variable labels through the decision tree.
Compared with the existing method based on probability map model inference and the prediction method based on the convolutional neural network, the algorithm has high prediction accuracy, can effectively generate the integral model of the traffic scene, and has small calculated amount and simple and effective method.
The framework of the algorithm is realized based on SSVM and a decision tree. In experimental data, 1000 pictures in different scenes are selected as a data set, and a single road image data set is divided into a training set and a testing set according to the proportion of 7: 3. In fig. 1, 6 scene platforms are shown, including a real scene and a scene wireframe model corresponding to the real scene, a road is divided into regions such as "background", "left wall", "right wall", "ground", and "sky" according to image content, a patch filled with four stars represents the "background", a patch filled with five stars represents the "left wall", a patch filled with diamonds represents the "right wall", a patch filled with circles represents the "ground", and a patch filled with straight lines represents the "sky".
Fig. 2 shows the road image segmentation principle, and the process can be divided into two steps of classification and optimization.
Fig. 3 shows a road scene platform inference process, and the whole process is divided into prediction of a sub-region hidden variable by a sub-region feature and prediction of a scene platform label by a combination of the hidden variables.
FIG. 4 shows experimental comparison of extracting RGB, HOG, Gabor, and LBP features using a tangent plane Structured SVM method with N slack variables to train a classifier and a Structured SVM model, a LibSVM model, a Subgadient Structured SVM model, a Frankwolf Block Structured SVM model, and a Frankwolf Batch Structured SVM model of a single slack variable when predicting hidden variables of a sub-region. It can be seen that the Structured SVM model of N relaxation variables used herein has good accuracy under different characteristics.
Fig. 5 shows a decision tree constructed according to the CART algorithm when predicting scene platform tags.
The algorithm of the invention is compared with a convolutional neural network, as shown in table 1, the compared data are from 1000 pictures in different scenes, and a single road image data set is split into a training set and a test set in a ratio of 7: 3. The algorithm of the invention is compared with three neural network models of AlexNet, VGG16 and ResNet _101, and the quantitative evaluation criteria are accuracy, precision, recall and F1 score. The comparison result shows that the algorithm has higher classification accuracy.
TABLE 1 quantitative evaluation of scene platform classification results
Figure BDA0002500782530000091
Figure BDA0002500782530000101
The method is based on the road scene pictures and videos of the simple road traffic scene environment, can effectively realize the prediction of the traffic scene platform, and is accurate in prediction effect and simple and effective.

Claims (7)

1. A road scene layout analysis method based on structured learning is characterized by comprising the following steps:
step 1: collecting traffic scene images, forming a traffic scene image data set, and carrying out labeling and preprocessing on the traffic scene image data set according to scene platform classification;
step 2: performing subregion segmentation on the marked and preprocessed image based on supervised training and graph model optimization to obtain a subregion segmentation result; the specific process is as follows: dividing a traffic scene image data set after marking and preprocessing into a training set and a testing set, performing superpixel segmentation on all images in the training set and extracting the characteristics of each superpixel, then training a lifting decision tree regressor by using the characteristics of a subregion label and the extracted superpixels, taking the output of the regressor as an initial segmentation result, and finally constructing a Markov random field on the initial segmentation result to optimize to obtain the segmentation result of the subregion;
the characteristics of each super pixel comprise sift characteristics, RGB mean and variance of colors, GIST characteristics of appearance and position characteristics;
the specific process of constructing a Markov random field for optimizing the initial segmentation result to obtain the segmentation result of the sub-region is as follows: minimization of the energy function j (c):
Figure FDA0003612909640000011
where SP is the set of superpixels, s i Is the ith super pixel, c i Is the sub-region category label corresponding to the ith super pixel; s j Is the jth super pixel, c j Is the sub-region category label corresponding to the jth super pixel; a is the set of adjacent superpixel pairs, s i And s j Is one neighboring superpixel pair in set a of neighboring superpixel pairs; e data And E smooth Are the data term and the smoothing term, respectively, and λ is the weight of the smoothing term;
data item E data And a smoothing term E smooth The concrete form of (A) is as follows:
E data =-w i σ(L(s i ,c i )) (2)
E smooth (c i ,c j )=-log[(P(c i |c j )+P(c j |c i ))/2]×δ[c i ≠c j ] (3)
l(s) in data item i ,c i ) Is that the ith super pixel belongs to sub-region c i Is a sigmoid function, w i Is the weight of the ith super pixel; δ is an event function;
and step 3: modeling the sub-topics of the sub-regions in the segmentation result by using hidden variables, training a classifier by adopting a tangent plane structured SVM (support vector machine) method with N loose variables, continuously iteratively updating the weight, stopping training until the value of the loss function is minimum, obtaining the optimized parameters of the classifier, and deducing hidden variable labels by using the optimized parameters of the classifier to obtain hidden variables of the sub-regions;
and 4, step 4: and (3) constructing a decision tree by adopting a CART algorithm, and deducing a scene platform label corresponding to the subregion hidden variable label combination.
2. The method for analyzing road scene layout based on structured learning according to claim 1, wherein in step 1, the specific processes of labeling and preprocessing are as follows: labeling each pixel in the traffic scene image data set with a sub-region label, performing data cleaning on the labeled data set, screening out data labeled with missing, and then resetting the picture size to be 256 × 256.
3. The structural learning-based road scene layout analysis method as claimed in claim 1, wherein P (c) in the smoothing term i |c j ) Is that a certain super-pixel belongs to sub-region c i When the neighboring super-pixel belongs to the sub-region c j The conditional probability of (a); when the condition c is satisfied i ≠c j The event function delta is 1 when the condition c is not satisfied i ≠c j The time event function δ is 0.
4. The method for analyzing road scene layout based on structured learning of claim 1, wherein the specific process of step 3 is as follows: firstly, modeling the subtopic of a sub-region in a segmentation result by using a hidden variable, then training a classifier by adopting a tangent plane structured SVM method with N loose variables, continuously iteratively updating the weight, and stopping training until the value of a loss function is minimum to obtain the optimized parameter of the classifier.
5. The method for analyzing road scene layout based on structured learning of claim 1, wherein the specific process of step 3 is as follows: when training the SVM classifier, inputting the extracted feature vector x of the sub-region i And hidden variable label z i Performing supervised training; wherein, the extracted characteristics of the sub-region include HOG, Gabor, LBP and RGB, and the loss function is defined as follows:
Figure FDA0003612909640000021
Figure FDA0003612909640000022
ξ i ≥Δ(z i ,z)+F(z,x i ;ω)-F(z i ,x i ;ω)
where ξ is the relaxation variable, ω is the weight, λ is the penalty parameter, x i Is an L-dimensional feature vector, z i Is the hidden variable label of the ith sample, z is all labels contained in the hidden variable label set, Δ (z) i Z) is the distance value between the hidden variable label of the ith sample and a label in the hidden variable label set, and F is the objective function defined as follows:
Figure FDA0003612909640000031
wherein x is i Is an L-dimensional feature vector, ω is a weight, φ (x) i ,z i ) M is the number of sub-region samples as a feature mapping function; feature mapping function phi (x) i ,z i ) The form of (1) is as follows:
Figure FDA0003612909640000032
wherein the content of the first and second substances,
Figure FDA0003612909640000033
is phi (x) i ,z i ) A segment of a non-zero vector of (a),
Figure FDA0003612909640000034
value of (a) and x i Are the same and are in phi (x) i ,z i ) Z of (a) i A location; omega * An optimization parameter to minimize the loss function.
6. The method as claimed in claim 5, wherein when the hidden variable label is inferred, the hidden variable label z is exhausted, and the hidden variable label z that maximizes the objective function F (x, z; ω) is obtained * As a result of the inference:
z * =argmax z∈Z F(x,z;ω * ) (7)
wherein, ω is * Is the optimization parameter that minimizes the loss function, and Z is the set of hidden variable labels.
7. The method for analyzing road scene layout based on structural learning of claim 1, wherein the specific process of step 4 is as follows: defining 14 labels of scene platforms according to scene layout and structure, and finding out hidden variable combination z related to each type of scene platform * Is data; a decision tree is constructed by adopting a CART algorithm, a group of hidden variable labels are input into the decision tree, and the scene platform labels finally corresponding to the hidden variable labels can be found through the decision tree.
CN202010431561.2A 2020-05-20 2020-05-20 Road scene layout analysis method based on structured learning Active CN111611919B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010431561.2A CN111611919B (en) 2020-05-20 2020-05-20 Road scene layout analysis method based on structured learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010431561.2A CN111611919B (en) 2020-05-20 2020-05-20 Road scene layout analysis method based on structured learning

Publications (2)

Publication Number Publication Date
CN111611919A CN111611919A (en) 2020-09-01
CN111611919B true CN111611919B (en) 2022-08-16

Family

ID=72200816

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010431561.2A Active CN111611919B (en) 2020-05-20 2020-05-20 Road scene layout analysis method based on structured learning

Country Status (1)

Country Link
CN (1) CN111611919B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112818775B (en) * 2021-01-20 2023-07-25 北京林业大学 Forest road rapid identification method and system based on regional boundary pixel exchange
CN115374498B (en) * 2022-10-24 2023-03-10 北京理工大学 Road scene reconstruction method and system considering road attribute characteristic parameters

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105844292A (en) * 2016-03-18 2016-08-10 南京邮电大学 Image scene labeling method based on conditional random field and secondary dictionary study
CN110032952A (en) * 2019-03-26 2019-07-19 西安交通大学 A kind of road boundary point detecting method based on deep learning

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103413307A (en) * 2013-08-02 2013-11-27 北京理工大学 Method for image co-segmentation based on hypergraph
CN103440501A (en) * 2013-09-01 2013-12-11 西安电子科技大学 Scene classification method based on nonparametric space judgment hidden Dirichlet model
US20160253816A1 (en) * 2014-02-27 2016-09-01 Andrew Niemczyk Method of Carrying Out Land Based Projects Using Aerial Imagery Programs
CN104809187B (en) * 2015-04-20 2017-11-21 南京邮电大学 A kind of indoor scene semanteme marking method based on RGB D data
CN105389584B (en) * 2015-10-13 2018-07-10 西北工业大学 Streetscape semanteme marking method based on convolutional neural networks with semantic transfer conjunctive model
CN106446914A (en) * 2016-09-28 2017-02-22 天津工业大学 Road detection based on superpixels and convolution neural network
CN107292234B (en) * 2017-05-17 2020-06-30 南京邮电大学 Indoor scene layout estimation method based on information edge and multi-modal features
CN107292253B (en) * 2017-06-09 2019-10-18 西安交通大学 A kind of visible detection method in road driving region
CN107369158B (en) * 2017-06-13 2020-11-13 南京邮电大学 Indoor scene layout estimation and target area extraction method based on RGB-D image
CN109829449B (en) * 2019-03-08 2021-09-14 北京工业大学 RGB-D indoor scene labeling method based on super-pixel space-time context
CN109993082B (en) * 2019-03-20 2021-11-05 上海理工大学 Convolutional neural network road scene classification and road segmentation method
CN110084136A (en) * 2019-04-04 2019-08-02 北京工业大学 Context based on super-pixel CRF model optimizes indoor scene semanteme marking method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105844292A (en) * 2016-03-18 2016-08-10 南京邮电大学 Image scene labeling method based on conditional random field and secondary dictionary study
CN110032952A (en) * 2019-03-26 2019-07-19 西安交通大学 A kind of road boundary point detecting method based on deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
采用多层图模型推理的道路场景分割算法;邓燕子等;《西安交通大学学报》;20171231(第12期);第67-72页 *

Also Published As

Publication number Publication date
CN111611919A (en) 2020-09-01

Similar Documents

Publication Publication Date Title
US11853903B2 (en) SGCNN: structural graph convolutional neural network
Ghadai et al. Learning localized features in 3D CAD models for manufacturability analysis of drilled holes
Kae et al. Augmenting CRFs with Boltzmann machine shape priors for image labeling
JP6725547B2 (en) Relevance score assignment for artificial neural networks
US20190236411A1 (en) Method and system for multi-scale cell image segmentation using multiple parallel convolutional neural networks
CN108830237B (en) Facial expression recognition method
US8331669B2 (en) Method and system for interactive segmentation using texture and intensity cues
CN109993102B (en) Similar face retrieval method, device and storage medium
CN105809672B (en) A kind of image multiple target collaboration dividing method constrained based on super-pixel and structuring
Yang et al. Multilayer graph cuts based unsupervised color–texture image segmentation using multivariate mixed student's t-distribution and regional credibility merging
Liu et al. Vehicle-related scene understanding using deep learning
CN114049381A (en) Twin cross target tracking method fusing multilayer semantic information
CN112287839A (en) SSD infrared image pedestrian detection method based on transfer learning
JP2008217706A (en) Labeling device, labeling method and program
Zanjani et al. Cancer detection in histopathology whole-slide images using conditional random fields on deep embedded spaces
CN108595558B (en) Image annotation method based on data equalization strategy and multi-feature fusion
US11816185B1 (en) Multi-view image analysis using neural networks
CN111611919B (en) Road scene layout analysis method based on structured learning
CN114332473A (en) Object detection method, object detection device, computer equipment, storage medium and program product
Xie et al. Robust segmentation of nucleus in histopathology images via mask R-CNN
CN111325237A (en) Image identification method based on attention interaction mechanism
Mudumbi et al. An approach combined the faster RCNN and mobilenet for logo detection
Lima et al. Automatic design of deep neural networks applied to image segmentation problems
Bressan et al. Semantic segmentation with labeling uncertainty and class imbalance
CN116993760A (en) Gesture segmentation method, system, device and medium based on graph convolution and attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant