CN111553193B - Visual SLAM closed-loop detection method based on lightweight deep neural network - Google Patents

Visual SLAM closed-loop detection method based on lightweight deep neural network Download PDF

Info

Publication number
CN111553193B
CN111553193B CN202010249172.8A CN202010249172A CN111553193B CN 111553193 B CN111553193 B CN 111553193B CN 202010249172 A CN202010249172 A CN 202010249172A CN 111553193 B CN111553193 B CN 111553193B
Authority
CN
China
Prior art keywords
neural network
image
training
model
loop
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010249172.8A
Other languages
Chinese (zh)
Other versions
CN111553193A (en
Inventor
金世俊
刘泽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202010249172.8A priority Critical patent/CN111553193B/en
Publication of CN111553193A publication Critical patent/CN111553193A/en
Application granted granted Critical
Publication of CN111553193B publication Critical patent/CN111553193B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a visual SLAM closed-loop detection method based on a lightweight deep neural network. In the method, the image recognition model adopts a lightweight deep neural network, and the training method is to perform atlas training on the constructed network model by using a data set of a similar scene, and to achieve a certain precision by training an optimization network. The final purpose is to enable the trained neural network model to learn the probability distribution corresponding to the image sample from the training sample, so that the purpose of detecting closed loop by extracting scene features and obtaining the similarity of the image is achieved, and preparation is made for subsequent SLAM mapping optimization. The method can obtain better detection effect under complex illumination, can improve the speed of the model in actual introduction, and greatly improves the accuracy of the algorithm under lower calculation cost. The method has important application value in the aspects of closed loop detection and the like.

Description

Visual SLAM closed-loop detection method based on lightweight deep neural network
Technical Field
The invention belongs to the field of computer vision and robot motion closed-loop detection, and relates to a closed-loop detection method based on a lightweight deep neural network.
Background
Closed loop detection is a problem of determining whether a mobile robot returns to a previously visited location, and is a key module in SLAM, aiming at reducing accumulated errors when an environment map is constructed, solving the process of drift of location estimation over time, and being very important for constructing a consistent environment map. To develop a closed-loop detection algorithm, a popular and successful technique is to match previously visited locations using the similarity of the current robot's view to the view in the robot map, consistent with the principle that the human eye distinguishes two similar locations. In this case, the closed-loop detection problem is essentially an image matching problem.
Image matching is generally divided into two steps: image description and similarity measurement, where the image descriptor compresses the image into a more compact, discriminating one-dimensional vector than the original image, is the most critical step in visual loop closure detection. Many image description techniques are currently used for visual closed-loop inspection with great success. However, most conventional appearance-based methods employ artificial features derived through professional calculations, i.e., they are designed through the process of property engineering, in which human expertise and insight lead the development process to achieve the desired properties. Image descriptors based on manual characteristics often have common weaknesses, including lack of robustness in terms of illumination changes and higher computational costs.
With the progress of computer performance and the rapid rise of the GPU in recent years, computer vision technology is greatly developed, and the appearance of deep learning provides a new idea for image description. The deep learning method can automatically learn the characteristics from the original data and has better adaptability to complex environmental changes. The deep neural network model can learn and extract image features from increasingly abstract visual data, good research results are obtained in the fields of image classification, image denoising and the like, a closed-loop detection technology applying deep learning to improve the recognition capability is also in a new rapid development stage, but the problems of more network model parameters and low real-time performance need to be overcome for rapid development of closed-loop detection based on the neural network. Therefore, if a general deep learning method is directly adopted, the algorithm cannot achieve excellent adaptability in various actual scenes.
Disclosure of Invention
The invention aims to solve the problems and provides a stable and reliable robot motion closed-loop detection method based on a lightweight deep neural network. Aiming at a limited image data set, an image characteristic discrimination model based on a convolutional neural network is designed, and model parameters are optimized to achieveOptimization ofThe generation network after the state can convert any scene picture into a group of characteristic vectors, and the normalized characteristic vectors are used for constructing a similarity matrix to judge the closed loop.
In order to achieve the purpose, the method adopted by the invention comprises the following steps: a visual SLAM closed-loop detection method based on a lightweight deep neural network comprises the following steps:
step 1: a closed loop test data set is selected. The training of the CNN model is a process with supervised learning, and if the data has no label information, the training of the model cannot be completed. Aiming at the problems, the training of a lightweight deep neural network model is completed on a large-scale labeled scene data set, the trained model is used as a feature extractor of a scene image, and finally the extracted features are applied to closed-loop detection;
step 2: and constructing a lightweight deep neural network. Preprocessing training data and test data of a prepared input model, uniformly adjusting images to be 224X224 in size (the actual size can be adjusted to be other different sizes according to needs), searching for characteristics of certain aspects of the images through convolution kernels, inputting the characteristics into the model, establishing a relation with the result, classifying the characteristics, and finally taking the output of the final full-connection layer as a characteristic vector of the images;
and 3, step 3: and optimizing the network model. Loading image samples of a data set, firstly initializing and setting the weight of a neural network model by adopting MSRA, inputting real image samples into a lightweight deep neural network model, training the neural network model by using a well-defined forward propagation process, and alternately training and optimizing model parameters by using backward propagation. After the training is finished, the model obtained by training is stored so as to be convenient for direct use next time. Testing by using the model stored after the previous training is finished, and training the network model to reach certain discrimination accuracy;
and 4, step 4: and carrying out closed-loop detection by using a network model. And (4) training through the step 3 to obtain a deep neural network model which can be used for acquiring image features. The method comprises the steps of calculating a feature descriptor of each query image (current robot view) by utilizing a neural network, preprocessing original CNN features, adding an enhancement step, and carrying out Principal Component Analysis (PCA) and whitening, so that the capability of representing the images can be remarkably improved, meanwhile, the calculation efficiency is improved, and finally, the feature descriptors are used for detection circulation. And after normalization, acquiring a similarity matrix between the images according to Euclidean distance, and reducing the rank of the matrix to reduce noise. The similarity is measured to determine if a loop closure has occurred and after all images in the data set are considered, an accuracy and recall pair result is obtained. By the method, the similarity relation between the images can be obtained, and the problem of closed-loop detection when the robot walks is solved.
As an improvement of the invention, in step 1, a standard-college365 is used to establish an image sample data set. The image sample adopts the size of 224X224, and the method adopts a supervised learning mode, so that a training set, a verification set and a test set are required, and model training is completed on a large number of labeled scene data sets. Training data is stored in a plurality of TFRecord files to improve processing efficiency, then samples are read from the TFRecord files to be analyzed, a file list of original data is appointed, the data is read from the files, and after preprocessing of gray values and mean values is carried out on the data, the data are combined and sorted into a batch to be used as neural network input. Meanwhile, in order to ensure that the training sample has enough representativeness, the coverage of various scenes such as different landforms, different distances, different illumination, camera shooting angles and the like needs to be considered during sample collection.
As an improvement of the invention, the lightweight deep neural network model in the step 2 adopts the structure of a traditional neural network and consists of an input layer, two convolutional layers, two maximum pooling layers, two blocks and a full-connection layer. The input layer is a 224X224 three-channel image; the Conv1 and Conv2 convolutional layers can perform feature extraction on input data, the maximum pooling layers Pool1 and Pool2 can perform effective information filtering, the convolutional layers and the pooling layers are linearly activated by adopting a correction linear unit C.RELU, parameters required by a convolutional kernel can be reduced by cascading an inverted image and an original image, and in addition, batch normalization operation is performed after each layer to accelerate convergence. Then two self-defined modules, namely block1 and block2, are started by a residual error network, and the two modules use skip connection to solve the problem that a deep neural network is difficult to optimize; in order to control the dimensionality of the characteristic diagram, reduce the process parameter quantity, increase the operating efficiency of the network, apply the bottleneck structure to the main line part of the residual error module; considering that a neural model always faces the defects of more parameters and slow operation, a network model utilizes the technologies of point-by-point group convolution and channel rearrangement to avoid the embarrassment of unsmooth information circulation, the feature maps obtained from the upper layer are divided into two groups, the two groups are inspired by an inclusion structure, a plurality of convolution kernels with different sizes are respectively used for the feature maps of the same layer to obtain features with different scales, the features are combined, the obtained features are often better than those of a single convolution kernel, depthwise operation is adopted to replace standard convolution operation to reduce a large number of parameters and obtain better effect at the same time, because each channel is learned, and not all channels correspond to the same filter, the advantages are that the calculated amount is smaller under the same weight parameter, and the operation speed is higher; the residual network structure enables the gradient to flow into a shallow network more easily, and the problem of gradient dispersion caused by deepening of the stratification degree is avoided. In order to improve the generalization of the model, channel shuffle is performed once after each split operation, the operation can fuse the features among different groups, and the next layer of group convolution is entered after the group conv is performed once, so that the cycle is performed. In order to avoid the over-fitting phenomenon, a reasonable regularization process is required. And finally, fusing and classifying the feature maps transmitted in the front by adopting a full connection layer Fc, wherein the number of the neurons of the output layer is the number of the categories of the data set. The network training can obtain the zero-sum game solution only by ensuring that the number of the real samples is far larger than the parameter quantity of the generated model. Secondly, in order to ensure that the discriminant model has good adaptability and discriminant capability, the model is also trained by using a Dropout and L2 regularization aided model.
As an improvement of the present invention, in step 3, the training process of the deep neural network can be described as an optimization process of the model parameters according to the model generation result. According to the method, optimization is carried out according to the real label in the sample and the model generation result. Taking a loss function cross entropy loss function L (loss function) corresponding to the Softmax classifier as an optimization objective function of the training process, and expressing the loss function cross entropy loss function L (loss function) as follows:
Figure BDA0002434861420000041
wherein m is the number of samples of each training batch; theta is a parameter matrix to be optimized of the network model; x (i) is the ith picture sample; y (i) is the ith sample true label; k is the classification number. In order to effectively avoid overfitting and regularization, weight is added to each parameter w in a loss function, and a model complexity index is introduced, so that model noise is suppressed, and overfitting is reduced. When the neural network is trained, all parameters in the neural network need to be changed continuously, and a random gradient descent algorithm (SGD) enables a loss function to be reduced continuously, so that a neural network model with higher accuracy is trained.
To suppress the SGD oscillations, inertia is added during the gradient descent. An ADMA algorithm is introduced on the basis of SGD:
Figure BDA0002434861420000042
Figure BDA0002434861420000043
m (t) is the exponential moving average of the gradient, and V (t) is the non-central variance value at the second moment of the gradient. The Adam algorithm, namely an Adaptive Moment Estimation method (Adaptive Moment Estimation), can calculate the Adaptive learning rate of each parameter. This method not only stores the exponentially decaying average of AdaDelta previous squared gradients, but also maintains the exponentially decaying average of previous gradients M (t), which is similar to momentum; beta is a 1 Empirical value of the parameter 0.9, beta 1 Empirical values of 0.999 for the parameters used to control the exponential decay; the momentum term only updates the parameters of the related samples, and unnecessary parameter updating is reduced, so that faster and stable convergence is obtained, and the oscillation process is also reduced.
The specific deployment flow in step 3 is as follows:
in step 301, because scene recognition is an image multi-classification problem, the network finally adopts a Softmax classifier to classify the input image. Softmax is a classifier, which maps the output of a plurality of neurons into a (0, 1) interval, calculates the probability of a class, performs mean centering preprocessing on an input image, transmits the processed image into a neural network in batches, performs forward calculation in a network model, and obtains a prediction result s through a discriminant formula:
Figure BDA0002434861420000044
Figure BDA0002434861420000045
wherein o is a parameter matrix of the network model, and k is a classification number;
and 302, optimizing the model parameters by using a random gradient descent algorithm and a self-adaptive learning gradient descent optimization algorithm. After the data is subjected to a prediction result through a discrimination formula, parameters are updated by using the training samples and the expected values, so that loss is minimized. Each time a sample is randomly selected from the training set for learning, each learning is very fast and can be updated on-line, the parameters such as weights and biases in the network are updated by the following equations:
Figure BDA0002434861420000051
Figure BDA0002434861420000052
Figure BDA0002434861420000053
θt=θ t-1 -V t
wherein
Figure BDA0002434861420000054
Is m t ,V t Correction of (D), V t For the t-th iterationIs the learning rate of the negative gradient,
Figure BDA0002434861420000055
is the partial derivative of the loss function with respect to the parameter, x (i), y (i) are the training samples, and θ t is the parameter value for the t-th iteration.
As an improvement of the present invention, in step 4, the feature dimension of the image is 365 dimensions, a feature threshold is applied to obtain a key frame library, an euclidean distance between feature vectors is calculated for each key frame image, a similarity matrix is obtained, and a closed-loop frame is found. Finding the nearest point, applying a distance threshold value to determine whether cyclic closing occurs, if the similarity is greater than a set threshold value, then the loop is closed, obtaining an accurate recall curve by changing the distance threshold value, and obtaining a similar recall loop by finding a key. And outputting a closed loop detection accuracy recall rate curve and the detected closed loop to be used as subsequent SLAM mapping optimization. Different training super-parameter settings can be tested during actual model training, and the model with the most excellent performance is selected.
(1) Judging a key frame; in order to avoid the situation that the keyframes are too close to each other, which results in too high similarity between the two keyframes, the frames for loop detection need to be sparse, not much the same, and need to cover the whole environment. Every time the camera moves for a certain interval, a new key frame is taken and stored, and in addition, a closed-loop closed image is determined by a method of limiting the matching range of the current position image, and the range of the detected image is set by using a threshold S. Specifically, if the number of current images is N and the number of excluded images is S, then loop closure occurs only in images other than the S frames prior to the current image.
(2) Acquiring a candidate key frame library from the key frames; the system does not directly match the current key frame with all possible closed-loop frames, but first obtains key frames which are near the key frames and comprise more than or equal to W key frames of categories which are not 0, and sets the key frames; the value of W is reasonably selected, too small value of W can cause too many acquired key frames and increase the calculated amount, too large value of W can exceed the category number, and closed-loop frames cannot be acquired;
(3) Computing key frames and key frame librariesA similarity score for each frame in (a); firstly, normalizing vectors, measuring similarity scores among images by using Euclidean distances of characteristic vectors, selecting a negative correlation function to record scores as the distances and the similarities are in negative correlation, and indicating that the similarity score is higher as the matching score is lower; then apply the distance threshold τ i To determine whether a cycle closure has occurred;
Figure BDA0002434861420000061
Figure BDA0002434861420000062
in the above formula, dis (I, j) is image I i ,I i Distance between, G is the similarity score, k 1 ,k 2 Is a process parameter, where k 1 <0, the similarity score is normalized to [0,1 ] before the detection loop is closed]. For measurement purposes, normalized distances are used to obtain a score value at [0,1%]。
(4) Performing rank reduction processing on the similarity matrix to avoid noise; the similarity scores for each pair of keyframes form a matrix M that describes the relationship between them. M is a real pair matrix n x n matrix, there being an orthogonal matrix V and a diagonal matrix D such that M satisfies the formula, where V i Is a feature vector, d i Is the eigenvalue on the diagonal:
Figure BDA0002434861420000063
the dominant eigenvectors of M are related to the subject matter of penetration into a particular environment, are detrimental to detection loop closure, can create ambiguity due to the repetitive nature of different scenarios and lead to false positive detections. The noise value can be reduced by removing the maximum characteristic value by utilizing the rank reduction matrix, a real loop is reserved, and the detection ambiguity is favorably reduced.
Figure BDA0002434861420000064
The above formula is obtained by calculating λ i Occupied lambda r To lambda n Entropy measures the complexity of M decomposition, removing outer products sequentially from M, obtaining r that maximizes H (M, r) ι
Figure BDA0002434861420000065
A similarity matrix with no single topic dominance is obtained, and a reduced order matrix is used to replace M. By decomposing the similarity matrix into a series of outer products, the effects of common similarity can be removed without removing the image itself and the degree of washout in the enhanced closed-loop detection can also be enhanced. And obtaining the mouth base candidate loop frame of the current frame by checking the high-partition area of the matrix.
(5) Loop frames detected by i frames before the current frame need to be verified whether the loop frames have a direct connection relation with the optimal candidate loop frame, and the optimal candidate loop frame after the spatial continuity check is determined to be the loop frame. After all the images in the dataset are considered, a precision and recall pair result is obtained, and once a loop is found, the spanning trees of adjacent frames are computed and the entire trajectory is optimized.
Compared with the prior art, the invention has the following advantages:
(1) The method is based on the lightweight deep convolution neural network, semantic information which is difficult to obtain due to the characteristics of manual manufacturing can be expressed, the image characteristics are obtained by utilizing the neural network model, the perception capability is enhanced, the texture and the distribution characteristics of an image sample can be effectively learned, in addition, the speed of the model in actual introduction can be improved due to the lightweight characteristics, and the detection speed is improved while the higher accuracy is ensured.
(2) The invention designs an enhancement step aiming at the original characteristics obtained by the network model, thereby obtaining the final loop-back judgment result. The original features are preprocessed, a key frame library is obtained by utilizing an algorithm, pairwise distances of candidate key frames are calculated to judge a loop, wherein the representation capability of the candidate key frames to images can be obviously improved by rank reduction operation, the calculation efficiency is improved, and the accuracy of detection results can be improved by verification after judgment.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a simplified diagram of a lightweight deep neural network architecture of the present invention;
fig. 3 is a block1 architecture diagram of the network of the present invention.
Fig. 4 is a diagram of a network block2 structure of the present invention.
FIG. 5 is a schematic diagram of the detection loop of the present invention;
Detailed Description
The invention is further described with reference to the following figures and specific embodiments.
A robot walking closed-loop detection method based on a lightweight deep neural network is disclosed, as shown in a flow chart of fig. 1, the method comprises the following steps:
step 1: a closed loop test data set is selected. The training of the CNN model is a process with supervised learning, and if the data has no label information, the training of the model cannot be completed. Aiming at the problems, the training of a lightweight deep neural network model is completed on a large labeled scene data set, the trained model is used as a feature extractor of a scene image, and finally the extracted features are applied to closed-loop detection.
An image sample data set is established by adopting a standard-college365, the size of the image sample is 224X224, a supervised learning mode is adopted, and therefore a training set, a verification set and a test set are needed, and model training is completed on a large number of labeled scene data sets. Training data is stored in a plurality of TFRecord files to improve processing efficiency, then samples are read from the TFRecord files to be analyzed, a file list of original data is appointed, the data is read from the files, and after preprocessing of gray values and mean values is carried out on the data, the data are combined and sorted into a batch to be used as neural network input. Meanwhile, in order to ensure that the training sample has enough representativeness, the coverage of various scenes such as different landforms, different distances, different illumination, camera shooting angles and the like needs to be considered during sample collection.
And 2, step: and constructing a lightweight deep neural network. The method comprises the steps of preprocessing training data and test data of a prepared input model, uniformly adjusting images to be 224X224 in size (the actual size can be adjusted to be other different sizes according to needs), enabling the model to be composed of a convolution layer, a pooling layer, a block and a full-connection layer, searching for characteristics of certain aspects of the images through convolution kernels, inputting the characteristics into the model, establishing a relation with results, classifying the characteristics, and finally enabling the output of the last full-connection layer to serve as a characteristic vector of the images.
The lightweight deep neural network model adopts the structure of a traditional neural network, a simplified network structure diagram is shown in fig. 2 and comprises an input layer, two convolutional layers, two maximum pooling layers, two blocks and a full-connection layer, and simplified block1 and block2 structures are respectively shown in fig. 3 and fig. 4. The input layer is a 224X224 three-channel image; the Conv1 and Conv2 convolutional layers can perform feature extraction on input data, the maximum pooling layers Pool1 and Pool2 can perform effective information filtering, the convolutional layers and the pooling layers are linearly activated by adopting a correction linear unit C.RELU, parameters required by a convolutional kernel can be reduced by cascading an inverted image and an original image, and in addition, batch normalization operation is performed after each layer to accelerate convergence. Then two self-defining modules, block1 and block2, are started by a residual error network, and the two modules use skip connection to solve the problem that a deep neural network is difficult to optimize; in order to control the dimensionality of the characteristic diagram, reduce the process parameter quantity, increase the operating efficiency of the network, use the bottleneck structure to the main line part of the residual error module; considering that the neural model always faces the defects of more parameters and slow operation, the network model utilizes the technologies of point-by-point group convolution and channel rearrangement to avoid the dilemma of unsmooth information circulation, divides the feature maps obtained from the upper layer into two groups, is inspired by an inclusion structure, respectively uses a plurality of convolution kernels with different sizes for the feature maps of the same layer to obtain features with different scales, combines the features, and the obtained features are always better than the features using a single convolution kernel; the residual network structure enables the gradient to flow into a shallow network more easily, and the problem of gradient dispersion caused by deepening of the stratification degree is avoided. In order to improve the generalization of the model, channel shuffle is performed once after each split operation, the operation can fuse the features among different groups, and the next layer of group convolution is entered after the group conv is performed once, so that the cycle is performed. In order to avoid the over-fitting phenomenon, a reasonable regularization process is required. And finally, fusing and classifying the feature maps transmitted in the front by adopting a full connection layer Fc, wherein the number of the neurons of the output layer is the number of the categories of the data set. The network training can obtain the zero-sum game solution only by ensuring that the number of the real samples is far larger than the parameter quantity of the generated model. Secondly, in order to ensure that the discriminant model has good adaptability and discriminant capability, the model in the method is trained by using a Dropout and L2 regularization auxiliary model.
And step 3: and optimizing the network model. Loading image samples of a data set, firstly initializing and setting the weight of a neural network model by adopting MSRA, inputting real image samples into a lightweight deep neural network model, training the neural network model by using a well-defined forward propagation process, and alternately training and optimizing model parameters by using backward propagation. After the training is finished, the model obtained by training is stored to be convenient for direct use next time. And testing by using the model stored after the training is finished, and training the network model to achieve certain discrimination accuracy.
In step 3, the training process of the deep neural network can be described as an optimization process of the model parameters according to the model generation result. According to the method, optimization is carried out according to the real label in the sample and the model generation result. Taking a loss function cross entropy loss function L (loss function) corresponding to the Softmax classifier as an optimization objective function of the training process, and expressing the loss function cross entropy loss function L (loss function) as follows:
Figure BDA0002434861420000091
wherein m is the number of samples of each training batch; theta is a parameter matrix to be optimized of the network model; x (i) is the ith picture sample; y (i) is the ith sample true label; k is the classification number. In order to effectively avoid overfitting and regularization, weight is added to each parameter w in a loss function, and a model complexity index is introduced, so that model noise is suppressed, and overfitting is reduced. When the neural network is trained, all parameters in the neural network need to be changed continuously, and a random gradient descent algorithm (SGD) enables a loss function to be reduced continuously, so that a neural network model with higher accuracy is trained.
To suppress the oscillation of the SGD, inertia is added during the gradient descent. An ADMA algorithm is introduced on the basis of SGD:
Figure BDA0002434861420000092
Figure BDA0002434861420000093
m (t) is the exponential moving average of the gradient, and V (t) is the non-central variance value at the second moment of the gradient. The Adam algorithm, namely an Adaptive Moment Estimation method (Adaptive motion Estimation), can calculate the Adaptive learning rate of each parameter. This method not only stores the exponentially decaying average of AdaDelta previous squared gradients, but also maintains the exponentially decaying average of previous gradients M (t), which is similar to momentum; beta is a beta 1 Empirical value of the parameter 0.9, beta 1 Empirical values of 0.999 for the parameters used to control the exponential decay; the momentum term only updates the parameters of the related samples, and unnecessary parameter updating is reduced, so that faster and stable convergence is obtained, and the oscillation process is also reduced.
The specific deployment flow in step 3 is as follows:
in step 301, because scene recognition is an image multi-classification problem, the network finally adopts a Softmax classifier to classify the input image. Softmax is a classifier, which maps the output of a plurality of neurons into a (0, 1) interval, calculates the probability of a class, performs mean centering preprocessing on an input image, transmits the processed image into a neural network in batches, performs forward calculation in a network model, and obtains a prediction result s through a discriminant formula:
Figure BDA0002434861420000101
Figure BDA0002434861420000102
wherein o is a parameter matrix of the network model, and k is a classification number;
and 302, optimizing the model parameters by using a random gradient descent algorithm and a self-adaptive learning gradient descent optimization algorithm. After the data is subjected to a prediction result through a discrimination formula, parameters are updated by using the training samples and the expected values, so that loss is minimized. Each time a sample is randomly selected from the training set for learning, each learning is very fast and can be updated on-line, the parameters such as weights and biases in the network are updated by the following equations:
Figure BDA0002434861420000103
Figure BDA0002434861420000104
Figure BDA0002434861420000105
θt=θ t-1 -V t
wherein
Figure BDA0002434861420000106
Is m t ,V t Correction of (D), V t For the parameter update value of the t-th iteration, λ is the learning rate of the negative gradient,
Figure BDA0002434861420000107
is the partial derivative of the loss function with respect to the parameter, x (i), y (i) are training samples, and θ t is the parameter value for the t-th iteration.
And 4, step 4: closed-loop detection is performed by using the network model, as shown in fig. 5, a deep neural network model that can be used for obtaining image features is obtained through training in step 3. The method comprises the steps of calculating a feature descriptor of each query image (current robot view) by using a neural network, preprocessing original CNN features, adding an enhancement step, and carrying out Principal Component Analysis (PCA) and whitening, so that the capability of representing the images can be remarkably improved, the calculation efficiency is improved, and the feature descriptors are finally used for detection circulation. And after normalization, acquiring a similarity matrix between the images according to Euclidean distance, and reducing the rank of the matrix to reduce noise. The similarity is measured to determine if a loop closure has occurred and after all images in the data set are considered, an accuracy and recall pair result is obtained. By the method, the similarity relation between the images can be obtained, and the problem of closed-loop detection when the robot walks is solved.
In step 4, the image feature dimension in the method is 365 dimensions, a feature threshold is applied to obtain a key frame library, the Euclidean distance between feature vectors is calculated for each key frame image, a similarity matrix is obtained, and a closed-loop frame is found. And finding a nearest point, determining whether circular closure occurs or not by applying a distance threshold, if the similarity is greater than a set threshold, determining that the circular closure is a closed loop, obtaining an accurate recall curve by changing the distance threshold, and obtaining a similar recall loop by finding a key. And outputting a closed loop detection accuracy recall rate curve and the detected closed loop to be used as subsequent SLAM mapping optimization. Different training super-parameter settings can be tested during actual model training, and the model with the most excellent performance is selected.
(1) Judging a key frame; in order to avoid the situation that the keyframes are too close to each other, which results in too high similarity between the two keyframes, the frames for loop detection need to be sparse, not much the same, and need to cover the whole environment. Every time the camera moves for a certain interval, a new key frame is taken and stored, and in addition, a closed-loop closed image is determined by a method for limiting the image matching range of the current position, and the range of the detected image is set by using a threshold S. Specifically, if the number of current images is N and the number of excluded images is S, then loop closure occurs only in images other than the S frames prior to the current image.
(2) Acquiring a candidate key frame library from the key frames; the system does not directly match the current key frame with all possible closed-loop frames, but first obtains key frames which are near the key frames and comprise more than or equal to W key frames of categories which are not 0, and sets the key frames; the value of W is reasonably selected, too small value of W can cause too many acquired key frames and increase the calculated amount, too large value of W can exceed the category number, and closed-loop frames cannot be acquired;
(3) Calculating a similarity score between the key frame and each frame in the key frame library; firstly, normalizing vectors, measuring similarity scores among images by using Euclidean distances of characteristic vectors, selecting a negative correlation function to record scores as the distances and the similarities are in negative correlation, and indicating that the similarity score is higher as the matching score is lower; then apply a distance threshold τ i To determine whether a cycle closure has occurred;
Figure BDA0002434861420000111
Figure BDA0002434861420000112
in the above formula, dis (I, j) is an image I i ,I i Distance between, G is the similarity score, k 1 ,k 2 Is a process parameter, where k 1 <0, the similarity score is normalized to [0,1 ] before the detection loop is closed]. For measurement purposes, normalized distances are used to obtain a score value at [0,1%]。
(4) Performing rank reduction processing on the similarity matrix to avoid noise; the similarity scores for each pair of keyframes form a matrix M that describes the relationship between them. M is a real diagonal n matrix, there is an orthogonal matrix V and a diagonal D such that M satisfies the formula, where V is i Is a feature vector, d i Is the eigenvalue on the diagonal:
Figure BDA0002434861420000121
the dominant eigenvectors of M are related to the subject matter of penetration into a particular environment, are detrimental to detection loop closure, can create ambiguity due to the repetitive nature of different scenarios and lead to false positive detections. The noise value can be reduced by removing the maximum eigenvalue by using the rank reduction matrix, a real loop is reserved, and the detection ambiguity is favorably reduced.
Figure BDA0002434861420000122
The above formula is obtained by calculating λ i Occupied lambda r To lambda n Entropy measures the complexity of M decomposition, removing outer products sequentially from M, obtaining r that maximizes H (M, r) ι
Figure BDA0002434861420000123
A similarity matrix with no single topic dominance is obtained, and a reduced order matrix is used to replace M. By decomposing the similarity matrix into a series of outer products, the impact of similarity commonalities can be removed without removing the image itself and the degree of washout in the enhanced closed-loop detection can also be enhanced. And obtaining the mouth base candidate loop frame of the current frame by checking the high-resolution area of the matrix.
(5) The loop frame detected by the i frames before the current frame needs to verify whether the loop frame has a direct connection relation with the optimal candidate loop frame, and the optimal candidate loop frame after the spatial continuity check is determined as the loop frame. After all the images in the dataset are considered, a precision and recall pair result is obtained, and once a loop is found, the spanning trees of adjacent frames are computed and the entire trajectory is optimized.

Claims (5)

1. A robot walking closed-loop detection method based on a lightweight deep neural network is characterized by comprising the following steps:
step 1, selecting a closed-loop detection data set, finishing training of a lightweight deep neural network model on a large-scale labeled scene data set, using the trained model as a feature extractor of a scene image, and finally applying the extracted features to closed-loop detection;
step 2: building a lightweight deep neural network, preprocessing training data and test data of a prepared input model, wherein the model consists of a convolution layer, a pooling layer, a block and a full-link layer, searching for characteristics of certain aspects of an image through the convolution kernel, inputting the characteristics into the model, establishing a relationship with a result, classifying the characteristics, and finally taking the output of the final full-link layer as a characteristic vector of the image;
and step 3: optimizing a network model, loading an image sample of a data set, initializing a weight value of a neural network model by adopting MSRA (modeling, retrieval and retrieval), inputting a real image sample into a lightweight deep neural network model, training the neural network model by using a well-defined forward propagation process, alternately training and optimizing model parameters by using backward propagation, storing the trained model after training is finished so as to be convenient for next direct use, testing by using the model stored after the previous training is finished, and training the network model to achieve certain discrimination accuracy;
and 4, step 4: performing closed-loop detection by using a network model, obtaining a deep neural network model which can be used for obtaining image characteristics through training in the step 3, calculating a characteristic descriptor of each query image by using a neural network, preprocessing the original CNN characteristics, adding an enhancement step, performing principal component analysis and whitening, and finally using the principal component analysis and whitening for detection circulation; after normalization, a similarity matrix between the images is obtained according to Euclidean distance, rank reduction is carried out on the matrix to reduce noise, whether circular closure occurs or not is determined by measuring similarity, and a precision and recall ratio pair result is obtained after all images in a data set are considered.
2. The robot walking closed-loop detection method based on the lightweight deep neural network as claimed in claim 1, wherein: in the step 1, a standard-college365 is adopted to establish an image sample data set, and the size of the image sample is 224X 224; the training data are stored in a TFRecord file, then samples are read from the TFRecord file to be analyzed, a file list of original data is appointed, the data are read from the file, and after preprocessing of gray value and mean value reduction is carried out on the data, the data are combined and sorted into a batch which is used as neural network input.
3. The robot walking closed-loop detection method based on the lightweight deep neural network as claimed in claim 1, wherein: the lightweight deep neural network model in the step 2 consists of an input layer, two convolutional layers, two maximum pooling layers, two blocks and a full-connection layer; the input layer is a 224X224 three-channel image; the Conv1 and Conv2 convolutional layers are used for performing feature extraction on input data, the maximum pooling layers Pool1 and Pool2 are used for performing effective information filtering, and the convolutional layers and the pooling layers are linearly activated by adopting a correction linear unit C.
4. The robot walking closed-loop detection method based on the lightweight deep neural network as claimed in claim 1, wherein: in step 3, optimizing according to the real label in the sample and the model generation result, taking a loss function cross entropy loss function L corresponding to the Softmax classifier as an optimization target function of the training process, and expressing by the following formula:
Figure FDA0002434861410000021
wherein m is the number of samples of each training batch; theta is a parameter matrix to be optimized of the network model; x (i) is the ith picture sample; y (i) is the ith sample true label; k is the classification number;
adding weight to each parameter w in the loss function, and introducing a model complexity index, thereby suppressing model noise and reducing overfitting; when the neural network is trained, all parameters in the neural network are continuously changed, a loss function is continuously reduced by a random gradient descent algorithm SGD, and a neural network model with higher accuracy is trained;
inertia is added in the gradient descending process to inhibit the oscillation of the SGD, and an ADMA algorithm is introduced on the basis of the SGD:
Figure FDA0002434861410000022
Figure FDA0002434861410000023
m (t) is an exponential moving average value of the gradient, and V (t) is a non-central variance value of the gradient at a second moment;
the specific deployment flow in step 3 is as follows:
step 301, classifying an input image by using a Softmax classifier, mapping output of a plurality of neurons into a (0, 1) interval by Softmax, calculating the probability of the class, performing mean centering preprocessing on the input image, transmitting the processed images into a neural network in batches, performing forward calculation in a network model, and obtaining a prediction result s through a discrimination formula:
Figure FDA0002434861410000024
Figure FDA0002434861410000025
wherein o is a parameter matrix of the network model, and k is a classification number;
302, optimizing model parameters by using a random gradient descent algorithm and a self-adaptive learning gradient descent optimization algorithm, updating parameters by using training samples and expected values after data obtain a prediction result through a discrimination formula, solving optimal parameters by using a random gradient iteration algorithm SDA, randomly selecting one sample from a training set each time to learn, and updating parameters such as weight, bias and the like in the network by using the following formulas:
Figure FDA0002434861410000031
Figure FDA0002434861410000032
Figure FDA0002434861410000033
θt=θ t-1 -V t
wherein
Figure FDA0002434861410000034
Figure FDA0002434861410000035
Is m t ,V t Correction of (V) t For parameter update values for the t-th iteration, λ is the learning rate for negative gradients,
Figure FDA0002434861410000036
is the partial derivative of the loss function with respect to the parameter, x (i), y (i) are training samples, and θ t is the parameter value for the t-th iteration.
5. The robot walking closed-loop detection method based on the lightweight deep neural network according to claim 1, characterized in that: the specific flow of the step 4 is as follows:
(1) Judging a key frame; taking and storing a new key frame every time the camera moves for a certain interval, determining a closed-loop closed image by a method of limiting the matching range of the current position image, and setting the range of the detected image by using a threshold S; if the number of current images is N and the number of excluded images is S, then loop closure occurs only in images other than S frames before the current image;
(2) Acquiring a candidate key frame library from the key frames; firstly, obtaining key frames which are near the key frames and comprise more than or equal to W categories which are not 0 in the key frames, and collecting the key frames; reasonably selecting the value of W;
(3) Calculating a similarity score between the key frame and each frame in the key frame library; firstly, normalizing vectors, measuring similarity scores among images by using Euclidean distance of characteristic vectors, selecting a negative correlation function record score, and indicating that the similarity score is higher when the matching score is lower; then apply the distance threshold τ i To determine whether a cycle closure has occurred;
Figure FDA0002434861410000037
Figure FDA0002434861410000038
in the above formula, dis (I, j) is image I i ,I i Distance between, G is the similarity score, k 1 ,k 2 Is a process parameter, where k 1 <0, the similarity score is normalized to [0,1 ] before the detection loop is closed]Using normalized distance to obtain a score value at [0,1 ]];
(4) Performing rank reduction processing on the similarity matrix to avoid noise; the similarity score of each pair of key frames constitutes a description of themThe matrix M of the relationships is a real pair matrix n x n, there is an orthogonal matrix V and a diagonal matrix D such that M satisfies the formula, where V i Is a feature vector, d i Is the eigenvalue on the diagonal:
Figure FDA0002434861410000041
m main eigenvectors are related to subjects permeating into a specific environment, are harmful to detection cycle closing, can generate fuzziness due to repeated properties of different scenes and lead to false positive detection, remove maximum eigenvalue by using a rank reduction matrix to reduce noise value, reserve real loop and reduce detection fuzziness
Figure FDA0002434861410000042
The above formula is obtained by calculating lambda i Occupied lambda r To lambda n Entropy measures the complexity of M decomposition, removing outer products sequentially from M, obtaining r that maximizes H (M, r) l
Figure FDA0002434861410000043
Replacing M with a reduced-order matrix, decomposing the similar matrix into a series of outer products, removing the influence of similarity of common features under the condition of not removing the image, enhancing the erosion degree in closed-loop detection, and checking high-partition areas of the matrix to obtain a mouth base candidate loop frame of the current frame;
(5) Loop frames detected by i frames before the current frame need to be verified whether to have a direct connection relation with the optimal candidate loop frame, and the optimal candidate loop frame after the spatial continuity test is determined to be the loop frame; after all images in the dataset are considered, a precision and recall pair result is obtained, once a loop is found, the spanning tree of the adjacent frames is calculated and the whole trajectory is optimized.
CN202010249172.8A 2020-04-01 2020-04-01 Visual SLAM closed-loop detection method based on lightweight deep neural network Active CN111553193B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010249172.8A CN111553193B (en) 2020-04-01 2020-04-01 Visual SLAM closed-loop detection method based on lightweight deep neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010249172.8A CN111553193B (en) 2020-04-01 2020-04-01 Visual SLAM closed-loop detection method based on lightweight deep neural network

Publications (2)

Publication Number Publication Date
CN111553193A CN111553193A (en) 2020-08-18
CN111553193B true CN111553193B (en) 2022-11-11

Family

ID=72003800

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010249172.8A Active CN111553193B (en) 2020-04-01 2020-04-01 Visual SLAM closed-loop detection method based on lightweight deep neural network

Country Status (1)

Country Link
CN (1) CN111553193B (en)

Families Citing this family (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112183742B (en) * 2020-09-03 2023-05-12 南强智视(厦门)科技有限公司 Neural network hybrid quantization method based on progressive quantization and Hessian information
CN112086198B (en) * 2020-09-17 2023-09-26 西安交通大学口腔医院 System and method for establishing age assessment model based on deep learning technology
CN112464989B (en) * 2020-11-02 2024-02-20 北京科技大学 Closed loop detection method based on target detection network
CN112258580B (en) * 2020-11-02 2024-05-17 上海应用技术大学 Visual SLAM loop detection method based on deep learning
CN112836719B (en) * 2020-12-11 2024-01-05 南京富岛信息工程有限公司 Indicator diagram similarity detection method integrating two classifications and triplets
CN112419317B (en) * 2020-12-15 2024-02-02 东北大学 Visual loop detection method based on self-coding network
CN112465067B (en) * 2020-12-15 2022-07-15 上海交通大学 Cryoelectron microscope single-particle image clustering implementation method based on image convolution self-encoder
CN112733067B (en) * 2020-12-22 2023-05-09 上海机器人产业技术研究院有限公司 Data set selection method for robot target detection algorithm
CN112766305B (en) * 2020-12-25 2022-04-22 电子科技大学 Visual SLAM closed loop detection method based on end-to-end measurement network
CN112906626A (en) * 2021-03-12 2021-06-04 李辉 Fault identification method based on artificial intelligence
CN113033555B (en) * 2021-03-25 2022-12-23 天津大学 Visual SLAM closed loop detection method based on metric learning
CN113377987B (en) * 2021-05-11 2023-03-28 重庆邮电大学 Multi-module closed-loop detection method based on ResNeSt-APW
CN113052152B (en) * 2021-06-02 2021-07-30 中国人民解放军国防科技大学 Indoor semantic map construction method, device and equipment based on vision
CN113378788A (en) * 2021-07-07 2021-09-10 华南农业大学 Robot vision SLAM loop detection method, computer equipment and storage medium
CN113361654A (en) * 2021-07-12 2021-09-07 广州天鹏计算机科技有限公司 Image identification method and system based on machine learning
CN113780102B (en) * 2021-08-23 2024-05-03 广州密码营地科技有限公司 Intelligent robot vision SLAM closed loop detection method, device and storage medium
CN114445661B (en) * 2022-01-24 2023-08-18 电子科技大学 Embedded image recognition method based on edge calculation
CN114219049B (en) * 2022-02-22 2022-05-10 天津大学 Fine-grained curbstone image classification method and device based on hierarchical constraint
CN115546626B (en) * 2022-03-03 2024-02-02 中国人民解放军国防科技大学 Data double imbalance-oriented depolarization scene graph generation method and system
CN114973330B (en) * 2022-06-16 2023-05-30 深圳大学 Cross-scene robust personnel fatigue state wireless detection method and related equipment
CN115063609B (en) * 2022-06-28 2024-03-26 华南理工大学 Deep learning-based heat pipe liquid absorption core oxidation grading method
CN116721302B (en) * 2023-08-10 2024-01-12 成都信息工程大学 Ice and snow crystal particle image classification method based on lightweight network
CN117220318B (en) * 2023-11-08 2024-04-02 国网浙江省电力有限公司宁波供电公司 Power grid digital driving control method and system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107330357A (en) * 2017-05-18 2017-11-07 东北大学 Vision SLAM closed loop detection methods based on deep neural network
CN110555881A (en) * 2019-08-29 2019-12-10 桂林电子科技大学 Visual SLAM testing method based on convolutional neural network
CN110781790A (en) * 2019-10-19 2020-02-11 北京工业大学 Visual SLAM closed loop detection method based on convolutional neural network and VLAD

Also Published As

Publication number Publication date
CN111553193A (en) 2020-08-18

Similar Documents

Publication Publication Date Title
CN111553193B (en) Visual SLAM closed-loop detection method based on lightweight deep neural network
US11195051B2 (en) Method for person re-identification based on deep model with multi-loss fusion training strategy
CN109919031B (en) Human behavior recognition method based on deep neural network
CN111709311B (en) Pedestrian re-identification method based on multi-scale convolution feature fusion
CN108615010B (en) Facial expression recognition method based on parallel convolution neural network feature map fusion
CN105975573B (en) A kind of file classification method based on KNN
CN109255289B (en) Cross-aging face recognition method based on unified generation model
CN110263697A (en) Pedestrian based on unsupervised learning recognition methods, device and medium again
CN109766936B (en) Image change detection method based on information transfer and attention mechanism
CN112580590A (en) Finger vein identification method based on multi-semantic feature fusion network
CN109166094A (en) A kind of insulator breakdown positioning identifying method based on deep learning
CN109443382A (en) Vision SLAM closed loop detection method based on feature extraction Yu dimensionality reduction neural network
CN112052772A (en) Face shielding detection algorithm
CN112633257A (en) Potato disease identification method based on improved convolutional neural network
CN113920472A (en) Unsupervised target re-identification method and system based on attention mechanism
CN112883931A (en) Real-time true and false motion judgment method based on long and short term memory network
CN114511710A (en) Image target detection method based on convolutional neural network
CN113220926A (en) Footprint image retrieval method based on multi-scale local attention enhancement network
CN110751005B (en) Pedestrian detection method integrating depth perception features and kernel extreme learning machine
CN114861761A (en) Loop detection method based on twin network characteristics and geometric verification
CN111242114B (en) Character recognition method and device
CN117333948A (en) End-to-end multi-target broiler behavior identification method integrating space-time attention mechanism
CN116977725A (en) Abnormal behavior identification method and device based on improved convolutional neural network
CN111860601A (en) Method and device for predicting large fungus species
CN113723482B (en) Hyperspectral target detection method based on multi-example twin network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant