CN112634367A - Anti-occlusion object pose estimation method based on deep neural network - Google Patents
Anti-occlusion object pose estimation method based on deep neural network Download PDFInfo
- Publication number
- CN112634367A CN112634367A CN202011562092.4A CN202011562092A CN112634367A CN 112634367 A CN112634367 A CN 112634367A CN 202011562092 A CN202011562092 A CN 202011562092A CN 112634367 A CN112634367 A CN 112634367A
- Authority
- CN
- China
- Prior art keywords
- branch
- dimensions
- neural network
- branches
- dimension
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 45
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 27
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 34
- 238000012549 training Methods 0.000 claims abstract description 34
- 238000012360 testing method Methods 0.000 claims abstract description 25
- 230000002159 abnormal effect Effects 0.000 claims abstract description 18
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 15
- 238000012545 processing Methods 0.000 claims abstract description 12
- 238000003062 neural network model Methods 0.000 claims abstract description 8
- 239000003550 marker Substances 0.000 claims description 11
- 230000000875 corresponding effect Effects 0.000 claims description 8
- 230000036544 posture Effects 0.000 claims description 8
- 238000012935 Averaging Methods 0.000 claims description 6
- 230000005856 abnormality Effects 0.000 claims description 6
- 230000002596 correlated effect Effects 0.000 claims description 6
- 238000011176 pooling Methods 0.000 claims description 6
- 238000012163 sequencing technique Methods 0.000 claims description 6
- 230000000694 effects Effects 0.000 description 11
- 238000001514 detection method Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 5
- 230000000903 blocking effect Effects 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000009776 industrial production Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention provides an anti-occlusion object pose estimation method based on a deep neural network, which comprises the following steps of: a training set picture database with labels and a test set picture database are automatically constructed by using 3D modeling software; constructing a deep neural network: the neural network comprises four sub-branch networks, and each sub-branch is an independent convolutional neural network structure; constructing a network prediction output processing algorithm: the four sub-branch networks output a predicted value of 6 dimensions at the last layer of dense connection layer, which represents the pose information of the object to be estimated; due to the existence of shielding, the result of a certain branch has an error, abnormal values exist in the output of 4 branches, 5 algorithms are constructed to optimize the abnormal values, and the shielding interference resistance is improved; training a deep neural network model, namely completing the training of the deep neural network by using samples in a training set; and testing the deep neural network model by using different shielding ratio test sets.
Description
Technical Field
The invention belongs to the field of object pose estimation, and relates to a method for estimating an object pose with strong anti-interference performance by using a deep neural network.
Background
The object pose covers all spatial information of the object, including position information and posture information of the object. In many fields such as modern industrial production and life, the pose information of the object has very important significance and plays a role of putting a great deal of weight. Accurate estimation of pose information of objects is the basis for many current industrial applications. For example, in the field of robots, accurate acquisition of target position and posture information is a main task of robot vision and is also a basis of other operation tasks such as follow-up grabbing. In the field of automatic driving of the Internet of things, accurate estimation of the pose of an obstacle is a precondition and guarantee for safe driving. Therefore, the method has very important significance in accurately and quickly estimating the pose of the object.
Convolutional Neural Networks (CNNs) have significant advantages in processing image information compared to many deep-learning neural networks, such as Deep Belief Networks (DBNs), Recurrent Neural Networks (RNNs), and are therefore commonly used to process image-related applications. The convolution kernel extracts information on the image by sliding on a feature map (feature map). The shallow feature map can acquire visual information such as image texture contours, the deep feature map can acquire relatively abstract semantic information, and regional information of features on the image can be integrated. The unique convolution learning mode greatly reduces the use of parameters in the network and obviously improves the training speed and the convergence speed of the network model. And the convolutional neural network obtains good effects in the applications of image classification, target detection, pattern recognition and the like. Therefore, in recent years, the convolutional neural network is further applied to the field of object pose estimation.
The rise of the computer vision technology and the deep neural network greatly simplifies the process of estimating the object pose, and overcomes the difficulties and the defects of complex equipment and complex process in the traditional technical scheme. But due to the characteristics of the deep neural network technology, the deep neural network technology also has inherent defects and shortcomings. The accuracy and effect of the network depend on a large-scale training set and also depend on the consistency of the training samples and the test samples. Good estimation effect can be achieved only when the similarity between the test sample and the training sample is high. However, in actual industrial production and industrial application, due to the ubiquitous existence of factors such as shielding, noise and illumination, good consistency of the test sample and the training sample cannot be guaranteed, and the performance of the trained network model is remarkably reduced when interference is shielded.
In the field of applied target detection of convolutional neural network, shielding is an important interference factor influencing the detection effect[1]. Under the condition that the shielding exists, the difference between the training sample and the testing sample is large, and the difference cannot be intelligently identified and judged, so that large errors are caused, and the performance of the neural network is reduced. Occlusion can be generally divided into two types, one is the mutual occlusion of two objects, and extensive research has been conducted by numerous scholars regarding this interference, and numerous solutions have been proposed. The other is that the target is blocked by the interferent, which is more common and common in industrial application, but at present, only increasing the number and diversity of samples can be adopted to overcome the effect of blocking interference reduction, and no effective solution is available.
Therefore, the method for estimating the pose of the object with the anti-shielding interference capability has very important industrial value and application value. The convolutional neural network has extremely excellent data characterization capability[2]。
The related documents are:
[1] chua Xingxian, Zhao and Peng, Qipeng, etc. the shielded target detection and identification technology researches [ J ] digital technology and application, 2013(9):73-75.
[2] Liu dong, plum, Cao Shi Dong, deep learning and its application in image object classification and detection are reviewed [ J ] computer science 2016, (12):13-23.
Disclosure of Invention
The invention provides an anti-occlusion object pose estimation method based on a deep neural network, aiming at the problem of occlusion interference in the field of object pose estimation. The invention uses a monocular vision system, and the pose information of the object to be estimated is output and utilized end to end by inputting the image of the object to be estimated. The use of the deep convolutional neural network ensures the rapidity and the real-time performance of estimation, and the accuracy and the strong anti-interference performance are ensured by a network prediction value processing algorithm. The technical scheme is as follows:
an anti-occlusion object pose estimation method based on a deep neural network comprises the following steps:
firstly, a training set picture database and a testing set picture database with labels are automatically constructed by using 3D modeling software.
(1) Constructing a cylindrical regular object as a target to be estimated, and using the checkerboard icon as an identifier;
(2) placing the target to be estimated in front of a target camera, wherein the marker is positioned in the middle of the camera view, and the center of the target to be estimated, the marker and the center of a camera lens are positioned on the same horizontal central line and are used as reference positions;
(3) moving and rotating an object to be estimated according to the script, changing the spatial position and posture, capturing marker photos under corresponding postures, and taking corresponding six-dimensional coordinates as labels of the training samples;
(4) obtaining a plurality of photos in batch as training set samples, and carrying out required data format processing on the tags of the photos to meet the requirement of network input;
(5) constructing a blocking test set in the same manner except that the markers are blocked in different percentage ratios;
secondly, constructing a deep neural network: the neural network comprises four sub-branch networks, wherein each sub-branch is an independent convolutional neural network structure: consists of 6 coiling layers, 4 maximum pooling layers, 1 flattening layer and 3 dense connecting layers; wherein each two layers or one layer is followed by one maximum pooling layer, the number of convolution kernels is 32,32,64, 128,256 in order, and the parameters of the dense connection layer are 2048, 6 in order.
Thirdly, constructing a network prediction output processing algorithm: the four sub-branch networks output a predicted value of 6 dimensions at the last layer of dense connection layer, which represents the pose information of the object to be estimated; due to the existence of shielding, the result of a certain branch has an error, abnormal values exist in the output of 4 branches, 5 algorithms are constructed to optimize the abnormal values, and the shielding interference resistance is improved;
(1) weighted average method: the 4 branches respectively output 6-dimensional predicted values of the target to be estimated, the 6 dimensional predicted values are respectively and correspondingly added to solve weighted average, and the 6-dimensional predicted values are taken as final output;
(2) euclidean distance method: considering the estimates of the 6 dimensions separately, for each dimension: calculating the average Euclidean distance of each branch and the other 3 branches in the dimension; setting an average Euclidean distance threshold; calculating the difference value between each branch and the average Euclidean distance, and when the difference value is larger than the average Euclidean distance threshold value, judging that the predicted value of the branch in the current dimension is abnormal, and deleting the branch; averaging the predicted values of the residual branches to be used as the predicted output of the current dimension; performing the operations in 6 dimensions respectively, and finally outputting predicted values of 6 dimensions;
(3) dot group density method: considering the estimates of the 6 dimensions separately, for each dimension: calculating the point group density of each branch and other 3 branches on the dimension, wherein the point group density is related to the distance, the larger the distance is, the smaller the point group density is, and the point group density is represented by the reciprocal of the Euclidean distance; setting a point group density threshold, calculating the difference value between the point group density of each branch and the other 3 branches in the dimension and the average point group density, and when the difference value on a certain branch is smaller than the preset point group density threshold, judging that the predicted value of the branch in the current dimension is abnormal and deleting the branch; averaging the predicted values of the residual branches to serve as the predicted output of the current dimension, performing the operation in 6 dimensions respectively, and finally outputting the predicted values of 6 dimensions;
(4) joint euclidean distance method: the 4 branches respectively output predicted values of 6 dimensions, and the predicted values of the 6 dimensions are correlated with each other in the presence or absence of abnormality, so that the mutual linkage influence among the 6 dimensions is considered in a combined manner, the Euclidean distance is used as a judgment standard, and in the first step, for each dimension: calculating the average Euclidean distance of each branch and the other 3 branches in the dimension; second, for each branch: carrying out weighted average on the average Euclidean distances of 6 dimensions to obtain a confidence coefficient of the branch; thirdly, sequencing the confidences of the 4 branches, finding out the branch with the lowest confidence, namely the branch with the largest weighted average Euclidean distance in the second step, and excluding the branch; fourthly, carrying out weighted average on the remaining 3 branches by using a weighted average method of the algorithm (1) and outputting predicted values of 6 dimensions;
(5) joint point group density method: the 4 branches respectively output predicted values of 6 dimensions, and the predicted values of the 6 dimensions are correlated with each other in the presence or absence of abnormality, so that the mutual linkage influence among the 6 dimensions is jointly considered, and the point group density is used as a judgment standard; first, for each dimension: calculating the point group density of each branch and other 3 branches in the dimension; second, for each branch: carrying out weighted average on the density values of the point groups with 6 dimensions to obtain a confidence coefficient of the branch; thirdly, sequencing the confidence degrees of the 4 branches, finding out the branch with the lowest confidence degree, namely the point group density minimum branch weighted and averaged in the second step, and excluding the branch; fourthly, carrying out weighted average on the remaining 3 branches by using a weighted average method of the algorithm (1) and outputting predicted values of 6 dimensions;
thirdly, training a deep neural network model, namely completing the training of the deep neural network by using samples in a training set;
the fourth step: and testing the deep neural network model by using different shielding ratio test sets.
The invention designs a strong anti-interference object pose estimation method based on a deep neural network by utilizing a convolutional neural network and 5 abnormal value detection algorithms. The method takes a picture of an unshielded marker as a training set for training a deep convolutional neural network model, and takes a picture of a part of the unshielded marker as a testing set for testing the anti-interference capability of the deep convolutional neural network model. Compared with the prior art, the method has stronger anti-interference performance, and the estimation accuracy of the object pose under the shielding condition is greatly improved.
Drawings
FIG. 1 is a flow chart of strong anti-interference object pose estimation based on a deep neural network
FIG. 2 Euclidean distance algorithm flow chart
FIG. 3 is a flow chart of a point group density algorithm
FIG. 4 is a flow chart of a joint Euclidean distance algorithm
FIG. 5 flow chart of the joint point group density algorithm
FIG. 6 comparison of effects under different test sets
FIG. 7 is a comparison graph of attitude estimation effects of 5 algorithms under different test sets
FIG. 8 is a comparison graph of position estimation effects of 5 algorithms under different test sets
Detailed Description
The invention adopts a frame of a convolutional neural network to complete the task of mapping and learning of the corresponding relation between feature extraction and vision, and 5 different mathematical algorithms are constructed on the basis of network prediction output to process predicted values, so that the prediction capability in the presence of shielding interference is improved. The scheme can greatly simplify the complexity of object pose estimation, omits image processing processes such as feature extraction, feature matching and the like, and realizes end-to-end estimation. Compared with the prior art, the method has the advantages that the output algorithm is used for processing the predicted value output by the network, the pose estimation precision and the shielding interference resistance are further improved, and the object pose estimation is more convenient, rapid, accurate and efficient
In order to make the technical scheme of the invention clearer, the invention is further explained below by combining the attached drawings. The invention is realized by the following steps:
firstly, a training set picture database and a testing set picture database with labels are automatically constructed by using 3D modeling software.
(1) And constructing a cylindrical regular object with the radius of 100mm and the height of 200mm as a target to be estimated, and taking the checkerboard grid icon as an identifier.
(2) And (3) placing the object to be estimated at a position 0.5m in front of the target camera, wherein the marker is positioned in the middle of the camera view, and the center of the cylindrical object, the marker and the camera lens are positioned on the same horizontal central line and are used as reference positions.
(3) And moving and rotating the object to be estimated according to the script, changing the spatial position and posture, capturing the marker photo under the corresponding posture, and taking the corresponding six-dimensional coordinate as the label of the training sample.
(4) 50000 photos are obtained in batches to be used as training set samples, and required data format processing is carried out on the labels of the photos, so that the requirement of network input is met.
(5) Occlusion test sets were constructed in the same manner except that the markers were occluded at a rate of 0%, 4%, 9%, 12%, 16%, 20%, 25%.
And secondly, constructing a deep neural network. The neural network comprises four sub-branch networks, wherein each sub-branch is an independent convolutional neural network structure: consists of 6 convolution layers, 4 maximum pooling layers, 1 flattening layer and 3 dense connecting layers. Wherein each two layers or one layer is followed by one maximum pooling layer, the number of convolution kernels is 32,32,64, 128,256 in order, and the parameters of the dense connection layer are 2048, 6 in order.
And thirdly, constructing a network prediction output processing algorithm. The four sub-branch networks output 6-dimensional estimated values at the last dense connection layer and represent the pose information of the object to be estimated. Due to the existence of shielding, the result of a certain branch has an error, abnormal values exist in 4 outputs, 5 algorithms are constructed to optimize the results, and the anti-interference capability is improved.
N-4, i.e., four branches; n is 1,2,3,4, i.e. the specific branch; 1, 2., 6, i.e., 6 degrees of freedom, and p (i) represents a predicted value of the i-th degree of freedom. The predicted value of each dimension is analyzed independently by a weighted average method, an Euclidean distance method and a point group density method, and the predicted value of 6 dimensions is analyzed jointly by a combined Euclidean distance method and a combined point group density method.
Weighted average method: the 4 branches respectively output 6-dimensional predicted values of the target to be estimated, the network output dimension at the moment is 4 x 6, the 6 dimensional predicted values are respectively and correspondingly added to calculate weighted average, and the 6-dimensional predicted values are taken as final output. When the region corresponding to a certain branch is shielded, the error of the predicted value of the branch is large, and the error can be weakened and reduced by accurately predicting the branch through a weighted average method.
Equation (1) represents the average of the ith degree of freedom, and equation (2) represents the weighted average of the ith degree of freedom. Where s (i) is a weighting factor.
Euclidean distance method: considering the estimates of the 6 dimensions separately, for each dimension: calculating the average Euclidean distance of each branch and the other 3 branches in the dimension; setting an average Euclidean distance threshold; calculating the difference value between each branch and the average Euclidean distance, and when the difference value is greater than the average Euclidean distance threshold value, judging that the predicted value of the branch in the current dimension is abnormal, and deleting the predicted value; and averaging the predicted values of the residual branches to be used as the predicted output of the current dimension. And respectively carrying out the operations in 6 dimensions, and finally outputting a predicted value of 6 dimensions. The algorithm flow is shown in fig. 2. Equation (3) calculates the distance of the nth branch from the other N-1 branches in dimension i, and equation (4) calculates the average distance of the 4 branch dimensions i. The Euclidean distance threshold is set to 0.2, and when the distance dis (n, i) and the average distance dis (i) of a certain branch exceed the threshold, the estimated value is judged to be an abnormal value, and the predicted value is excluded.
Dot group density method: considering the estimates of the 6 dimensions separately, for each dimension: calculating the point group density of each branch and other 3 branches on the dimension, wherein the point group density is related to the distance, the larger the distance is, the smaller the point group density is, and the point group density is represented by the reciprocal of the Euclidean distance; setting average point group density, calculating the difference value between the point group density and the average point group density of each branch and other 3 branches in the dimension, and when the difference value on a certain branch is smaller than a preset point group density threshold value, judging that the predicted value of the branch in the current dimension is abnormal, and deleting the branch; and averaging the predicted values of the residual branches to be used as the predicted output of the current dimension, performing the operation in 6 dimensions respectively, and finally outputting the predicted values of 6 dimensions.
The algorithm flow is shown in fig. 3. From a geometric analysis, the outlier point cluster density is small. Outliers are excluded by comparing the dot cluster densities for each dimension. Equation (5) calculates the point cloud density of the nth branch, and equation (6) finds the branch where the minimum point cloud density is located.
Since there is a correlation between 6 degrees of freedom of an object, the 6-dimensional mutual linkage influence is considered in combination, and the distance between the jth branch and the kth branch can be expressed by B (j, k) shown in formula (7) as a whole, with a dimension of 6.
Joint euclidean distance method: the 4 branches respectively output predicted values of 6 dimensions, and the predicted values of the 6 dimensions are correlated with each other in the presence or absence of abnormality, so that the mutual linkage influence among the 6 dimensions is considered in a combined manner, the Euclidean distance is used as a judgment standard, and in the first step, for each dimension: the average euclidean distance in this dimension is calculated for each branch and the other 3 branches. Second, for each branch: the average euclidean distance of the 6 dimensions is weighted averaged to obtain a confidence for the branch. And thirdly, sequencing the confidences of the 4 branches, finding out the branch with the lowest confidence, namely the branch with the largest weighted average Euclidean distance in the second step, and excluding the branch. And fourthly, carrying out weighted average on the rest 3 branches by using a weighted average method of the algorithm (1) and outputting predicted values with 6 dimensions. The algorithm flow chart is shown in fig. 4. And (4) determining an abnormal value by using a 6-dimensional combined Euclidean distance as a criterion according to an Euclidean distance algorithm. Equation (8) represents the i-dimensional euclidean distance of the nth branch. Equation (9) weights the different dimensions to obtain the joint euclidean distance for the nth branch. The abnormal value having the largest euclidean distance is calculated in formula (10).
Joint point group density method: the 4 branches respectively output predicted values of 6 dimensions, and the predicted values of the 6 dimensions are correlated with each other in the presence or absence of abnormality, so that the mutual linkage influence among the 6 dimensions is jointly considered, and the point group density is used as a judgment standard. First, for each dimension: the point cluster density in this dimension is calculated for each branch and the other 3 branches. Second, for each branch: and carrying out weighted average on the density values of the point groups of 6 dimensions to obtain a confidence coefficient of the branch. And thirdly, sequencing the confidences of the 4 branches, finding out the branch with the lowest confidence, namely the point group density minimum branch weighted and averaged in the second step, and excluding the branch. And fourthly, carrying out weighted average on the rest 3 branches by using a weighted average method of the algorithm (1) and outputting predicted values with 6 dimensions. The algorithm flow chart is shown in fig. 5. And (4) considering the mutual linkage influence among the 6 dimensions jointly, and judging an abnormal value by taking the joint point cluster density as a criterion. Equation (11) calculates the point cluster density of the ith dimension of the nth branch. Equation (12) calculates the 6-dimensional weighted joint point cluster density for the nth branch. Equation (13) calculates the minimum joint point cluster density outlier
And thirdly, training a deep neural network model. And completing the training of the deep neural network by using the samples in the training set. The specific training parameters are as follows:
(1) each epoch randomly selects 3000 pictures from the training set as the training samples of the current round;
(2) mini _ batch ═ 2: 2 pictures are input into the network at a time, and the loss of the two pictures is reduced by the aim of back propagation
(3) nb _ epoch ═ 6: each epoch is repeated 6 times before proceeding to the next epoch.
(4) One epoch will be saved 600, one h5 file, the currently trained network model. The next round will be trained on the current basis, i.e. again extracting 3000 training pictures.
(5) Random gradient descent counter-propagating reduces loss.
The fourth step: the deep neural network model was tested using different occlusion ratio test sets, and the network prediction output values were processed using 5 algorithms. FIG. 6 compares the pose estimation effects of a single network SBN without a network predicted value processing algorithm and a 4-branch network MBN-4 using a combined Euclidean distance algorithm under different occlusion proportions, and it can be seen that when the occlusion proportion of the MBN-4 increases, higher estimation accuracy can still be ensured. Fig. 7 compares the average effect of 5 algorithms on object pose across all test sets. Figure 8 compares the average effect of 5 algorithms on object position across all test sets.
Claims (1)
1. An anti-occlusion object pose estimation method based on a deep neural network comprises the following steps:
firstly, a training set picture database and a testing set picture database with labels are automatically constructed by using 3D modeling software.
(1) Constructing a cylindrical regular object as a target to be estimated, and using the checkerboard icon as an identifier;
(2) placing the target to be estimated in front of a target camera, wherein the marker is positioned in the middle of the camera view, and the center of the target to be estimated, the marker and the center of a camera lens are positioned on the same horizontal central line and are used as reference positions;
(3) moving and rotating an object to be estimated according to the script, changing the spatial position and posture, capturing marker photos under corresponding postures, and taking corresponding six-dimensional coordinates as labels of the training samples;
(4) obtaining a plurality of photos in batch as training set samples, and carrying out required data format processing on the tags of the photos to meet the requirement of network input;
(5) constructing a shielding test set in the same manner except that the markers are shielded at a rate of 0%, 4%, 9%, 12%, 16%, 20%, 25%;
secondly, constructing a deep neural network: the neural network comprises four sub-branch networks, wherein each sub-branch is an independent convolutional neural network structure: consists of 6 coiling layers, 4 maximum pooling layers, 1 flattening layer and 3 dense connecting layers; wherein each two layers or one layer is followed by one maximum pooling layer, the number of convolution kernels is 32,32,64, 128,256 in order, and the parameters of the dense connection layer are 2048, 6 in order.
Thirdly, constructing a network prediction output processing algorithm: the four sub-branch networks output a predicted value of 6 dimensions at the last layer of dense connection layer, which represents the pose information of the object to be estimated; due to the existence of shielding, the result of a certain branch has an error, abnormal values exist in the output of 4 branches, 5 algorithms are constructed to optimize the abnormal values, and the shielding interference resistance is improved;
(1) weighted average method: the 4 branches respectively output 6-dimensional predicted values of the target to be estimated, the 6 dimensional predicted values are respectively and correspondingly added to solve weighted average, and the 6-dimensional predicted values are taken as final output;
(2) euclidean distance method: considering the estimates of the 6 dimensions separately, for each dimension: calculating the average Euclidean distance of each branch and the other 3 branches in the dimension; setting an average Euclidean distance threshold; calculating the difference value between each branch and the average Euclidean distance, and when the difference value is larger than the average Euclidean distance threshold value, judging that the predicted value of the branch in the current dimension is abnormal, and deleting the branch; averaging the predicted values of the residual branches to be used as the predicted output of the current dimension; performing the operations in 6 dimensions respectively, and finally outputting predicted values of 6 dimensions;
(3) dot group density method: considering the estimates of the 6 dimensions separately, for each dimension: calculating the point group density of each branch and other 3 branches on the dimension, wherein the point group density is related to the distance, the larger the distance is, the smaller the point group density is, and the point group density is represented by the reciprocal of the Euclidean distance; setting a point group density threshold, calculating the difference value between the point group density of each branch and the other 3 branches in the dimension and the average point group density, and when the difference value on a certain branch is smaller than the preset point group density threshold, judging that the predicted value of the branch in the current dimension is abnormal and deleting the branch; averaging the predicted values of the residual branches to serve as the predicted output of the current dimension, performing the operation in 6 dimensions respectively, and finally outputting the predicted values of 6 dimensions;
(4) joint euclidean distance method: the 4 branches respectively output predicted values of 6 dimensions, and the predicted values of the 6 dimensions are correlated with each other in the presence or absence of abnormality, so that the mutual linkage influence among the 6 dimensions is considered in a combined manner, the Euclidean distance is used as a judgment standard, and in the first step, for each dimension: calculating the average Euclidean distance of each branch and the other 3 branches in the dimension; second, for each branch: carrying out weighted average on the average Euclidean distances of 6 dimensions to obtain a confidence coefficient of the branch; thirdly, sequencing the confidences of the 4 branches, finding out the branch with the lowest confidence, namely the branch with the largest weighted average Euclidean distance in the second step, and excluding the branch; fourthly, carrying out weighted average on the remaining 3 branches by using a weighted average method of the algorithm (1) and outputting predicted values of 6 dimensions;
(5) joint point group density method: the 4 branches respectively output predicted values of 6 dimensions, and the predicted values of the 6 dimensions are correlated with each other in the presence or absence of abnormality, so that the mutual linkage influence among the 6 dimensions is jointly considered, and the point group density is used as a judgment standard; first, for each dimension: calculating the point group density of each branch and other 3 branches in the dimension; second, for each branch: carrying out weighted average on the density values of the point groups with 6 dimensions to obtain a confidence coefficient of the branch; thirdly, sequencing the confidence degrees of the 4 branches, finding out the branch with the lowest confidence degree, namely the point group density minimum branch weighted and averaged in the second step, and excluding the branch; fourthly, carrying out weighted average on the remaining 3 branches by using a weighted average method of the algorithm (1) and outputting predicted values of 6 dimensions;
thirdly, training a deep neural network model, namely completing the training of the deep neural network by using samples in a training set;
the fourth step: and testing the deep neural network model by using different shielding ratio test sets.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011562092.4A CN112634367A (en) | 2020-12-25 | 2020-12-25 | Anti-occlusion object pose estimation method based on deep neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011562092.4A CN112634367A (en) | 2020-12-25 | 2020-12-25 | Anti-occlusion object pose estimation method based on deep neural network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112634367A true CN112634367A (en) | 2021-04-09 |
Family
ID=75324868
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011562092.4A Pending CN112634367A (en) | 2020-12-25 | 2020-12-25 | Anti-occlusion object pose estimation method based on deep neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112634367A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114627359A (en) * | 2020-12-08 | 2022-06-14 | 山东新松工业软件研究院股份有限公司 | Out-of-order stacked workpiece grabbing priority evaluation method |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108491880A (en) * | 2018-03-23 | 2018-09-04 | 西安电子科技大学 | Object classification based on neural network and position and orientation estimation method |
US20190147234A1 (en) * | 2017-11-15 | 2019-05-16 | Qualcomm Technologies, Inc. | Learning disentangled invariant representations for one shot instance recognition |
CN109816725A (en) * | 2019-01-17 | 2019-05-28 | 哈工大机器人(合肥)国际创新研究院 | A kind of monocular camera object pose estimation method and device based on deep learning |
WO2019144575A1 (en) * | 2018-01-24 | 2019-08-01 | 中山大学 | Fast pedestrian detection method and device |
CN110322510A (en) * | 2019-06-27 | 2019-10-11 | 电子科技大学 | A kind of 6D position and orientation estimation method using profile information |
CN111339903A (en) * | 2020-02-21 | 2020-06-26 | 河北工业大学 | Multi-person human body posture estimation method |
-
2020
- 2020-12-25 CN CN202011562092.4A patent/CN112634367A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190147234A1 (en) * | 2017-11-15 | 2019-05-16 | Qualcomm Technologies, Inc. | Learning disentangled invariant representations for one shot instance recognition |
WO2019144575A1 (en) * | 2018-01-24 | 2019-08-01 | 中山大学 | Fast pedestrian detection method and device |
CN108491880A (en) * | 2018-03-23 | 2018-09-04 | 西安电子科技大学 | Object classification based on neural network and position and orientation estimation method |
CN109816725A (en) * | 2019-01-17 | 2019-05-28 | 哈工大机器人(合肥)国际创新研究院 | A kind of monocular camera object pose estimation method and device based on deep learning |
CN110322510A (en) * | 2019-06-27 | 2019-10-11 | 电子科技大学 | A kind of 6D position and orientation estimation method using profile information |
CN111339903A (en) * | 2020-02-21 | 2020-06-26 | 河北工业大学 | Multi-person human body posture estimation method |
Non-Patent Citations (2)
Title |
---|
JIACHEN YANG .ETC: ""Robust Six Degrees of Freedom Estimation for IIoT Based on Multibranch Network"", 《IEEE》, 23 March 2020 (2020-03-23) * |
雷宇田;杨嘉琛;满家宝;奚萌;: "自适应航天器态势分析***", 宇航总体技术, no. 01, 15 January 2020 (2020-01-15) * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114627359A (en) * | 2020-12-08 | 2022-06-14 | 山东新松工业软件研究院股份有限公司 | Out-of-order stacked workpiece grabbing priority evaluation method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108171748B (en) | Visual identification and positioning method for intelligent robot grabbing application | |
CN108280856B (en) | Unknown object grabbing pose estimation method based on mixed information input network model | |
CN108010078B (en) | Object grabbing detection method based on three-level convolutional neural network | |
CN109685141B (en) | Robot article sorting visual detection method based on deep neural network | |
CN106886216B (en) | Robot automatic tracking method and system based on RGBD face detection | |
CN109559320A (en) | Realize that vision SLAM semanteme builds the method and system of figure function based on empty convolution deep neural network | |
CN109118473B (en) | Angular point detection method based on neural network, storage medium and image processing system | |
CN110969660B (en) | Robot feeding system based on three-dimensional vision and point cloud deep learning | |
CN111553949B (en) | Positioning and grabbing method for irregular workpiece based on single-frame RGB-D image deep learning | |
CN110509273B (en) | Robot manipulator detection and grabbing method based on visual deep learning features | |
CN113284179B (en) | Robot multi-object sorting method based on deep learning | |
CN113888461A (en) | Method, system and equipment for detecting defects of hardware parts based on deep learning | |
CN110006444B (en) | Anti-interference visual odometer construction method based on optimized Gaussian mixture model | |
CN111598172B (en) | Dynamic target grabbing gesture rapid detection method based on heterogeneous depth network fusion | |
CN111259837B (en) | Pedestrian re-identification method and system based on part attention | |
CN111414875B (en) | Three-dimensional point cloud head posture estimation system based on depth regression forest | |
KR101460313B1 (en) | Apparatus and method for robot localization using visual feature and geometric constraints | |
CN113657551B (en) | Robot grabbing gesture task planning method for sorting and stacking multiple targets | |
CN113752255A (en) | Mechanical arm six-degree-of-freedom real-time grabbing method based on deep reinforcement learning | |
CN114387513A (en) | Robot grabbing method and device, electronic equipment and storage medium | |
CN110992378A (en) | Dynamic update visual tracking aerial photography method and system based on rotor flying robot | |
Wei et al. | Novel green-fruit detection algorithm based on D2D framework | |
CN112669452B (en) | Object positioning method based on convolutional neural network multi-branch structure | |
CN112634367A (en) | Anti-occlusion object pose estimation method based on deep neural network | |
CN114998573B (en) | Grabbing pose detection method based on RGB-D feature depth fusion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |