CN112634367A - Anti-occlusion object pose estimation method based on deep neural network - Google Patents

Anti-occlusion object pose estimation method based on deep neural network Download PDF

Info

Publication number
CN112634367A
CN112634367A CN202011562092.4A CN202011562092A CN112634367A CN 112634367 A CN112634367 A CN 112634367A CN 202011562092 A CN202011562092 A CN 202011562092A CN 112634367 A CN112634367 A CN 112634367A
Authority
CN
China
Prior art keywords
branch
dimensions
neural network
branches
dimension
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011562092.4A
Other languages
Chinese (zh)
Inventor
杨嘉琛
奚萌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN202011562092.4A priority Critical patent/CN112634367A/en
Publication of CN112634367A publication Critical patent/CN112634367A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides an anti-occlusion object pose estimation method based on a deep neural network, which comprises the following steps of: a training set picture database with labels and a test set picture database are automatically constructed by using 3D modeling software; constructing a deep neural network: the neural network comprises four sub-branch networks, and each sub-branch is an independent convolutional neural network structure; constructing a network prediction output processing algorithm: the four sub-branch networks output a predicted value of 6 dimensions at the last layer of dense connection layer, which represents the pose information of the object to be estimated; due to the existence of shielding, the result of a certain branch has an error, abnormal values exist in the output of 4 branches, 5 algorithms are constructed to optimize the abnormal values, and the shielding interference resistance is improved; training a deep neural network model, namely completing the training of the deep neural network by using samples in a training set; and testing the deep neural network model by using different shielding ratio test sets.

Description

Anti-occlusion object pose estimation method based on deep neural network
Technical Field
The invention belongs to the field of object pose estimation, and relates to a method for estimating an object pose with strong anti-interference performance by using a deep neural network.
Background
The object pose covers all spatial information of the object, including position information and posture information of the object. In many fields such as modern industrial production and life, the pose information of the object has very important significance and plays a role of putting a great deal of weight. Accurate estimation of pose information of objects is the basis for many current industrial applications. For example, in the field of robots, accurate acquisition of target position and posture information is a main task of robot vision and is also a basis of other operation tasks such as follow-up grabbing. In the field of automatic driving of the Internet of things, accurate estimation of the pose of an obstacle is a precondition and guarantee for safe driving. Therefore, the method has very important significance in accurately and quickly estimating the pose of the object.
Convolutional Neural Networks (CNNs) have significant advantages in processing image information compared to many deep-learning neural networks, such as Deep Belief Networks (DBNs), Recurrent Neural Networks (RNNs), and are therefore commonly used to process image-related applications. The convolution kernel extracts information on the image by sliding on a feature map (feature map). The shallow feature map can acquire visual information such as image texture contours, the deep feature map can acquire relatively abstract semantic information, and regional information of features on the image can be integrated. The unique convolution learning mode greatly reduces the use of parameters in the network and obviously improves the training speed and the convergence speed of the network model. And the convolutional neural network obtains good effects in the applications of image classification, target detection, pattern recognition and the like. Therefore, in recent years, the convolutional neural network is further applied to the field of object pose estimation.
The rise of the computer vision technology and the deep neural network greatly simplifies the process of estimating the object pose, and overcomes the difficulties and the defects of complex equipment and complex process in the traditional technical scheme. But due to the characteristics of the deep neural network technology, the deep neural network technology also has inherent defects and shortcomings. The accuracy and effect of the network depend on a large-scale training set and also depend on the consistency of the training samples and the test samples. Good estimation effect can be achieved only when the similarity between the test sample and the training sample is high. However, in actual industrial production and industrial application, due to the ubiquitous existence of factors such as shielding, noise and illumination, good consistency of the test sample and the training sample cannot be guaranteed, and the performance of the trained network model is remarkably reduced when interference is shielded.
In the field of applied target detection of convolutional neural network, shielding is an important interference factor influencing the detection effect[1]. Under the condition that the shielding exists, the difference between the training sample and the testing sample is large, and the difference cannot be intelligently identified and judged, so that large errors are caused, and the performance of the neural network is reduced. Occlusion can be generally divided into two types, one is the mutual occlusion of two objects, and extensive research has been conducted by numerous scholars regarding this interference, and numerous solutions have been proposed. The other is that the target is blocked by the interferent, which is more common and common in industrial application, but at present, only increasing the number and diversity of samples can be adopted to overcome the effect of blocking interference reduction, and no effective solution is available.
Therefore, the method for estimating the pose of the object with the anti-shielding interference capability has very important industrial value and application value. The convolutional neural network has extremely excellent data characterization capability[2]
The related documents are:
[1] chua Xingxian, Zhao and Peng, Qipeng, etc. the shielded target detection and identification technology researches [ J ] digital technology and application, 2013(9):73-75.
[2] Liu dong, plum, Cao Shi Dong, deep learning and its application in image object classification and detection are reviewed [ J ] computer science 2016, (12):13-23.
Disclosure of Invention
The invention provides an anti-occlusion object pose estimation method based on a deep neural network, aiming at the problem of occlusion interference in the field of object pose estimation. The invention uses a monocular vision system, and the pose information of the object to be estimated is output and utilized end to end by inputting the image of the object to be estimated. The use of the deep convolutional neural network ensures the rapidity and the real-time performance of estimation, and the accuracy and the strong anti-interference performance are ensured by a network prediction value processing algorithm. The technical scheme is as follows:
an anti-occlusion object pose estimation method based on a deep neural network comprises the following steps:
firstly, a training set picture database and a testing set picture database with labels are automatically constructed by using 3D modeling software.
(1) Constructing a cylindrical regular object as a target to be estimated, and using the checkerboard icon as an identifier;
(2) placing the target to be estimated in front of a target camera, wherein the marker is positioned in the middle of the camera view, and the center of the target to be estimated, the marker and the center of a camera lens are positioned on the same horizontal central line and are used as reference positions;
(3) moving and rotating an object to be estimated according to the script, changing the spatial position and posture, capturing marker photos under corresponding postures, and taking corresponding six-dimensional coordinates as labels of the training samples;
(4) obtaining a plurality of photos in batch as training set samples, and carrying out required data format processing on the tags of the photos to meet the requirement of network input;
(5) constructing a blocking test set in the same manner except that the markers are blocked in different percentage ratios;
secondly, constructing a deep neural network: the neural network comprises four sub-branch networks, wherein each sub-branch is an independent convolutional neural network structure: consists of 6 coiling layers, 4 maximum pooling layers, 1 flattening layer and 3 dense connecting layers; wherein each two layers or one layer is followed by one maximum pooling layer, the number of convolution kernels is 32,32,64, 128,256 in order, and the parameters of the dense connection layer are 2048, 6 in order.
Thirdly, constructing a network prediction output processing algorithm: the four sub-branch networks output a predicted value of 6 dimensions at the last layer of dense connection layer, which represents the pose information of the object to be estimated; due to the existence of shielding, the result of a certain branch has an error, abnormal values exist in the output of 4 branches, 5 algorithms are constructed to optimize the abnormal values, and the shielding interference resistance is improved;
(1) weighted average method: the 4 branches respectively output 6-dimensional predicted values of the target to be estimated, the 6 dimensional predicted values are respectively and correspondingly added to solve weighted average, and the 6-dimensional predicted values are taken as final output;
(2) euclidean distance method: considering the estimates of the 6 dimensions separately, for each dimension: calculating the average Euclidean distance of each branch and the other 3 branches in the dimension; setting an average Euclidean distance threshold; calculating the difference value between each branch and the average Euclidean distance, and when the difference value is larger than the average Euclidean distance threshold value, judging that the predicted value of the branch in the current dimension is abnormal, and deleting the branch; averaging the predicted values of the residual branches to be used as the predicted output of the current dimension; performing the operations in 6 dimensions respectively, and finally outputting predicted values of 6 dimensions;
(3) dot group density method: considering the estimates of the 6 dimensions separately, for each dimension: calculating the point group density of each branch and other 3 branches on the dimension, wherein the point group density is related to the distance, the larger the distance is, the smaller the point group density is, and the point group density is represented by the reciprocal of the Euclidean distance; setting a point group density threshold, calculating the difference value between the point group density of each branch and the other 3 branches in the dimension and the average point group density, and when the difference value on a certain branch is smaller than the preset point group density threshold, judging that the predicted value of the branch in the current dimension is abnormal and deleting the branch; averaging the predicted values of the residual branches to serve as the predicted output of the current dimension, performing the operation in 6 dimensions respectively, and finally outputting the predicted values of 6 dimensions;
(4) joint euclidean distance method: the 4 branches respectively output predicted values of 6 dimensions, and the predicted values of the 6 dimensions are correlated with each other in the presence or absence of abnormality, so that the mutual linkage influence among the 6 dimensions is considered in a combined manner, the Euclidean distance is used as a judgment standard, and in the first step, for each dimension: calculating the average Euclidean distance of each branch and the other 3 branches in the dimension; second, for each branch: carrying out weighted average on the average Euclidean distances of 6 dimensions to obtain a confidence coefficient of the branch; thirdly, sequencing the confidences of the 4 branches, finding out the branch with the lowest confidence, namely the branch with the largest weighted average Euclidean distance in the second step, and excluding the branch; fourthly, carrying out weighted average on the remaining 3 branches by using a weighted average method of the algorithm (1) and outputting predicted values of 6 dimensions;
(5) joint point group density method: the 4 branches respectively output predicted values of 6 dimensions, and the predicted values of the 6 dimensions are correlated with each other in the presence or absence of abnormality, so that the mutual linkage influence among the 6 dimensions is jointly considered, and the point group density is used as a judgment standard; first, for each dimension: calculating the point group density of each branch and other 3 branches in the dimension; second, for each branch: carrying out weighted average on the density values of the point groups with 6 dimensions to obtain a confidence coefficient of the branch; thirdly, sequencing the confidence degrees of the 4 branches, finding out the branch with the lowest confidence degree, namely the point group density minimum branch weighted and averaged in the second step, and excluding the branch; fourthly, carrying out weighted average on the remaining 3 branches by using a weighted average method of the algorithm (1) and outputting predicted values of 6 dimensions;
thirdly, training a deep neural network model, namely completing the training of the deep neural network by using samples in a training set;
the fourth step: and testing the deep neural network model by using different shielding ratio test sets.
The invention designs a strong anti-interference object pose estimation method based on a deep neural network by utilizing a convolutional neural network and 5 abnormal value detection algorithms. The method takes a picture of an unshielded marker as a training set for training a deep convolutional neural network model, and takes a picture of a part of the unshielded marker as a testing set for testing the anti-interference capability of the deep convolutional neural network model. Compared with the prior art, the method has stronger anti-interference performance, and the estimation accuracy of the object pose under the shielding condition is greatly improved.
Drawings
FIG. 1 is a flow chart of strong anti-interference object pose estimation based on a deep neural network
FIG. 2 Euclidean distance algorithm flow chart
FIG. 3 is a flow chart of a point group density algorithm
FIG. 4 is a flow chart of a joint Euclidean distance algorithm
FIG. 5 flow chart of the joint point group density algorithm
FIG. 6 comparison of effects under different test sets
FIG. 7 is a comparison graph of attitude estimation effects of 5 algorithms under different test sets
FIG. 8 is a comparison graph of position estimation effects of 5 algorithms under different test sets
Detailed Description
The invention adopts a frame of a convolutional neural network to complete the task of mapping and learning of the corresponding relation between feature extraction and vision, and 5 different mathematical algorithms are constructed on the basis of network prediction output to process predicted values, so that the prediction capability in the presence of shielding interference is improved. The scheme can greatly simplify the complexity of object pose estimation, omits image processing processes such as feature extraction, feature matching and the like, and realizes end-to-end estimation. Compared with the prior art, the method has the advantages that the output algorithm is used for processing the predicted value output by the network, the pose estimation precision and the shielding interference resistance are further improved, and the object pose estimation is more convenient, rapid, accurate and efficient
In order to make the technical scheme of the invention clearer, the invention is further explained below by combining the attached drawings. The invention is realized by the following steps:
firstly, a training set picture database and a testing set picture database with labels are automatically constructed by using 3D modeling software.
(1) And constructing a cylindrical regular object with the radius of 100mm and the height of 200mm as a target to be estimated, and taking the checkerboard grid icon as an identifier.
(2) And (3) placing the object to be estimated at a position 0.5m in front of the target camera, wherein the marker is positioned in the middle of the camera view, and the center of the cylindrical object, the marker and the camera lens are positioned on the same horizontal central line and are used as reference positions.
(3) And moving and rotating the object to be estimated according to the script, changing the spatial position and posture, capturing the marker photo under the corresponding posture, and taking the corresponding six-dimensional coordinate as the label of the training sample.
(4) 50000 photos are obtained in batches to be used as training set samples, and required data format processing is carried out on the labels of the photos, so that the requirement of network input is met.
(5) Occlusion test sets were constructed in the same manner except that the markers were occluded at a rate of 0%, 4%, 9%, 12%, 16%, 20%, 25%.
And secondly, constructing a deep neural network. The neural network comprises four sub-branch networks, wherein each sub-branch is an independent convolutional neural network structure: consists of 6 convolution layers, 4 maximum pooling layers, 1 flattening layer and 3 dense connecting layers. Wherein each two layers or one layer is followed by one maximum pooling layer, the number of convolution kernels is 32,32,64, 128,256 in order, and the parameters of the dense connection layer are 2048, 6 in order.
And thirdly, constructing a network prediction output processing algorithm. The four sub-branch networks output 6-dimensional estimated values at the last dense connection layer and represent the pose information of the object to be estimated. Due to the existence of shielding, the result of a certain branch has an error, abnormal values exist in 4 outputs, 5 algorithms are constructed to optimize the results, and the anti-interference capability is improved.
N-4, i.e., four branches; n is 1,2,3,4, i.e. the specific branch; 1, 2., 6, i.e., 6 degrees of freedom, and p (i) represents a predicted value of the i-th degree of freedom. The predicted value of each dimension is analyzed independently by a weighted average method, an Euclidean distance method and a point group density method, and the predicted value of 6 dimensions is analyzed jointly by a combined Euclidean distance method and a combined point group density method.
Weighted average method: the 4 branches respectively output 6-dimensional predicted values of the target to be estimated, the network output dimension at the moment is 4 x 6, the 6 dimensional predicted values are respectively and correspondingly added to calculate weighted average, and the 6-dimensional predicted values are taken as final output. When the region corresponding to a certain branch is shielded, the error of the predicted value of the branch is large, and the error can be weakened and reduced by accurately predicting the branch through a weighted average method.
Equation (1) represents the average of the ith degree of freedom, and equation (2) represents the weighted average of the ith degree of freedom. Where s (i) is a weighting factor.
Figure BDA0002860714920000041
Figure BDA0002860714920000042
Euclidean distance method: considering the estimates of the 6 dimensions separately, for each dimension: calculating the average Euclidean distance of each branch and the other 3 branches in the dimension; setting an average Euclidean distance threshold; calculating the difference value between each branch and the average Euclidean distance, and when the difference value is greater than the average Euclidean distance threshold value, judging that the predicted value of the branch in the current dimension is abnormal, and deleting the predicted value; and averaging the predicted values of the residual branches to be used as the predicted output of the current dimension. And respectively carrying out the operations in 6 dimensions, and finally outputting a predicted value of 6 dimensions. The algorithm flow is shown in fig. 2. Equation (3) calculates the distance of the nth branch from the other N-1 branches in dimension i, and equation (4) calculates the average distance of the 4 branch dimensions i. The Euclidean distance threshold is set to 0.2, and when the distance dis (n, i) and the average distance dis (i) of a certain branch exceed the threshold, the estimated value is judged to be an abnormal value, and the predicted value is excluded.
Figure BDA0002860714920000043
Figure BDA0002860714920000051
Dot group density method: considering the estimates of the 6 dimensions separately, for each dimension: calculating the point group density of each branch and other 3 branches on the dimension, wherein the point group density is related to the distance, the larger the distance is, the smaller the point group density is, and the point group density is represented by the reciprocal of the Euclidean distance; setting average point group density, calculating the difference value between the point group density and the average point group density of each branch and other 3 branches in the dimension, and when the difference value on a certain branch is smaller than a preset point group density threshold value, judging that the predicted value of the branch in the current dimension is abnormal, and deleting the branch; and averaging the predicted values of the residual branches to be used as the predicted output of the current dimension, performing the operation in 6 dimensions respectively, and finally outputting the predicted values of 6 dimensions.
The algorithm flow is shown in fig. 3. From a geometric analysis, the outlier point cluster density is small. Outliers are excluded by comparing the dot cluster densities for each dimension. Equation (5) calculates the point cloud density of the nth branch, and equation (6) finds the branch where the minimum point cloud density is located.
Figure BDA0002860714920000052
Figure BDA0002860714920000053
Since there is a correlation between 6 degrees of freedom of an object, the 6-dimensional mutual linkage influence is considered in combination, and the distance between the jth branch and the kth branch can be expressed by B (j, k) shown in formula (7) as a whole, with a dimension of 6.
Figure BDA0002860714920000054
Joint euclidean distance method: the 4 branches respectively output predicted values of 6 dimensions, and the predicted values of the 6 dimensions are correlated with each other in the presence or absence of abnormality, so that the mutual linkage influence among the 6 dimensions is considered in a combined manner, the Euclidean distance is used as a judgment standard, and in the first step, for each dimension: the average euclidean distance in this dimension is calculated for each branch and the other 3 branches. Second, for each branch: the average euclidean distance of the 6 dimensions is weighted averaged to obtain a confidence for the branch. And thirdly, sequencing the confidences of the 4 branches, finding out the branch with the lowest confidence, namely the branch with the largest weighted average Euclidean distance in the second step, and excluding the branch. And fourthly, carrying out weighted average on the rest 3 branches by using a weighted average method of the algorithm (1) and outputting predicted values with 6 dimensions. The algorithm flow chart is shown in fig. 4. And (4) determining an abnormal value by using a 6-dimensional combined Euclidean distance as a criterion according to an Euclidean distance algorithm. Equation (8) represents the i-dimensional euclidean distance of the nth branch. Equation (9) weights the different dimensions to obtain the joint euclidean distance for the nth branch. The abnormal value having the largest euclidean distance is calculated in formula (10).
Figure BDA0002860714920000055
Figure BDA0002860714920000056
Figure BDA0002860714920000057
Joint point group density method: the 4 branches respectively output predicted values of 6 dimensions, and the predicted values of the 6 dimensions are correlated with each other in the presence or absence of abnormality, so that the mutual linkage influence among the 6 dimensions is jointly considered, and the point group density is used as a judgment standard. First, for each dimension: the point cluster density in this dimension is calculated for each branch and the other 3 branches. Second, for each branch: and carrying out weighted average on the density values of the point groups of 6 dimensions to obtain a confidence coefficient of the branch. And thirdly, sequencing the confidences of the 4 branches, finding out the branch with the lowest confidence, namely the point group density minimum branch weighted and averaged in the second step, and excluding the branch. And fourthly, carrying out weighted average on the rest 3 branches by using a weighted average method of the algorithm (1) and outputting predicted values with 6 dimensions. The algorithm flow chart is shown in fig. 5. And (4) considering the mutual linkage influence among the 6 dimensions jointly, and judging an abnormal value by taking the joint point cluster density as a criterion. Equation (11) calculates the point cluster density of the ith dimension of the nth branch. Equation (12) calculates the 6-dimensional weighted joint point cluster density for the nth branch. Equation (13) calculates the minimum joint point cluster density outlier
Figure BDA0002860714920000061
Figure BDA0002860714920000062
Figure BDA0002860714920000063
And thirdly, training a deep neural network model. And completing the training of the deep neural network by using the samples in the training set. The specific training parameters are as follows:
(1) each epoch randomly selects 3000 pictures from the training set as the training samples of the current round;
(2) mini _ batch ═ 2: 2 pictures are input into the network at a time, and the loss of the two pictures is reduced by the aim of back propagation
(3) nb _ epoch ═ 6: each epoch is repeated 6 times before proceeding to the next epoch.
(4) One epoch will be saved 600, one h5 file, the currently trained network model. The next round will be trained on the current basis, i.e. again extracting 3000 training pictures.
(5) Random gradient descent counter-propagating reduces loss.
The fourth step: the deep neural network model was tested using different occlusion ratio test sets, and the network prediction output values were processed using 5 algorithms. FIG. 6 compares the pose estimation effects of a single network SBN without a network predicted value processing algorithm and a 4-branch network MBN-4 using a combined Euclidean distance algorithm under different occlusion proportions, and it can be seen that when the occlusion proportion of the MBN-4 increases, higher estimation accuracy can still be ensured. Fig. 7 compares the average effect of 5 algorithms on object pose across all test sets. Figure 8 compares the average effect of 5 algorithms on object position across all test sets.

Claims (1)

1. An anti-occlusion object pose estimation method based on a deep neural network comprises the following steps:
firstly, a training set picture database and a testing set picture database with labels are automatically constructed by using 3D modeling software.
(1) Constructing a cylindrical regular object as a target to be estimated, and using the checkerboard icon as an identifier;
(2) placing the target to be estimated in front of a target camera, wherein the marker is positioned in the middle of the camera view, and the center of the target to be estimated, the marker and the center of a camera lens are positioned on the same horizontal central line and are used as reference positions;
(3) moving and rotating an object to be estimated according to the script, changing the spatial position and posture, capturing marker photos under corresponding postures, and taking corresponding six-dimensional coordinates as labels of the training samples;
(4) obtaining a plurality of photos in batch as training set samples, and carrying out required data format processing on the tags of the photos to meet the requirement of network input;
(5) constructing a shielding test set in the same manner except that the markers are shielded at a rate of 0%, 4%, 9%, 12%, 16%, 20%, 25%;
secondly, constructing a deep neural network: the neural network comprises four sub-branch networks, wherein each sub-branch is an independent convolutional neural network structure: consists of 6 coiling layers, 4 maximum pooling layers, 1 flattening layer and 3 dense connecting layers; wherein each two layers or one layer is followed by one maximum pooling layer, the number of convolution kernels is 32,32,64, 128,256 in order, and the parameters of the dense connection layer are 2048, 6 in order.
Thirdly, constructing a network prediction output processing algorithm: the four sub-branch networks output a predicted value of 6 dimensions at the last layer of dense connection layer, which represents the pose information of the object to be estimated; due to the existence of shielding, the result of a certain branch has an error, abnormal values exist in the output of 4 branches, 5 algorithms are constructed to optimize the abnormal values, and the shielding interference resistance is improved;
(1) weighted average method: the 4 branches respectively output 6-dimensional predicted values of the target to be estimated, the 6 dimensional predicted values are respectively and correspondingly added to solve weighted average, and the 6-dimensional predicted values are taken as final output;
(2) euclidean distance method: considering the estimates of the 6 dimensions separately, for each dimension: calculating the average Euclidean distance of each branch and the other 3 branches in the dimension; setting an average Euclidean distance threshold; calculating the difference value between each branch and the average Euclidean distance, and when the difference value is larger than the average Euclidean distance threshold value, judging that the predicted value of the branch in the current dimension is abnormal, and deleting the branch; averaging the predicted values of the residual branches to be used as the predicted output of the current dimension; performing the operations in 6 dimensions respectively, and finally outputting predicted values of 6 dimensions;
(3) dot group density method: considering the estimates of the 6 dimensions separately, for each dimension: calculating the point group density of each branch and other 3 branches on the dimension, wherein the point group density is related to the distance, the larger the distance is, the smaller the point group density is, and the point group density is represented by the reciprocal of the Euclidean distance; setting a point group density threshold, calculating the difference value between the point group density of each branch and the other 3 branches in the dimension and the average point group density, and when the difference value on a certain branch is smaller than the preset point group density threshold, judging that the predicted value of the branch in the current dimension is abnormal and deleting the branch; averaging the predicted values of the residual branches to serve as the predicted output of the current dimension, performing the operation in 6 dimensions respectively, and finally outputting the predicted values of 6 dimensions;
(4) joint euclidean distance method: the 4 branches respectively output predicted values of 6 dimensions, and the predicted values of the 6 dimensions are correlated with each other in the presence or absence of abnormality, so that the mutual linkage influence among the 6 dimensions is considered in a combined manner, the Euclidean distance is used as a judgment standard, and in the first step, for each dimension: calculating the average Euclidean distance of each branch and the other 3 branches in the dimension; second, for each branch: carrying out weighted average on the average Euclidean distances of 6 dimensions to obtain a confidence coefficient of the branch; thirdly, sequencing the confidences of the 4 branches, finding out the branch with the lowest confidence, namely the branch with the largest weighted average Euclidean distance in the second step, and excluding the branch; fourthly, carrying out weighted average on the remaining 3 branches by using a weighted average method of the algorithm (1) and outputting predicted values of 6 dimensions;
(5) joint point group density method: the 4 branches respectively output predicted values of 6 dimensions, and the predicted values of the 6 dimensions are correlated with each other in the presence or absence of abnormality, so that the mutual linkage influence among the 6 dimensions is jointly considered, and the point group density is used as a judgment standard; first, for each dimension: calculating the point group density of each branch and other 3 branches in the dimension; second, for each branch: carrying out weighted average on the density values of the point groups with 6 dimensions to obtain a confidence coefficient of the branch; thirdly, sequencing the confidence degrees of the 4 branches, finding out the branch with the lowest confidence degree, namely the point group density minimum branch weighted and averaged in the second step, and excluding the branch; fourthly, carrying out weighted average on the remaining 3 branches by using a weighted average method of the algorithm (1) and outputting predicted values of 6 dimensions;
thirdly, training a deep neural network model, namely completing the training of the deep neural network by using samples in a training set;
the fourth step: and testing the deep neural network model by using different shielding ratio test sets.
CN202011562092.4A 2020-12-25 2020-12-25 Anti-occlusion object pose estimation method based on deep neural network Pending CN112634367A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011562092.4A CN112634367A (en) 2020-12-25 2020-12-25 Anti-occlusion object pose estimation method based on deep neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011562092.4A CN112634367A (en) 2020-12-25 2020-12-25 Anti-occlusion object pose estimation method based on deep neural network

Publications (1)

Publication Number Publication Date
CN112634367A true CN112634367A (en) 2021-04-09

Family

ID=75324868

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011562092.4A Pending CN112634367A (en) 2020-12-25 2020-12-25 Anti-occlusion object pose estimation method based on deep neural network

Country Status (1)

Country Link
CN (1) CN112634367A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114627359A (en) * 2020-12-08 2022-06-14 山东新松工业软件研究院股份有限公司 Out-of-order stacked workpiece grabbing priority evaluation method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108491880A (en) * 2018-03-23 2018-09-04 西安电子科技大学 Object classification based on neural network and position and orientation estimation method
US20190147234A1 (en) * 2017-11-15 2019-05-16 Qualcomm Technologies, Inc. Learning disentangled invariant representations for one shot instance recognition
CN109816725A (en) * 2019-01-17 2019-05-28 哈工大机器人(合肥)国际创新研究院 A kind of monocular camera object pose estimation method and device based on deep learning
WO2019144575A1 (en) * 2018-01-24 2019-08-01 中山大学 Fast pedestrian detection method and device
CN110322510A (en) * 2019-06-27 2019-10-11 电子科技大学 A kind of 6D position and orientation estimation method using profile information
CN111339903A (en) * 2020-02-21 2020-06-26 河北工业大学 Multi-person human body posture estimation method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190147234A1 (en) * 2017-11-15 2019-05-16 Qualcomm Technologies, Inc. Learning disentangled invariant representations for one shot instance recognition
WO2019144575A1 (en) * 2018-01-24 2019-08-01 中山大学 Fast pedestrian detection method and device
CN108491880A (en) * 2018-03-23 2018-09-04 西安电子科技大学 Object classification based on neural network and position and orientation estimation method
CN109816725A (en) * 2019-01-17 2019-05-28 哈工大机器人(合肥)国际创新研究院 A kind of monocular camera object pose estimation method and device based on deep learning
CN110322510A (en) * 2019-06-27 2019-10-11 电子科技大学 A kind of 6D position and orientation estimation method using profile information
CN111339903A (en) * 2020-02-21 2020-06-26 河北工业大学 Multi-person human body posture estimation method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JIACHEN YANG .ETC: ""Robust Six Degrees of Freedom Estimation for IIoT Based on Multibranch Network"", 《IEEE》, 23 March 2020 (2020-03-23) *
雷宇田;杨嘉琛;满家宝;奚萌;: "自适应航天器态势分析***", 宇航总体技术, no. 01, 15 January 2020 (2020-01-15) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114627359A (en) * 2020-12-08 2022-06-14 山东新松工业软件研究院股份有限公司 Out-of-order stacked workpiece grabbing priority evaluation method

Similar Documents

Publication Publication Date Title
CN108171748B (en) Visual identification and positioning method for intelligent robot grabbing application
CN108280856B (en) Unknown object grabbing pose estimation method based on mixed information input network model
CN108010078B (en) Object grabbing detection method based on three-level convolutional neural network
CN109685141B (en) Robot article sorting visual detection method based on deep neural network
CN106886216B (en) Robot automatic tracking method and system based on RGBD face detection
CN109559320A (en) Realize that vision SLAM semanteme builds the method and system of figure function based on empty convolution deep neural network
CN109118473B (en) Angular point detection method based on neural network, storage medium and image processing system
CN110969660B (en) Robot feeding system based on three-dimensional vision and point cloud deep learning
CN111553949B (en) Positioning and grabbing method for irregular workpiece based on single-frame RGB-D image deep learning
CN110509273B (en) Robot manipulator detection and grabbing method based on visual deep learning features
CN113284179B (en) Robot multi-object sorting method based on deep learning
CN113888461A (en) Method, system and equipment for detecting defects of hardware parts based on deep learning
CN110006444B (en) Anti-interference visual odometer construction method based on optimized Gaussian mixture model
CN111598172B (en) Dynamic target grabbing gesture rapid detection method based on heterogeneous depth network fusion
CN111259837B (en) Pedestrian re-identification method and system based on part attention
CN111414875B (en) Three-dimensional point cloud head posture estimation system based on depth regression forest
KR101460313B1 (en) Apparatus and method for robot localization using visual feature and geometric constraints
CN113657551B (en) Robot grabbing gesture task planning method for sorting and stacking multiple targets
CN113752255A (en) Mechanical arm six-degree-of-freedom real-time grabbing method based on deep reinforcement learning
CN114387513A (en) Robot grabbing method and device, electronic equipment and storage medium
CN110992378A (en) Dynamic update visual tracking aerial photography method and system based on rotor flying robot
Wei et al. Novel green-fruit detection algorithm based on D2D framework
CN112669452B (en) Object positioning method based on convolutional neural network multi-branch structure
CN112634367A (en) Anti-occlusion object pose estimation method based on deep neural network
CN114998573B (en) Grabbing pose detection method based on RGB-D feature depth fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination