CN114029943A

CN114029943A - Target grabbing and positioning method and system based on image data processing

Info

Publication number: CN114029943A
Application number: CN202111168660.7A
Authority: CN
Inventors: 毕登科; 朱亮
Original assignee: Nanjing Fulian Micro Network Technology Co ltd
Current assignee: Nanjing Fulian Micro Network Technology Co ltd
Priority date: 2021-09-30
Filing date: 2021-09-30
Publication date: 2022-02-11

Abstract

The invention provides a target grabbing and positioning method and system based on image data processing, wherein the method specifically comprises the following steps: step 1, collecting image data of a target object on a workbench through a camera; step 2, preprocessing image data of the target object image data; step 3, constructing an identification model for identifying the target object, and receiving the preprocessed image data of the target object; step 4, the recognition model analyzes according to the received image data and outputs an analysis result; step 5, generating a grabbing instruction according to an analysis result output by the recognition model; and 6, triggering a mechanical structure by the grabbing robot according to the generated grabbing instruction to operate, grabbing the target object and carrying the target object to an object carrying and transporting table for subsequent cargo loading. According to the method, the preprocessing mode of the acquired image data is optimized, so that the operation efficiency of image feature extraction is effectively improved, the possibility of misjudgment is reduced, and the identification efficiency of the object is further improved.

Description

Target grabbing and positioning method and system based on image data processing

Technical Field

The invention relates to a target grabbing and positioning method and system based on image data processing, in particular to the technical field of visual image data processing.

Background

With the vigorous advance of science and technology, an intelligent industrial chain gradually replaces an artificial operation industrial chain, and an intelligent industrial robot gradually permeates into various industries of mass life along with the development of artificial intelligence technology. In the process of grabbing industrial operation, because the traditional grabbing robot can only repeat simple structured actions, the grabbing efficiency is not high, or the problem that the surface of an object to be grabbed is damaged due to grabbing errors often occurs.

In the prior art, a mode of combining visual image processing is often adopted to further analyze a target object, so that the positioning accuracy of the target object is improved, but the traditional image data processing algorithm often causes the problems of insufficient recognition accuracy, low robustness and the like due to the reasons that the edge extraction is not accurate, the image data processing accuracy is not enough and the like.

Disclosure of Invention

The purpose of the invention is as follows: a target grabbing and positioning method and system based on image data processing are provided to solve the problems in the prior art.

The technical scheme is as follows: in a first aspect, a target grabbing and positioning method based on image data processing is provided, which is characterized by specifically including the following steps:

step 1, collecting image data of a target object on a workbench through a camera;

step 2, preprocessing image data of the target object image data;

step 3, constructing an identification model for identifying the target object, and receiving the preprocessed image data of the target object;

step 4, the recognition model analyzes according to the received image data and outputs an analysis result;

step 5, generating a grabbing instruction according to an analysis result output by the recognition model;

and 6, triggering a mechanical structure by the grabbing robot according to the generated grabbing instruction to operate, grabbing the target object and carrying the target object to an object carrying and transporting table for subsequent cargo loading.

In some realizations of the first aspect, in the step 2, an edge detection method is adopted to achieve the purpose of extracting useful results from different visual objects, so as to perform preprocessing on the image data; further, the pretreatment process comprises the following steps:

2.1, constructing a Gaussian filter, and performing convolution on the image by using the Gaussian filter to smooth the image and eliminate image noise;

step 2.2, calculating first-order derivative values of all pixel points in the image data in the horizontal direction and the vertical direction by adopting an edge detection operator, so as to obtain the gradient and the direction of the pixel points to drink;

step 2.3, eliminating stray response by a preset non-maximum value inhibition method;

2.4, using double threshold detection to define real and potential edges;

and 2.5, realizing the detection of the final edge by inhibiting an isolated weak edge mode.

In some realizations of the first aspect, the gaussian filter constructed in step 2.1 is further a gaussian filter kernel with a size of (2k +1) × (2k +1), and the corresponding generation expression is:

wherein k represents the dimension of the kernel matrix; σ represents the standard deviation;

the expression of calculating the first derivative values of all pixel points in the image data in the horizontal direction and the vertical direction by adopting an edge detection operator is as follows:

in the formula, G represents the gradient of a pixel point; theta represents the direction of the pixel point; g_xRepresenting the first derivative of all pixel points in the horizontal direction; g_yThe first derivative of all the pixels in the vertical direction is represented.

When the edge detection operator is adopted to calculate the first derivative values of all the pixel points in the image data in the horizontal direction and the vertical direction in the step 2.2, the operator involved in the calculation process is optimized by combining the characteristic of capturing the target image data.

Further, in the double-threshold processing process in step 2.4, the phenomenon that edge fracture and discontinuity are caused by overhigh threshold parameters is used, a method of maximum inter-class variance is used for replacing a setting method of manually preset parameters, and the definition of the generated edge image is improved by a self-adaptive obtaining mode meeting the double thresholds.

The maximum between-class variance is further: adaptively determining a threshold according to gray level histogram information of an image, dividing pixels of acquired target object image data into two categories of C1 smaller than the threshold omega and C2 larger than the threshold omega through the existing threshold omega, presetting L gray levels in the image data, and when the gray level of the image data is k, respectively presetting the mean values of the two categories of pixels as M₁、M₂Global image data pixel mean value MG, probability ρ of pixel being classified into C1 category₁Probability ρ of a pixel being classified into the class C2₂The expressions satisfied by each other are:

ρ₁*M₁+ρ₂*M₂＝MG

ρ₁+ρ₂＝1

where the probability ρ of a pixel being classified into the C1 class₁The expression is further:

probability ρ of a pixel being classified into C2 class₂The expression is further:

in the formula, p_i＝n_ithe/N represents the probability distribution of the image after the ith gray level normalization; wherein n is_iThe number of pixels with the gray value of i is represented,

representing the total number of pixel points;

the between-class variance expression is:

in the formula, σ²A value representing the variance; rho₁Representing the probability, ρ, that a pixel is classified into the C1 class₂Represents the probability of the pixel being classified into the C2 category; MG represents the global image data pixel mean; m represents the accumulated mean of the gray level k. Further, 0 to 255 gray levels are searched in a traversing mode, and the gray level k meeting the condition that the inter-class variance is maximized is obtained, so that the optimal high and low threshold values are obtained through a self-adaptive method.

In some realizable manners of the first aspect, for the problem of light interference in the actual working condition, step 2 adjusts the brightness of the acquired image by an adaptive method for ambient light during image data preprocessing;

further, a pixel threshold range is preset, and when the acquired image data is lower than or higher than the preset range, the image data is automatically adjusted, so that the recognition degree of the target object in the acquired image is enhanced;

the brightness calculation expression for obtaining the pixel point is as follows:

wherein F (x, y) represents an average pixel value; r (x, y) represents a pixel value of the R channel; g (x, y) represents a pixel value of the G channel; b (x, y) represents a pixel value of the B channel.

In some implementation manners of the first aspect, in order to overcome a misjudgment phenomenon that occurs easily when a circular object is identified in an actual working condition, the number of scattered point samples selected on a circular edge is increased and the number of the scattered point samples is set as a threshold value in a parameter space according to a geometric characteristic that the distance from a unique circle center of the circular object to any point on the circular edge is equal.

In a second aspect, an object capture positioning system based on image data processing is provided, where the system specifically includes:

a first module for acquiring image data of a target object;

a second module for preprocessing image data;

a third module for constructing a recognition model of the recognition target object;

a fourth module for separating processing and outputting analysis results;

a fifth module for generating a grab instruction;

a sixth module for executing the grab instruction.

In some realizations of the second aspect, the first module obtains image data of a target object on the workbench through the information acquisition device, and transmits the image data to the second module for preprocessing the image data; then, the third module constructs a recognition model for recognizing the target object and receives the image data preprocessed in the second module; the fourth module analyzes the data received by the third module according to the data and outputs a corresponding analysis result to the fifth module; the fifth module generates a grabbing instruction for triggering grabbing operation according to the received analysis result; and the sixth module triggers the mechanical structure to perform grabbing operation according to the generated grabbing instruction, grabs the target object and carries the target object to the object carrying and transporting table for subsequent cargo loading.

In some realizations of the second aspect, the process of preprocessing the image data by the second module further includes: useful structures are extracted from different visual objects through an edge detection method, and the data volume to be processed is reduced; meanwhile, aiming at the acquired image data, the brightness of the image data is adaptively adjusted in a mode of setting a threshold value.

Furthermore, when the target object is a circular target, aiming at the condition of easy misjudgment, the number of scattered point samples selected on the circular edge is increased and the number of the scattered point samples is set as a threshold value in a parameter space according to the geometric characteristic that the distance from the unique circle center of the circular object to any point of the circular edge is equal, so that the misjudgment of the non-circular target is overcome.

Has the advantages that: the invention provides a target grabbing and positioning method and system based on image data processing, which effectively improve the operation efficiency of image feature extraction by optimizing the preprocessing mode of the acquired image data, and simultaneously adjust the light brightness of the image data to reduce the possibility of misjudgment, thereby further improving the identification efficiency of objects, increasing the effectiveness of grabbing operation and improving the overall operation speed and robustness.

Drawings

FIG. 1 is a flow chart of data processing according to the present invention.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a more thorough understanding of the present invention. It will be apparent, however, to one skilled in the art, that the present invention may be practiced without one or more of these specific details. In other instances, well-known features have not been described in order to avoid obscuring the invention.

In view of the problems that in the prior art, a conventional grabbing robot can only repeat simple simplified motions, so that grabbing efficiency is often low, or grabbing errors occur or even the surface of an object to be grabbed is damaged, the present embodiment provides an object grabbing positioning method based on image data processing, as shown in fig. 1, the method specifically includes the following steps:

step 2, preprocessing image data of the target object image data;

In a further embodiment, in order to facilitate effective analysis of the received image data by the recognition model, when the image data is preprocessed in step 2, an edge detection method is further adopted to achieve the purpose of extracting useful results from different visual objects, so that the data volume to be processed is reduced. Specifically, firstly, a Gaussian filter is constructed, and the Gaussian filter is utilized to carry out convolution with an image, so that the image is smoothed, and image noise is eliminated; secondly, calculating first-order derivative values of all pixel points in the image data in the horizontal direction and the vertical direction by adopting an edge detection operator, thereby obtaining the gradient and the direction of the pixel points to the drink; thirdly, eliminating the spurious response by a preset non-maximum value inhibition method; from time to time, true and potential edges are delineated using dual threshold detection; and finally, realizing the detection of the final edge by a mode of restraining the isolated weak edge.

Wherein, the constructed gaussian filter is further a gaussian filter kernel with a size of (2k +1) × (2k +1), and the corresponding generation expression is:

wherein k represents the dimension of the kernel matrix; σ denotes the standard deviation.

In a further embodiment, when the edge detection method is used for extracting useful results from different visual objects, the edge fracture and discontinuous appearance can be caused by overhigh threshold parameter in the process of delineating real and potential edges by using dual-threshold detection; in addition, too low a threshold parameter setting also results in more false edges. Therefore, the invention aims at the edge detection algorithm principle and defects in the prior art and combines the characteristic of image capture to improve the operator involved in the calculation process.

Specifically, on the aspect of setting high and low thresholds, the manual setting in the prior art is replaced by the method of maximum inter-class variance, and the definition of the generated edge image is effectively improved by meeting the self-adaptive acquisition mode of double thresholds. Wherein the maximum between-class variance is further a method for adaptively determining a threshold according to gray histogram information of an image, pixels of the acquired image data are divided into two classes of C1 smaller than the threshold and C2 larger than the threshold through an existing threshold omega,the preset image data has L gray levels, and when the gray level of the image data is k, the average values of the two types of pixels are respectively set as M₁、M₂Global image data pixel mean value MG, probability ρ of pixel being classified into C1 category₁Probability ρ of a pixel being classified into the class C2₂And satisfy the following expression therebetween:

ρ₁*M₁+ρ₂*M₂＝MG

ρ₁+ρ₂＝1

where the probability ρ of a pixel being classified into the C1 class₁The expression is as follows:

representing the total number of pixels.

C1 pixel mean M less than threshold₁The expression is as follows:

c2 pixel mean M greater than threshold₂The expression is as follows:

the cumulative mean M expression for gray level k is:

the expression for the image mean value MG for the gray level k is:

in summary, the inter-class variance expression is:

through traversal search of 0-255 gray levels, the gray level k meeting the condition of maximization of the inter-class variance is obtained, so that the optimal high-low threshold is obtained in a self-adaptive mode, and the definition of the edge image of the target object is improved.

In a further embodiment, because there is the problem because of ambient light among the operating condition to lead to the picture data that gathers not to satisfy actual demand, and then make the recognition result inaccurate, to the problem of light interference among the operating condition, this embodiment provides a method to ambient light self-adaptation and carries out the adjustment of luminance to the image of gathering. Specifically, a pixel threshold range is preset, and when the acquired image data is lower than or higher than the preset range, the image data is automatically adjusted, so that the recognition degree of the target object in the acquired image is enhanced. The brightness calculation expression of the pixel point is as follows:

wherein F (x, y) represents an average pixel value; r (x, y) represents a pixel value of the R channel; g (x, y) represents a pixel value of the G channel; b (x, y) represents a pixel value of the B channel. In order to overcome the erroneous judgment phenomenon easily occurring when the circular object is identified in the actual working condition, the number of scattered point samples selected on the circular edge is increased and the number of the scattered point samples is set as the threshold value in the parameter space according to the geometric characteristic that the distance from the center of the circle to any point of the circular edge of the circular object is equal, so that the erroneous judgment on the non-circular object is solved.

In a further embodiment, the identification model for identifying the object is constructed and subjected to data analysis after receiving the preprocessed image data of the object. Wherein, the identification model analysis process comprises depth separable convolution, channel attention and channel shuffling, and the process of depth separable convolution realization comprises depth convolution and point-by-point convolution for separating a channel domain and a space domain for processing.

In particular, deep convolution is a two-dimensional planar mode of operation for applying a single convolution kernel to each channel; the point-by-point convolution is used to adjust the number of channels by convolution. In the preferred embodiment, by comparing the calculated scalar product of the depth separable convolution and the normal convolution, it can be seen that when the convolution kernel size is preset to 3 x 3, the depth separable convolution can reduce the parameter number to around 1/9 of the normal convolution, increasing the operating speed of the network.

The channel shuffling is analyzed from the aspect of the feature maps, and in the preferred embodiment, the problem of information non-circulation between different groups is solved by combining the feature maps from different branches together by using a concat function. Compared with the simple splicing in the prior art, the channel shuffling method can complete information mixing between channels on the premise of not increasing calculated amount and parameter amount, and enhances the classification effect.

In the prior art, in an image data analysis model, an attention mechanism has become a widely used method, and soft attention is a common way of a neural network, and mainly includes: and the channel attention and the space attention are used for generating corresponding weights according to the importance degree of the image characteristics so as to assist reasonable distribution of computing power. When the positive and negative activation values are balanced, the problem that the information of the intermediate feature map is extracted by adopting global average pooling and global random pooling to reduce the loss of the information is further solved by the embodiment, aiming at the problem that the information of the intermediate feature map is blurred by only adopting the global average pooling and the global random pooling. The global average pooling can better reserve background information, the global random pooling selects characteristics according to probability values, and has stronger generalization, and the information loss can be reduced by combining the global average pooling and the global random pooling.

The identification model constructed in the embodiment adopts a characteristic splicing mode, realizes the lightweight of the network by combining a channel attention mechanism and a depth separable convolution through a multi-branch parallel mode, overcomes the problem of high requirement on operating memory in the prior art, reduces the parameter quantity of the model, and reduces the complexity.

In an embodiment, a system for implementing the method is provided for a proposed target capture positioning method based on image data processing, and specifically, the system is a target capture positioning system based on image data processing, and specifically includes:

a first module for acquiring image data of a target object;

a second module for preprocessing image data;

a fourth module for separating processing and outputting analysis results;

a fifth module for generating a grab instruction;

a sixth module for executing the grab instruction.

In a further embodiment, the first module acquires image data of a target object on a workbench through information acquisition equipment, and transmits the data to the second module for preprocessing the image data; then, the third module constructs a recognition model for recognizing the target object and receives the image data preprocessed in the second module; the fourth module analyzes the data received by the third module according to the data and outputs a corresponding analysis result to the fifth module; the fifth module generates a grabbing instruction for triggering grabbing operation according to the received analysis result; and the sixth module triggers the mechanical structure to perform grabbing operation according to the generated grabbing instruction, grabs the target object and carries the target object to the object carrying and transporting table for subsequent cargo loading.

In a further embodiment, the process of preprocessing the image data by the second module further comprises: useful structures are extracted from different visual objects through an edge detection method, and the data volume to be processed is reduced; meanwhile, aiming at the acquired image data, the brightness of the image data is adaptively adjusted in a mode of setting a threshold value.

As noted above, while the present invention has been shown and described with reference to certain preferred embodiments, it is not to be construed as limited thereto. Various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A target grabbing and positioning method based on image data processing is characterized by comprising the following steps:

step 2, preprocessing image data of the target object image data;

2. The method for grabbing and positioning an object based on image data processing according to claim 1, wherein in step 2, an edge detection method is used to achieve the purpose of extracting useful results from different visual objects, so as to preprocess the image data;

further, the pretreatment process comprises the following steps:

2.4, using double threshold detection to define real and potential edges;

3. The method according to claim 2, wherein the gaussian filter constructed in step 2.1 is further a gaussian filter kernel with a size of (2k +1) × (2k +1), and the corresponding generation expression is:

4. The method according to claim 2, wherein when calculating the first derivative values of all pixel points in the image data in the horizontal direction and the vertical direction by using an edge detection operator in step 2.2, the operator involved in the calculation process is optimized by combining the characteristics of the captured target image data;

5. The method according to claim 4, wherein the maximum between-class variance further comprises:

adaptively determining a threshold according to gray level histogram information of an image, dividing pixels of acquired target object image data into two categories of C1 smaller than the threshold omega and C2 larger than the threshold omega through the existing threshold omega, presetting L gray levels in the image data, and when the gray level of the image data is k, respectively presetting the mean values of the two categories of pixels as M₁、M₂Global image data pixel mean value MG, probability ρ of pixel being classified into C1 category₁Probability ρ of a pixel being classified into the class C2₂The expressions satisfied by each other are:

ρ₁*M₁+ρ₂*M₂＝MG

ρ₁+ρ₂＝1

representing the total number of pixel points;

the between-class variance expression is:

in the formula, σ²A value representing the variance; pi₁Representing the probability, π, that a pixel is classified into the C1 class₂Represents the probability of the pixel being classified into the C2 category; MG represents the global image data pixel mean; m represents the accumulated mean of the gray level k;

further, 0 to 255 gray levels are searched in a traversing mode, and the gray level k meeting the condition that the inter-class variance is maximized is obtained, so that the optimal high and low threshold values are obtained through a self-adaptive method.

6. The method for grabbing and positioning an object based on image data processing according to claim 1, wherein for the problem of light interference in practical working conditions, step 2 is to adjust the brightness of the collected image by an adaptive method for ambient light during image data preprocessing;

7. The method for grabbing and positioning an object based on image data processing according to claim 1,

in order to overcome the phenomenon of misjudgment easily occurring when a circular object is identified in the actual working condition, the number of scattered point samples selected on the circular edge is increased and the number of the scattered point samples is set as a threshold value in a parameter space according to the unique geometric characteristic that the distance from the center of a circle to any point of a circular edge is equal.

8. An object grabbing and positioning system based on image data processing, which is used for implementing the method of any one of claims 1 to 7, and is characterized by specifically comprising:

a first module for acquiring image data of a target object;

a second module for preprocessing image data;

a fourth module for separating processing and outputting analysis results;

a fifth module for generating a grab instruction;

a sixth module for executing the grab instruction.

9. The system according to claim 8, wherein the first module acquires image data of an object on the workbench through an information acquisition device, and transmits the image data to the second module for image data preprocessing; then, the third module constructs a recognition model for recognizing the target object and receives the image data preprocessed in the second module; the fourth module analyzes the data received by the third module according to the data and outputs a corresponding analysis result to the fifth module; the fifth module generates a grabbing instruction for triggering grabbing operation according to the received analysis result; and the sixth module triggers the mechanical structure to perform grabbing operation according to the generated grabbing instruction, grabs the target object and carries the target object to the object carrying and transporting table for subsequent cargo loading.

10. The system of claim 8, wherein the second module preprocesses the image data further comprising:

useful structures are extracted from different visual objects through an edge detection method, and the data volume to be processed is reduced; meanwhile, aiming at the acquired image data, the brightness of the image data is adaptively adjusted in a mode of setting a threshold value;