CN112926503B

CN112926503B - Automatic generation method of grabbing data set based on rectangular fitting

Info

Publication number: CN112926503B
Application number: CN202110307109.XA
Authority: CN
Inventors: 李育文; 张智辉
Original assignee: University of Shanghai for Science and Technology
Current assignee: University of Shanghai for Science and Technology
Priority date: 2021-03-23
Filing date: 2021-03-23
Publication date: 2023-07-18
Anticipated expiration: 2041-03-23
Also published as: CN112926503A

Abstract

The invention discloses an automatic generation method of a grabbing data set based on rectangular fitting, which adopts a mechanical arm end effector to clamp the edge of an object, drives the object to rotate around the center of the object, and collects images of all angles of the object through a camera; determining an object area and generating an object mask; detecting outline information of a target object; performing Hough transformation on the image, detecting a straight line in the outline of the object, and merging the short line segments; sequencing the lengths of the straight lines, detecting corresponding parallel lines according to the lengths of the straight lines, and fitting the outline of the object by using a plurality of rectangles; generating a plurality of grabbing rectangles suitable for the double-finger paws by the fitting rectangles according to an equidistant sampling mode; combining the background image and the object image generates a captured dataset image while generating a corresponding label. The invention has simple operation, convenient and quick realization and no need of extra equipment. The problem that the manual labeling work of the data sets in the grabbing task is time-consuming and labor-consuming is solved, and a convenient way is provided for manufacturing training sets required by deep learning models in the grabbing task.

Description

Automatic generation method of grabbing data set based on rectangular fitting

Technical Field

The invention belongs to the field of robots, and particularly relates to an automatic grabbing data set generation method based on rectangular fitting.

Background

With the development of computer vision and artificial intelligence, robotics have received increasing attention, with intelligent robots being more considered as the main stream of development in the future. In recent years, various methods for detecting object grabbing gestures are proposed in journals and conferences in the robot field, and particularly, detection of scene object grabbing gestures under a deep learning framework requires a huge target grabbing data set with labeling information for training and testing of models. The dataset needs to have object image information and a corresponding plurality of gripping poses.

In the robot field, there are currently capture detection data sets Cornell Grasp Dataset, jacquard data set, VMRD, etc., which are all manually labeled by researchers. Each sample in the dataset may contain a plurality of objects, and each object has a plurality of possible grabbing postures, the manual preparation of grabbing the dataset wastes time and energy, is influenced by human subjectivity, the marked grabbing dataset has a certain orientation, and the marked grabbing positions only contain a part of available grabbing positions and cannot comprehensively represent the grabbing positions of the objects.

Disclosure of Invention

The invention aims to solve the problems in the existing grabbing data set manufacturing process and provides an automatic grabbing data set generation method based on rectangular fitting. The operation is simple, the time and the labor are saved, and the production work of grabbing the database can be completed in a short time.

In order to achieve the above purpose, the invention adopts the following technical scheme:

a grabbing data set automatic generation method based on rectangular fitting comprises the following steps:

step one, object image acquisition:

the multi-angle image information of the object is acquired by combining the mechanical arm with the camera;

determining an object area and generating an object mask:

according to the position relation among all the components in the data collection system, primarily determining the approximate position of the object in the image, and removing most of background noise; dividing an object region from the background by using a background subtraction method to obtain an object mask;

detecting outline information of a target object:

detecting the edge of an object by using a canny operator, and representing the outline of the object by a large number of short lines;

fourth, detecting a straight line by Hough transformation:

performing Hough transformation on the object contour curve, detecting straight lines in the object contour, integrating line segments meeting merging conditions, and reducing the number of lines;

fifthly, fitting the outline by utilizing a plurality of rectangles:

sequencing the straight lines according to the length of the straight lines, and sequentially detecting corresponding parallel lines according to the sequencing of the length of the straight lines; the rectangle used to fit the contour of an object needs to satisfy four conditions simultaneously:

(1) the included angle of the two line segments is smaller than 20 degrees;

(2) the distance between parallel lines is greater than threshold th ₁ ＝50；

(3) The number of pixels between parallel lines is not zero;

(4) the shorter line segment has a projection on the longer line segment;

mutually projecting parallel lines meeting the conditions (1) - (4) at the same time to generate a plurality of rectangles for fitting the outline of the object;

step six, generating a grabbing rectangle:

according to the width w of the finger, a plurality of grabbing rectangles suitable for double-finger paws are generated on the fitting rectangle in an equidistant sampling mode;

step seven, synthesizing a grabbing data set image, and manufacturing a grabbing data set:

and D, combining the color image of the object with a preset plurality of backgrounds by using the object mask generated in the step II to generate a plurality of grabbing data set images, and simultaneously manufacturing a label file for all samples in the data set according to the position of the object in the image and the grabbing rectangle generated in the step six.

Preferably, in the first step, the data acquisition is performed by combining a mechanical arm and a camera, and the camera is mounted directly above the background desktop and vertically downward through a frame. To facilitate later object segmentation, the background desktop is a single color, here white. Controlling the claws to clamp the edge of the object and enable the object to be located under the camera, driving the object to rotate around X, Y, Z coordinate axes of the claw coordinate system by plus or minus 30 degrees respectively, and collecting an object image by the camera after the system is stable.

Preferably, in the second step, coordinates (u, v) of a midpoint position (X, Y, Z) of the line connecting the two fingertips in the coordinate system of the mechanical arm in the image are calculated by using a relation between the camera and the mechanical arm through kinematic positive solution, so as to determine the position of the object in the image:

wherein,,representing the relation of the mechanical arm coordinate system relative to the camera coordinate system, f _x ,f _y ,c _x ,c _y Is an internal reference of the camera. Assuming that the object is smaller than a cube with a side length of 20cm, the object can be determined in the image by projecting the cubeThe clipping can contain the smallest square of the area, assuming a side length of e. The region contains only the target object and the background image, then the background is subtracted from the image, and a threshold th is set _v 10, the portion where the difference is less than the threshold is the object region.

Preferably, in the third step, edge detection is performed on the object mask image obtained in the second step, the Canny operator is used to detect the edge of the object, and gradient values of all points in the image are calculated. If the point with the gradient value smaller than 100 is set as a non-boundary point, the point with the gradient value larger than 150 is set as a boundary point. The points in the interval are judged according to whether the adjacent points are boundary points or not, if the adjacent points are adjacent to the boundary points, the points are considered as boundary points, and if the adjacent points are not adjacent to the boundary points, the points are not boundary points.

Preferably, in the fourth step, all points on the contour are mapped into the hough space through hough transformation, points in the hough space correspond to straight lines in the image space, and the number of straight lines corresponding to each intersection point in the hough space is counted to be larger than the threshold th _c Is mapped to a straight line in image space, here th _c =30. And then calculating the included angle between any two line segments and the distance between the two nearest points of the two line segments to judge whether the two line segments can be combined, and if the included angle of the straight line where the two line segments are positioned is smaller than 10 degrees and the distance between the nearest points is smaller than 20 pixels, considering that the two line segments are part of the real line segments. K segment pairs available for merging are generated through pairwise matching. And clustering the k line segment pairs, and gathering all the line segments with the matching relationship into one type. And calculating the minimum circumscribed rectangle formed by all points on each type of straight line, and connecting the midpoints of two short sides in the rectangle to represent the combined straight line.

Preferably, in the fifth step, all the lengths of the combined straight lines are ordered, and longer straight lines are preferentially used to generate the rectangle. Judging whether the longer straight line and the residual straight line can generate a rectangle for fitting an object in sequence, wherein the rectangle for fitting the object needs to meet the following conditions:

1. the two line segments are parallel lines;

2. the distance between parallel lines is greater than threshold th ₁ ＝50；

3. Non-empty between parallel lines;

4. the shorter line segment has a projection on the longer line segment.

And establishing a fitting rectangle according to the conditions. Calculating an included angle by using the remainder of the included angles between the vectorsThe distance between the midpoints of the two line segments is simplified to represent the distance between the two line segments; and respectively connecting two endpoints of one line segment with two endpoints of the other line segment to form four vectors, and if vectors generated by the two endpoints of one line segment at least have one vector and form an acute angle with the line segment, indicating that projections exist between the two line segments. Connecting the corresponding endpoints of the two line segments to form a quadrilateral, and counting the number n of non-zero points in the quadrilateral. If the four conditions cannot be met at the same time, judging the next straight line until the complete straight line is traversed. And calculating the projection point of the shorter line segment on the longer line segment for the line meeting the condition, and if the projection point does not exist, making a perpendicular line from the side end point of the longer line segment to the short line segment, and updating the intersection point to the end point of the short line segment. Calculating the midpoint, rotation angle, width and height of the fitting rectangle, wherein the abscissa of the midpoint is the average value of the abscissas of four endpoints of the line segment, and the rotation angle theta _r The width W is equal to the distance between the midpoints of the line segments, and the height H is the shorter length value in the line segments.

Preferably, in the sixth step, the width w of the finger is measured, and the projection width w' of the finger in the image is calculated according to the relationship between the eyes and the hands. And taking w'/2 as a step length, and sampling from one end of the rectangle to the other end of the rectangle along the direction of the edge H in the fitted rectangle to generate a plurality of grabbing rectangles. Width w of grabbing rectangle _g =w', highThe grabbing rectangle and the fitting rectangle are mutually perpendicular in axis. The center point of the grabbing rectangle falls on the axis of the fitting rectangle, and the grabbing rectangle is ensured not to exceed the fitting rectangle.

PreferablyIn the seventh step, both the monochrome background and the cluttered background are selected as the background. Clipping the object region in the image by using the object mask obtained in the second step, then pasting to the center position on the background image and rotating θ _▽ ，θ _▽ Values from 0-360 ° were taken at 30 ° intervals, with single object symbiosis of 2×12=24 samples. For color images, the object image is directly used for replacing the background, and when the mask value is true, I _B (x,y)＝I _F (x, y), the other position values remain unchanged. For depth images, subtracting the object region value from the background value, and when the mask value is true, I _B (x,y)＝I _B (x,y)-I _F (x, y), the other position values remain unchanged. Based on the position and angle of the object in the background image and the grab rectangle obtained in step six, the grab rectangle (c 'of the object in the composite image is calculated' _x ,c' _y ,w' _g ,h' _g ,θ' _g ) WhereinThe abscissa representing the center point of the grabbing rectangle in the composite image, +.>Representing the ordinate, w 'of the centre point' _g ＝w _g Represents the width of the finger in the grabbing rectangle, h' _g ＝h _g Represents the opening size of the paw in the grabbing rectangle, theta' _g ＝θ _g +θ _▽ Representing the angle between the grabbing rectangle and the horizontal axis of the image.

Compared with the prior art, the invention has the following obvious prominent substantive features and obvious advantages:

1. the invention has simple and time-saving manufacturing operation for the grabbing data set, only needs to manually clamp the object in the middle, does not need excessive manual intervention, and can finish the manufacturing work of grabbing the data set in the shortest time;

2. the method is simple, high in efficiency and easy to realize.

Drawings

Fig. 1 is a flow chart of the present invention.

FIG. 2 is a schematic diagram of a data acquisition system of the present invention.

FIG. 3 is a schematic representation of a fitted rectangle calculation of the present invention.

Fig. 4 is a schematic diagram of a grab frame sampling of the present invention.

FIG. 5 is a schematic view of the object segmentation result of the present invention.

Fig. 6 is a schematic diagram of the contour detection of the present invention.

Fig. 7 is a schematic representation of the rectangular fitting results of the present invention.

FIG. 8 is a schematic diagram of a sample of a captured data set in accordance with the present invention.

Detailed Description

The invention is further illustrated by the following examples in conjunction with the accompanying drawings:

embodiment one:

referring to fig. 1, a method for automatically generating a captured data set based on rectangular fitting includes the following steps:

step one, object image acquisition:

determining an object area and generating an object mask:

detecting outline information of a target object:

fourth, detecting a straight line by Hough transformation:

fifthly, fitting the outline by utilizing a plurality of rectangles:

(1) the included angle of the two line segments is smaller than 20 degrees;

(3) The number of pixels between parallel lines is not zero;

(4) the shorter line segment has a projection on the longer line segment;

step six, generating a grabbing rectangle:

The method of the embodiment has simple and time-saving manufacturing operation for the grabbing data set, only needs to manually clamp the object in the middle, does not need excessive manual intervention, and can finish the manufacturing work for grabbing the data set in the shortest time.

Embodiment two:

this embodiment is substantially the same as the first embodiment, and is specifically as follows:

in the first step, a fixed Kinect camera is used as data acquisition equipment, the object is clamped by a gripper at the tail end of the mechanical arm to change the posture of the object, and image information of the object under various angles is acquired.

In the fifth step, using four conditions for forming a rectangle, detecting whether any two parallel lines can be used for synthesizing a fitting rectangle, and establishing the fitting rectangle from the two parallel lines in a mutual projection mode.

In the step six, the projection w of the finger width is taken as the width of the grabbing rectangle, the H of the fitting rectangle which is 6/5 times of the width of the grabbing rectangle is taken as the height H of the grabbing rectangle, and sampling is carried out at intervals of 0.5w along the H edge of the fitting rectangle to construct the grabbing rectangle.

And 5, in the step seven, merging the object image and the background image to form a grabbing data set image, and generating grabbing parameters according to the position of the object image in the background.

The automatic generation method of the grabbing data set based on rectangular fitting is simple to operate, time-saving and labor-saving, and the manufacturing work of the grabbing database can be completed in a short time.

Embodiment III:

referring to fig. 1 to 8, a method for automatically generating a captured data set based on rectangular fitting includes the following steps:

step one, a data collection system is built according to the structure shown in fig. 2, a camera is installed right above a background desktop through a frame and vertically downward, and a mechanical arm is fixed on the desktop. To facilitate later object segmentation, a white desktop background is selected. The camera is connected with the computer through an L-to-USB data line, and the mechanical arm controller is connected with the computer through a twisted pair. And powering up all the devices after all the hardware is successfully connected, and initializing a data acquisition system. The hand-operated control mechanical arm gripper clamps the edge of the object to be collected and drags the object to the position right below the camera, the object is driven to rotate around X, Y, Z coordinate axes of the gripper coordinate system by plus or minus 30 degrees respectively, and the camera acquires an object image after the system is stable.

And secondly, establishing a coordinate system at the midpoint position of the connecting line of the two fingertips, and calculating the projection point of the fingertips in the image by utilizing the pose relation between the camera and the mechanical arm so as to determine the approximate position of the object. A rectangle with a side length of e of 50 pixels is established, the rectangle is parallel to the image edge, the projection point of the fingertip is positioned at 3/4 of the axis of the rectangle, and the rectangle limits the area of the object. The region contains only the target object and the background image, then the background is subtracted from the image, and a threshold th is set _v 10, where the difference is less than the thresholdThe division into object areas is effected as shown in fig. 5.

And step three, detecting the edge of the object by using the cvCanny () function provided in OpenCV for the object mask image obtained in step two. The mask image is input as a function, and parameters threshhold 1=100, threshhold 2=150, aperture_size=3 are set. The two-dimensional matrix with the same size as the input is output, the outline position of the object is represented by '1', the other parts are represented by '0', and the effect is shown in figure 6.

And step four, detecting a point set which is possibly a straight line in the outline by using a probability HoughLinesP () function provided in OpenCV. Setting a search step rho=2, a search radian interval theta=pi/180, an accumulated point number threshold=30, a line segment maximum interval maxlinegap=10 and a line segment minimum length=100 by taking the output in the step three as an input. The function output is a two-dimensional matrix L consisting of a plurality of line segments, the first dimension representing the number of detected line segments, and the second dimension being the abscissa and ordinate of the two endpoints of each line segment. In order to simplify the calculation amount in line segment integration, line segments are clustered first. Firstly, initializing a set S for storing line segments used for integrating the straight line segments, sequentially extracting one line segment from all detected line segments, and deleting the line segment from the L. Judging whether the line segment which can be combined with the line segment exists in the S, if so, adding the line segment into a corresponding line segment set, otherwise, establishing a new line segment set until all the detected line segments are traversed. And calculating the minimum circumscribed rectangle formed by all points on each type of straight line, and connecting the midpoints of two short sides in the rectangle to represent the combined straight line.

And fifthly, calculating the lengths of all the combined straight lines, and sequencing the straight lines by using a quick sequencing algorithm. Preferentially using longer straight lines to generate rectangles, and sequentially judging whether the longer straight lines and the remaining straight lines can generate rectangles for fitting objects, wherein the rectangles for fitting objects need to meet the following conditions: 1. the included angle of the two line segments is smaller than 20 degrees; 2. the distance between parallel lines is greater than threshold th _l =50; 3. the number of pixels between parallel lines is not zero; 4. the shorter line segment has a projection on the longer line segment. And establishing a fitting rectangle according to the conditions. Assume thatAny given two straight lines have a length of l ₁ The short line is l ₂ ，l ₁ P is the left end point of (2) ₁ The right end point is p ₂ ，l ₂ P is the left end point of (2) ₃ The right end point is p ₄ As shown in fig. 3. Connection point p ₁ 、p ₂ Form vector V ₁ Connection point p ₃ 、p ₄ Form vector V ₂ . Calculating the included angle between the two vectorsBy calculating l ₂ Midpoint to l ₁ The distance d of (2) represents the distance between the two line segments. Respectively p ₃ 、p ₄ Point and p ₁ 、p ₂ The point links form a vector V ₁₃ 、V ₁₄ 、V ₂₃ 、V ₂₄ If V ₁₃ 、V ₁₄ At least one vector and vector V ₁ An included angle of an acute angle, V ₂₃ 、V ₂₄ At least one vector and vector V ₁ The included angle is an acute angle, which indicates that projections exist between the two line segments. The successive connection points p ₁ 、p ₂ 、p ₄ 、p ₃ 、p ₁ Forming a quadrangle, and counting the number n of non-zero points in the quadrangle. If the four conditions cannot be met at the same time, judging the next straight line until the complete straight line is traversed. For a line meeting the condition, a point p is calculated ₃ 、p ₄ At vector V ₁ Projection point p 'on' ₃ 、p' ₄ Abscissa of projection pointOrdinate->Where i is {3,4}, α ₁ Representing vector V ₁ The included angle between the straight line and the horizontal line. If p' ₃ At vector V ₁ On p' ₁ =p3. Otherwise, a passing point p is established ₁ And is connected with vector V ₂ Straight line perpendicular to V ₂ The intersection of the straight lines being taken as point p' ₁ ，p' ₃ ＝p ₁ . Similarly, if p' ₄ At vector V ₁ On p' ₂ ＝p ₄ . Otherwise, a passing point p is established ₂ And is connected with vector V ₂ Straight line perpendicular to V ₂ The intersection of the straight lines being taken as point p' ₂ ，p' ₄ ＝p ₂ . Calculating the midpoint (x, y) of the fitted rectangle, wherein +.> Calculating the rotation angle +.>Calculating the width W of the fitting rectangle to be equal to the point p' ₁ And p' ₂ From the midpoint to point p' ₃ And p' ₄ Calculating the height H of the fitting rectangle as the distance of the midpoint of the point p' ₁ And p' ₂ Distance between and point p' ₃ And p' ₄ The distance between them is small. The fitting is finally completed by a rectangle, as shown in fig. 7.

And step six, measuring the width w of the finger, and calculating the projection width w' of the finger in the image according to the hand-eye relation. Taking w'/2 as a step length, sampling from the end P1 to the end P2 of the rectangle along the direction of the edge H in the fitting rectangle to generate a plurality of grabbing rectangles, as shown in fig. 4. Width w of grabbing rectangle _g =w', highWhen theta is as _r Rotation angle theta of right grabbing rectangle _g ＝θ _r -90 DEG, when theta _r θ when the value is negative _g ＝θ _r +90°. Grabbing the abscissa c of the rectangular center point _x ＝x _P1 +k·w _g cos(θ _r ) Ordinate c _y ＝y _P1 +k·w _g sin(θ _r ) Where k= 0.5,1,1.5,2 ….

And step seven, selecting two situations of a monochromatic background and a clutter background as the background, wherein the size is 300 x 300. Utilization stepIn the second step, the object area in the object mask clipping image is obtained, and a matrix with values only existing in the object area and other parts being 0 is obtained. Rotating the image using a warp Affine () function, the rotation center being set as the center of the image, the boundary filling value being 0, the rotation angle θ _▽ Take values from 0-360 deg. at intervals of 30 deg.. The rotated image was added to the background image corresponding element and the single object was co-generated into 2 x 12 = 24 samples. Based on the position and angle of the object in the background image and the grab rectangle obtained in step six, the grab rectangle (c 'of the object in the composite image is calculated' _x ,c' _y ,w' _g ,h' _g ,θ' _g ) WhereinThe abscissa representing the center point of the grabbing rectangle in the composite image, +.>Representing the ordinate, w 'of the centre point' _g ＝w _g Represents the width of the finger in the grabbing rectangle, h' _g ＝h _g Represents the opening size of the paw in the grabbing rectangle, theta' _g ＝θ _g +θ _▽ An example of the acquired grabbing data set is shown in fig. 8, representing the angle between the grabbing rectangle and the horizontal axis of the image.

According to the automatic grabbing data set generation method based on rectangular fitting, an end effector of a mechanical arm is used for clamping the edge of an object, the object is driven to rotate around the center of the object, and images of all angles of the object are collected through a camera; determining an object area and generating an object mask; detecting outline information of a target object; performing Hough transformation on the image, detecting a straight line in the outline of the object, and merging the short line segments; sequencing the lengths of the straight lines, detecting corresponding parallel lines according to the lengths of the straight lines, and fitting the outline of the object by using a plurality of rectangles; generating a plurality of grabbing rectangles applicable to the double-finger paw in a mode of equidistant sampling from the fitting rectangles; combining the background image and the object image generates a captured dataset image while generating a corresponding label. The method of the embodiment is simple to operate, convenient and quick to realize, and does not need extra equipment. The problem that the manual labeling work of the data sets in the grabbing task is time-consuming and labor-consuming is solved, and a convenient way is provided for manufacturing training sets required by deep learning models in the grabbing task.

The above detailed description of the specific embodiments of the present invention with reference to the accompanying drawings, but the present invention is not limited to the above embodiments, and all changes, substitutions or simplifications made according to the technical scheme principles of the present invention are equivalent substitution modes, or direct or indirect application in other related fields are included in the scope of the patent protection of the present application.

Claims

1. The automatic generation method of the grabbing data set based on rectangular fitting is characterized by comprising the following steps of:

step one, object image acquisition:

determining an object area and generating an object mask:

in the second step, the coordinates (u, v) of the midpoint position (X, Y, Z) of the connecting line of the two fingertips in the coordinate system of the mechanical arm in the image are calculated by using the relation between the camera and the mechanical arm through kinematic positive solution, so as to determine the position of the object in the image:

wherein,,representing the relation of the mechanical arm coordinate system relative to the camera coordinate system, f _x ,f _y ,c _x ,c _y Is an internal reference of the camera;

detecting outline information of a target object:

fourth, detecting a straight line by Hough transformation:

fifthly, fitting the outline by utilizing a plurality of rectangles:

(1) the included angle of the two line segments is smaller than 20 degrees;

(3) The number of pixels between parallel lines is not zero;

(4) the shorter line segment has a projection on the longer line segment;

step six, generating a grabbing rectangle:

2. The automatic generation method of a grabbing data set based on rectangular fitting according to claim 1, wherein: in the first step, a fixed Kinect camera is used as data acquisition equipment, the object is clamped by a gripper at the tail end of the mechanical arm to change the posture of the object, and image information of the object under various angles is acquired.

3. The automatic generation method of a grabbing data set based on rectangular fitting according to claim 1, wherein: in the fifth step, using four conditions for forming a rectangle, detecting whether any two parallel lines can be used for synthesizing a fitting rectangle, and establishing the fitting rectangle from the two parallel lines in a mutual projection mode.

4. The automatic generation method of a grabbing data set based on rectangular fitting according to claim 1, wherein: in the step six, the projection w of the finger width is taken as the width of the grabbing rectangle, the H of the fitting rectangle which is 6/5 times of the width of the grabbing rectangle is taken as the height H of the grabbing rectangle, and sampling is carried out at intervals of 0.5w along the H edge of the fitting rectangle to construct the grabbing rectangle.

5. The automatic generation method of a grabbing data set based on rectangular fitting according to claim 1, wherein: in the seventh step, the object image and the background image are combined to form a grabbing data set image, and grabbing parameters are generated according to the position of the object image in the background.