WO2023124734A1 - Procédé, appareil et système d'estimation de point de saisie d'objet, procédé, appareil et système d'entraînement de modèle et procédé, appareil et système de génération de données - Google Patents

Procédé, appareil et système d'estimation de point de saisie d'objet, procédé, appareil et système d'entraînement de modèle et procédé, appareil et système de génération de données Download PDF

Info

Publication number
WO2023124734A1
WO2023124734A1 PCT/CN2022/135705 CN2022135705W WO2023124734A1 WO 2023124734 A1 WO2023124734 A1 WO 2023124734A1 CN 2022135705 W CN2022135705 W CN 2022135705W WO 2023124734 A1 WO2023124734 A1 WO 2023124734A1
Authority
WO
WIPO (PCT)
Prior art keywords
point
grasping
quality
image
model
Prior art date
Application number
PCT/CN2022/135705
Other languages
English (en)
Chinese (zh)
Inventor
周韬
Original Assignee
广东美的白色家电技术创新中心有限公司
美的集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广东美的白色家电技术创新中心有限公司, 美的集团股份有限公司 filed Critical 广东美的白色家电技术创新中心有限公司
Publication of WO2023124734A1 publication Critical patent/WO2023124734A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/75Determining position or orientation of objects or cameras using feature-based methods involving models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Definitions

  • the present disclosure relates to but is not limited to artificial intelligence technology, and specifically relates to a method, device and system for object grasping point estimation, model training and data generation.
  • the challenge encountered by the robot vision system is to guide the robot to grab thousands of different stock keeping units (SKU for short).
  • SKU stock keeping units
  • These objects are usually unknown to the system, or due to the variety, it is too costly to maintain physical models or texture templates for all SKUs.
  • the simplest example is in the depalletizing application, although the objects to be grabbed are all rectangular objects (boxes or boxes), the texture and size of the objects will change according to different scenes. Therefore, the classic object localization or recognition scheme based on template matching is difficult to apply in such scenarios.
  • many objects have irregular shapes. The most common objects are box-like objects and bottle-like objects. Sorting one by one from the stacked state, performing subsequent scanning or identification operations and sending them into the appropriate target material box.
  • the robot vision system estimate the most suitable grasping point of the robot (it can be a suction point but not limited to this) based on the scene captured by the camera without prior knowledge of the object, and guide the robot to perform object grasping Taking actions is still a problem that needs to be solved.
  • An embodiment of the present disclosure provides a method for generating training data of an object grasping point estimation model, including:
  • a target grasping quality of pixels in the sample image is generated according to the grasping quality of the sampling points of the first object.
  • An embodiment of the present disclosure also provides a device for generating training data of an object grasping point estimation model, including a processor and a memory storing a computer program, wherein, when the processor executes the computer program, the implementation of the present disclosure is realized.
  • a device for generating training data for an object grasping point estimation model including a processor and a memory storing a computer program, wherein, when the processor executes the computer program, the implementation of the present disclosure is realized.
  • the method and device of the above-mentioned embodiments of the present disclosure realize automatic labeling of sample images, can generate training data efficiently and with high quality, and avoid problems such as heavy workload and unstable labeling quality caused by manual labeling.
  • An embodiment of the present disclosure provides a method for training an estimation model of object grasping points, including:
  • the training data includes a sample image and the target grasping quality of pixels in the sample image
  • the estimation model includes a backbone network using a semantic segmentation network architecture and a multi-branch network, and the multi-branch network adopts a multi-task learning network architecture.
  • An embodiment of the present disclosure also provides a training device for an estimation model of an object grasping point, including a processor and a memory storing a computer program, wherein, when the processor executes the computer program, any one of the methods described in the present disclosure can be realized.
  • the method for training the estimation model of the object grasping point described in the embodiment is not limited to the embodiment.
  • the method and device of the above-mentioned embodiments of the present disclosure learn the capture quality of pixels in a 2D image through training, which has better accuracy and stability than the direct and optimal way of capturing points.
  • An embodiment of the present disclosure provides a method for estimating a grasping point of an object, including:
  • An embodiment of the present disclosure also provides a device for estimating the grasping point of an object, including a processor and a memory storing a computer program, wherein, when the processor executes the computer program, the method described in any embodiment of the present disclosure is implemented.
  • a device for estimating the grasping point of an object including a processor and a memory storing a computer program, wherein, when the processor executes the computer program, the method described in any embodiment of the present disclosure is implemented. The method for estimating object grasping points described above.
  • An embodiment of the present disclosure also provides a robot vision system, including:
  • a camera configured to shoot a scene image containing an object to be captured, where the scene image includes a 2D image, or includes a 2D image and a depth image;
  • the control device includes the object grasping point estimation device according to the embodiment of the present disclosure, the control device is configured to determine the position of the grasping point of the object to be grasped according to the scene image captured by the camera ; and, controlling the grabbing action performed by the robot according to the position of the grabbing point;
  • a robot configured to perform said grasping action.
  • the estimation method, device and robot vision system of the above-mentioned embodiments of the present disclosure can improve the accuracy of object grasping point estimation, thereby improving the success rate of grasping.
  • An embodiment of the present disclosure also provides a non-transitory computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the computer program described in any embodiment of the present disclosure can be implemented.
  • the method for generating the training data of the object grasping point estimation model described above, or realize the training method of the object grasping point estimation model as described in any embodiment of the present disclosure, or realize the method as described in any embodiment of the present disclosure Method for estimating object grasp points.
  • FIG. 1 is a flowchart of a method for generating training data for an object grasping point estimation model according to an embodiment of the present disclosure
  • Fig. 2 is a flow chart of generating labeled data according to the grasping quality of sampling points in Fig. 1;
  • FIG. 3 is a schematic diagram of a device for generating training data according to an embodiment of the present disclosure
  • Fig. 4 is a flowchart of a training method for an estimation model of an object grasping point according to an embodiment of the present disclosure
  • Fig. 5 is a network structure diagram of an estimation model according to an embodiment of the present disclosure.
  • FIG. 6 is a flowchart of a method for estimating an object grasping point according to an embodiment of the present disclosure
  • FIG. 7 is a schematic structural diagram of a robot vision system according to an embodiment of the present disclosure.
  • words such as “exemplary” or “for example” are used to mean an example, illustration or illustration. Any embodiment described in this disclosure as “exemplary” or “for example” should not be construed as preferred or advantageous over other embodiments.
  • "And/or” in this article is a description of the relationship between associated objects, which means that there can be three relationships, for example, A and/or B, which can mean: A exists alone, A and B exist simultaneously, and there exists alone B these three situations.
  • “A plurality” means two or more than two.
  • words such as “first” and “second” are used to distinguish the same or similar items with basically the same function and effect. Those skilled in the art can understand that words such as “first” and “second” do not limit the quantity and execution order, and words such as “first” and “second” do not necessarily limit the difference.
  • the point cloud is used for plane segmentation or segmentation based on Euclidean distance, so as to try to segment and detect different objects in the scene, and then based on the segmented points Find the center point as the grab point candidate, then use a series of heuristic rules to sort the grab point candidates, and finally guide the robot to grab the optimal grab point.
  • a feedback system is introduced to record the success or failure of each grab. If successful, the current object is used as a template to match the grab point of the next grab.
  • the problem with this scheme is that the performance of ordinary point cloud segmentation is relatively weak, there will be more wrong capture points, and when the objects are closely arranged, the point cloud segmentation scheme is easy to fail.
  • a deep learning framework is used to manually mark some limited data, mark the direction and area of its grasping point to obtain relevant training data, and train a neural network model based on these training data.
  • the vision system can process and count pictures similar to the training set, and estimate the grasping points of the objects.
  • the problem with this solution is that the cost of data collection and labeling is relatively high, especially at the level of data labeling. It is difficult to label the direction and area of the grabbing point, which requires the labeler to have strong technical capabilities. At the same time, there are many human factors in the labeling information. , the labeling quality cannot be systematically controlled, so it is impossible to produce a model with systematic quality assurance.
  • An embodiment of the present disclosure provides a method for generating training data of an object grasping point estimation model, as shown in FIG. 1 , including:
  • Step 110 acquiring the 3D model of the sample object, sampling the grabbing points based on the 3D model of the sample object and evaluating the grabbing quality of the sampling points;
  • the sample object can be various box-like objects, bottle-like objects, box-like objects and bottle-like objects, or objects of other shapes.
  • the sample object can usually be selected from the actual items to be grabbed, but it is not required to cover all types of the actual items to be grabbed.
  • items with typical geometric shapes among the items to be grasped can be selected as sample objects, but this disclosure does not require that the sample objects must cover the shapes of all items to be grasped.
  • the trained model is still Grasp point estimation can be performed for objects of other shapes.
  • Step 120 rendering the simulated scene loaded with the 3D model of the first object, and generating a sample image for training, the first object being selected from the sample objects;
  • the loaded first object may be randomly selected by the system from sample objects, or manually selected, or selected according to configured rules.
  • the selected first object may include one type of sample object, or may include multiple types of sample objects, may include one type of sample object, or may include multiple types of sample objects. This embodiment is not limited to this.
  • Step 130 generating a target grasping quality of pixels in the sample image according to the grasping quality of the sampling points of the first object.
  • the target capture quality of pixels in the sample image here can be the target capture quality of some pixels in the sample image, or the capture quality of all pixels in the sample image, which can be marked pixel by pixel , may also be to mark a set of multiple pixel points, such as an area including more than two pixel points in the sample image. Because the grasping quality of pixels in the labeled sample image is used as the target data during training, it is called the target grasping quality of pixels in this paper.
  • the 3D model of the sample object is obtained first, and the grabbing point sampling and evaluation of the grabbing quality of the sampling point are performed based on the 3D model. Because the geometric shape of the 3D model itself is accurate, the quality of grabbing can be completed with high quality. evaluation of. After loading the 3D model of the selected first object to generate the first simulation scene, because the position and attitude of the 3D model can be tracked during loading, the positional relationship between the sampling point and the pixel point in the sample image can be calculated, and the sampling point The quality of grabbing is passed to the corresponding pixel in the sample image.
  • the training data generated by the embodiments of the present disclosure includes sample images and annotation data (including but not limited to the target capture quality), so the embodiments of the present disclosure realize automatic annotation of sample images, and can generate training data efficiently and with high quality , avoiding the heavy workload and unstable labeling quality caused by manual labeling.
  • the acquiring the 3D model of the sample object includes: creating or collecting the 3D model of the sample object, and normalizing the center of mass of the sample object to be located at the center of the 3D model
  • the origin of the model coordinate system, the main axis of the sample object is consistent with the direction of a coordinate axis in the model coordinate system.
  • the so-called normalization can be embodied as a unified modeling rule, that is, the origin of the model coordinate system is established at the center of mass of the sample object, and one coordinate axis in the model coordinate system is consistent with the main axis direction of the object . If it is a collected 3D model that has been created, normalization can be achieved by translating and rotating the 3D model to meet the above requirements that the center of mass is located at the origin and the main axis is in the same direction as a coordinate axis.
  • the performing grab point sampling based on the 3D model of the sample object includes: performing point cloud sampling on the 3D model of the sample object, and determining the position of the sampling point in the 3D model The first position and grabbing direction are recorded; the first position is represented by the coordinates of the sampling point in the model coordinate system of the 3D model, and the grabbing direction is based on the coordinates of the sampling point in the 3D model
  • the normal vector is determined.
  • uniform sampling can be performed on the surface of the sample object, and the specific algorithm is not limited.
  • the sampling points on the surface of the sample object can have an appropriate density. , to avoid missing suitable grab points.
  • a normal vector of a plane fitted by all points within a set neighborhood range of a sampling point may be used as a normal vector of the sampling point.
  • the evaluation of the grasping quality of the sampling point includes: in the scenario where a single suction cup is used to pick up the sample object, estimating the sampling point according to the sealing quality and the confrontation quality of each sampling point The grasping quality; wherein, the sealing quality is based on the fact that the sucker sucks the sample object at the sampling point and the axial direction of the sucker is consistent with the grabbing direction of the sampling point.
  • the airtightness between the surfaces is determined, and the resisting mass is determined according to the gravitational moment of the sample object and the resisting degree of the gravitational moment against the gravitational moment that can be generated when the suction cup absorbs the object.
  • the gravitational moment will cause the sample object to rotate and fall (the mass of the sample object is assigned during configuration), and the suction force of the suction cup on the sample object and the distance between the end of the suction cup and the sample object
  • the friction force between them can provide a moment against the gravitational moment to prevent the sample object from falling.
  • the suction force and friction force can be used as configuration information or calculated according to configuration information (such as suction cup parameters, object material, etc.). Therefore, the degree of resistance reflects the stability of the object during absorption, which can be calculated according to relevant formulas.
  • the above closedness quality and confrontation quality can be scored separately, and then the sum of the two scores, or the average value, or the weighted average value can be used as the grasping quality of the sampling point.
  • the airtight quality and confrontation quality of the sampling point are determined by the local geometric characteristics of the 3D model, which can fully reflect the relationship between the local geometric information of the object and the quality of the grasping point, so the accurate evaluation of the grasping quality of the sampling point can be realized .
  • the embodiment of the present disclosure takes a single sucker to pick up an object as an example, the present disclosure is not limited thereto.
  • the grasping method of picking up an object through multiple points or clamping an object through multiple points it can also be achieved according to the grasping efficiency and the stability of the object.
  • the index of quality and probability of success is used to evaluate the grasping quality of sampling points.
  • the simulation scene is obtained by loading the 3D model of the first object into the initial scene, and the loading process includes:
  • the embodiment of the present disclosure can simulate various object stacking scenes through the above loading process, and the training data generated based on the scene makes the model trained based on the training data suitable for object grasping points in complex scenes of object stacking. Estimation, to solve the problem that the object grasping point is difficult to estimate in this complex scene.
  • a simulated material frame can be set in the initial scene, the 3D model of the first object is loaded into the material frame, and the collision process between the first objects and between the first object and the material frame is simulated through simulation, so that the final form The simulated scene of object stacking is closer to the real scene. But the material frame is not necessary.
  • a simulated scene in which the first objects are stacked in an orderly manner may also be loaded, depending on the need for simulating the actual working scene.
  • the first object may be loaded multiple times in different ways to obtain multiple simulation scenes.
  • the different manners may be, for example, different types and/or quantities of the loaded first objects, different initial positions and postures of the 3D models during loading, and the like.
  • the rendering of the simulated scene to generate sample images for training includes: rendering each simulated scene at least twice to obtain at least two sets of sample images for training ; Wherein, when rendering each time, add a simulated camera to the simulated scene, set the light source and add texture to the loaded first object, and render the 2D image and depth image as a set of sample images; in the multiple renderings At least one of the following parameters is different between any two renderings: object texture, simulated camera parameters, and light parameters.
  • the simulated environment is illuminated during the process of rendering pictures, by adjusting the parameters of the simulated camera (such as internal parameters, position, angle, etc.), light parameters (such as the color and intensity of lighting, etc.), the texture of the object, etc., can Strengthen the degree of data randomization, enrich the content of sample images, and increase the number of sample images, thereby improving the quality of training data, and then improving the performance of the trained estimation model.
  • the parameters of the simulated camera such as internal parameters, position, angle, etc.
  • light parameters such as the color and intensity of lighting, etc.
  • the texture of the object etc.
  • adding textures to the loaded first object each time of rendering includes: each time of rendering, for each first object loaded into the simulation scene, from various Randomly select one of the real textures to paste on the surface of the first object; or, each time rendering, for each type of first object loaded into the simulation scene, randomly select one of the collected multiple real textures Attached to the surface of the first object of this kind.
  • This example compensates for domain differences between real and simulated data through a randomization technique.
  • the real texture can be collected from an actual object image, an image of the real texture can be used, and the like. Randomly pasting the selected texture on the surface of the first object stacked randomly in the simulated scene can render multiple images with different textures.
  • the estimation model can use the local geometry Information to predict the grasping quality of the grasping point, so as to realize the generalization ability of the model for unknown objects.
  • the sample image includes a 2D image and a depth image; generating the target capture quality of the pixel points in the sample image according to the capture quality of the sampling points of the first object ,include:
  • Step 210 according to the internal parameters of the simulated camera during rendering and the rendered depth image, obtain the point cloud of the first visible object in the simulated scene;
  • Step 220 Determine the position of the target sampling point in the point cloud according to the first position of the target sampling point in the 3D model, the second position of the 3D model in the simulation scene, and the attitude change after loading , the target sampling point refers to the sampling point of the visible first object;
  • Step 230 according to the capture quality of the target sampling point and the position in the point cloud, determine the capture quality of the point in the point cloud and mark it as the target capture quality of the corresponding pixel point in the 2D image. Take the quality.
  • the determining the grasping quality of the point in the point cloud according to the grasping quality of the target sampling point and the position in the point cloud includes:
  • the first one for each target sampling point, determine the grasping quality of the points adjacent to the target sampling point in the point cloud as the grasping quality of the target sampling point;
  • the second type for a point in the point cloud, obtain the grasping quality of the point according to the grasping quality interpolation of the target sampling point adjacent to the position of the point;
  • the third method for each target sampling point, determine the grasping quality of the points adjacent to the target sampling point in the point cloud as the grasping quality of the target sampling point, and determine the quality of all points adjacent to the target sampling point After the quality is grasped, the quality of grasping of other points in the point cloud is obtained by interpolation.
  • Embodiments of the present disclosure provide various methods for transferring the grasping quality of target sampling points to the points of the point cloud.
  • the first is to assign the grasping quality of the target sampling point to the adjacent points in the point cloud.
  • the adjacent point can be one or more points closest to the target sampling point in the point cloud, such as can be filtered out according to the set distance threshold, and the distance to the target sampling point in the point cloud is less than the set point.
  • the point with the above-mentioned distance threshold is taken as the point adjacent to the target sampling point.
  • the second is an interpolation method. A point in the point cloud can be interpolated according to the grasping quality of multiple nearby target sampling points.
  • an interpolation method based on Gaussian filtering can be used, or based on multiple target sampling points. Each distance to the point is given different weights for multiple target sampling points. The larger the distance, the smaller the weight. Based on the weight, the grasping quality of the multiple target sampling points is weighted and averaged to obtain the weight of the point. Grabbing quality, this embodiment can also use other interpolation methods.
  • the points adjacent to this point can also be filtered according to the set distance threshold. If only one adjacent target sampling point is found for this point, the grabbing quality of this target sampling point can be assigned to this point.
  • the third method is to determine the capture quality of the points adjacent to the target sampling point in the point cloud, and then obtain the capture quality of other points in the point cloud by interpolation according to the capture quality of some points in the point cloud.
  • Both the above second and third methods can obtain the capture quality of all points in the point cloud, and after mapping the capture quality of these points in the point cloud to the capture quality of the corresponding pixel points in the 2D image, you can draw Heatmap of grasping quality for 2D images.
  • using the first method it is also possible to obtain only the capture quality of some points in the point cloud, and then obtain the capture quality of some pixels in the 2D image through mapping. At this time, during training, only the predicted grasping quality of the part of pixels is compared with the target grasping quality, the loss is calculated, and then the model is optimized according to the loss.
  • the generating The method further includes: for each target sampling point, using the grabbing direction of the target sampling point as the grabbing direction of a point adjacent to the target sampling point in the point cloud, combining the visible first Relative positional relationship between objects, when it is determined that the grabbing space at a point adjacent to the target sampling point is smaller than the required grabbing space, the distance from the point cloud to the target sampling point is smaller than the set Grabbing quality is adjusted downwards for points with a certain distance threshold.
  • the embodiment of the present disclosure considers that in the stacked state, the grasping points with better quality of each object may not have enough space to complete the grasping operation due to the existence of adjacent objects, so the grasping quality of the points in the point cloud is determined Afterwards, the judgment of the grasping space is carried out, and the grasping quality of the point affected by the insufficient grasping space is adjusted downward, specifically, it can be adjusted below a certain set quality threshold to avoid being selected.
  • the sample image includes a 2D image
  • the generating method further includes: labeling the classification of each pixel in the 2D image, the classification includes foreground and background, where the foreground is The first object in the image.
  • Classification of pixel points can be used to train the estimation model to distinguish the ability of the foreground and background, and accurately select the points in the foreground (that is, the points on the first object) from the sample image input to the estimation model, so only the foreground points are needed. Estimates of predictive grasp quality are performed.
  • the classification of the 2D image point pixels can also be obtained based on the classification of the points on the point cloud. By mapping the boundary points between the first object in the simulation scene and the background in the simulation scene to the point cloud, it is possible to determine the classification of the points on the point cloud, that is, foreground points or background points.
  • An embodiment of the present disclosure also provides a method for generating training data of an object grasping point estimation model, including:
  • Step 1 Collect 3D models of various sample objects, and normalize the 3D models so that the origin of the model coordinate system is placed at the center of mass of the sample object, and the first coordinate axis of the model coordinate system is consistent with the main axis of the sample object.
  • a 3D model in a format such as STereoLithography (STL for short) can be used, and the position of the center of mass of the sample object can be obtained by calculating the center points of all vertices through the statistics of the vertex and surface information in the 3D model. Then translate the origin of the model coordinate system to the center of mass of the sample object.
  • the principal component analysis (PCA) method can be used to confirm the main axis direction of the sample object, and then the 3D model of the sample object is rotated so that the direction of a coordinate axis of the model coordinate system is in the same direction as the main axis of the sample object.
  • PCA principal component analysis
  • Step 2 sampling the grabbing points of the 3D model of the sample object, obtaining and recording the first position and grabbing direction of each sampling point;
  • the sampling process in this embodiment is to perform point cloud sampling on the object model, and use the sampled point cloud to estimate the normal vector in a fixed neighborhood, and each point and its normal vector represent a sampling point.
  • the voxel sampling method or other sampling methods such as the farthest point sampling
  • all points within a certain range of neighborhood where each sampling point is located are used to estimate the direction of the normal vector of the sampling point.
  • the method of estimating the normal vector can be to use the random sample consensus algorithm (RANSAC for short) to fit all points in the neighborhood of the sampling point to estimate a plane, and the normal vector of the plane is approximately the normal vector of the sampling point.
  • RANSAC random sample consensus algorithm
  • Step 3 assessing the quality of the sampling points
  • the quality assessment process includes calculating the closedness quality during suction and the counterweight to the gravitational moment during suction (it needs to be able to resist the gravitational moment to complete stable grasping), according to the closure of each sampling point Quality and Adversarial Quality Estimate the grasping quality at that sampling point.
  • the sampled suction point that is, the sampling point
  • it is necessary to evaluate whether the sampled suction point is a suction point that can stably pick up the sample object.
  • Evaluation includes two aspects, the first is the quality of closure.
  • the quality of closure can be measured by approximating the end of the suction cup with a set radius as a polygon, projecting this polygon onto the surface of the 3D model through the grab point direction of the sampling point, and then comparing the projected polygon population side length and original side length. If the overall side length after projection increases more than the original side length, the sealing performance is not good. On the contrary, if the change is not large, the sealing performance is better.
  • the increase degree can be expressed by the ratio of the increased value to the original side length value, The ratio can be given a corresponding score according to the ratio interval it falls into.
  • Another aspect is to calculate the counter mass against the gravitational moment when the suction cup sucks the sample object at the sampling point along the grasping direction (also called the suction point direction).
  • the confrontation quality can be calculated through the "wrench resistance” modeling scheme.
  • "wrench” is a six-dimensional vector, the first three dimensions are forces, and the last three dimensions are moments.
  • the space formed by the six-dimensional vectors is “wrench space", "wrench resistance” Indicates whether the combined wrench of the force and moment acting on a certain point has resistance. If the gravitational moment can be included in the wrench space provided by the suction force and the torque generated by the friction force, it can provide stable suction, but not vice versa.
  • closure quality and adversarial quality into scores between 0 and 1, and summing them up, the suction quality evaluation results for each suction point are obtained.
  • Step 4 Build the initial simulation data collection scene, that is, the initial scene, load multiple first objects selected from the sample objects into the initial simulation scene, and use the physics engine to simulate the falling dynamics and final stacking posture of the first objects .
  • the 3D model of the first object can be loaded into the simulation environment through random positions and postures, and a certain quality can be given to each 3D model.
  • the 3D model of the first object can randomly fall into the material frame by simulating the effect of gravity, and the physics engine will also calculate the collision information between different first objects at the same time, so that the first object forms a The stacking state is very close to the real scene. Based on such a scheme, the second position and posture of the first objects that are close to the real random stacking are obtained in the simulation scene.
  • Step 5 generating annotation data of the sample image rendered based on the simulated scene according to the capture quality of the sampling points.
  • This step needs to map the sampling points obtained by sampling the grab points on the 3D model and the grasping quality of each sampling point obtained through evaluation to the module scene based on stacked objects. Since the second position and attitude of the 3D model of the first object can be obtained during the simulation of the simulated scene, and the positions of the sampling points are expressed based on the model coordinate system of the 3D model, it is easy to calculate the positions of these sampling points in the simulated scene .
  • a simulated camera is added at a set position in the simulated environment, and a ray-tracing-based rendering engine is used to efficiently render the 2D image (such as a texture image) and depth map of the first object in the stacked scene.
  • the rendered depth image can be converted into a point cloud. Based on the calculated positions of the sampling points in the simulated scene and the rendered point cloud of the first object, the position of each sampling point in the point cloud of the first object to which it belongs can be determined.
  • a Gaussian filter may be performed on the grasping qualities of these sampling points based on the positions of the sampling points in the same first object.
  • the target grasping quality of other pixels in the 2D image may be obtained by interpolation according to the target grasping quality of the corresponding pixel.
  • the grabbing quality of pixels with insufficient grabbing space can be adjusted down, so that When selecting the optimal grab point, some low-quality grab points caused by collisions are filtered out.
  • the adjustment of the capture quality here may also be performed on corresponding pixels in the 2D image.
  • the capture quality heat map of the 2D image rendered by the simulated scene can be obtained.
  • the grasping quality heat map is output as annotation data of the sample image, but the mark data is not necessarily in the form of a heat map, as long as it contains information about the target grasping quality of pixels in the 2D image.
  • the estimation model may be driven to learn or fit the grasping quality heat map during training.
  • An embodiment of the present disclosure also provides a device for generating training data of an object grasping point estimation model, as shown in FIG. 3 , including a processor 60 and a memory 50 storing a computer program, wherein the processor 60 executes The computer program implements the method for generating training data of an object grasping point estimation model according to any embodiment of the present disclosure.
  • the processor in the embodiments of the present disclosure and other embodiments may be an integrated circuit chip, which has a signal processing capability.
  • Described processor can be general-purpose processor, as central processing unit (Central Processing Unit, be called for short CPU), network processor (Network Processor, be called for short NP) etc.; It can also be digital signal processor (DSP), application-specific integrated circuit ( ASIC), off-the-shelf programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • DSP digital signal processor
  • ASIC application-specific integrated circuit
  • FPGA off-the-shelf programmable gate array
  • a general purpose processor may be a microprocessor or any conventional processor or the like.
  • the object physical model ie 3D model
  • object geometric information are used to evaluate the quality of the grab points based on basic physics principles, so as to ensure the rationality of the grab point labeling.
  • domain randomization technology is used to generate a large amount of synthetic data to train the estimation model by means of random texture, random illumination, random camera position, etc. Therefore, the estimation model can bridge the domain gap between synthetic data and real data, and learn the local geometric features of the object, so as to accurately complete the task of estimating the object's grasping point.
  • An embodiment of the present disclosure also provides a method for training an estimation model of an object grasping point, as shown in FIG. 4 , including:
  • Step 310 acquiring training data, the training data including sample images and target capture quality of pixels in the sample images;
  • Step 320 using the sample image as input data, using machine learning to train the estimation model of the object grasping point, during training, according to the predicted grasping quality of the pixel points in the sample image output by the estimation model and the described
  • the difference between target grasp qualities computes the loss.
  • the training method of the estimation model in the embodiment of the present disclosure learns the grasping quality of the pixels in the 2D image, and then selects the optimal grasping point according to the predicted grasping quality of the pixels in the 2D image.
  • the way of grabbing points has better accuracy and stability.
  • the machine learning in the embodiments of the present disclosure may be supervised deep learning, non-deep learning machine learning, and the like.
  • the training data is generated according to the method for generating training data of the object grasping point estimation model described in any embodiment of the present disclosure.
  • the network architecture of the estimation model is shown in Figure 5, including:
  • the backbone network (Backbone) 10 adopts a semantic segmentation network architecture (such as DeepLab, UNet, etc.), and is set to extract features from the input 2D image and depth image;
  • a semantic segmentation network architecture such as DeepLab, UNet, etc.
  • the multi-branch network 20 adopts a multi-task learning network architecture and is configured to perform prediction based on the extracted features, so as to output the predicted capture quality of pixels in the 2D image.
  • the multi-branch network (also referred to as a network head or a detection head) includes:
  • the first branch network 21 learns semantic segmentation information to distinguish the foreground and background, and is configured to output the classification confidence of each pixel in the 2D image, and the classification includes foreground and background;
  • the second branch network 23 learns the grasping quality information of pixels in the 2D image, and is configured to output the predicted grasping quality of pixels classified as foreground determined according to the classification confidence in the 2D image. For example, pixels classified as foreground with confidence greater than a set confidence threshold may be referred to as foreground pixels.
  • This example involves classification, so the training data needs to include classified data.
  • the sample image includes a 2D image and a depth image; both the backbone network 10 and the multi-branch network 20 include depth channels, and the convolutional layers therein may adopt a 3D convolutional structure.
  • the loss of the first branch network 21 is calculated based on the classification loss of all pixels in the 2D image; the loss of the second branch network 23 is based on the predicted capture of some or all pixels classified as foreground The difference between the quality and the target grasping quality is calculated; the loss of the backbone network 10 is calculated according to the total loss of the first branch network 21 and the second branch network 23 .
  • the parameters of each network can be optimized using the gradient descent algorithm until the loss is minimized and the model converges.
  • the depth image can also be randomly block-shaped, such as 64*64 pixels at a time, so that the network can better utilize the structured information in the depth image.
  • the verification data After using the training data to train the above evaluation module for multiple iterations, use the verification data to verify the accuracy of the trained estimation model.
  • the verification data can be generated in the same way as the training data. After the accuracy of the estimation model meets the requirements, the It is estimated that the model is trained well and can be used. If the accuracy does not meet the requirements, continue training.
  • the 2D image and the depth image containing the actual object to be grasped are input, and the predicted grasping quality of the pixels in the 2D image is output.
  • the embodiments of the present disclosure use a multi-task learning framework based on deep learning principles to build a grasping point estimation model, which can effectively solve the problems of high error rate and inability to distinguish adjacent objects in a simple point cloud segmentation scheme.
  • An embodiment of the present disclosure also provides a training device for an estimation model of an object grasping point, referring to FIG. 3 , which includes a processor and a memory storing a computer program, wherein, when the processor executes the computer program, the following is implemented: A method for training an estimation model of an object grasping point described in any embodiment of the present disclosure.
  • the estimation model training method predicts the grasping quality of pixels in a 2D image through pixel-level dense prediction. Do pixel-level foreground and background classification predictions on one branch. In another branch, a grasp quality prediction value, ie predicted grasp quality, may be output for each pixel classified as foreground in the 2D image.
  • Both the backbone network and the branch network of the estimation model in the embodiment of the disclosure include depth channels. At the input end, the depth image containing the depth channel information is input into the backbone network, and then the features learned by the depth channel are fused into the color 2D from the channel dimension direction. The features of the image, and pixel-by-pixel multi-task prediction, can help the estimation model to better handle the grasping point estimation task in the scene where objects to be grasped are stacked.
  • An embodiment of the present disclosure also provides a method for estimating an object grasping point, as shown in FIG. 6 , including:
  • Step 410 acquiring a scene image containing an object to be captured, where the scene image includes a 2D image, or includes a 2D image and a depth image;
  • Step 420 input the scene image into the estimation model of the object grasping point, wherein the estimation model is an estimation model trained by the training method described in any embodiment of the present disclosure;
  • Step 430 Determine the position of the grasping point of the object to be grasped according to the predicted grasping quality of the pixels in the 2D image output by the estimation model.
  • the embodiments of the present disclosure realize camera driving, and scene images of objects to be captured, such as 2D images and depth images, can be captured by depth cameras adapted to various industrial scenes. After acquiring the color 2D image and the depth image from the depth camera, they are cropped and scaled to the image size required by the estimation model input, and then input into the estimation model.
  • determining the position of the grasping point of the object to be grasped according to the predicted grasping quality of the pixels in the 2D image output by the estimation model includes:
  • the obtained candidate grasping points are sorted based on a predetermined rule, and an optimal candidate grasping point is determined as the grasping point of the object to be grasped according to the ranking.
  • the sorting when sorting the obtained grabbing points based on predetermined rules, the sorting may be based on predetermined heuristic rules, and the heuristic rules may be based, for example, on the relative The distance from the camera, whether the pick-up point is in the actual material frame, whether the pick-up point will bring collisions and other condition settings, use these information to sort the candidate grabbing points, and determine the best candidate grabbing point as the one to be grabbed Grab points for objects.
  • An embodiment of the present disclosure also provides a device for estimating the grasping point of an object.
  • a device for estimating the grasping point of an object includes a processor and a memory storing a computer program, wherein, when the processor executes the computer program, it implements any aspect of the present disclosure.
  • the above-mentioned embodiments of the present disclosure are based on the trained estimation model, send the 2D image and the depth image captured by the camera into the estimation model for forward reasoning, and output the prediction and evaluation quality of the pixels in the 2D image. If the number of pixels whose predicted evaluation quality is greater than the set quality threshold exceeds the set number, the set number of pixels with the best predicted capture quality, such as TOP50, TOP100, etc., can be selected. After clustering the selected pixels and calculating one or more class centers, the nearest pixel to the class center in the 2D image (it can be a pixel or a pixel in an area) can be used as The candidate grabbing points. Since the estimation model adopted can achieve better accuracy, the estimation method and device of this embodiment can improve the accuracy of object grasping point estimation, thereby improving the success rate of grasping.
  • An embodiment of the present disclosure also provides a robot vision system, as shown in FIG. 7 , including:
  • the camera 1 is configured to shoot a scene image containing an object to be captured, and the scene image includes a 2D image, or includes a 2D image and a depth image;
  • the control device 2 includes the estimation device of the object grasping point according to claim 20, the control device is configured to determine the position of the grasping point of the object to be grasped according to the scene image captured by the camera ; and, controlling the grabbing action performed by the robot according to the position of the grabbing point;
  • the robot 3 is configured to perform the grasping action.
  • the robot vision system of the embodiments of the present disclosure can improve the accuracy of object grasping point estimation, thereby improving the success rate of grasping.
  • An embodiment of the present disclosure also provides a non-transitory computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the computer program described in any embodiment of the present disclosure can be implemented.
  • the method for generating training data of the object grasping point estimation model described above, or the method for training the object grasping point estimation model described in any embodiment of the present disclosure, or the object grasping method described in any embodiment of the present disclosure Estimation method for taking points.
  • Computer-readable media may include computer-readable storage media that correspond to tangible media such as data storage media, or communication media including any medium that facilitates transfer of a computer program from one place to another, eg, according to a communication protocol.
  • a computer-readable medium may generally correspond to a non-transitory tangible computer-readable storage medium or a communication medium such as a signal or carrier wave.
  • Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the techniques described in this disclosure.
  • a computer program product may comprise a computer readable medium.
  • such computer-readable storage media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk or other magnetic storage, flash memory, or may be used to store instructions or data Any other medium that stores desired program code in the form of a structure and that can be accessed by a computer.
  • any connection could also be termed a computer-readable medium. For example, if a connection is made from a website, server or other remote source for transmitting instructions, coaxial cable, fiber optic cable, dual wire, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium.
  • disk and disc include compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk, or blu-ray disc, etc. where disks usually reproduce data magnetically, while discs use lasers to Data is reproduced optically. Combinations of the above should also be included within the scope of computer-readable media.
  • processors can be implemented by one or more processors such as one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuits.
  • DSPs digital signal processors
  • ASICs application specific integrated circuits
  • FPGAs field programmable logic arrays
  • processors may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein.
  • the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Also, the techniques may be fully implemented in one or more circuits or logic elements.
  • the technical solutions of the embodiments of the present disclosure may be implemented in a wide variety of devices or devices, including a wireless handset, an integrated circuit (IC), or a set of ICs (eg, a chipset).
  • IC integrated circuit
  • Various components, modules, or units are described in the disclosed embodiments to emphasize functional aspects of devices configured to perform the described techniques, but do not necessarily require realization by different hardware units. Rather, as described above, the various units may be combined in a codec hardware unit or provided by a collection of interoperable hardware units (comprising one or more processors as described above) in combination with suitable software and/or firmware.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Geometry (AREA)
  • Computer Graphics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

La présente divulgation concerne un procédé, un appareil et un système d'estimation de point de saisie d'objet, un procédé, un appareil et un système d'entraînement de modèle et un procédé, un appareil et un système de génération de données. Le procédé de génération de données consiste à : échantillonner un point de saisie sur la base d'un modèle 3D d'un objet échantillon et évaluer la qualité de saisie du point échantillonné ; et rendre une scène de simulation, dans laquelle un modèle 3D d'un premier objet est chargé, de manière à générer une image échantillon pour l'entraînement et une qualité de saisie cible d'un point de pixel dans celle-ci. L'image d'échantillon et la qualité de saisie cible sont prises en tant que données d'entraînement pour entraîner un modèle d'estimation de point de saisie d'objet et le modèle entraîné est utilisé pour estimer un point de saisie d'objet. Les modes de réalisation de la présente divulgation permettent de réaliser un étiquetage automatique d'une image d'échantillon, de générer efficacement des données d'entraînement de qualité élevée et d'améliorer la précision d'estimation d'un point de préhension.
PCT/CN2022/135705 2021-12-29 2022-11-30 Procédé, appareil et système d'estimation de point de saisie d'objet, procédé, appareil et système d'entraînement de modèle et procédé, appareil et système de génération de données WO2023124734A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111643324.3A CN116416444B (zh) 2021-12-29 2021-12-29 物体抓取点估计、模型训练及数据生成方法、装置及***
CN202111643324.3 2021-12-29

Publications (1)

Publication Number Publication Date
WO2023124734A1 true WO2023124734A1 (fr) 2023-07-06

Family

ID=86997564

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/135705 WO2023124734A1 (fr) 2021-12-29 2022-11-30 Procédé, appareil et système d'estimation de point de saisie d'objet, procédé, appareil et système d'entraînement de modèle et procédé, appareil et système de génération de données

Country Status (2)

Country Link
CN (1) CN116416444B (fr)
WO (1) WO2023124734A1 (fr)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116841914A (zh) * 2023-09-01 2023-10-03 星河视效科技(北京)有限公司 一种渲染引擎的调用方法、装置、设备及存储介质
CN117656083B (zh) * 2024-01-31 2024-04-30 厦门理工学院 七自由度抓取姿态生成方法、装置、介质及设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108058172A (zh) * 2017-11-30 2018-05-22 深圳市唯特视科技有限公司 一种基于自回归模型的机械手抓取方法
CN109598264A (zh) * 2017-09-30 2019-04-09 北京猎户星空科技有限公司 物体抓取方法及装置
US20200061811A1 (en) * 2018-08-24 2020-02-27 Nvidia Corporation Robotic control system
CN111553949A (zh) * 2020-04-30 2020-08-18 张辉 基于单帧rgb-d图像深度学习对不规则工件的定位抓取方法

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2579462B (en) * 2017-08-24 2022-02-16 Toyota Motor Europe System and method for label augmentation in video data
JP2020532440A (ja) * 2017-09-01 2020-11-12 カリフォルニア大学The Regents of the University of California オブジェクトをロバストに把持し、ターゲティングするためのロボットシステムおよび方法
CN108818586B (zh) * 2018-07-09 2021-04-06 山东大学 一种适用于机械手自动抓取的物体重心检测方法
CN109159113B (zh) * 2018-08-14 2020-11-10 西安交通大学 一种基于视觉推理的机器人作业方法
CN109523629B (zh) * 2018-11-27 2023-04-07 上海交通大学 一种基于物理仿真的物体语义和位姿数据集生成方法
CN109658413B (zh) * 2018-12-12 2022-08-09 达闼机器人股份有限公司 一种机器人目标物体抓取位置检测的方法
CN111127548B (zh) * 2019-12-25 2023-11-24 深圳市商汤科技有限公司 抓取位置检测模型训练方法、抓取位置检测方法及装置
CN111161387B (zh) * 2019-12-31 2023-05-30 华东理工大学 堆叠场景下合成图像的方法及***、存储介质、终端设备
CN212553849U (zh) * 2020-05-26 2021-02-19 腾米机器人科技(深圳)有限责任公司 一种物件抓取机械手
CN111844101B (zh) * 2020-07-31 2022-09-06 中国科学技术大学 一种多指灵巧手分拣规划方法
CN113034526B (zh) * 2021-03-29 2024-01-16 深圳市优必选科技股份有限公司 一种抓取方法、抓取装置及机器人
CN113297701B (zh) * 2021-06-10 2022-12-20 清华大学深圳国际研究生院 多种类工业零件堆叠场景的仿真数据集生成方法及装置
CN113436293B (zh) * 2021-07-13 2022-05-03 浙江大学 一种基于条件生成式对抗网络的智能抓取图像生成方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109598264A (zh) * 2017-09-30 2019-04-09 北京猎户星空科技有限公司 物体抓取方法及装置
CN108058172A (zh) * 2017-11-30 2018-05-22 深圳市唯特视科技有限公司 一种基于自回归模型的机械手抓取方法
US20200061811A1 (en) * 2018-08-24 2020-02-27 Nvidia Corporation Robotic control system
CN111553949A (zh) * 2020-04-30 2020-08-18 张辉 基于单帧rgb-d图像深度学习对不规则工件的定位抓取方法

Also Published As

Publication number Publication date
CN116416444A (zh) 2023-07-11
CN116416444B (zh) 2024-04-16

Similar Documents

Publication Publication Date Title
US11763550B2 (en) Forming a dataset for fully-supervised learning
WO2023124734A1 (fr) Procédé, appareil et système d'estimation de point de saisie d'objet, procédé, appareil et système d'entraînement de modèle et procédé, appareil et système de génération de données
Depierre et al. Jacquard: A large scale dataset for robotic grasp detection
CN109584298B (zh) 面向机器人自主物体拾取任务的在线自学习方法
Marton et al. Hierarchical object geometric categorization and appearance classification for mobile manipulation
WO2021113408A1 (fr) Synthèse d'images à partir de modèles 3d
CN111906782B (zh) 一种基于三维视觉的智能机器人抓取方法
CN110929795B (zh) 高速焊线机焊点快速识别与定位方法
CN115816460B (zh) 一种基于深度学习目标检测与图像分割的机械手抓取方法
CN110136130A (zh) 一种检测产品缺陷的方法及装置
Wada et al. Instance segmentation of visible and occluded regions for finding and picking target from a pile of objects
CN117124302B (zh) 一种零件分拣方法、装置、电子设备及存储介质
CN113034575A (zh) 一种模型构建方法、位姿估计方法及物体拣取装置
CN113894058A (zh) 基于深度学习的品质检测与分拣方法、***及存储介质
Madessa et al. Leveraging an instance segmentation method for detection of transparent materials
CN111240195A (zh) 基于机器视觉的自动控制模型训练、目标物回收方法及装置
CN115359119A (zh) 一种面向无序分拣场景的工件位姿估计方法及装置
Pattar et al. Automatic data collection for object detection and grasp-position estimation with mobile robots and invisible markers
Fang et al. A pick-and-throw method for enhancing robotic sorting ability via deep reinforcement learning
Sedlar et al. Imitrob: Imitation learning dataset for training and evaluating 6d object pose estimators
CN111783537A (zh) 一种基于目标检测特征的两阶段快速抓取检测方法
Martinson Interactive training of object detection without imagenet
Keaveny Experimental Evaluation of Affordance Detection Applied to 6-DoF Pose Estimation for Intelligent Robotic Grasping of Household Objects
Yang et al. Integrating Deep Learning Models and Depth Cameras to Achieve Digital Transformation: A Case Study in Shoe Company
TWI845797B (zh) 物件辨識裝置及物件辨識方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22913989

Country of ref document: EP

Kind code of ref document: A1