CN114882214A - Method for predicting object grabbing sequence from image based on deep learning - Google Patents

Method for predicting object grabbing sequence from image based on deep learning Download PDF

Info

Publication number
CN114882214A
CN114882214A CN202210344226.8A CN202210344226A CN114882214A CN 114882214 A CN114882214 A CN 114882214A CN 202210344226 A CN202210344226 A CN 202210344226A CN 114882214 A CN114882214 A CN 114882214A
Authority
CN
China
Prior art keywords
feature
sequence
objects
grabbing
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210344226.8A
Other languages
Chinese (zh)
Inventor
林梓尧
贾奎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cross Dimension Shenzhen Intelligent Digital Technology Co ltd
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202210344226.8A priority Critical patent/CN114882214A/en
Publication of CN114882214A publication Critical patent/CN114882214A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Probability & Statistics with Applications (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method for predicting object grabbing sequence from an image based on deep learning, which comprises the following steps: 1) acquiring disordered captured scene pictures; 2) detecting a detection frame and a segmentation mask of all objects to be grabbed from the picture by using a depth segmentation network; 3) and pooling the characteristic map areas corresponding to different segmentation masks to form characteristic vectors with equal length. Meanwhile, performing feature pooling on the global feature map to form a global feature vector; 4) connecting the feature vectors corresponding to all the feature masks to a global feature vector to form the feature vector of each object; and sending the objects into a special cyclic neural network in a disordered way, wherein the cyclic neural network outputs the grabbing sequence of the objects. The method can predict a reasonable grabbing sequence in a complex stacked object scene, can accelerate the grabbing speed of the mechanical arm to the object, and can reduce collision. Object grabbing in a reasonable grabbing order is crucial in industrial scenarios.

Description

Method for predicting object grabbing sequence from image based on deep learning
Technical Field
The invention belongs to the field of computer vision. In particular to a method for predicting object grabbing sequence from images based on deep learning.
Background
Under the transition from manufacturing to intelligent manufacturing, the attention of industry and computer vision research has been gradually drawn on how to use AI technology to assist in building intelligent functions. In order to be able to gradually utilize AI technology in traditional manufacturing to replace the tedious and customization-demanding links therein, many enterprises seek help from computer vision. In the industrial manufacturing process, some static task processes such as defect detection and volume measurement are solved by using a visual AI mode. In some industrial tasks involving robot gripping objects, such as unordered gripping and loading and unloading, there have been some methods that attempt to perform object gripping by using AI to restore a gripping posture of an object as an input of the robot.
However, in real-world scenarios, the interaction of the robot with the environmental objects does not depend solely on a single target object, but is also influenced by some other target instances and other scene objects. For example, in a stacked scene, it may be desirable to first grab an upper object and then grab an underlying object. To reduce mechanical arm collisions, existing methods attempt to calculate a path to avoid a collision using a mechanical arm collision avoidance algorithm. The reasonable object grabbing sequence is presumed from the scene picture by utilizing the vision technology, so that the collision can be reduced in the actual grabbing process, and the whole grabbing process can be accelerated.
In the prior art, "grass Planning Based On Scene approximation In unrestrained Environment," plan the capture of a Scene of a basic geometric body, so that the capture can avoid collision of Scene objects. The grabbing plan is performed by sequencing the grabbed scores of the objects, has limitation and is not suitable for any object.
Disclosure of Invention
Aiming at the problem of how to predict reasonable object grabbing sequence under the condition of object stacking, the invention provides a method for predicting the object grabbing sequence under the scene of object grabbing stacking, and also provides a means for generating training data so as to support end-to-end training of an algorithm.
The invention is realized by at least one of the following technical schemes.
A method of predicting an object grabbing order from an image based on deep learning, comprising the steps of:
step 1, detecting all foreground objects in an image by using a segmentation network, simultaneously outputting segmentation masks for all foreground objects, and reserving a global feature map of the image and an object feature map before outputting the masks;
step 2, cutting out a characteristic diagram of a mask position from an object characteristic diagram by using a segmentation mask of the object, pooling to obtain a local characteristic vector of the object, pooling the global characteristic diagram to obtain a global characteristic vector, and connecting the global characteristic vector to the local characteristic vector of each object to obtain an object characteristic of each object;
step 3, using a cyclic neural network as an encoder, and sequentially sending object feature vector sequences of all objects into the encoder to finally obtain a feature vector with a fixed length;
and 4, taking the feature vector coded in the step 3 as a hidden feature, randomly generating and generating an input vector, inputting the hidden feature vector into a grabbing sequence predictor, receiving a fixed-length input vector and a hidden feature obtained in the previous step by each step of the grabbing sequence predictor, outputting an index, wherein the index points to a certain feature in an object feature sequence, an object corresponding to the feature is a grabbed object predicted in the current step, the step number of cyclic prediction is the number of detected objects, and the predicted index sequence is the grabbing sequence of the objects finally.
Further, the segmentation network comprises two classifiers for separating out foreground objects and background objects.
Further, the step 2 comprises the following steps:
21. mask for detecting all foreground objects by using segmentation network i I ∈ 1,2, … N, where N is the number of objects detected by the segmentation network in the current picture, and the mask of the foreground object is used to mask the previous layer of feature layer of the predicted object mask, and after the mask is performed, feature pooling is performed, and then a linear network is used to convert the number of feature channels into the local feature of the object with fixed length
Figure BDA0003580427040000021
To generate respective local feature vectors for the respective objects;
22. directly performing feature pooling on the global feature layer with the most complete resolution, and converting the global features into the global features f of the scene by using another linear network global
23. Connecting local features and global features of an object into features of the object
Figure BDA0003580427040000022
Further, each cycle of the encoder uses an object feature as an input and correspondingly outputs a hidden feature
Figure BDA0003580427040000023
Hidden feature to be encoded last
Figure BDA0003580427040000024
Feature encoding as a sequence of features of an object
Figure BDA0003580427040000025
Wherein,
Figure BDA0003580427040000026
is a hidden feature of the last encoded output,
Figure BDA0003580427040000027
is a characteristic of the object, N is the total number of objects,
Figure BDA0003580427040000028
is the result after encoding the characteristics of all objects.
Further, step 4 comprises the steps of:
41. using an LSTM recurrent neural network as a grabbing sequence predictor, and using the feature vector coded in the step 3 as a first hidden feature
Figure BDA0003580427040000029
Randomly generating a first input vector
Figure BDA00035804270400000210
m is a fixed input feature length, and the hidden feature and the input vector are input into a capture sequence predictor;
the fetch order predictor accepts one hidden feature per step
Figure BDA00035804270400000211
And an input vector
Figure BDA00035804270400000212
And outputs an output vector
Figure BDA0003580427040000031
Where j represents the current position in the jth loop that is predicting the jth grabbing target,
Figure BDA0003580427040000032
the feature corresponding to the capture target predicted in the last cycle indicates the first step of starting prediction when j is 1, and at this time
Figure BDA0003580427040000033
Generating a random vector as input;
42. calculating an index from the feature sequence of the object by using a mechanism in PointerNet for the feature vector output by each step of the grabbing sequence predictor, and taking the object corresponding to the index as the grabbed object of the step;
43. and (4) cycling the steps 41 to 42 for h times, wherein h is the number of the detected objects, so as to obtain an index sequence with the length of h, and the index sequence is the object grabbing sequence.
Further, the marked data is data with grabbing sequence marks in a large batch automatically generated in a simulation and rendering mode, and a generating method of the grabbing sequence marks uses a heuristic algorithm, and specifically comprises the following steps:
51. starting the construction of a scene, randomly importing n objects into a simulator, copying m instances of each object, and randomly generating the number of the objects and instance data in each scene construction;
52. dividing a p multiplied by p grid by taking the world center of the simulator as an origin, wherein the size of each square of the grid is the average diameter of an introduced object plus a fixed constant d;
53. placing an object at the edge of the grid, randomly selecting one example from the object examples each time, placing the object at the center of the corresponding grid, lifting the object along the z-axis, and then randomly translating the object in the xy-plane, wherein a texture is randomly given to the example each time;
54. repeating step 53 until the grid is fully placed, and then continuing to divide the grid of (p-2) × (p-2) with the same center and the same size;
56. repeating steps 53-54 until three layers are placed; if the number of the examples is insufficient or other conditions are not met, stopping scene construction and entering the next stage, wherein the finally generated scene presents the shape of a suspended pyramid pattern;
57. taking the world coordinate origin as the origin of a sphere, generating a hemispherical surface in the positive direction of a z axis, and uniformly sampling o positions on the spherical surface to assume a virtual camera;
58. gradually rendering the sampled camera position points, wherein the camera is over against the world origin every time, and meanwhile, randomly disturbing the lamplight and the material on the surface of the object in each rendering, and then rendering an image;
59. and filtering out objects with the shielding ratio exceeding a set ratio.
Further, the marked data is obtained by acquiring RGB images of a stacked scene at multiple angles by using a camera, detecting an object frame of the object by using a trained segmentation network, and marking the grabbing sequence of the object in a manual marking mode.
Furthermore, the image synthesized by using the simulation rendering mode is used as the data of the marked data, and the data synthesized by using the simulation rendering mode is automatically marked without additional manual marking.
Furthermore, the segmentation network can generate a feature map of the image while obtaining a segmentation mask of the object in the image, and meanwhile, the segmentation network can detect all foreground objects in the image, and meanwhile, the PointerNet network used in the capturing sequence prediction stage performs cyclic prediction on any number of objects, and is not limited to a certain fixed number of objects. Therefore, the invention can effectively predict the grabbing sequence of any number of objects to be grabbed in the scene.
Further, when the object is subjected to feature extraction, the image features of the area where the object is located are used as the features of the object, and a global feature is connected at the same time, so that the obtained features have local and global information at the same time, and sufficient information quantity is provided for the follow-up capture sequence prediction.
Compared with the prior art, the invention has the following beneficial effects:
the invention is based on a deep learning method, uses global information and local information to construct the relation between an object and a scene, and infers a reasonable object grabbing sequence from the relation. The object grabbing of the stacked scene is performed by utilizing the grabbing sequence, so that the collision can be reduced, and the grabbing process is accelerated. The invention is not limited to simple basic geometries, but applies to any object. Meanwhile, the invention uses an end-to-end algorithm, does not need to carry out more searches and simplifies the whole capturing process.
Drawings
FIG. 1 is a flowchart of a method for predicting an object grabbing sequence from an image based on deep learning according to the present invention;
FIG. 2 is an architecture diagram of a method for predicting an object capture sequence from an image based on deep learning according to the present invention;
FIG. 3 is a side view of a build scenario of the present embodiment;
fig. 4 is a schematic diagram of a picture generated in the present embodiment.
Detailed Description
In order that those skilled in the art will better understand the disclosure, the invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
A method of predicting an object grabbing order from an image based on deep learning, comprising the steps of:
step 1, foreground object detection and segmentation mask prediction: all target objects, i.e. foreground objects, are considered as one type of object and background-independent objects are considered as another type of object. Detecting all foreground objects in the image by using a segmentation network, simultaneously outputting a segmentation mask to all foreground objects, and reserving a global feature map of the image global And outputting the object feature map before the mask
Figure BDA0003580427040000041
7. For the use of a segmentation network to separate foreground objects from background objects. The split network is modified based on the existing split network. Mainly changing the training method. The invention only needs to detect all interested foreground objects without distinguishing specific object types. Therefore, the classification head of the segmentation network is transformed into a two-classifier for separating foreground and background objects. The reconstructed segmentation network can be trained on the existing segmentation data set, and the difference is that all similar foreground objects in the segmentation data set are divided into foreground classes, and similar background objects are divided into background classes. I.e. the number of object classes in the data set used for training is 2.
Step 2, generation of foreground object feature vectors: after obtaining the segmentation masks of all foreground objects and the feature maps of the whole image, the segmentation masks of the objects are used to obtain the feature maps of the objects
Figure BDA0003580427040000051
Middle cut mask bitThe placed feature maps are pooled to obtain the local feature vectors of the object, and the global feature map is processed global Performing feature pooling to obtain global feature vector, and connecting the global feature vector to the local feature vectors of the objects to obtain object features of each object
Figure BDA0003580427040000052
Feature vector generation for foreground objects combines both local and global features. The method comprises the following specific steps:
21. mask for detecting all foreground objects by using segmentation network i I is equal to 1,2, … N, and uses the mask to mask the previous layer of characteristic layer of the forecast object mask, after the mask, the characteristic pooling is carried out, after the pooling, the characteristic channel number is converted into the object local characteristic with fixed length by a layer of linear network
Figure BDA0003580427040000053
Thus generating respective local feature vectors for each object; here N refers to the number of objects detected by the segmentation network from the image.
22. Directly performing feature pooling on the global feature layer with the most complete resolution, and converting the global features into the global features f of the scene in a fixed length by using another linear network global
23. Connecting local features and global features of an object into features of the object
Figure BDA0003580427040000054
The local features of the object are added with the global features of the image, so that the relative position relation of the object relative to the scene and other objects can be better captured by the capture sequence prediction network.
Step 3, feature coding: and a cyclic neural network is used as an encoder, and the object feature vector sequences of all objects are sequentially sent to the encoding network to finally obtain a feature vector with a fixed length. The sequence of features of the object is unordered here, so it is not important what order is chosen to be input into the coding network. The reason for choosing a recurrent neural network as the encoder here is that different scenes contain different numbers of instances of foreground objects. Different numbers of objects can be accommodated using a recurrent neural network.
The feature encoder in step 3 encodes the object feature sequence using a recurrent neural network. Object feature sequence obtained from step 2
Figure BDA0003580427040000055
That is, the object feature sequence represents a sequence of object features corresponding to all objects. The feature encoder uses an object feature as input and outputs a hidden feature at each step
Figure BDA0003580427040000056
Hidden feature to be encoded last
Figure BDA0003580427040000057
Feature encoding as a sequence of object features
Figure BDA0003580427040000061
Wherein,
Figure BDA0003580427040000062
is a hidden feature of the last encoded output. Hidden feature to be encoded last
Figure BDA0003580427040000063
Feature encoding as a sequence of object features
Figure BDA0003580427040000064
Figure BDA0003580427040000065
Where N is the total number of objects and, therefore,
Figure BDA0003580427040000066
is the result after encoding the characteristics of all objects.
Step 4, capturing sequence prediction: and 3, taking the feature vector coded in the step 3 as a first hidden feature, randomly generating and generating a first input vector, inputting the first hidden feature and the first input vector into a grabbing sequence predictor, wherein the grabbing sequence predictor is also a recurrent neural network, each step of the grabbing sequence predictor receives a fixed-length input feature vector and a hidden feature vector obtained in the previous step, and selects one object from the object feature sequences and outputs an index of the object feature sequence as the object selected in the current step, the step number of the recurrent prediction is the number of the detected objects, and finally, the index sequence of the object sequence is predicted, and the sequence is the grabbing sequence of the objects. The method comprises the following specific steps:
41. a recurrent neural network is used as a fetch order predictor. The predictor accepts one hidden feature per step
Figure BDA0003580427040000067
And an input vector
Figure BDA0003580427040000068
Where j represents the current position in the jth loop, i.e., the jth fetch target is currently predicted.
Figure BDA0003580427040000069
The characteristics corresponding to the captured target predicted by the previous cycle. When j is 1, i.e. the first step of the prediction has just started, this time
Figure BDA00035804270400000610
I.e. a random vector is generated as input. Simultaneously outputting an output vector
Figure BDA00035804270400000611
More specifically, in order to be able to output one index step by step from the object feature sequence, the present embodiment uses PointerNet as a grab order predictor.
42. Using the sequence features after step 4 feature coding as claimed in claim 1 as the first hidden features of the object grabbing order predictor
Figure BDA00035804270400000612
At the same time, a simple vector is generated as the input feature of the first step
Figure BDA00035804270400000613
m is the fixed input feature length.
43. For the feature vector output by each step of the predictor, an index is calculated from the feature sequence of the object in a manner similar to the attention mechanism, using the mechanism in PointerNet. And taking the object corresponding to the index as the grabbed object of the step. Therefore, the PointerNet is adopted as the capture sequence prediction network, and the purpose is 1), the number of objects is variable, and different scene images contain different numbers of objects. 2) The output of the network is a discrete number, each representing the object that should be grabbed at that step.
44. The above steps are cycled h times, h being the number of detected objects. Thus, an index sequence with the length h is obtained, and the index sequence is the object grabbing sequence.
Step 5, training the segmentation network and the recurrent neural network by using automatically synthesized labeled data: using simulation and rendering combined data to train mass data and using real data to optimize: in order to reduce the cost of manual marking, the invention also provides a method for automatically generating a large batch of training data with grabbing sequential marks by using the simulation-rendering synthetic data. A simulation engine is used for constructing a virtual unordered object grabbing stacking scene, wherein objects are placed in a gradually overlapping mode, so that the marks of grabbing sequences can be automatically acquired; and then, rendering a large amount of training data with a grabbing sequence at a plurality of angles by using a renderer and randomizing parameters such as illumination, materials and the like. And acquiring pictures by using an RGB camera from a real scene through other data, and manually marking to obtain a grabbing sequence which accords with human intuition. Meanwhile, for the trained basic network, the real training data obtained by a manual marking method is optimized, so that the network is better generalized to a real scene picture.
The acquisition of training data comes from three aspects. One is to automatically generate a large batch of data with grabbing sequential labels by using a simulation and rendering mode. The method for generating the label of the grabbing sequence uses a simple heuristic algorithm, and comprises the following specific steps:
51. enabling construction of a scene
52. And randomly importing n objects into the simulator, and copying m instances from each object. The number of objects and instance data here are randomly generated each time the scene is constructed.
53. A p x p grid is divided with the world center of the simulator as the origin. The size of each square of the grid is the average diameter of the lead-in object plus a fixed constant d.
54. The object is placed from left to right from top to bottom, one example is randomly selected from the object examples each time, the object is placed at the center of the corresponding grid, the object is lifted for a certain distance, for example, 5cm, along the z-axis, the distance is a parameter needing to be adjusted, generally, a distance can be randomly sampled from 5-10cm, and then random translation within a certain radius is performed on the xy plane. Randomly assigning a texture to the instance at a time;
55. repeat step 4 until the grid is fully set. The grid of (p-2) × (p-2) is then continued with the same center and the same size.
56. Repeating steps 4-5 until three layers are placed. If the number of instances is insufficient or other conditions are not met, the scene build is aborted and the next phase is entered. The resulting scene exhibits a shape of a suspended pyramid pattern. A side view of one constructed scene in this embodiment is given in fig. 3.
57. The world coordinate origin is taken as the origin of a sphere, a certain distance is taken as a radius, the radius is generally related to the distance between a camera and a target scene in an actual scene, the distance between the position of the camera and the center position of the scene in an application scene can be set, the distance is taken as the center point, the plus and minus extension is 0.5 meter, and a distance is randomly sampled in the range to be taken as the final used radius. A hemisphere is generated in the positive direction of the z axis, and o positions are uniformly sampled on the sphere to assume a virtual camera.
58. And gradually rendering the sampled camera position points, wherein the camera is over against the world origin every time, and meanwhile, certain random disturbance is performed on the light and the material of the object surface every time of rendering, and then an image is rendered. For example, a random number of 0-1 is sampled before each rendering, and if the number is less than 0.5, the probability is half. At this time, a material is randomly selected from the material bag and is attached to the surface of the object through the simulator. The texture packages may be downloaded directly from the web, such as the cctexture texture package.
59. And filtering out objects with shielding exceeding a certain proportion. By projecting the object separately, a complete segmentation Mask of the object is obtained full Calculating the area ratio of the segmented mask and the complete mask which can be the surface of the object in the actual rendering map:
Figure BDA0003580427040000081
wherein Area () refers to the Area occupied by the split Mask, Mask visib The visualization segmentation mask which is blocked by other objects is removed from the actual rendering image of the object. The occlusion threshold can be set to 0.5 when p mask If < 0.5, the object is removed because the object is too much occluded by other objects.
Because the object is placed in a successive free fall mode, the object placed later can be considered to be grabbed first, and then the grabbing sequence of the object is automatically marked. Meanwhile, the boundary box and the segmentation mask of the object can be calculated by using the mode generated by the renderer. Thus, the data obtained using this approach can be used to train the entire neural network in the method of the present invention.
Another way to acquire training data is to use a real camera to acquire RGB images of a stacked scene at multiple angles, detect an object frame of an object using a segmentation network trained in advance, and label the grasping order of the object by a manual labeling way.
Example 2
As shown in fig. 1, the method for predicting the object grabbing sequence from the image based on the deep learning of the present invention includes the following steps:
s1, foreground object detection and segmentation mask generation: this step detects the masks of all foreground objects from the image, and in the process, obtains the feature map of the objects and the global feature map of the scene image.
Specifically, any object segmentation neural network in the prior art can be used for generating the segmentation mask, and MaskRCNN is adopted as the segmentation network in the embodiment to predict the segmentation mask and generate the feature map. The split network in fig. 2 is illustrated using the framework of MaskRCNN.
The method is characterized in that the method does not specifically classify the objects, but classifies all the objects in the image into a scene object class and a background object class. The background object class mainly comprises a desktop, a ground, a background frame for placing an object and the like. Thus, using an existing segmented neural network in the present method only requires setting the output class of the segmented network to 2. Meanwhile, the class ID of the object at the time of training is adjusted to the foreground (1) or the background (0).
S2, generating local features and global features of the object, connecting the local features and the global features to generate the object features: in order to enable the capture sequence prediction network to perceive the relative position relation of each object relative to other objects and scenes, the method fuses the global features and the local features of the objects in the process of generating the object features.
Specifically, for the local features of the object, the feature map before the object segmentation mask is output as the feature source, and the segmentation mask covering the object is covered, and the mask covered part is extracted. This step is illustrated in label 1 in fig. 2. The mask covered feature area is feature pooled and then the pooled feature vectors are mapped into a fixed length feature space using a one-dimensional convolution or linear layer. The embodiment maps the features to the feature space with the dimension of 256 and the local features of the object
Figure BDA0003580427040000091
Different dimensions of feature space may be suitable for different applicationsA complex scenario, therefore 256 is not a strictly required number here. Other numbers are possible.
The global features of the image are illustrated in label 2 in fig. 2. Global pooling is performed directly from the most complete resolution feature map of the image, and common pooling operations such as mean pooling and maximum pooling may be used. This example uses maximum pooling. And performing feature mapping on the pooled global features to a fixed-length feature space. The global features are still mapped to feature space with dimension 256 in this embodiment. I.e. f global ∈R 256 .
Local characteristics of each object
Figure BDA0003580427040000092
Connecting global features to obtain object features
Figure BDA0003580427040000093
Figure BDA0003580427040000094
In this embodiment, the object feature is a feature vector with a length of 512. Wherein f is global The global feature vector obtained in the previous step.
S3, carrying out feature coding on the object feature sequence by using a coder to generate the synthesized features of the object feature sequence: deriving object feature sequences from previous steps
Figure BDA0003580427040000095
Where N is the number of foreground objects detected. And coding the feature sequence by using a coder to obtain a feature vector fusing local and global.
Further, the encoder herein uses a recurrent neural network. The recurrent neural networks are used in order to be able to adapt to different object numbers. At the same time, the recurrent neural network can better encode content of the nature of the band sequence. The encoder herein may use variations of various recurrent neural networks. In this embodiment a single layer LSTM network is used as the encoder. The object signature sequence here is virtually disordered. At each cycle step of the encoder, a feature vector which is not selected yet is randomly selected from the feature sequence to be used as the current step input of the encoder. This is repeated N times, N being the number of objects. And using the hidden features output in the Nth step as the synthesized features of the object feature sequence.
S4, predicting the object grabbing sequence: in order to be able to pick the objects to be grabbed step by step from the characteristic sequence of the objects and finally recover an object grabbing sequence, the grabbing sequence predictor uses a special recurrent neural network. More specifically, the present embodiment uses PointerNet as a sequential predictor. The predictor receives one hidden feature per cycle step
Figure BDA0003580427040000096
And an input feature
Figure BDA0003580427040000097
And using the output characteristics of each step
Figure BDA0003580427040000098
With object feature sequences
Figure BDA0003580427040000099
And calculating an object selection probability distribution vector, and selecting an index with the highest probability from the object selection probability distribution vector as the object output of the step. The calculation method of the probability distribution vector comprises the following steps:
1.
Figure BDA00035804270400000910
2.p j =softmax(u j ),p j ∈R N
in the above calculation, v T 、W 1 、W 2 Are all network learnable parameters.
Figure BDA0003580427040000101
Is the correlation coefficient of the feature output of each step of the predictor to the object feature sequence. PredictionEach step of the output of the device can calculate the correlation coefficient with all the object features, namely the whole object feature sequence, and then the correlation coefficient becomes a correlation coefficient vector. Wherein p is j Is the probability distribution calculated from the correlation coefficient vector using the softmax function.
Figure BDA0003580427040000102
Is referred to as being in p j I.e. the ith value of
Figure BDA0003580427040000103
Representing the predicted probability of grabbing object i for the jth cycle step. N is the number of objects in the image, therefore, p j Is a vector of length N, i.e. p j ∈R N From p j Index with the highest probability of selection
Figure BDA0003580427040000104
As the object of the step selection. The index is added to the pool of indexes that have already been selected so that the next cycle does not repeat the selection of the index.
S5, performing network training of large-scale data by using the automatically synthesized labeled data: existing data sets comprising an order of object grabbing are small and therefore it is also part of the invention how to obtain sufficient training data. Tagging by hand is a common way of acquiring data. In this embodiment, a step of obtaining data by manual annotation is as follows
a) Within a working range, usually a material frame or a working table, 5-20 instances of 2-3 objects are randomly placed. Different angles in the top assume that the camera is capturing images. 10-30 images can be acquired for each scene.
b) Foreground and background object detection and segmentation is performed using segmentation networks trained on other data sets.
c) Manually screening more accurate detection frames, and simultaneously manually optimizing the segmentation mask
d) And marking the grabbing sequence of the detected objects by using human expert knowledge.
Further, the method uses a simulation and rendering mode to generate larger-scale training data, and the specific steps are as follows
1) Enabling construction of a scene
2) And randomly importing n objects into the simulator, and copying m instances from each object. The number of objects and instance data here are randomly generated each time the scene is constructed. In this embodiment, n is sampled from 2-5 and m is sampled from 1-10
3) A p x p grid is divided with the world center of the simulator as the origin. In this example, p is randomly chosen from 3-7 at a time. The size of each square of the grid is the average diameter of the lead-in object plus a fixed constant d, which in this embodiment is set to 5 cm.
4) And placing the object from left to right from top to bottom, randomly selecting one example from the object examples each time, placing the object at the center of the corresponding grid, and lifting the object for a certain distance along the z-axis, wherein the lifting distance of the example is randomly sampled from 3-8 cm. Then make random translations within a certain radius in the xy-plane. Each time randomly assigned a texture to the instance
5) Repeating step 4) until the grid is fully placed. The grid of (p-2) × (p-2) is then continued with the same center and the same size.
6) Repeating steps 4) -5) until three layers are placed. If the number of instances is insufficient or other conditions are not met, the scene build is aborted and the next phase is entered. The resulting scene exhibits a shape of a suspended pyramid pattern. A side view of one constructed scene in this embodiment is given in fig. 3.
7) The world coordinate origin is used as the origin of a sphere, a certain distance is used as a radius, a hemispherical surface is generated in the positive direction of the z axis, and o positions are uniformly sampled on the spherical surface to be used for assuming a virtual camera. Here, o is not a constant value and may generally be 50 to 200.
8) And gradually rendering the sampled camera position points, wherein the camera is over against the world origin every time, and meanwhile, certain random disturbance is performed on the light and the material of the object surface every time of rendering, and then an image is rendered.
9) And filtering out objects with shielding exceeding a certain proportion. In this embodiment, objects that block more than 50% are filtered out. Fig. 4 gives an example of a generated picture. The numbers on each box represent the order in which the objects are grabbed within that box.
10) Because the object is placed in a successive free fall mode, the object placed later can be considered to be grabbed first, and then the grabbing sequence of the object is automatically marked. Meanwhile, the boundary box and the segmentation mask of the object can be calculated by using the mode generated by the renderer. Thus, the data obtained using this approach can be used to train the entire neural network in the method of the present invention.
Example 3
As shown in fig. 1, the method for predicting the object grabbing sequence from the image based on the deep learning of the present invention includes the following steps:
s1, foreground object detection and segmentation mask generation: this step detects the masks of all foreground objects from the image, and in the process, obtains the feature map of the objects and the global feature map of the scene image.
Specifically, the segmentation mask may be generated by using any object segmentation neural network in the prior art, and the MaskRCNN object segmentation method is adopted in this embodiment. The split network in fig. 2 is illustrated using the framework of MaskRCNN. S2, erecting a camera and constructing a plurality of real scenes, and acquiring images through different camera positions for each scene. Approximately 10-30 scenes were constructed, each scene having 25-50 real images taken at different angles. Labeling is performed using a labeling tool, such as LabelMe. The annotation types include: front and back scene body types, front and back scene object detection frames, front and back scene body segmentation masks and a foreground object grabbing sequence.
And fixing the parameters of the trunk network of the MaskRCNN pre-trained on other data sets, such as the MaskRCNN trained on the COCO data set, and changing the class output of the network into class 2. Retraining of the segmented network is performed using the labeled data.
S3, generating object local features and global features, connecting the two features to generate object features: in order to enable the capture sequence prediction network to perceive the relative position relation of each object relative to other objects and scenes, the method fuses the global features and the local features of the objects in the process of generating the object features.
Specifically, for the local features of the object, the feature map before the object segmentation mask is output as the feature source, and the segmentation mask covering the object is covered, and the mask covered part is extracted. This step is illustrated in label 1 in fig. 2. The mask covered feature area is feature pooled and then the pooled feature vectors are mapped into a fixed length feature space using a one-dimensional convolution or linear layer. The embodiment maps the features to the feature space with the dimension of 256 and the local features of the object
Figure BDA0003580427040000121
Different dimensions of feature space may be suitable for different complexity scenarios, so 256 is not a strictly required number here. Other numbers are possible.
The global features of the image are illustrated in label 2 in fig. 2. Global pooling is performed directly from the most complete resolution feature map of the image, and common pooling operations such as mean pooling and maximum pooling may be used. This example uses maximum pooling. And performing feature mapping on the pooled global features to a fixed-length feature space. The global features are still mapped to feature space with dimension 256 in this embodiment. I.e. f global ∈R 256 .
Local characteristics of each object
Figure BDA0003580427040000122
Connecting global features to obtain object features
Figure BDA0003580427040000123
In this embodiment, the object feature is a feature vector with a length of 512. Wherein f is global The global feature vector obtained in the previous step.
S4, carrying out feature coding on the object feature sequence by using a coder to generate the synthesized features of the object feature sequence: deriving object feature sequences from previous steps
Figure BDA0003580427040000124
Where N is the number of foreground objects detected. And coding the feature sequence by using a coder to obtain a feature vector fusing local and global.
Further, the encoder herein uses a recurrent neural network. The recurrent neural networks are used in order to be able to adapt to different object numbers. At the same time, the recurrent neural network can better encode content of a band sequence nature. The encoder herein may use variations of the recurrent neural network. In this embodiment a single layer LSTM recurrent neural network is used as the encoder. The object signature sequence here is virtually disordered. At each cycle step of the encoder, a feature vector which is not selected yet is randomly selected from the feature sequence to be used as the current step input of the encoder. This is repeated N times, N being the number of objects. And using the hidden features output in the Nth step as the synthesized features of the object feature sequence.
S5, predicting the object grabbing sequence: in order to be able to pick the objects to be grabbed step by step from the characteristic sequence of the objects and finally recover an object grabbing sequence, the grabbing sequence predictor uses a special recurrent neural network. More specifically, the present embodiment uses PointerNet as a sequential predictor. The predictor receives one hidden feature per cycle step
Figure BDA0003580427040000131
And an input feature
Figure BDA0003580427040000132
And using the output characteristics of each step
Figure BDA0003580427040000133
With object feature sequences
Figure BDA0003580427040000134
And calculating an object selection probability distribution vector, and selecting an index with the highest probability from the object selection probability distribution vector as the object output of the step. The calculation method of the probability distribution vector comprises the following steps:
Figure BDA0003580427040000135
p j =softmax(u j ),p j ∈R N
in the above calculation, v T 、W 1 、W 2 Are all network learnable parameters.
Figure BDA0003580427040000136
Is the correlation coefficient of the feature output of each step of the predictor to the object feature sequence. Each step of output of the predictor can calculate the correlation coefficient with all the object features, namely the whole object feature sequence, and then the correlation coefficient becomes a correlation coefficient vector. Wherein p is j Is the probability distribution calculated from the correlation coefficient vector using the softmax function.
Figure BDA0003580427040000137
Is referred to as being in p j I.e. the ith value of
Figure BDA0003580427040000138
Representing the predicted probability of grabbing object i for the jth cycle step. N is the number of objects in the image, therefore, p j Is a vector of length N, i.e. p j ∈R N From p j Index with the highest probability of selection
Figure BDA0003580427040000139
As the object of the step selection. The index is added to the pool of indexes that have already been selected so that the next cycle does not repeat the selection of the index.
The preferred embodiments of the invention disclosed above are intended to be illustrative only. The preferred embodiments are not intended to be exhaustive or to limit the invention to the precise embodiments disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best utilize the invention. The invention is limited only by the claims and their full scope and equivalents.

Claims (10)

1. A method for predicting an object capture order from an image based on deep learning is characterized in that: the method comprises the following steps:
step 1, detecting all foreground objects in an image by using a segmentation network, simultaneously outputting segmentation masks for all foreground objects, and reserving a global feature map of the image and an object feature map before outputting the masks;
step 2, cutting out a feature map of a mask position from the object feature map by using a segmentation mask of the object, pooling the feature map to obtain a local feature vector of the object, pooling the global feature map to obtain a global feature vector, and connecting the global feature vector to the local feature vector of each object to obtain the object feature of each object;
step 3, using a cyclic neural network as an encoder, and sequentially sending object feature vector sequences of all objects into the encoder to finally obtain a feature vector with a fixed length;
and 4, taking the feature vector coded in the step 3 as a hidden feature, randomly generating and generating an input vector, inputting the hidden feature vector into a grabbing sequence predictor, receiving a fixed-length input vector and a hidden feature obtained in the previous step by each step of the grabbing sequence predictor, outputting an index, wherein the index points to a certain feature in an object feature sequence, an object corresponding to the feature is a grabbed object predicted in the current step, the step number of cyclic prediction is the number of detected objects, and the predicted index sequence is the grabbing sequence of the objects finally.
2. The method of claim 1, wherein the method for predicting the object grabbing sequence from the image based on the deep learning is characterized in that: the segmentation network includes two classifiers for separating foreground objects and background objects.
3. The method of claim 1, wherein the method for predicting the object grabbing sequence from the image based on the deep learning is characterized in that: the step 2 comprises the following steps:
21. mask for detecting all foreground objects by using segmentation network i I ∈ 1, 2.. N, wherein N is the number of objects detected by the segmentation network in the current picture, the mask of the foreground object is used for masking the characteristic layer in the previous layer of the predicted object mask, the characteristic pooling is performed after the masking, and then a linear network is used for converting the number of characteristic channels into the local characteristic of the object with the fixed length
Figure FDA0003580427030000011
To generate respective local feature vectors for the respective objects;
22. directly performing feature pooling on the global feature layer with the most complete resolution, and converting the global features into the global features f of the scene by using another linear network global
23. Connecting local features and global features of an object into features of the object
Figure FDA0003580427030000012
4. The method of claim 1, wherein the method for predicting the object grabbing sequence from the image based on the deep learning is characterized in that: each time the encoder cycles, an object feature is used as an input, and a hidden feature is correspondingly output
Figure FDA0003580427030000013
Figure FDA0003580427030000014
Will be best understood byHidden feature encoded later
Figure FDA0003580427030000015
Feature encoding as a sequence of object features
Figure FDA0003580427030000021
Wherein,
Figure FDA0003580427030000022
is a hidden feature of the last encoded output,
Figure FDA0003580427030000023
is a characteristic of the object, N is the total number of objects,
Figure FDA0003580427030000024
is the result after encoding the characteristics of all objects.
5. The method of claim 1, wherein the method for predicting the object grabbing sequence from the image based on the deep learning is characterized in that: step 4 comprises the following steps:
41. using an LSTM recurrent neural network as a grabbing sequence predictor, and using the feature vector coded in the step 3 as a first hidden feature
Figure FDA0003580427030000025
Randomly generating a first input vector
Figure FDA0003580427030000026
m is a fixed input feature length, and the hidden feature and the input vector are input into the capture sequence predictor;
the fetch order predictor accepts one hidden feature per step
Figure FDA0003580427030000027
And an input vector
Figure FDA0003580427030000028
And outputs an output vector
Figure FDA0003580427030000029
Where j represents the current position in the jth loop that is predicting the jth grabbing target,
Figure FDA00035804270300000210
the feature corresponding to the capture target predicted in the last cycle indicates the first step of starting prediction when j is 1, and at this time
Figure FDA00035804270300000211
Generating a random vector as input;
42. for the feature vector output by each step of the grabbing sequence predictor, an index is calculated from the feature sequence of the object by using a mechanism in PointerNet, and the object corresponding to the index is taken as the grabbed object of the step;
43. and (4) circulating the steps 41-42 for h times, wherein h is the number of the detected objects, so as to obtain an index sequence with the length of h, and the index sequence is the object grabbing sequence.
6. The method of claim 1, wherein the method for predicting the object grabbing sequence from the image based on the deep learning is characterized in that: the marked data is data with grabbing sequence marks in a large batch automatically generated in a simulation and rendering mode, a heuristic algorithm is used for the method for generating the grabbing sequence marks, and the method comprises the following specific steps:
51. starting the construction of a scene, randomly importing n objects into a simulator, copying m instances of each object, and randomly generating the number of the objects and instance data in each scene construction;
52. dividing a p multiplied by p grid by taking the world center of the simulator as an origin, wherein the size of each square of the grid is the average diameter of an introduced object plus a fixed constant d;
53. placing an object at the edge of the grid, randomly selecting one example from the object examples each time, placing the object at the center of the corresponding grid, lifting the object along the z-axis, and then randomly translating the object in the xy-plane, wherein a texture is randomly given to the example each time;
54. repeating step 53 until the grid is fully placed, and then continuing to divide the grid of (p-2) × (p-2) with the same center and the same size;
56. repeating steps 53-54 until three layers are placed; if the number of the examples is insufficient or other conditions are not met, stopping scene construction and entering the next stage, wherein the finally generated scene presents the shape of a suspended pyramid pattern;
57. the world coordinate origin is taken as the origin of the sphere, a hemisphere is generated in the positive direction of the z axis, and the uniform sampling is carried out on the sphere o The position is used to assume a virtual camera;
58. gradually rendering the sampled camera position points, wherein the camera is over against the world origin every time, and meanwhile, randomly disturbing the lamplight and the material on the surface of the object in each rendering, and then rendering an image;
59. and filtering out objects with the shielding ratio exceeding a set ratio.
7. The method of claim 1, wherein the method for predicting the object grabbing sequence from the image based on the deep learning is characterized in that: the marked data is obtained by collecting RGB images of a stacked scene at a plurality of angles by using a camera, detecting an object frame of an object by using a trained segmentation network, and marking the grabbing sequence of the object in a manual marking mode.
8. The method of claim 1, wherein the method for predicting the object grabbing sequence from the image based on the deep learning is characterized in that: the marked data adopts an image synthesized by using a simulation rendering mode as data, and the data synthesized by using the simulation rendering mode is automatically marked.
9. The method of claim 1, wherein the method for predicting the object grabbing sequence from the image based on the deep learning is characterized in that: the segmentation network can generate a characteristic diagram of an image while obtaining a segmentation mask of an object in the image, can detect all foreground objects in the image, and can perform cyclic prediction on any number of objects by using the PoInterNet network in a capturing sequence prediction stage.
10. The method of claim 1, wherein the method for predicting the object grabbing sequence from the image based on the deep learning is characterized in that: when the feature extraction is performed on the object, the image feature of the area where the object is located is taken as the feature of the object.
CN202210344226.8A 2022-04-02 2022-04-02 Method for predicting object grabbing sequence from image based on deep learning Pending CN114882214A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210344226.8A CN114882214A (en) 2022-04-02 2022-04-02 Method for predicting object grabbing sequence from image based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210344226.8A CN114882214A (en) 2022-04-02 2022-04-02 Method for predicting object grabbing sequence from image based on deep learning

Publications (1)

Publication Number Publication Date
CN114882214A true CN114882214A (en) 2022-08-09

Family

ID=82669635

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210344226.8A Pending CN114882214A (en) 2022-04-02 2022-04-02 Method for predicting object grabbing sequence from image based on deep learning

Country Status (1)

Country Link
CN (1) CN114882214A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116184892A (en) * 2023-01-19 2023-05-30 盐城工学院 AI identification control method and system for robot object taking

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116184892A (en) * 2023-01-19 2023-05-30 盐城工学院 AI identification control method and system for robot object taking
CN116184892B (en) * 2023-01-19 2024-02-06 盐城工学院 AI identification control method and system for robot object taking

Similar Documents

Publication Publication Date Title
CN110837778B (en) Traffic police command gesture recognition method based on skeleton joint point sequence
CN108491880B (en) Object classification and pose estimation method based on neural network
CN114627360B (en) Substation equipment defect identification method based on cascade detection model
CN111553949B (en) Positioning and grabbing method for irregular workpiece based on single-frame RGB-D image deep learning
CN110532897A (en) The method and apparatus of components image recognition
US11475589B2 (en) 3D pose estimation by a 2D camera
CN114821014B (en) Multi-mode and countermeasure learning-based multi-task target detection and identification method and device
CN112288809B (en) Robot grabbing detection method for multi-object complex scene
CN110969660A (en) Robot feeding system based on three-dimensional stereoscopic vision and point cloud depth learning
CN115147488B (en) Workpiece pose estimation method and grabbing system based on dense prediction
CN115937774A (en) Security inspection contraband detection method based on feature fusion and semantic interaction
CN115861619A (en) Airborne LiDAR (light detection and ranging) urban point cloud semantic segmentation method and system of recursive residual double-attention kernel point convolution network
US11554496B2 (en) Feature detection by deep learning and vector field estimation
CN114549507A (en) Method for detecting fabric defects by improving Scaled-YOLOv4
CN114119753A (en) Transparent object 6D attitude estimation method facing mechanical arm grabbing
CN112613478A (en) Data active selection method for robot grabbing
CN114882214A (en) Method for predicting object grabbing sequence from image based on deep learning
CN113681552B (en) Five-dimensional grabbing method for robot hybrid object based on cascade neural network
CN113139432B (en) Industrial packaging behavior identification method based on human skeleton and partial image
Shah et al. Detection of different types of blood cells: A comparative analysis
CN113496526A (en) 3D gesture detection by multiple 2D cameras
CN115937492B (en) Feature recognition-based infrared image recognition method for power transformation equipment
CN116664843A (en) Residual fitting grabbing detection network based on RGBD image and semantic segmentation
CN111401203A (en) Target identification method based on multi-dimensional image fusion
CN116071299A (en) Insulator RTV spraying defect detection method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20240228

Address after: 510641 Industrial Building, Wushan South China University of Technology, Tianhe District, Guangzhou City, Guangdong Province

Applicant after: Guangzhou South China University of Technology Asset Management Co.,Ltd.

Country or region after: China

Address before: 510640 No. five, 381 mountain road, Guangzhou, Guangdong, Tianhe District

Applicant before: SOUTH CHINA University OF TECHNOLOGY

Country or region before: China

TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20240410

Address after: 518057, Building 4, 512, Software Industry Base, No. 19, 17, and 18 Haitian Road, Binhai Community, Yuehai Street, Nanshan District, Shenzhen City, Guangdong Province

Applicant after: Cross dimension (Shenzhen) Intelligent Digital Technology Co.,Ltd.

Country or region after: China

Address before: 510641 Industrial Building, Wushan South China University of Technology, Tianhe District, Guangzhou City, Guangdong Province

Applicant before: Guangzhou South China University of Technology Asset Management Co.,Ltd.

Country or region before: China