CN111191650A - Object positioning method and system based on RGB-D image visual saliency - Google Patents

Object positioning method and system based on RGB-D image visual saliency Download PDF

Info

Publication number
CN111191650A
CN111191650A CN202010003692.0A CN202010003692A CN111191650A CN 111191650 A CN111191650 A CN 111191650A CN 202010003692 A CN202010003692 A CN 202010003692A CN 111191650 A CN111191650 A CN 111191650A
Authority
CN
China
Prior art keywords
rgb
image
visual saliency
saliency
salient
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010003692.0A
Other languages
Chinese (zh)
Other versions
CN111191650B (en
Inventor
王松涛
靳薇
曲寒冰
李彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING INSTITUTE OF NEW TECHNOLOGY APPLICATIONS
Original Assignee
BEIJING INSTITUTE OF NEW TECHNOLOGY APPLICATIONS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING INSTITUTE OF NEW TECHNOLOGY APPLICATIONS filed Critical BEIJING INSTITUTE OF NEW TECHNOLOGY APPLICATIONS
Publication of CN111191650A publication Critical patent/CN111191650A/en
Application granted granted Critical
Publication of CN111191650B publication Critical patent/CN111191650B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J19/00Accessories fitted to manipulators, e.g. for monitoring, for viewing; Safety devices combined with or specially adapted for use in connection with manipulators
    • B25J19/02Sensing devices
    • B25J19/021Optical sensing devices
    • B25J19/023Optical sensing devices including video camera means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Robotics (AREA)
  • Mechanical Engineering (AREA)
  • Image Analysis (AREA)
  • Manipulator (AREA)

Abstract

An article positioning method and system based on RGB-D image visual saliency mainly comprises a camera, a mechanical arm and an operation table; the article to be grabbed is stacked on the operating platform, the mechanical arm is a UR5 mechanical arm, and the operating platform is a horizontal panel. When the system is initialized, the camera corrects the operating platform, and provides a reference plane for positioning the object by the mechanical arm and grabbing the object by the mechanical arm. Firstly, acquiring an RGB-D image of an operation platform scene by a camera; then, calculating a visual saliency map based on the RGB-D image, namely an RGB-D image saliency map; and finally, positioning the article based on the visual saliency map and providing mechanical arm article operation information. The pixel-level visual saliency map and the position information of the salient object can be generated simultaneously, and multiple operations of the manipulator are supported.

Description

Object positioning method and system based on RGB-D image visual saliency
Technical Field
The invention relates to the field of computer vision target positioning, in particular to an article positioning method and system based on RGB-D image vision saliency.
Background
When the scene is complex, especially when various items are scattered, the quick positioning of the items by the vision-based mechanical arm is a challenging task. Whether the mechanical arm successfully grabs the scene object or not is related to the selection of the order of grabbing the object, namely, the type and the position of the object which are most suitable for grabbing under the current scene need to be judged.
The mechanical arm grabbing based on visual perception is suitable for general article stacking scenes, such as logistics warehouses, can replace manual sorting, and achieves full-automatic and intelligent logistics management of unmanned factories, unmanned warehouses and the like.
Currently, a mechanical arm article grabbing application system generally collects scene visual information based on an RGB-D camera. A feedback map (affordance map) is computed based on the RGB-D image, from which points of proper operation are located. If the feedback graph has no proper points, a depth reinforcement learning strategy is adopted to actively try to change the space distribution of the scene objects, and the process is carried out until the feedback graph has proper points, so that the capturing success rate is not high.
When the scene is complex, the objects are overlapped and stacked and mutually shielded, and at the moment, the optimal positioning point cannot be confirmed by adopting the feedback graph method, the placing sequence of the objects in the scene needs to be actively interfered, namely, a feedback graph and reinforcement learning-based method is adopted. However, as a consequence of active intervention requires reinforcement learning to assess, a risk uncontrolled situation may arise, i.e. in an ineffective loop of death. Therefore, the object positioning based on the feedback graph for the reinforcement learning mechanism has the defects of complexity, uncontrollable property, high calculation consumption and the like.
Therefore, how to solve the problems of fast positioning and clamping of the mechanical arm in a complex scene of disordered distribution of various articles, namely, researching a new, fast, convenient and flexible method for fast positioning and clamping of the mechanical arm with less calculation consumption becomes a technical problem to be solved urgently.
Disclosure of Invention
In order to research a flexible and effective mechanism positioning method and system to complete rapid scene object positioning. The invention provides a positioning method for rapidly analyzing scenes based on visual saliency and simulating a human visual attention mechanism. Visual-saliency-based analysis may be used to accomplish specific visual tasks based on a priori knowledge, rules, and the like. The human visual attention mechanism can quickly browse a scene according to the significance degree, the first noticed area or target is often related to self experience knowledge and a specific purpose, and also related to the relative significance degree of the area or target in the scene, so how to realize the object positioning based on the visual significance is a technical difficulty which must be overcome. In order to solve the technical problems, the invention provides the method for sequencing scene articles based on semantic information to calculate the visual saliency, and provides distributed grabbing for mechanical arm grabbing according to the visual saliency value as a basis.
To solve the above technical problem, according to an aspect of the present invention, there is provided an article positioning method based on RGB-D image visual saliency, comprising the following steps:
acquiring an RGB-D image of an operation platform scene by a camera;
secondly, calculating a visual saliency map based on the RGB-D image, namely the RGB-D image saliency map;
thirdly, positioning the articles based on the visual saliency map and providing mechanical arm article operation information;
and step two, performing visual saliency detection on the RGB-D image, and calculating to obtain a visual saliency map, as shown in formula (1):
Figure BDA0002353953810000021
wherein, P (z)S|IRGB-D) Representing the visual saliency of the current scene, i.e. the visual saliency map, is defined as the probability p (z) of whether a pixel of an RGB-D image is salient or notS|xc,xd);IRGB-DRepresenting an RGB-D image; x is the number ofcAnd xdRespectively representing RGB image salient features and Depth image salient features, and respectively extracting by adopting a CNN network; p (z)S,xc,xd) Representing a joint probability distribution, p (x)c,xd) Representing a significant feature probability distribution; the visualization effect is represented by a temperature map, the larger the significant value is, the warmer the color is, and the smaller the significant value is, the colder the color is;
based on the RGB-D image saliency map, carrying out salient object position estimation, as shown in formula (6):
Figure BDA0002353953810000022
wherein O represents the salient object position coordinates, zSRepresenting the visual saliency of a salient object; p (O, z)s|IRGB-D) Representing a joint distribution of salient objects and visual saliency, p (I)RGB-D|O,zs) Representing the distribution of the RGB-D image saliency map, p (O, z), given target coordinates and visual saliencys) Representing the joint distribution of objects and visual saliency, p (I)RGB-D) Representing the RGB-D image feature distribution.
Preferably, when z is givenSWhen is, O and IRGB-DIs condition independent, then the following is obtained as equation (7):
p(IRGB-D|O,zs)=p(IRGB-D|zs) (7)
when the posterior probability of visual saliency of the target region is satisfied as a constraint condition of the salient target, under the condition that the image feature distribution is not changed, the formula (6) is approximately transformed to obtain a formula (8) as follows:
p(O,zs|IRGB-D)∝p(zs|IRGB-D)L(O)C(O,zs) (8)
wherein L (O) represents a target detection region, C (O, z)s) Representing constraints, decidesMeaning in the form:
Figure BDA0002353953810000031
wherein, bjWhich indicates the area of the detection target,
Figure BDA0002353953810000032
indicating that the region is visually significant; and L (O) is obtained by detecting a target area based on the RGB image by adopting a target detection algorithm, namely a Faster R-CNN algorithm.
Preferably, the camera is a Kinect camera.
Preferably, the adopted mechanical arm has two operation functions of suction and clamping; when the scene is complex, namely, articles are stacked together and are seriously shielded, a significant image is generated to be distributed in a pixel level, and the 'sucking' operation of a manipulator is supported; when the scene provides a target detection rectangular area, the salient target rectangular area can be obtained based on the salient image, and the manipulator clamping operation is supported.
Preferably, an operation stop threshold value is set based on the saliency value, the manipulator operation is sequentially driven according to the magnitude of the visual saliency value until the saliency value is lower than the threshold value, and the visual saliency of the RGB-D image of the scene is recalculated.
To solve the above technical problem, according to another aspect of the present invention, there is provided an article positioning system based on RGB-D image visual saliency using the method of claim 1, comprising: the system comprises a camera, a mechanical arm and an operating platform, wherein articles to be grabbed are stacked on the operating platform, the mechanical arm is a UR5 mechanical arm, the operating platform is a horizontal panel, and when the system is initialized, the camera corrects the operating platform and provides a reference plane for positioning the articles by the mechanical arm and grabbing the articles by the mechanical arm.
Preferably, when z is givenSWhen is, O and IRGB-DIs condition independent, then the following is obtained as equation (7):
p(IRGB-D|O,zs)=p(IRGB-D|zs) (7)
when the posterior probability of visual saliency of the target region is satisfied as a constraint condition of the salient target, under the condition that the image feature distribution is not changed, the formula (6) is approximately transformed to obtain a formula (8) as follows:
p(O,zs|IRGB-D)∝p(zs|IRGB-D)L(O)C(O,zs) (8)
wherein L (O) represents a target detection region, C (O, z)s) Representing constraints, defined in the form:
Figure BDA0002353953810000041
wherein, bjWhich indicates the area of the detection target,
Figure BDA0002353953810000042
indicating that the region is visually significant; and L (O) is obtained by detecting a target area based on the RGB image by adopting a target detection algorithm, namely a Faster R-CNN algorithm.
Preferably, the camera is a Kinect camera.
Preferably, the adopted mechanical arm has two operation functions of suction and clamping; when the scene is complex, namely, articles are stacked together and are seriously shielded, a significant image is generated to be distributed in a pixel level, and the 'sucking' operation of a manipulator is supported; when the scene provides a target detection rectangular area, the salient target rectangular area can be obtained based on the salient image, and the manipulator clamping operation is supported.
Preferably, an operation stop threshold value is set based on the saliency value, the manipulator operation is sequentially driven according to the magnitude of the visual saliency value until the saliency value is lower than the threshold value, and the visual saliency of the RGB-D image of the scene is recalculated.
The invention has the beneficial effects that:
1. the object positioning method based on the RGB-D image visual saliency is based on the fact that the visual saliency is used as a mechanical arm object selection judgment basis for positioning, and the complexity of learning based on a deep reinforcement learning strategy is solved.
2. The pixel-level visual saliency map and the position information of the saliency target can be generated simultaneously, various operations of the manipulator are supported, and the limitation of interpretation basis is overcome.
3. The problem of article positioning sequence strategy learning is simplified, and the article positioning sequence criterion has universality and universality.
4. The selection of articles based on visual saliency only requires sorting of visual saliency values to determine a priority without special training for a particular scene. And reflecting the attention sequence of the scene articles based on the visual saliency, and selecting the articles as a mechanical arm to position the articles. Only scene objects need to be detected, and reinforcement learning for specific scenes is not needed.
5. The method has the advantages that the obvious target position is estimated accurately by adopting the estimation method, the basis is provided for the subsequent sequential positioning of the articles, the system operation efficiency is improved, the success rate of the system operation is increased, and the system operation time is effectively reduced.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the principles of the invention. The above and other objects, features and advantages of the present invention will become more apparent from the detailed description of the embodiments of the present invention when taken in conjunction with the accompanying drawings.
FIG. 1 is a robotic arm item positioning system;
FIG. 2 is a block diagram of a method for positioning items based on visual saliency facing the operation of a robotic arm;
FIG. 3 is a diagram of the operation of the robotic arm to provide for sequencing item prioritization based on saliency maps;
FIG. 4 is a drawing of the test experiment "suck" operation;
FIG. 5 is a diagram of the test experiment "clamp" operation.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and embodiments. It is to be understood that the specific embodiments described herein are for purposes of illustration only and are not to be construed as limitations of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.
In addition, the embodiments of the present invention and the features of the embodiments may be combined with each other without conflict. The present invention will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
As shown in fig. 1, the RGB-D image based visual saliency-based mechanical arm article positioning system includes a Kinect camera, a mechanical arm, a manipulator and an operation table, an article to be grabbed is stacked on the operation table, the mechanical arm is a UR5 mechanical arm, the operation table is a horizontal panel, and when the system is initialized, the Kinect camera corrects the operation table to provide a reference plane for positioning the article by the mechanical arm and grabbing the article by the manipulator. First, an RGB-D image of the console scene is acquired by the Kinect camera. Then, a visual saliency map is computed based on the RGB-D image. And positioning the article based on the visual saliency map, and providing mechanical arm article operation information. The adopted manipulator has two operation functions of suction and clamping. In the specific operation process, the executed operation flow is as follows: when the scene is complex, namely, articles are stacked together and are seriously shielded, a significant image is generated to be distributed in a pixel level, and the 'sucking' operation of a manipulator is supported; when the scene provides a target detection rectangular area, the salient target rectangular area can be obtained based on the salient image, and the manipulator clamping operation is supported. And finally, setting an operation stop threshold value based on the saliency value, sequentially driving the mechanical hands to operate according to the magnitude sequence of the visual saliency value until the saliency value is lower than the threshold value, stopping, recalculating the visual saliency of the RGB-D image of the scene, and repeating the steps.
Therefore, by carrying out scene analysis, the visual saliency with different scales is generated, various operations of the mechanical arm are realized, and the mechanical arm grabbing system can be suitable for quick positioning and operating tasks of articles in different scenes.
FIG. 2 is a block diagram of a salient item positioning method oriented to robotic arm operation. The method comprises the steps of collecting an RGB-D image by using a Kinect device, carrying out visual saliency detection on the RGB-D image based on a DMNB (hybrid Mixed-member Naive Bayes Model), and calculating to obtain a scene saliency map, namely obtaining the saliency map based on the RGB-D image. The sequence of scene item operations is then sorted based on the saliency values, as shown in FIG. 3.
To calculate the visual saliency of an RGB-D image, a binary random variable zs is defined to represent whether a pixel of the RGB-D image is salient or not, as shown in equation 1:
Figure BDA0002353953810000061
wherein, P (z)S|IRGB-D) Representing the visual saliency of the current scene, i.e. saliency map, is defined as the probability p (z) of whether a pixel of an RGB-D image is salient or notS|xc,xd) The visualization effect is shown in a temperature diagram, the larger the significant value is, the warmer the color is, and the smaller the significant value is, the colder the color is; i isRGB-DRepresenting RGB-D images, xcAnd xdRespectively representing RGB image salient features and Depth image (Depth image) salient features, and respectively extracting by adopting a CNN network; p (z)S,xc,xd) Representing a joint probability distribution, p (x)c,xd) Representing a significant feature probability distribution.
Expanding the formula (1) based on Bayesian theorem, as shown in the formula (2):
Figure BDA0002353953810000062
where x iscAnd xdGiven hidden variable zSThe conditions are independent, and therefore, equation (2) is transformed into equation (3), as follows:
p(xc,xd|zs)=p(xc|zs)p(xd|zs) (3)
combining with the formula (3), the formula (2) is transformed into the formula (4)
Figure BDA0002353953810000071
Wherein, p (z)S) Representing a prior distribution, p (x)c|zs) And p (x)d|zs) Representing a visual saliency distribution, p (x), based on color features and depth featuresc,xd) Representing a significant feature probability distribution, which is simplified for computational efficiency. Finally, the significance value is calculated by equation (5):
p(zs|xc,xd)∝p(zs)p(xc|zs)p(xd|zs) (5)
(1) salient object position estimation based on RGB-D image salient map
Aiming at the technical problem of the invention, in order to obtain an effective visual saliency value so as to determine a priority according to a size ordering, the invention innovatively provides a calculation method of a saliency target estimation, wherein the saliency target estimation is shown as a formula 6:
Figure BDA0002353953810000072
wherein O represents the salient object position coordinates, zsIndicating the visual saliency of the salient objects. p (O, z)s|IRGB-D) Representing a joint distribution of salient objects and visual saliency, p (I)RGB-D|O,zs) Representing the distribution of the RGB-D image saliency map, p (O, z), given target coordinates and visual saliencys) Representing the joint distribution of objects and visual saliency, p (I)RGB-D) Representing the RGB-D image feature distribution.
The invention innovatively provides that the estimation method is adopted to accurately estimate the obvious target position, provides a basis for the subsequent sequential positioning of articles, improves the system operation efficiency, increases the success rate of system operation, and effectively reduces the system operation time.
When given zSWhen is, O and IRGB-DIs condition independent, then the following is obtained as equation (7):
p(IRGB-D|O,zs)=p(IRGB-D|zs) (7)
when the posterior probability of visual saliency of the target region is satisfied as a constraint condition of the salient target, under the condition that the image feature distribution is not changed, in order to calculate efficiency, the formula (6) is approximately transformed to obtain a formula (8) as follows:
p(O,zs|IRGB-D)∝p(zs|IRGB-D)L(O)C(O,zs) (8)
wherein L (O) represents a target detection region, C (O, z)s) Representing constraints, defined in the form:
Figure BDA0002353953810000081
wherein, bjWhich indicates the area of the detection target,
Figure BDA0002353953810000082
indicating that the region is visually significant. And L (O) is obtained by detecting a target area based on the RGB image by adopting a target detection algorithm, namely a Faster R-CNN algorithm.
Based on the problem of detecting the repeated area solved by the non-maximization inhibition algorithm, the rectangular frame of the obvious object can be positioned under the condition of sparse distribution of articles in a scene, and the method is suitable for the 'grabbing' operation of a manipulator.
In order to verify the effectiveness of the object positioning method based on the visual saliency of the RGB-D image, the following test experiments are carried out:
selecting 40 different objects to construct different scenes, and grabbing by using a manipulator as shown in fig. 1, and performing grabbing experiments as shown in fig. 4 and 5. Wherein, fig. 4 is the test experiment suction operation, and fig. 5 is the test experiment clamping operation.
If a conventional feedback map is used, the robot will repeat this failed operation when the article corresponding to its maximum cannot be manipulated by the robot arm, since the environment and the feedback map are unchanged. Thus, if a manipulator fails three times on the same object, we define the test operation as a failure; we define the test as successful if the first 10 objects in the scene were successfully operated by the robot. On this basis, we define three indexes:
(1) the average number of test scenarios successfully grabbed at each time;
(2) "suck" operation success rate, which is defined as the number of successfully grasped objects divided by the number of lifting operations;
(3) test success, defined as the number of successful tests divided by the number of tests.
Table 1 records all test results for 20 different scenarios. Experiments show that after the feedback diagram and the reinforcement learning are actively optimized, the suction operation success rate and the test success rate of the system are improved to a certain extent compared with those of a method based on the feedback diagram, but the time complexity is greatly increased. The method based on the visual saliency solves the problem of object positioning from the perspective of visual saliency, does not depend on reinforcement learning, greatly improves the success rate, and does not increase excessive time complexity.
It follows that when relying solely on static feedback maps to obtain operational decisions, failures are likely to occur in cluttered scenarios.
When relying on feedback maps for active detection optimization to provide success rates, the system will ensure that the locations of items near the scene are sparse by actively disturbing the item distribution.
The method introduces the position estimation of the obvious objects, can automatically detect the sparse situation of the scene objects, and avoids the possibility of failed operation. The success rate of grabbing is greatly improved; meanwhile, for the situation that the article is seriously overlapped and blocked, the method can output the pixel level saliency map, so that the method has stronger adaptability to the scene. Thus, a more reliable decision can be obtained.
TABLE 1 System article location test results
Figure BDA0002353953810000091
The technical scheme of the invention solves the problems of complexity and limitation of strategy judgment basis in the process of positioning and grabbing articles based on machine vision in the operation of the traditional mechanical arm. The problem of article positioning sequence strategy learning is simplified, and the article positioning sequence criterion has universality and universality. The object positioning method based on the RGB-D image visual saliency is based on the fact that the visual saliency is used as a mechanical arm object selection judgment basis for positioning, and the complexity of learning based on a deep reinforcement learning strategy is solved. The method obtains the pixel-level and target-level saliency maps, supports various operations of a manipulator, and overcomes the limitation of interpretation bases.
The strategy training based on the deep reinforcement learning needs a large amount of labeled video data, and meanwhile, the training needs specific hardware support, so that the method is the optimal strategy learning under a specific scene. The visual saliency-based article selection method only needs the sorting of the visual saliency values to determine the priority, and does not need special training for specific scenes.
And scene positioning perception multi-scale positioning information is output, and various positioning requirements of the manipulator are supported. The original method only outputs pixel level information based on a feedback graph, and target level information is not generated. The invention calculates visual saliency in various output modes, and is suitable for pixel-level and target-level positioning. Each pixel in the image corresponds to a visual saliency value, the visual saliency map is suitable for mechanical arm suction operation, object segmentation is carried out on the image based on the visual saliency values of the pixel values to obtain the outline of the object, and the visual saliency map is suitable for mechanical arm clamping operation.
The mechanical arm object positioning and grabbing can be based on an object 6D posture estimation method, but the method is complex in scene, serious in object shielding and unreliable in posture estimation under the object stacking condition. Meanwhile, whether the object posture is suitable for the operation type of the manipulator needs to be quantified, and actual scene debugging is needed.
The method and the system for positioning the articles based on the visual saliency of the RGB-D images can be used in the fields of unmanned sorting of warehouses, service robots and the like.
And reflecting the attention sequence of the scene articles based on the visual saliency, and selecting the articles as a mechanical arm to position the articles. Only scene objects need to be detected, and reinforcement learning for specific scenes is not needed. In addition, the invention can simultaneously generate the pixel-level visual saliency map and the saliency target position information and support various operations of the manipulator.
So far, the technical solutions of the present invention have been described with reference to the preferred embodiments shown in the drawings, but it should be understood by those skilled in the art that the above embodiments are only for clearly illustrating the present invention, and not for limiting the scope of the present invention, and it is apparent that the scope of the present invention is not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims (10)

1. An article positioning method based on RGB-D image visual saliency is characterized by comprising the following steps:
acquiring an RGB-D image of an operation platform scene by the camera;
secondly, calculating a visual saliency map based on the RGB-D image, namely the RGB-D image saliency map;
thirdly, positioning the articles based on the visual saliency map and providing mechanical arm article operation information;
and step two, performing visual saliency detection on the RGB-D image, and calculating to obtain a visual saliency map, as shown in formula (1):
Figure FDA0002353953800000011
wherein, P (z)S|IRGB-D) Representing the visual saliency of the current scene, i.e. the visual saliency map, is defined as the probability p (z) of whether a pixel of an RGB-D image is salient or notS|xc,xd);IRGB-DRepresenting an RGB-D image; x is the number ofcAnd xdRespectively representing RGB image salient features and Depth image salient features, and respectively extracting by adopting a CNN network; p (z)S,xc,xd) Representing a joint probability distribution, p (x)c,xd) Outline of representing salient featuresRate distribution; the visualization effect is represented by a temperature map, the larger the significant value is, the warmer the color is, and the smaller the significant value is, the colder the color is;
based on the RGB-D image saliency map, carrying out salient object position estimation, as shown in formula (6):
Figure FDA0002353953800000012
wherein O represents the salient object position coordinates, zSRepresenting the visual saliency of a salient object; p (O, z)s|IRGB-D) Representing a joint distribution of salient objects and visual saliency, p (I)RGB-D|O,zs) Representing the distribution of the RGB-D image saliency map, p (O, z), given target coordinates and visual saliencys) Representing the joint distribution of objects and visual saliency, p (I)RGB-D) Representing the RGB-D image feature distribution.
2. The RGB-D image visual saliency-based item positioning method of claim 1,
when given zSWhen is, O and IRGB-DIs condition independent, then the following is obtained as equation (7):
p(IRGB-D|O,zs)=p(IRGB-D|zs) (7)
when the posterior probability of visual saliency of the target region is satisfied as a constraint condition of the salient target, under the condition that the image feature distribution is not changed, the formula (6) is approximately transformed to obtain a formula (8) as follows:
p(O,zs|IRGB-D)∝p(zs|IRGB-D)L(O)C(O,zs) (8)
wherein L (O) represents a target detection region, C (O, z)s) Representing constraints, defined in the form:
Figure FDA0002353953800000021
wherein, bjIndicating a detected objectThe area of the image to be displayed is,
Figure FDA0002353953800000022
indicating that the region is visually significant; and L (O) is obtained by detecting a target area based on the RGB image by adopting a target detection algorithm, namely a Faster R-CNN algorithm.
3. The RGB-D image visual saliency-based item positioning method of claim 1,
the camera is a Kinect camera.
4. The RGB-D image visual saliency-based item positioning method of claim 1,
the adopted manipulator has two operation functions of suction and clamping; when the scene is complex, namely, articles are stacked together and are seriously shielded, a significant image is generated to be distributed in a pixel level, and the 'sucking' operation of a manipulator is supported; when the scene provides a target detection rectangular area, the salient target rectangular area can be obtained based on the salient image, and the manipulator clamping operation is supported.
5. The RGB-D image visual saliency-based item positioning method of claim 1,
and setting an operation stop threshold value based on the saliency value, sequentially driving the mechanical hands to operate according to the magnitude sequence of the visual saliency value until the saliency value is lower than the threshold value, stopping, and recalculating the visual saliency of the RGB-D image of the scene.
6. An article location system based on RGB-D image visual saliency employing the method of claim 1, comprising: the system comprises a camera, a mechanical arm and an operating platform, wherein articles to be grabbed are stacked on the operating platform, the mechanical arm is a UR5 mechanical arm, the operating platform is a horizontal panel, and when the system is initialized, the camera corrects the operating platform and provides a reference plane for positioning the articles by the mechanical arm and grabbing the articles by the mechanical arm.
7. An article positioning system based on RGB-D image visual saliency as claimed in claim 6,
when given zSWhen is, O and IRGB-DIs condition independent, then the following is obtained as equation (7):
p(IRGB-D|O,zs)=p(IRGB-D|zs) (7)
when the posterior probability of visual saliency of the target region is satisfied as a constraint condition of the salient target, under the condition that the image feature distribution is not changed, the formula (6) is approximately transformed to obtain a formula (8) as follows:
p(O,zs|IRGB-D)∝p(zs|IRGB-D)L(O)C(O,zs) (8)
wherein L (O) represents a target detection region, C (O, z)s) Representing constraints, defined in the form:
Figure FDA0002353953800000031
wherein, bjWhich indicates the area of the detection target,
Figure FDA0002353953800000032
indicating that the region is visually significant; and L (O) is obtained by detecting a target area based on the RGB image by adopting a target detection algorithm, namely a Faster R-CNN algorithm.
8. An article positioning system based on RGB-D image visual saliency as claimed in claim 6,
the camera is a Kinect camera.
9. An article positioning system based on RGB-D image visual saliency as claimed in claim 6,
the adopted manipulator has two operation functions of suction and clamping; when the scene is complex, namely, articles are stacked together and are seriously shielded, a significant image is generated to be distributed in a pixel level, and the 'sucking' operation of a manipulator is supported; when the scene provides a target detection rectangular area, the salient target rectangular area can be obtained based on the salient image, and the manipulator clamping operation is supported.
10. An article positioning system based on RGB-D image visual saliency as claimed in claim 6,
and setting an operation stop threshold value based on the saliency value, sequentially driving the mechanical hands to operate according to the magnitude sequence of the visual saliency value until the saliency value is lower than the threshold value, stopping, and recalculating the visual saliency of the RGB-D image of the scene.
CN202010003692.0A 2019-12-30 2020-01-02 Article positioning method and system based on RGB-D image visual saliency Active CN111191650B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911402160 2019-12-30
CN2019114021608 2019-12-30

Publications (2)

Publication Number Publication Date
CN111191650A true CN111191650A (en) 2020-05-22
CN111191650B CN111191650B (en) 2023-07-21

Family

ID=70709757

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010003692.0A Active CN111191650B (en) 2019-12-30 2020-01-02 Article positioning method and system based on RGB-D image visual saliency

Country Status (1)

Country Link
CN (1) CN111191650B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112077842A (en) * 2020-08-21 2020-12-15 上海明略人工智能(集团)有限公司 Clamping method, clamping system and storage medium
CN112223288A (en) * 2020-10-09 2021-01-15 南开大学 Visual fusion service robot control method
CN113222003A (en) * 2021-05-08 2021-08-06 北方工业大学 RGB-D-based indoor scene pixel-by-pixel semantic classifier construction method and system
WO2022037389A1 (en) * 2020-08-18 2022-02-24 维数谷智能科技(嘉兴)有限公司 Reference plane-based high-precision method and system for estimating multi-degree-of-freedom attitude of object

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103679740A (en) * 2013-12-30 2014-03-26 中国科学院自动化研究所 ROI (Region of Interest) extraction method of ground target of unmanned aerial vehicle
CN103824284A (en) * 2014-01-26 2014-05-28 中山大学 Key frame extraction method based on visual attention model and system
CN104408733A (en) * 2014-12-11 2015-03-11 武汉大学 Object random walk-based visual saliency detection method and system for remote sensing image
US20150117783A1 (en) * 2013-10-24 2015-04-30 Adobe Systems Incorporated Iterative saliency map estimation
US20150169989A1 (en) * 2008-11-13 2015-06-18 Google Inc. Foreground object detection from multiple images
US20150310303A1 (en) * 2014-04-29 2015-10-29 International Business Machines Corporation Extracting salient features from video using a neurosynaptic system
CN105389550A (en) * 2015-10-29 2016-03-09 北京航空航天大学 Remote sensing target detection method based on sparse guidance and significant drive
US20160180188A1 (en) * 2014-12-19 2016-06-23 Beijing University Of Technology Method for detecting salient region of stereoscopic image
CN106997478A (en) * 2017-04-13 2017-08-01 安徽大学 RGB-D image salient target detection method based on salient center prior
CN107992874A (en) * 2017-12-20 2018-05-04 武汉大学 Image well-marked target method for extracting region and system based on iteration rarefaction representation
US20180285683A1 (en) * 2017-03-30 2018-10-04 Beihang University Methods and apparatus for image salient object detection
CN108846416A (en) * 2018-05-23 2018-11-20 北京市新技术应用研究所 The extraction process method and system of specific image
CN109146925A (en) * 2018-08-23 2019-01-04 郑州航空工业管理学院 Conspicuousness object detection method under a kind of dynamic scene
CN109598268A (en) * 2018-11-23 2019-04-09 安徽大学 A kind of RGB-D well-marked target detection method based on single flow depth degree network
CN109740613A (en) * 2018-11-08 2019-05-10 深圳市华成工业控制有限公司 A kind of Visual servoing control method based on Feature-Shift and prediction

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150169989A1 (en) * 2008-11-13 2015-06-18 Google Inc. Foreground object detection from multiple images
US20150117783A1 (en) * 2013-10-24 2015-04-30 Adobe Systems Incorporated Iterative saliency map estimation
CN103679740A (en) * 2013-12-30 2014-03-26 中国科学院自动化研究所 ROI (Region of Interest) extraction method of ground target of unmanned aerial vehicle
CN103824284A (en) * 2014-01-26 2014-05-28 中山大学 Key frame extraction method based on visual attention model and system
US20150310303A1 (en) * 2014-04-29 2015-10-29 International Business Machines Corporation Extracting salient features from video using a neurosynaptic system
CN104408733A (en) * 2014-12-11 2015-03-11 武汉大学 Object random walk-based visual saliency detection method and system for remote sensing image
US20160180188A1 (en) * 2014-12-19 2016-06-23 Beijing University Of Technology Method for detecting salient region of stereoscopic image
CN105389550A (en) * 2015-10-29 2016-03-09 北京航空航天大学 Remote sensing target detection method based on sparse guidance and significant drive
US20180285683A1 (en) * 2017-03-30 2018-10-04 Beihang University Methods and apparatus for image salient object detection
CN106997478A (en) * 2017-04-13 2017-08-01 安徽大学 RGB-D image salient target detection method based on salient center prior
CN107992874A (en) * 2017-12-20 2018-05-04 武汉大学 Image well-marked target method for extracting region and system based on iteration rarefaction representation
CN108846416A (en) * 2018-05-23 2018-11-20 北京市新技术应用研究所 The extraction process method and system of specific image
CN109146925A (en) * 2018-08-23 2019-01-04 郑州航空工业管理学院 Conspicuousness object detection method under a kind of dynamic scene
CN109740613A (en) * 2018-11-08 2019-05-10 深圳市华成工业控制有限公司 A kind of Visual servoing control method based on Feature-Shift and prediction
CN109598268A (en) * 2018-11-23 2019-04-09 安徽大学 A kind of RGB-D well-marked target detection method based on single flow depth degree network

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
GERMÁN M. GARCÍA: "Saliency-based object discovery on RGB-D data with a late-fusion approach" *
JIANHUA ZHANG: "Objectness ranking by uniform Bayesian model with multimodal and global cues" *
夏辰: "基于重构的自底向上视觉注意模型研究" *
杜杰: "基于区域特征融合的显著目标检测研究" *
王松涛: "基于特征融合的RGB-D图像视觉显著性检测方法研究" *
黄子超: "先验融合和特征指导的显著目标检测方法研究" *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022037389A1 (en) * 2020-08-18 2022-02-24 维数谷智能科技(嘉兴)有限公司 Reference plane-based high-precision method and system for estimating multi-degree-of-freedom attitude of object
CN112077842A (en) * 2020-08-21 2020-12-15 上海明略人工智能(集团)有限公司 Clamping method, clamping system and storage medium
CN112223288A (en) * 2020-10-09 2021-01-15 南开大学 Visual fusion service robot control method
CN112223288B (en) * 2020-10-09 2021-09-14 南开大学 Visual fusion service robot control method
CN113222003A (en) * 2021-05-08 2021-08-06 北方工业大学 RGB-D-based indoor scene pixel-by-pixel semantic classifier construction method and system
CN113222003B (en) * 2021-05-08 2023-08-01 北方工业大学 Construction method and system of indoor scene pixel-by-pixel semantic classifier based on RGB-D

Also Published As

Publication number Publication date
CN111191650B (en) 2023-07-21

Similar Documents

Publication Publication Date Title
CN111191650A (en) Object positioning method and system based on RGB-D image visual saliency
US10124489B2 (en) Locating, separating, and picking boxes with a sensor-guided robot
US11209265B2 (en) Imager for detecting visual light and projected patterns
JP7352260B2 (en) Robot system with automatic object detection mechanism and its operating method
Schwarz et al. Fast object learning and dual-arm coordination for cluttered stowing, picking, and packing
US9802317B1 (en) Methods and systems for remote perception assistance to facilitate robotic object manipulation
US9649767B2 (en) Methods and systems for distributing remote assistance to facilitate robotic object manipulation
WO2020034872A1 (en) Target acquisition method and device, and computer readable storage medium
EP3186777B1 (en) Combination of stereo and structured-light processing
US9259844B2 (en) Vision-guided electromagnetic robotic system
US9205558B1 (en) Multiple suction cup control
US20230260071A1 (en) Multicamera image processing
CN111571581B (en) Computerized system and method for locating grabbing positions and tracks using image views
JP7377627B2 (en) Object detection device, object grasping system, object detection method, and object detection program
CN113538459A (en) Multi-mode grabbing obstacle avoidance detection optimization method based on drop point area detection
Xu et al. A vision-guided robot manipulator for surgical instrument singulation in a cluttered environment
US20210001488A1 (en) Silverware processing systems and methods
EP4249178A1 (en) Detecting empty workspaces for robotic material handling
WO2023092519A1 (en) Grabbing control method and apparatus, and electronic device and storage medium
Su et al. Pose-Aware Placement of Objects with Semantic Labels-Brandname-based Affordance Prediction and Cooperative Dual-Arm Active Manipulation
WO2023073780A1 (en) Device for generating learning data, method for generating learning data, and machine learning device and machine learning method using learning data
WO2024053150A1 (en) Picking system
US20230364787A1 (en) Automated handling systems and methods
Piao et al. Robotic tidy-up tasks using point cloud-based pose estimation
CN117726896A (en) Computer-implemented method for generating (training) images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant