WO2020142498A1

WO2020142498A1 - Robot having visual memory

Info

Publication number: WO2020142498A1
Application number: PCT/US2019/069080
Authority: WO
Inventors: Remus Boca; Thomas Fuhlbrigge; Johnny Holmberg; Magnus Wahlstrom; Zhou TENG
Original assignee: Abb Schweiz Ag
Priority date: 2018-12-31
Filing date: 2019-12-31
Publication date: 2020-07-09

Abstract

A robot system includes a moveable robot and camera useful to image a target object, such as a worktime object. A controller can receive an image from the camera and compare the image against a visual memory. In one form the visual memory represents a memory associated with a class of objects (e.g. screws, nuts, bolts, plates, gears, chains, hinges, latches, hasps, catches, etc). The robot can be delivered in a ready to use condition which includes a perception controller, visual memory, visual sensors, and perception task definition.

Description

ROBOT HAVING VISUAL MEMORY

TECHNICAL FIELD

The present invention generally relates to robot and robot training, and more particularly, but not exclusively, to robotic systems and training for robotic systems.

BACKGROUND

Providing flexibility and increased productivity in robotic systems and training for robotic systems remains an area of interest. Some existing systems have various shortcomings relative to certain applications. Accordingly, there remains a need for further contributions in this area of technology.

SUMMARY

One embodiment of the present invention is a unique robotic system. Other embodiments include apparatuses, systems, devices, hardware, methods, and combinations for training robotic systems. Further embodiments, forms, features, aspects, benefits, and advantages of the present application shall become apparent from the description and figures provided herewith. BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 depicts an embodiment of a robot system.

FIG. 2 depicts an embodiment of a computer.

FIG. 3 depicts an embodiment of a robot system.

FIG. 4 depicts a flow chart embodiment of the present application.

FIG. 5 depicts an embodiment of a robot which can be provided direct to a customer with minimal to no training required.

DETAILED DESCRIPTION OF THE ILLUSTRATIVE EMBODIMENTS

For the purposes of promoting an understanding of the principles of the invention, reference will now be made to the embodiments illustrated in the drawings and specific language will be used to describe the same. It will nevertheless be understood that no limitation of the scope of the invention is thereby intended. Any alterations and further modifications in the described embodiments, and any further applications of the principles of the invention as described herein are contemplated as would normally occur to one skilled in the art to which the invention relates.

With reference to FIG. 1 , a schematic of a robot 50 is shown which includes a number of moveable robot components 52 along with an effector 54 useful to manipulate and/or sense a target 56. In some forms the robot 50 can include a single moveable robot component 52. Additionally and/or alternatively, the robot 50 need not include an end effector 54. The robot 50 can be mobile in some embodiments, but other embodiments the robot 50 can be mounted upon a stationary base (e.g. FIG. 1 ). The robot components 52 can take any variety of forms such as arms, links, beams, etc which can be used to position the effector 54. The robot 50 can include any number of moveable components 52 which can take on different sizes, shapes, and other features. The components 52, furthermore, can be interconnected with one another through any variety of useful mechanisms such as links and gears 58, to set forth just two examples. The components 52 can be actuated via any suitable device such as electric actuators, pneumatic or hydraulic pistons, etc. The effector 54 can take any variety of forms such as a gripper, suction effector, belt, etc.

In many embodiments described herein, the target 56 will be understood to be a training object 56 which will be imaged as the robot interacts with the object (e.g. to determine the type of part of the object and/or to determine a technique to capture the object through a variety of end effector(s) (such as but not limited to a gripper), etc) to obtain training and evaluation data useful in the creation of a system capable of quickly recognizing and/or locating objects in working runtime. The training object 56 can be imaged using image capture device 57 which can take the form of any variety of devices/systems including via cameras, radar, light curtains, etc. In some forms the device 57 can be other types of sensors and/or computed values such as force contact sensor, range finder, proximity sensors, etc. Although a single image capture device 57 is depicted in FIG. 1 , it will be appreciated that other embodiments may include two or more devices 57. As will be understood, the term“camera” can refer to a variety of devices capable of detecting electromagnetic radiation, whether in the visible range, infrared range, etc. Such“cameras” can also refer to 2D and/or 3D cameras. Reference may be made below to image capture device 57 as a camera 57, but no limitation is hereby intended that such device 57 is limited to a “camera” unless explicitly or inherently understood to the contrary.

The position and/or orientation of the training object 56 can be

manipulated by a positioner 59. The positioner 59 can take a variety of forms useful to manipulate the training object 56. In one form the positioner 59 can include one or more moveable components and/or effectors useful to change the orientation of the training object 56 for purposes of being imaged by the camera 57 at different orientations. As with the robot 50, the positioner 59 can be stationary, but in some forms may also be moveable.

The robot 50, camera 57, and/or positioner 59 can be operated via a controller 55. The controller 55 can be one or more different devices useful to control one or more of the robot 50, camera 57, and positioner 59.

Turning now to FIG. 2, and with continued reference to FIG. 1 , a schematic diagram is depicted of a computer 60 suitable to host the controller 55 for operating the robot 50. Computer 60 includes a processing device 64, an input/output device 66, memory 68, and operating logic 70. Furthermore, computer 60 can be configured to communicate with one or more external devices 72.

The input/output device 66 may be any type of device that allows the computer 60 to communicate with the external device 72. For example, the input/output device may be a network adapter, network card, or a port (e.g., a USB port, serial port, parallel port, VGA, DVI, HDMI, FireWire, CAT 5, or any other type of port). The input/output device 66 may be comprised of hardware, software, and/or firmware. It is contemplated that the input/output device 66 includes more than one of these adapters, cards, or ports.

The external device 72 may be any type of device that allows data to be inputted or outputted from the computer 60. In one non-limiting example the external device 72 can be the camera 57, positioner 59, etc. To set forth just a few additional non-limiting examples, the external device 72 may be another computer, a server, a printer, a display, an alarm, an illuminated indicator, a keyboard, a mouse, mouse button, or a touch screen display. Furthermore, it is contemplated that the external device 72 may be integrated into the computer 60. For example, the computer 60 may be a smartphone, a laptop computer, or a tablet computer. It is further contemplated that there may be more than one external device in communication with the computer 60. The external device can be co-located with the computer 60 or alternatively located remotely from the computer.

Processing device 64 can be of a programmable type, a dedicated, hardwired state machine, or a combination of these; and can further include multiple processors, Arithmetic-Logic Units (ALUs), Central Processing Units (CPUs), or the like. For forms of processing device 64 with multiple processing units, distributed, pipelined, and/or parallel processing can be utilized as appropriate. Processing device 64 may be dedicated to performance of just the operations described herein or may be utilized in one or more additional applications. In the depicted form, processing device 64 is of a programmable variety that executes algorithms and processes data in accordance with operating logic 70 as defined by programming instructions (such as software or firmware) stored in memory 68. Alternatively or additionally, operating logic 70 for processing device 64 is at least partially defined by hardwired logic or other hardware. Processing device 64 can be comprised of one or more components of any type suitable to process the signals received from input/output device 66 or elsewhere, and provide desired output signals. Such components may include digital circuitry, analog circuitry, or a combination of both.

Memory 68 may be of one or more types, such as a solid-state variety, electromagnetic variety, optical variety, or a combination of these forms.

Furthermore, memory 68 can be volatile, nonvolatile, or a mixture of these types, and some or all of memory 68 can be of a portable variety, such as a disk, tape, memory stick, cartridge, or the like. In addition, memory 68 can store data that is manipulated by the operating logic 70 of processing device 64, such as data representative of signals received from and/or sent to input/output device 66 in addition to or in lieu of storing programming instructions defining operating logic 70, just to name one example. As shown in FIG. 2, memory 68 may be included with processing device 64 and/or coupled to the processing device 64.

The operating logic 70 can include the algorithms and steps of the controller, whether the controller includes the entire suite of algorithms necessary to effect movement and actions of the robot 50, or whether the controller includes just those necessary to receive data from the camera 57, determine a point cloud, find edges, utilize object recognition, and/or resolve position of the objects, among other functions. The operating logic can be saved in a memory device whether of the volatile or nonvolatile type, and can be expressed in any suitable type such as but not limited to source code, object code, and machine code.

Turning now to FIG. 3, another embodiment of a robot 57 and training object 56 environment is depicted. FIG. 3 includes a robot controller 74, edge computer 76, and computer server 78, all of which are depicted as separate components but can also be understood to be the same as or associated with any of the computer 60 and/or controller 55. The edge computer 76 can be structured to be useful in image processing and/or the learning described herein, and are contemplated to be as distributed device node (e.g. a smart device or edge device akin to the geographic distribution of nodes such as in the Internet of Things) as opposed to those functions being performed in a centralized cloud environment. Any data generated herein, however, can be shared to remote stations such as through cloud computing. In one form the edge device can be structured to receive data generated from the camera 57 and operate upon the data to generate point clouds, detect edges, etc. In one nonlimiting form the edge computer can be structured to detect edges in objects, such as by detecting discontinuities in brightness within an image. Such discontinuities can

correspond to discontinuities in depth and/or surface orientation, change in material property, variations in scene illumination, etc. Edge detection can sometimes, but not exclusively, be grouped in two categories: search-based and zero-crossing. The edge computer 76 may also be useful in filtering an image, as well as edge thinning to remove unwanted spurious points on the edges in an image. In some forms the edge computer 76 can also be based upon second- order derivatives of the intensity. Still further, the edge computer 76 can utilize techniques such as the phase stretch transform. In short, the edge computer 76 can be constructed using a variety of techniques and employ a number of associated computations which assist in identifying edges of a training object 56. FIG. 3 also depicts an illumination source 80 useful to radiate the training object 56 which is positioned using the positioner 59. The illumination source 80 can be constructed to radiate in any number of manners, from focused to diffuse, low to high brightness, specific to broad based electromagnetic radiations, etc. Such electromagnetic radiation can occur over any range of suitable frequencies, from infrared, visible, ultraviolet, etc. In one non-limiting form, the illumination source emit light in the visible electromagnetic band at a variety of intensities. Additional and/or alternative to the above, the illumination source 80 can be positioned at a variety of locations around the training object 56, at a variety of distances, and be oriented to project in a variety of angles relative to the training object 56. Not all embodiments need include an illumination source. In some forms different illumination sources can be used.

Robots are widely used in the assembly or production lines to automate a lot of manufacturing procedures today. However, to execute pre-programmed manufacturing operations on target parts/objects, it is necessary for robots to first locate them very accurately in the assembly or production lines.

Most existing solutions can be categorized into two methods: one is to take advantage of some fixtures to always place target parts/objects in fixed locations with fixed poses, and the other is to develop robust part/object features based on geometry or color to locate them through mounted sensors. However, in both methods, different kinds of solutions must be specifically developed for different kinds of parts/objects in different kinds of applications, because of the constraints of the specially designed fixtures or manually selected part/object features. Therefore, the weak generality and reusability of existing methods always cause a lot of additional development time and cost for new parts/objects or new applications. In extreme cases, everything has to be redone from the beginning.

Embodiments disclosed herein introduce a self-adaptive optimization system for robots with mounted sensors, which can automatically learn new target parts/objects and locate them in the working runtime.

In the training time, target parts/objects are placed on a positioner in different pose/scene settings; sensor(s) mounted on the robot (e.g. robot arms) are moved around target parts/objects to capture a number of representative images in different robot/sensor settings; then, all the captured images are preprocessed by edge computers and sent to computer servers for training. Based on the evaluation results of the trained system from existing images, computer servers might ask for more images of target parts/objects to continue the training until a robust performance of the trained system is achieved.

After the learning is finished, the trained system with the learned knowledge of target parts/objects can be directly used to locate them in the working runtime.

Embodiments described herein may be used to:

(1 ) Automatically optimize robot/sensor/scene settings to collect new images for object training which can improve the runtime performance of the trained system; (2) Automatically optimize robot/sensor/scene settings to collect new images of objects to evaluate the runtime performance of the trained system in all different kinds of application scenarios;

(3) A complete control loop which manages and optimizes both the training and the evaluation of the system until the robust performance of the system is achieved;

(4) All image data collecting in the training and the evaluation is optimized. Our system tries to use the least amount of image data of target objects in training and evaluation to achieve the best runtime performance; and

(5) Automatically learn a system which tries to achieve the best runtime performance in different kinds of application scenarios.

In many embodiments five separate hardware functions/components are included, such as:

(1 ) A positioner. Target parts/objects are placed on it in different pose/scene settings for learning;

(2) Sensors. Sensors are mounted on robots and used to capture a number of representative images of target parts/objects in different settings. Multiple sensors could be mounted on the same robot arm and be in different types to capture different types of part/object information;

(3) Robot arms and controllers (robots). Robots are used to move the sensors around target parts/objects to capture images in different robot/sensor settings; (4) Edge computers. Edge computers control the motion of robot arms through robot controllers, preprocess all captured images and send images to computer servers for learning; and

(5) Computer servers. Servers run machine learning algorithms based on all collected images to model the knowledge of target parts/objects for different kinds of application purposes.

The training and evaluation useful to develop the generalized object learning described herein can involve the capture of several images set at different orientations of the training object 56 by the positioner 59, and/or as set using different lighting of the illumination source 80, and/or as set using different positions of the robot 50 and device 57. The training can be conducted in any number of manners, with the step of evaluation useful to determine when training is sufficient. In one form a matrix of image capture locations, positioner orientations, and/or lighting conditions from the illuminator can be used to develop a range of images useful in the embodiments herein in the creation of a dataset for class of object library. A validation dataset of images can be used to test whether the training matrix has sufficiently covered the training object 56.

Training based on the images obtained during the training can include any number of different methodologies as will be appreciated. Learning can take the form of supervised, semi-supervised, or unsupervised learning, although many embodiments herein contemplate the use of unsupervised learning. Principal component analysis and cluster analysis can be used. Artificial neural networks, support vector machines, Bayesian networks, and genetic algorithms are contemplated in the learning stage. In one non-limiting embodiment, deep learning techniques are also contemplated. In short, any techniques and methods useful to conduct leaning of the training object are contemplated herein. The learning can be conducted in the same computer (server 78, controller 55, etc) or can be separate from the machine that performs any other particular function of the embodiments described herein.

FIG. 4 depicts an embodiment of a self-adaptive optimization procedure of object learning. The procedure starts with the section of a

robot/sensor/object/scene setting. Such a setting can include a position and/or orientation of the robot 50 (with accompanying position and/or orientation of the camera 57, position and/or orientation of the training object 56 via the positioner 59, and one or more parameters associated with the illumination source 80 (if used)). Images are then taken via the camera 57 which are used to train the model. The data can be split into a leaning data set and an evaluation data set.

A validation data set (one or more images designated to test the trained model) is used to evaluate the robustness of the trained model. Any suitable test can be used to assess whether the training was successful or not. In many instances the test can result in a score. If the robustness test fails, the procedure loops to the beginning to select a new robot/sensor/object/scene setting. The ability to pick a new robot/sensor/object/scene setting can be through serial testing of a pre-defined test matrix, but can also be adaptive in the sense that intermediate robot/sensor/object/scene setting can be selected based upon the results of the evaluation step. It is also possible that entirely new robot/sensor/object/scene settings are developed based upon the results of the evaluation step. It will be appreciated that examples of such robot settings can be feedback settings, position settings, etc; examples of such sensor settings can include sensitivity settings, aperture settings, resolution settings, etc; examples of such object settings can be position settings, etc; and examples of such scene settings can include illumination type, illumination intensity, etc. Once the evaluation step is passed, the procedure can cease and a validated, trained model can be used.

Additionally and/or alternatively, the learning can include:

(1 ) Place target parts/objects in certain pose/scene settings on the positioner;

(2) Adjust robot/sensor/scene parameters;

(3) Move the robot arm and collect new images and/or sensor information of target parts/objects;

(4) New images and/or sensor data are preprocessed by edge computers and sent to computer servers for new or continuous object training;

(5) Run machine learning algorithms to train new object models. Depending on the application purposes, different kinds of training criteria and methods can be used;

(6) Evaluate the performance of the trained system. For example, an optimized set of new images of target parts/objects can be collected for evaluation, combined with all failure images from previous evaluation tests; (7) If the required system performance has not been achieved, optimize robot/sensor/object/scene parameters to collect new images for more training of object models.

Repeat the procedures (1 ) - (7) until the required system performance is achieved. Not all of procedures (1 ) - (7) need be repeated depending on the application.

As will be appreciated, the images made available by step (4) can be split between a leaning data set and an evaluation data set.

As will be appreciated, when any of procedures (1 ) - (7) are repeated, new data may be required to assist in furthering the training. The new data can be collected into a new data set which can be used exclusively for new learning, or can be split into leaning and evaluation data sets as above. New data can be collected at the same settings as those collected above, but new data at other settings are also contemplated. The new settings can be completely new robot settings, sensor settings, object settings, and scene settings, or a combination of new and old settings of the same. If new settings are developed, the robot can develop such settings according to a pre-structured update schedule (e.g. finer resolution between object positions, incremental increase in illumination along a preprogrammed step increase, etc), but other new settings can be developed in an unsupervised manner such as, but not limited to, a random walk, etc.

Whether the new settings are determined via a pre-structured update or are unsupervised (e.g. random walk), it will be appreciated that the new settings can be identified around a point of weakness identified by the robot during the evaluation step. If, for example, the robot identifies a weakness in its evaluation of robustness around a certain illumination range, increases in illumination can be generated and new data collected for further leaning and/or evaluation.

Embodiments described herein can be structured to robustly conduct object recognition, object detection, and pose estimation.

Building upon the embodiments described above with respect to FIGS. 1 - 4, an extension of the embodiments above (including all variations in the embodiments above) includes adding one or more robots to the single robot 50 above and/or one or more positioners to the single positioner 59. Such extension can greatly speed up the training and evaluation of the previous system and improve the performance of the system in the working runtime. The robots 50 can be the same, but can also be different. Likewise, the positioners 59 used can be the same but can also be different. The extension of the embodiments above by the inclusion of multiple robots 50 and/or multiple positioners 59 can be characterized by:

(1 ) Multiple robots and positioners are utilized in the training and evaluation of the system;

(2) Training and evaluation of the system can be done simultaneously;

(3) Images for object training and evaluation in different

robot/sensor/object/scene settings can be collected simultaneously;

(4) Images and/or sensor information of multiple different target objects/parts can be collected and the knowledge of them can be trained and evaluated by the same system simultaneously; (5) The system can be trained and evaluated for multiple application purposes simultaneously.

To speed up the training and evaluation procedure, multiple robots 50 and positioners 59 can be utilized in the following ways:

(1 ) Robot(s): at least two robots are used if only one positioner;

(2) Positioner(s): multiple positioners with same or different target objects/parts can be used if only one robot;

(3) A robot with the current robot and sensor settings can collect images and/or sensor information of different settings of same/different target

objects/parts from multiple positioners;

(4) A positioner with the current pose/scene setting of a target object/part can be used to collect images and/or sensor information by multiple robots in different robot and/or sensor parameters simultaneously;

(5) Training and evaluation of the system can be done by multiple robots simultaneously. In a simple case of two robots with one positioner, for example, one robot can be used to collect new images and/or sensor information of target objects/parts with different robot and/or sensor settings for training, and the other can be used to collect new images and/or sensor information of target

objects/parts to evaluate the performance of the trained system simultaneously;

(6) More than one robots can be used to collect new images and/or sensor information of target objects/parts in different robot and/or sensor parameters for training simultaneously; (7) More than one robots can be used to collect new images and/or sensor information of target objects/parts in different robot and/or sensor parameters to evaluate the performance of the trained system simultaneously;

(8) With multiple positioners and robots, images and/or sensor information of different settings of same target objects/parts with different scene settings on different positioners can be collected for training and evaluation simultaneously;

(9) With multiple positioners and robots, different target objects/parts can be set on different positioners for image and/or sensor collecting simultaneously; therefore, different target objects/parts can be trained and evaluated

simultaneously;

(10) With multiple positioners and robots, different settings of

same/different target objects/parts for different application purposes can be trained and evaluated by the same system simultaneously.

With continued reference to the embodiments described above (single robot, multiple robots, edge detectors, training, camera(s), etc), additional constraints can be placed upon the system to further speed and/or improve the training and evaluation process. The training and evaluation procedure in the previously introduced system can be very general, and attempts to achieve the best performance in all kinds of possible application scenarios. The learned system is very knowledgeable about target objects/parts, but such knowledge can consume large amount of time and hardware resources to finish the training and evaluation. On the other hand, most applications do not require such a knowledgeable system about target objects/parts, for the reason that there are only a limited number of different kinds of cases to deal with for most specific applications. Therefore, for most specific applications, such a knowledgeable system is overqualified and it wastes a lot of time and hardware resources to achieve such as a knowledgeable system about target objects/parts.

Again, continued reference to the embodiments described above (single robot, multiple robots, edge detectors, training, camera(s), etc), further additional refinements introduce an application-case driven approach for object learning with robots in the previous system. The main idea is that, for most specific applications, a lot of settings/parameters about robots, sensors, target

objects/parts and runtime scenes are known and fixed in certain specific ranges. All these known settings/parameters should be used in the training and evaluation procedure of the system. By adding more constraints in the training and evaluation procedure, we can make it much faster and also achieve same robust performance in the working runtime for target applications.

Given the additional constraints in the refinements of the embodiments described above, the following features will be appreciated in these refinements:

(1 ) The training and evaluation procedure of the system can be driven by specific application cases; all robot/sensor/object/scene settings in the training and evaluation procedure are optimized based on the requirements of specific application cases;

(2) Other specific knowledge of each application which can help improve the system performance is also utilized in object learning of the system for that application. The training and evaluation procedure in refinements discussed immediately above can be sped up in the following methods, based on the constraints and requirements of different specific application cases:

(1 ) Restrict the motion range of target objects/parts in the images. In most specific application cases, target objects/parts might only appear in a specific range of locations in the images of the working runtime. Therefore, it is not necessary to always run the algorithms of training and evaluation in the whole images. Less background distraction can be introduced;

(2) Restrict the settings/parameters of sensors. For example, distance, exposure time, viewpoints, and so on. Images for training and evaluation of target objects/parts are only needed to be captured in the application-case required settings/parameters of sensors;

(3) Restrict the settings of target objects/parts. Target objects/parts might be only set in certain ranges of poses in the working runtime. As a result, it is not necessary to consider all possible poses of target objects/parts in the training and evaluation;

(4) Restrict the settings of the scenes according to specific application cases. For example, light conditions, occlusions;

All (2), (3) and (4) can bring in less variance about the appearances of target objects/parts in the training and evaluation;

(5) If certain objects always co-appear with target objects/parts in the application scenarios, for example, bins, they can also be used to help locate target objects/parts and they can be included in the training and evaluation;

and/or

(6) With more constraints as above and smaller regions on the images to be focused on, high resolution images which provide more details of target objects/parts can be used for training and evaluation and they can help improve the performance of the system.

The embodiments described above (single robot, multiple robots, edge detectors, training, camera(s), etc) as well as the additional constraints discussed immediately above to further speed and/or improve the training and evaluation process (e.g. the application specific cases) can be extended to a system that accounts for objects of the same class. For example, the training object 56 that is used to develop a system that robustly recognizes the object in various poses, lighting conditions, etc can itself be a type of object that fits within a broader class of objects. To set forth just one possibility, the training object 56 may be a threaded machine screw having a rounded head that fits within a larger class of “screws.” The embodiments described above can be extended to train a system on one particular type of screw, further train the system on another type of screw (wood screw), and so on. Any number of different objects within a class can be used for training purposes with the intent that the trained system will be able to recognize any of the objects from within the class of objects. For example, a second training object 56 that falls within the same class as the original training object 56 can be put through the same range of robotic image capture locations, positioner orientations, lighting conditions, etc to develop its own range of images useful in the creation of a dataset for that particular object. The ability to form a visual data set on a class of objects using the procedures described herein can be applied to any of the embodiments (one or more robots, one or more positioners, edge detectors, etc). As will be appreciated, not only can the system be used to train object recognition over a class of objects, but can also be used to train the system how to capture the objects using one or more different end effectors. In one nonlimiting example, the system can be used to train the system to use one or more grippers to capture the objects within the class of objects. References may be made herein to training the system to recognize objects in the class of objects, but it will be appreciated that the same techniques apply to training the system to recognize how to interact with the objects through various end effectors. No limitation is hereby intended to be limited to strictly object recognition and not object interaction unless specifically expressed to the contrary.

With continued reference to the embodiments described above (single robot, multiple robots, edge detectors, training, camera(s), etc) as well as the additional constraints discussed immediately above to further speed and/or improve the training and evaluation process (e.g. the application specific cases) and the creation of a system capable of detecting objects within a class of objects, a system can be developed and provided to an end user which takes into account and utilizes information provided from the various training features described above. A vision system integrated with industrial robots typically require extensive training and evaluation for each new task or part. The training and evaluation is performed by a knowledgeable and experienced robot operator. In this patent application, we develop a system including a built-in perception controller and visual memory where a class of objects models are known. By providing the knowledge of known objects, the training of the robotic vision system for each part is removed because the class of parts / objects are already defined in the robot visual memory. The improvement of having built-in

perception controller, visual memory and perception functions provide the ability to deliver a robot system that can work out of the box, eliminating the need for training and evaluation. This can result in cost and time reductions, simplification of the robot interaction, and possible deployment of large number of robots in a short period of time.

As used herein, the term“visual memory” will be understood to include any suitable model, dataset, and/or algorithm which is capable of being

referenced and evaluated when a worktime image is collected and an attempt is made to identify the object in the worktime image from the visual memory.

Also as used herein,“worktime” can be contrasted with“training time” in that“worktime” involves tasks associated with a production process intended to result in a sales or servicing event for a customer, while“training time” involves tasks associated with training a robot and which may not result in a sales or servicing event. In this regard“worktime” may result in no training, and a “training time” event may result in no actual work production.

Industrial robots can utilize vision / perception systems to perform flexible and adaptive tasks. The vision systems integrated with an industrial robots typically need to be trained before use. There are many tasks a vision system integrated with industrial robots can be used for, such as: material handling, machine tending, welding, dispensing, paint, machining and so on. The vision system is used for example to recognize and locate parts / objects of interest in the scene. These parts of interest usually are standard within a customer, across segments or industries. Example of such parts are: screws, nuts, bolts, plates, gears, chains, hinges, latches, hasps, catches and so on. The vision system integrated with industrial robots typically need to be trained and tested for each part type in production conditions. For example, the training usually consists in selecting an area of interest in an image to select a part or section of the part, or collecting many images with different scales, viewpoints, light conditions, labeling them and then use a neural network to calculate the object model or models.

In order to simplify the robot operation for standard parts, described herein is a robotic perception system that has built-in perception tasks. A robot, for example, is able to recognize, locate, sort, count, calculate grasps or relative positions of the tool to the part of interest with built-in / visual memory functions that are deployed with a (brand new) robot. In addition to being able to perform perception tasks, such a robot have a tool specialized for handling screws. Such a robot is a robot knowing how to handle screws for example. In conclusion, such a system will include all the hardware and software components to handle only a class of parts, such as: vision sensors mounted on the arm or statically mounted, computational hardware, robot tool, end effector, software and algorithms. It will be appreciated that the systems described herein that can provide a robot“out of the box” ready to detect and manipulate objects can provide the following features:

1 ) Complete robotic system, ready to be deployed for handling standard parts;

2) Build-in visual memory for a class of standard parts;

3) Robot tool suitable to handle the class of object built-in the system; and/or

4) Perception algorithms suited for the class of parts present in the visual memory.

A built-in visual memory for a class of objects, the associated robot tool or tools to handle / manipulate the objects from the visual memory and the perception tasks / functions integrated in the robot controller are the critical components of a robot system that know how to pick parts without training. Such a system is a robotic system that know how to handle parts, e.g., a robot that knows how to pick screws, or nuts, or bolts, or gears ... out of the box. The robot includes all the tools, hardware and software, to start working the moment is set ready to work.

This robot system will know to handle / manipulate the objects that exist in the robot visual memory and this means that the robot perception functions can identify, recognize, check for presence, count, sort, locate, and generate grasps for the known objects. The robot perception functions are optimized for the known objects from the visual memory. The robot system know how to handle a specific class of parts. The robot system knows, for example, to pick a screw from a box of screws and the robot operator needs to specify what the robot does with the screw after it was pick. The logic of the robot sequence after the screw was picked, for example, is specific for each installation, and has to be introduced by the robot operator.

Details about the hardware components:

a) Robot arm can have visual sensors integrated. There is an option to provide visual sensors that are statically mounted. In this case, additional steps are needed to install the visual sensors before operation. In both cases, the robot visual memory include complete functionality to solve perception tasks for the known objects;

b) Robot tool suitable for the known objects;

c) Robot controller responsible for the robot arm motion and necessary functions; and/or

d) Perception controller responsible for keeping the class of objects database, knowledge about the robot geometric structure with the kinematic parameters. It is also provide the connections to the visual sensors. The perception tasks / functions are implemented in the perception controller.

Details about software controller:

a) Robot visual memory is a collection of one or more models, e.g., neural network models, 3D models ... that allow a wide range of perception tasks to be solved; b) Robot perception tasks include a set of perception functions that can be solved using the models from the robot visual memory and the visual data measured with the visual sensors; and/or

c) Robot geometric and kinematic data stored on the perception controller that can include: CAD models of the robot arm and robot tool, kinematic parameters of the robot arm and tool.

An illustration of hardware components of an“out of the box” robot capable of detecting and manipulating objects from a class of objects is depicted in FIG. 5. The perception controller is in communication with the robot controller structured to control one or more functions of the robot (e.g. activate an actuator), visual sensors used to detect robot surroundings (e.g. a camera), a visual memory which includes the class of objects dataset, as well as the perception tasks. Such an“out of the box” robot can be produced by a manufacturer and trained on a class of objects (recognition and/or interaction therewith), and can be sold to a customer for installation at a customer facility. The customer can then develop the robot further to include actions to be performed after the part was recognized and interacted with (e.g. captured).

Such customer specific actions can include how to dispose of the object (e.g. insertion into an opening, moved to a specific bin, etc). Such“out of the box” robots prepared to recognize and capture objects at the manufacturer can increase the speed by which customers integrate a robot into their operations.

It will be appreciated that any of the embodiments described herein (training and evaluation, multiple robot, robot trained to recognize a class of objects, restricted learning, etc) can be used with any of the other embodiments. No limitation in hereby intended unless expressly contemplated to the contrary.

An aspect of the present application includes an apparatus comprising a robot having an end effector structured to interact with a worktime object, a camera structured to image the worktime object, and a perception controller in data communication with a visual memory structured to identify an object from within a class of objects.

A feature of the present application includes wherein the class of objects associated with the visual memory includes a plurality of different objects in a common class of objects.

Another feature of the present application includes wherein the visual memory includes information related to identifying a robot capture useful to capture a particular object.

Yet another feature of the present application includes wherein the robot capture is a robot grasp, and wherein the robot grasp is a function of the type of effector of the robot.

Still another feature of the present application includes wherein the class of objects associated with the visual memory includes a type of capture suitable to interact with at least one of the plurality of objects in the common class of objects.

Yet still another feature of the present application includes wherein the robot is an out-of-the-box robot structured to be positioned into place and enabled, the robot lacking any requirement for object training prior to introduction to a workflow process.

Still yet another feature of the present application includes wherein the out-of-the-box robot is a modular system capable of being packaged and delivered without need to conduct object training at a customer location.

A further feature of the present application includes wherein the workflow process includes object recognition and object grasping, wherein the object recognition and object grasping are provided by the visual memory.

A still further feature of the present application includes wherein the workflow process further includes disposal of the object once it is grasped by the robot using the visual memory.

A yet still further feature of the present application includes wherein a training for a disposal of the object in the workflow process is not provided in the visual memory upon delivery to a customer.

Another aspect of the present application includes a method comprising providing a robot having an end effector structured to interact with a plurality of training objects, the robot in cooperative communication with a camera structured to image the plurality of training objects, training a learning system to recognize an interaction with a first training object of the plurality of training objects, training the learning system to recognize an interaction with a second training object of the plurality of training objects, the second training object different from the first training object but in a common class as the first training object, forming a visual memory structured to identify an object from within a class of objects based on the training the learning system with the first training object and the training the learning system with the second training object.

A feature of the present application includes wherein the training the learning system with the first training object includes training the system to capture the first training object with a first end effector.

Another feature of the present application includes wherein the visual memory includes information related to identifying a robot grasp for a particular object.

Still another feature of the present application includes wherein the training the learning system with the first training object includes training the system to capture the first training object with a second end effector.

Yet still another feature of the present application includes wherein the training the learning system with the first training object includes evaluating the training against a training data set, and if the evaluating results in an evaluation score below a threshold a new training is initiated to collect additional data.

Still yet another feature of the present application further includes forming a perception controller that includes the visual memory.

A further feature of the present application includes delivering another robot with the visual memory to a customer, wherein the robot is an out-of-the- box robot structured to be positioned into place and enabled, the robot lacking any requirement for object training of a working object prior to introduction to a workflow process, and wherein the out-of-the-box robot is a modular system capable of being packaged and delivered without need to conduct object training at a customer location.

A yet further feature of the present application includes wherein the workflow process is developed by a customer at a customer location.

A still yet further feature of the present application includes wherein the workflow process further includes training the perception controller to dispose of the working object once it is grasped by the robot using the visual memory.

Still yet another feature of the present application includes wherein a training to dispose of the working object in the workflow process is not provided in the visual memory upon delivery to a customer.

While the invention has been illustrated and described in detail in the drawings and foregoing description, the same is to be considered as illustrative and not restrictive in character, it being understood that only the preferred embodiments have been shown and described and that all changes and modifications that come within the spirit of the inventions are desired to be protected. It should be understood that while the use of words such as preferable, preferably, preferred or more preferred utilized in the description above indicate that the feature so described may be more desirable, it nonetheless may not be necessary and embodiments lacking the same may be contemplated as within the scope of the invention, the scope being defined by the claims that follow. In reading the claims, it is intended that when words such as“a,”“an,”“at least one,” or“at least one portion” are used there is no intention to limit the claim to only one item unless specifically stated to the contrary in the claim. When the language“at least a portion” and/or“a portion” is used the item can include a portion and/or the entire item unless specifically stated to the contrary. Unless specified or limited otherwise, the terms“mounted,”

“connected,”“supported,” and“coupled” and variations thereof are used broadly and encompass both direct and indirect mountings, connections, supports, and couplings. Further,“connected” and“coupled” are not restricted to physical or mechanical connections or couplings.

Claims

CLAIMS WHAT IS CLAIMED IS:

1. An apparatus comprising:

a robot having an end effector structured to interact with a worktime object;

a camera structured to image the worktime object; and

a perception controller in data communication with a visual memory structured to identify an object from within a class of objects.

2. The apparatus of claim 1 , wherein the class of objects associated with the visual memory includes a plurality of different objects in a common class of objects.

3. The apparatus of claim 2, wherein the visual memory includes information related to identifying a robot capture useful to capture a particular object.

4. The apparatus of claim 3, wherein the robot capture is a robot grasp, and wherein the robot grasp is a function of the type of effector of the robot.

5. The apparatus of claim 3, wherein the class of objects associated with the visual memory includes a type of capture suitable to interact with at least one of the plurality of objects in the common class of objects.

6. The apparatus of claim 2, wherein the robot is an out-of-the-box robot structured to be positioned into place and enabled, the robot lacking any requirement for object training prior to introduction to a workflow process.

7. The apparatus of claim 6, wherein the out-of-the-box robot is a modular system capable of being packaged and delivered without need to conduct object training at a customer location.

8. The apparatus of claim 6, wherein the workflow process includes object recognition and object grasping, wherein the object recognition and object grasping are provided by the visual memory.

9. The apparatus of claim 8, wherein the workflow process further includes disposal of the object once it is grasped by the robot using the visual memory.

10. The apparatus of claim 9, wherein a training for a disposal of the object in the workflow process is not provided in the visual memory upon delivery to a customer.

11. A method comprising: providing a robot having an end effector structured to interact with a plurality of training objects, the robot in cooperative communication with a camera structured to image the plurality of training objects;

training a learning system to recognize an interaction with a first training object of the plurality of training objects;

training the learning system to recognize an interaction with a second training object of the plurality of training objects, the second training object different from the first training object but in a common class as the first training object;

forming a visual memory structured to identify an object from within a class of objects based on the training the learning system with the first training object and the training the learning system with the second training object.

12. The method of claim 11 , wherein the training the learning system with the first training object includes training the system to capture the first training object with a first end effector.

13. The method of claim 12, wherein the visual memory includes information related to identifying a robot grasp for a particular object.

14. The method of claim 13, wherein the training the learning system with the first training object includes training the system to capture the first training object with a second end effector.

15. The method of claim 13, wherein the training the learning system with the first training object includes evaluating the training against a training data set, and if the evaluating results in an evaluation score below a threshold a new training is initiated to collect additional data.

16. The method of claim 15, which further includes forming a perception controller that includes the visual memory.

17. The method of claim 16, delivering another robot with the visual memory to a customer, wherein the robot is an out-of-the-box robot structured to be positioned into place and enabled, the robot lacking any requirement for object training of a working object prior to introduction to a workflow process, and wherein the out-of-the-box robot is a modular system capable of being packaged and delivered without need to conduct object training at a customer location.

18. The method of claim 17, wherein the workflow process is developed by a customer at a customer location.

19. The method of claim 18, wherein the workflow process further includes training the perception controller to dispose of the working object once it is grasped by the robot using the visual memory.

20. The method of claim 19, wherein a training to dispose of the working object in the workflow process is not provided in the visual memory upon delivery to a customer.