CN115351780A

CN115351780A - Method for controlling a robotic device

Info

Publication number: CN115351780A
Application number: CN202210527848.4A
Authority: CN
Inventors: L·洛佐; V·达夫
Original assignee: Robert Bosch GmbH
Current assignee: Robert Bosch GmbH
Priority date: 2021-05-17
Filing date: 2022-05-16
Publication date: 2022-11-18
Also published as: KR20220155921A; JP2022176917A; DE102021204961B4; DE102021204961A1

Abstract

Method for controlling a robotic device. According to various embodiments, a method for controlling a robotic device is described, comprising providing demonstrations of robot skills, wherein each demonstration demonstrates a trajectory comprising a sequence of robot configurations, each robot configuration being described by an element of a predetermined configuration space having a riemann manifold structure, determining a representation of each trajectory as a weight vector of predetermined basic movements of the robotic device by searching for a weight vector minimizing a distance metric between a combination of the basic movements according to the weight vector and the demonstration trajectory, wherein the combination is mapped to the manifold, determining a probability distribution of the weight vectors by fitting the probability distribution to the weight vectors determined for the demonstration trajectories, and controlling the robotic device by performing the basic movements according to the determined probability distribution of the weight vectors.

Description

Method for controlling a robotic device

Prior Art

The present disclosure relates to a method for controlling a robotic device.

In many applications, it is desirable that a robot can execute autonomously in a potentially dynamic and unstructured environment. For this reason, they need to learn how to move into and interact with their surroundings. To do so, the robot may rely on a skill base that may be used to perform simple movements or to perform complex tasks as a combination of several skills. The way to learn motor skills is via a human example, which is called learning from a presentation (LfD). This requires a (usually human) expert to show one or several specific movements to be simulated by the robot.

A. The publication "Using probabilistic dynamics primitives in robotics" by Paraschos et al in Autonomous Robots (42-551, 2018) describes a probabilistic movement primitive (ProMP), a probabilistic framework for learning and synthesizing robot motor skills. ProMP represents the trajectory distribution based on a compact basis function representation. Its probabilistic formulation enables the development of variance information in motion modulation, parallel motion activation, and control.

While ProMP has been used to learn cartesian movements, its formulation does not permit directional movements to be handled in the form of quaternion trajectories. Quaternions, however, have advantageous characteristics for robotic control, such as they provide nearly minimal representation and strong stability in closed-loop directional control. Accordingly, a method that allows robot control learning from a presentation that includes quaternion trajectories is desirable.

Disclosure of Invention

According to various embodiments, a method for controlling a robotic device is provided, comprising providing demonstrations of robot skills, wherein each demonstration demonstrates a trajectory comprising a sequence of robot configurations, wherein each robot configuration is described by an element of a predetermined configuration space having a riemann manifold structure. The method further comprises for each demonstration trajectory, determining a representation of the trajectory as a weight vector of a predetermined basic movement of the robotic device by: a weight vector is searched for that minimizes a distance metric between a combination of the base movements according to the weight vector and the presentation trajectory, wherein the combination is mapped to the manifold. The method further comprises determining a probability distribution of weight vectors by fitting the probability distribution to the weight vectors determined for the demonstration trajectory, and controlling the robotic device by performing the basic movements according to the determined probability distribution of weight vectors.

According to various embodiments, the above-described method provides robot control using the riemann manifold method to encode, reproduce and adapt probabilistic moving primitives (using multivariate geodesic regression, as described in detail below). In particular, according to various embodiments, the space of quaternion trajectories is considered to be a riemann manifold. This method allows robots to learn and reproduce skills, while being less prone to encode inaccurate data or reproduce distorted trajectories, compared to methods that do not know the geometry, such as classical ProMP. The model is also more interpretable because it does not rely on a coarse approximation. Furthermore, the method provides additional adaptation capabilities, such as modulation of the trajectory profile and mixing of moving primitives.

According to various embodiments, the demonstration track is represented as a weight vector by geodesic regression. This means that geodesic lines can be seen as fitting to each demonstration track.

Various examples are given below.

Example 1 is a method for controlling a robotic device as described above.

Example 2 is the method of example 1, wherein the probability distribution of the weight vectors is determined by fitting a gaussian distribution to the weight vectors determined for the demonstration trajectory.

Training and rendering using gaussian distributions provides reliable control for control scenarios that were not seen in the presentation.

Example 3 is the method of example 1 or 2, wherein each demonstration trajectory comprises a robot configuration for each time of the predetermined sequence of time points, and wherein each combination of basic movements according to the weight vector specifies a robot configuration for each time of the predetermined sequence of time points, and wherein the weight vector is determined for each demonstration trajectory by: determining from a set of possible weight vectors a weight vector of a combination of basic movements according to the weight vector and a presentation trajectory, wherein the combination is mapped to a manifold, being smallest in the set of possible weight vectors, wherein a distance between the combination of basic movements mapped to the manifold and the presentation trajectory is given by summing terms over time points of a sequence of time points, the terms comprising for each time point a term comprising a value or a power of a measure of the manifold between an element of the manifold given by the combination of basic movements at the time point when mapped to the manifold and the presentation trajectory.

This provides an efficient way to represent the presentation trajectory with weight vectors by fitting the weight vectors to the presentation trajectory. The combination may be mapped to the manifold by selecting a point on the manifold and mapping the combination to the manifold by an exponential function of the tangent space of the manifold at the selected point.

Example 4 is the method of any one of examples 1 to 3, comprising, for one of the presentation trajectories, searching for a point of the manifold and the weight vector such that the point and the weight vector minimize a distance metric between a combination of the basic movements according to the weight vector and the presentation trajectory, wherein the combination maps from a tangent space at the point to the manifold, and wherein, for each presentation trajectory, the mapping of the respective combination to the manifold is performed by mapping the combination from the tangent space at the selected point.

In other words, by performing optimization on the weights and points, a tangent space is determined for one demonstration track (i.e., the point of the manifold where the tangent space is taken). This tangential space is then used to map the combinations (or any combinations necessary during the search) to the manifold for all presentation trajectories. In other words, the same tangent space, and thus the same exponential mapping, is used for all presentation trajectories. This provides an efficient way to overcome the problem of using different cut spaces for different trajectories, which may lead to very diverse cut weight vectors.

Example 5 is the method of any one of examples 1 to 4, wherein the trajectory is an orientation trajectory, and each demonstration further demonstrates a location trajectory, and each robot configuration comprises a pose described by a vector in three-dimensional space and an orientation described by an element of a predetermined configuration space.

Thus, the skills may be learned by demonstrating a sequence of robot poses, such as end effector position and orientation, where the model for orientation is learned using a Riemannian manifold-based approach.

Example 6 is the method of any one of examples 1 to 5, comprising: providing a demonstration of more than robot skills, and determining for each skill a representation of the trajectory and the weight vector and a probability distribution of the weight vector, and controlling the robot device by determining for each skill a riemann gaussian distribution of the manifold points (at each time point) from the probability distribution of the weight vector, determining a product distribution of the riemann gaussian distribution of the skills, and controlling the robot device by sampling from the determined product probability distribution (at each time point).

This allows for mixing of skills in the skills learned from the demonstration on the Riemannian manifold.

Example 7 is a robotic device controller configured to carry out the method of any of claims 1 to 6.

Example 8 is a computer program comprising instructions that, when executed by a processor, cause the processor to perform a method according to any one of examples 1 to 6.

Example 9 is a computer-readable medium storing instructions that, when executed by a processor, cause the processor to perform a method according to any one of examples 1 to 6.

Drawings

In the drawings, like reference characters generally refer to the same parts throughout the different views. The drawings are not necessarily to scale, emphasis instead generally being placed upon illustrating the principles of the invention. In the following description, various aspects are described with reference to the following drawings, in which:

fig. 1 shows a robot.

FIG. 2 shows a spherical manifold

The points of which may each represent a possible orientation of the robotic end effector, for example.

FIG. 3 illustrates a spherical manifold in accordance with an embodiment

Multiple above generally linear regression.

Fig. 4 shows an example of applying the embodiment to letters on a sphere for illustration purposes.

FIG. 5 illustrates, for purposes of illustration, a mixing process for letters on a sphere, according to an embodiment.

Fig. 6 shows a flow chart illustrating a method for controlling a robotic device.

Detailed Description

The following detailed description refers to the accompanying drawings that show, by way of illustration, specific details and aspects of the disclosure in which the invention may be practiced. Other aspects may be utilized and structural, logical, and electrical changes may be made without departing from the scope of the present invention. Various aspects of the present disclosure are not necessarily mutually exclusive, as some aspects of the present disclosure may be combined with one or more other aspects of the present disclosure to form new aspects.

Hereinafter, various examples will be described in more detail.

Fig. 1 shows a robot 100.

The robot 100 includes a robot arm 101, such as an industrial robot arm for handling or assembling a workpiece (or one or more other objects). The robot arm 101 includes

manipulators

102, 103, 104 and a base (or support) 105 supporting the

manipulators

102, 103, 104. The term "manipulator" refers to a movable member of the robotic arm 101 whose actuation enables physical interaction with the environment, such as performing a task. For control, the robot 100 comprises a (robot) controller 106, the (robot) controller 106 being configured to enable interaction with the environment according to a control program. The last member 104 (furthest from the support 105) of the

manipulators

102, 103, 104 is also referred to as an end effector 104 and may include one or more tools, such as a welding torch, a grasping instrument, a spraying device, and the like.

The other manipulators 102, 103 (closer to the support 105) may form a positioning device such that together with the end effector 104, a robotic arm 101 is provided having the end effector 104 at its end. The robotic arm 101 is a robotic arm (possibly with a tool at its end) that may provide similar functionality as a human arm.

The robotic arm 101 may comprise

joint elements

107, 108, 109 interconnecting the

manipulators

102, 103, 104 to each other and to the support 105. The

joint elements

107, 108, 109 may have one or more joints, each of which may provide rotational motion (i.e., rotational motion) and/or translational motion (i.e., displacement) to the associated manipulator relative to one another. The movement of the

manipulators

102, 103, 104 may be initiated by means of actuators controlled by the controller 106.

The term "actuator" may be understood as a component adapted to affect a mechanism or process in response to being driven. The actuator may implement the command issued by the controller 106 (so-called activation) as a mechanical movement. An actuator, such as an electromechanical converter, may be configured to convert electrical energy into mechanical energy in response to a drive.

The term "controller" may be understood as any type of logic-implemented entity, which may include, for example, circuitry and/or a processor capable of executing software stored in a storage medium, firmware, or a combination thereof, and which may, for example, issue instructions to actuators in this example. The controller may be configured, for example by program code (e.g., software), to control the operation of the system, which in this example is a robot.

In this example, the controller 106 includes one or more processors 110 and a memory 111 storing code and data, based on which the processor 110 controls the robotic arm 101. According to various embodiments, the controller 106 controls the robotic arm 101 based on the machine learning model 112 stored in the memory 111.

According to various embodiments, the riemann manifold method is used for learning directional motion primitives using ProMP, i.e. an extension of classical ProMP using the riemann manifold formula is provided, which is denoted "directional ProMP".

The original (i.e. classical) probabilistic mobile primitive (ProMP) method deals with robot skills in euclidean space, making learning and reproduction of quaternion trajectories (representing robot orientations) infeasible.

The Riemann's formula for ProMP described below enables the learning and reproduction of quaternion data. Furthermore, it allows the use of a generic Riemann manifold, due to the general processing given herein.

In the following, an introduction of ProMP for handling robot skills in euclidean space is given.

The following notations are used hereinafter:

typically, for a single move execution, a certain trajectory

Is marked as a variable

Time series of (2). Here, the first and second liquid crystal display panels are,

the robot configuration, also called for time t, may be expressed in time stepstThe joint angle of (or cartesian position in task space) of (may also be considered)

Additional time derivative of). Following the classic ProMP notation,

is to showdOf systems with a single degree of freedom (DoF)dA dimensional vector, for example a robotic arm 101 with 7 degrees of freedom.

Track of

Can be expressed as a linear basis function model

Wherein

Is that

The weight vector is maintained and the weight vector,

is a system containing time-dependent basis functions for each DoF

Is/are as follows

Dimension block diagonal matrices (the basis functions for one DoF are also called basic moves (e.g., moves in a certain direction, rotations around a certain axis),

the number of basis functions is indicated, and

is of uncertainty

Zero mean i.i.d. (independently identically distributed) gaussian noise.

ProMPs assumes that each presentation is characterized by a weight vector

Different values of (a), which result in a distribution

. Then, the complete trajectory can be modeled as a basis function at each t along with the slave

Extracted weights

The synthesis of (2). Thus, with respect to timetDistribution of state of

Can be calculated as follows

From which at each time steptBoth mean and variance are estimated.

Example trajectories typically differ in length of time when learned from a presentation. ProMP overcomes this problem by introducing a phase variable to decouple the data from the time instance, which in turn allows time modulation. In this case, the demonstration ranges from

To

Redefining the demonstration track as

. Form a

Is also dependent on the phase variable z. In particular, proMP uses a gaussian basis function for stroke-based movement, which is defined as having a widthhAnd a center

Is/are as follows

Which is usually designed experimentally. These Gaussian basis functions are then normalized, resulting in

。

In general, the learning process of ProMP is mainly based on estimating the weight distribution

And (4) forming. To do so, the second as represented in (1) is estimated by maximum likelihood estimationiWeight vector of a presentation

. This results in a linear ridge regression solution that forms the following equation

Wherein

All observed points of track are linked, an

From a matrix of basis functions

All time instances of (c). Then, a group is givenNDemonstration that the weight distribution parameter can be estimated by maximum likelihood

。

To adapt to new situations, proMP allows motion via adjustments to arrive with associated covariance

Desired track point of

To track modulate the transit point or target location. This results in conditional probabilities

The parameters can be calculated as follows (assuming a Gaussian distribution)

。

By calculating the product of the trajectory distributions, different movement primitives can be mixed into a single motion. In particular, for a groupSA different ProMP

At each time steptThe mixing track of (A) follows a distribution

Its influence on the final motion is based on the blending weight

But may vary. Then, the user can use the device to perform the operation,

is easily estimated from the weighted product of the gaussian distribution.

The task parameters permit adapting the robot motion to e.g. a target object for reaching the task. Such information is often available during presentation and can be integrated into the ProMP formula. Formally, with ProMP taking into account external conditions

And learn from

To mean weight vector

Thereby resulting in a joint probability distribution

Wherein

Is learned using linear ridge regression.

As mentioned above, quaternions have advantageous characteristics for robot control. However, since quaternions (for robot control) satisfy the unit norm constraint, they do not form a vector space, and thus it is not sufficient to process and analyze variables having quaternion values (having unit norms) using the conventional euclidean space method. According to various embodiments, the rimps over quaternion space are formulated using riemann geometry.

Riemann manifold

Is thatmA dimensional topological space for which each point is locally similar to the Euclidean space

And it has a globally defined differential structure. For each point

Existence of a cutting space

It is a vector space consisting of the tangent vectors of all possible smooth curves passing through x. Riemann manifolds are equipped with smoothly varying positive definite inner products, called Riemann metrics, which are permitted in

Define the curve length. These curves are referred to as geodesic lines,is a generalization of straight lines on Euclidean space to Riemann manifolds, since they represent

The minimum length curve between two points in the middle.

FIG. 2 shows a spherical manifold S ² The points of which may each represent a possible orientation of the robotic end effector, for example.

Two points x and y are indicated on the sphere that the controller 106 can use to represent two different orientations of the robotic end effector 104.

The shortest distance between two points in the surrounding space will be a straight line 201, while the shortest path on the manifold is a geodesic line 202.

To exploit Euclidean cut spaces, the sum and return mapping between cut spaces may be used

They are denoted as exponential and logarithmic mapping, respectively.

Exponential mapping

A point u in the tangent space of x is mapped to a point y on the manifold such that it lies on the geodesic line starting from x in the direction u, such that the geodesic distance dM between x and y is equal to the norm of the distance between x and u. The inverse operation is called log mapping

I.e. by

。

Another useful operation on the manifold is parallel forwarding

It moves the elements between the cutting spaces so that the inner product between two elements in the cutting spaces remains constant.

For example, in FIG. 2,

、

is carried in parallel, vector

And

from

To

(for simplicity, the indices have been omitted

）。

For the following, random variables

Is introduced to have a mean value

Sum covariance

Is/are as follows

The riemann gaussian corresponds to an approximate maximum entropy distribution of the riemann manifold.

The following is for spherical manifold

Riemann distance, exponential and logarithmic mapping and expressions for parallel transport operations

According to various embodiments, a geodesic regression (e.g., is the controller 106) is used that generalizes the linear regression to the Riemannian manifold settings. The geodesic regression model is defined as

Wherein

Wherein

And

are the output and the input variables respectively, and,

is the base point on the flow pattern,

is a vector in the tangent space at p, and the error term

Is that

A random variable of values in the tangent space of (a). As an analogy to linear regression, one can compare

Explained as intercept p and slope

。

Now, consider a set of points

And

. The objective of geodesic regression is to find the geodesic curve

It is to allTFor is to

The relationship between them is best modeled. To achieve this, the model estimates the sum of the squares of the Riemann distance (i.e., the error) between the observation and the observation, i.e., the error

Is minimized in that

Is a manifold

The above-mentioned model estimation is carried out,

is a Riemann error, and

is a cutting cluster

The element (c) of (a). The least squares estimator of the geodesic model can be formulated as the minimum of the sum of squares of the above-mentioned riemann distances, i.e.,

。

however, (9) does not give an analytical solution like (3). A solution can be obtained by gradient descent, which requires the calculation of the derivative of the riemann distance function and the derivative of the exponential mapping. The latter is split into a first point p and a first speed

The derivative of (c). These gradients can be computed from the Jacobian field (i.e., the solution of a second order equation under the Riemann curvature tensor subject to certain initial conditions).

It should be noted that the geodesic model described above only considers scalar independent variables

This means that the derivative is formed by the Jacobian field along a single tangent vector

ParameterizedIs single by singleAnd (4) obtaining a geodesic curve. The calculation of the jacobian field depends on so-called adjoint operators, which in practice play a role in parallel transport on the geodesic regression error term. Extend thereto

A slightly different approach is required, which requires the identification of multiple geodesic curves (which can be viewed as "basis" vectors in euclidean space). A multivariate general linear model on the riemann manifold (MGLM) provides a solution to this problem.

MLGM uses a vector composed of multiple tangent vectors

Formed geodesic base

，

One for each dimension. The problem (9) can then be reformulated as

Wherein

. To solve for (10), the corresponding gradient can be computed by exploiting the insight that the adjoint operator resembles a parallel transport operation. In this way, the obstacles of designing special companion operators for multivariate situations can be overcome, and instead, parallel shipping operations can be performed to approximate the necessary gradients. The multivariate framework serves the analogy of (3) for locating at Riemannian manifold

Each demonstration above calculates the purpose of the weight vector.

In the following, it is explained when the presentation data corresponds to a quaternion trajectory, i.e.

How the MLGM may be used.

When a human presentation is characterized by cartesian motion patterns (via kinesthetic teaching or teleoperation), it is necessary to have a learning model 112 that encapsulates both translational and rotational movement of the robotic end effector. This means that a certain demonstration track

Now by the data point

Composition representing a time steptIn a full cartesian attitude of the end effector. In this case, the challenge is to learn ProMP in the orientation space, because

The euclidean case of (a) follows the classical ProMP.

First, in MGLM framework for

Such that it is similar to the linear basis function model in (1). Specifically, estimate

Wherein

And is provided with

。

This equivalent proves useful when establishing an analogy between the classical formulation of ProMPs and the method we propose for orienting the trajectory.

Similar to the case of (1),

point of (2)

Can be expressed asEarth measuring lineBasis function model

Wherein p is

The upper fixed base point is arranged on the upper side of the base,

is connected by

Weight vector

The large weight vector of (a) is,

is the same time-dependent basis function matrix as in (1), and

is coding

Covariance matrix of upper uncertainty. Two particular aspects of the formula are of particular note, namely（i）Mean of Riemann Gaussian distribution in (12), i.e.

The aforementioned equivalent formula using MGLM, and（ii）is formed in (12)

Corresponds to the vector of geodesic bases of the synthetic MGLM.

Since each presentation is composed of a different weight vector

To characterize so that again a distribution can be obtained

. Therefore, the number of the first and second electrodes is increased,

can be calculated as

Where the edge distribution depends on two probability distributions located on different manifolds (the time indices are omitted here and in the following for simplicity). However, mean value

Relying on a single fixed point

And is and

. These two observations are used to solve the tangent space as follows

Upper rim (13)

Wherein

Is from

Parallel transport covariance to p

. It should be noted that the edge profile is still located in the cutting space

And thus is mapped back to using an exponential mapping

This results in a final edge

Wherein

。

As described above, the learning process of ProMP is summarized as estimating the weight distribution

. To do so, for each presentationiThe controller 106 estimates the weight vector using MGLM

. For the start, use was made of what was introduced previously

Wherein

(wherein

) And is

Is the number of basis functions. Furthermore, consider demonstrating quaternion trajectories

Wherein

. Then, analogize to (3) in Euclidean space, where weight estimation is obtained by using (10), resulting in

Wherein the content of the first and second substances,

is at time oftA vector of the basis functions of (a), and

comprising a set of estimated tangent weight vectors

(i.e. occurring from among the points p e M

Individual tangent vectors).

FIG. 3 illustrates a spherical manifold for learning directional ProMPs weights

Multiple above generally linear regression. Given a trajectory y, a tangent space

Origin p and tangent weight vector of

Is estimated via (15).

To solve (15), calculate

Relative to p and each

Of the gradient of (a). As explained above, these gradients depend on so-called adjoint operators, which, in a broad sense, will each have an error term

From

Is brought to

Wherein

。

Thus, these companion operators can be approximated as parallel carry operations. This results in the following reformulation of the error function of (15)

。

Then, an error function

Corresponding to an approximate gradient of

。

Using the gradients described above, the controller 106 may target each presentationiEstimate the channel of

An individual vector

Formed vector

And a weight matrix

Both of which are described below. It should be noted that each presentation may result in a different estimate of p, which defines the vector used to estimate each cut weight

Flow pattern of

In (1)Origin point. This may result in a cross-presentation violationThe co-tangential space and thus the very diverse tangential weight vectors. An effective way to overcome this problem is to assume that all demonstrations share the same tangent space origin p, which is the same assumption made when defining the geodesic basis function model (12). Thus, according to various embodiments, the controller 106 estimates p for a single presentation and uses it to estimate all cut weight vectors for the entire set of presentations. Then, a group is givenNDemonstration, the weight distribution parameter can be calculated by standard maximum likelihood

Estimated as

。

An example of an algorithm for learning the robot control model 112 by directional ProMP is as follows, the controller 106 may already be provided with a set of modelsNThe algorithm is executed after each demonstration (e.g., provided by the user by moving the robotic arm 101 by hand).

As in classical ProMP, controller 106 may achieve the desired covariance by adjusting the motion to have an associated covariance

Desired track point of

To perform track modulation (i.e. to adapt to the new situation, i.e. the control scenario). This results in conditional probabilities

Which is similar to (13) depending on two probability distributions located on different manifolds. Here, the following fact is again utilized: mean value

Dependent on being single and fixed

Which in turn is the tangent space where the weight distribution lies

The radical of (2). This allows rewriting condition distribution as follows

Wherein

And are each and every

Are parameters that are estimated for the resulting conditional distribution. Since both distributions are now located in the embedded euclidian space

Above, new distribution parameters can thus be estimated similar to the classical ProMP tuning process, with particular attention to the parallel-carrying covariance matrix. The new weight distribution parameter is then

。

From the resulting new weight distribution, a new edge distribution can now be obtained via (14)

。

With respect to blending, classical ProMP blends a set of mobile primitives by using the product of a gaussian distribution. When referring to mixing

When primitive in (1)One needs to consider that each trajectory distribution is formed by being located in a different tangent space

Is parameterized by a set of weight vectors. Therefore, the weighted product of the gaussian distribution needs to be reformulated. To do so, according to various embodiments, a gaussian product formula over a riemann manifold is used, where the log likelihood of the product is iteratively maximized using a gradient-based approach.

Formally, the log-likelihood of the Riemannian Gaussian distribution product is given by (factorizing out the constant term)

Wherein

And

is edge distribution

For skillssThe parameter (c) of (c). Note that the logarithmic mapping in (20) works for different tangent spaces

. To perform log-likelihood maximization, the mapped bases and arguments are switched while ensuring that the original log-likelihood function remains unchanged. To do so, relationships can be utilized

And parallel transport operations to overcome the problem, resulting in

Wherein the content of the first and second substances,

is the mean of the (estimated) resulting gaussians and

. Can be determined by defining a vector

And block diagonal matrix

To rewrite equation (21). This results in

With provision for calculating Riemann manifolds

Empirical mean of the upper Gaussian distribution

In the form of an objective function of (a),

，

it is thus possible to iteratively calculate the mean value as

Wherein

Wherein

Is that

Relative to

In that

The jacobian of the base of the tangent space at (a).

The controller 106 may now perform the averaging as follows

Similar iterative estimation of

Wherein

. At iterationKAfter convergence, the controller 106 obtains the distribution as follows

Final parameter of

And

。

as explained above, classical ProMP allows to adapt the weight distribution

As external task parameters

Wherein it is assumed that there are pairs for each presentation

Access to the value. Task parameterization is similarly applied to oriented ProMP as a weight vector

And therefore only task parameters

Is euclidean and can be applied directly (6). However, if

Belonging to the Riemann manifold, a more general approach is needed.

When task parameter

When established, the controller 106 may learn the joint probability distribution using a gaussian mixture model over the riemann manifold

. Subsequently, when new task parameters are provided

In time, the controller 106 may employ Gaussian mixture regression to calculate during rendering

。

To better illustrate how model learning, trajectory reconstruction, via point fitting and skill mixing work in oriented ProMP, a handwritten alphabet dataset was used. Original track is at

And then projected to the unit norm vector by a simple mapping to the unit norm vector

. Each of the data setsLetters demonstrateN=8 times, and a simple smoothing filter is applied to each trajectory, mainly for visualization purposes. Four ProMP models were trained, one for assembly

Each letter of (a). Model use with evenly distributed centers for I and J training

A basis function, for letters G and S, uses

A basis function. According to having an initial learning rate

And corresponding upper bound

The algorithm given above to train the oriented ProMP model.

FIG. 4 shows demonstration data, computing edge distribution via (13) and via point adaptation obtained from (18) and (19)

Which corresponds to a model trained on the letters G and S. The mean of the edge distribution follows the demonstration mode, and the corresponding covariance contour captures

The variability of the presentation in (1). Noteworthy is the complexity of the trajectories of the letters G and S, which show a very exhaustive "movement" pattern, which may be more complex than what is observed in a real robot setup. With respect to via point adaptation, use is made of having an associated covariance

Random point of (2)

(i.e., in passing through

When high precision is required).

As shown in fig. 4, the directional ProMP is able to smoothly fit both the trajectory and the associated covariance profile, while passing exactly through a given transit point.

FIG. 5 illustrates a schematic diagram for

And

directed ProMP mixing process.

The goal is to generate a trajectory that starts with a contour that follows the first letter in the set and then smoothly switches in the middle of the trajectory profile of the second letter. In fig. 5, the resulting blended trajectory is shown for the two aforementioned cases, where the directional ProMP smoothly blends two given trajectory distributions by following the blending process for the directional ProMP described above. It should be noted that the mixing behavior strongly depends on each skillsAssociated weights

Time evolution of (c). In this set of experiments, the weights used

And

a sigmoid-like function of

And is

. As described aboveThe results show that the oriented ProMP successfully learns and reproduces

And provides full via point adaptation and mixing capabilities.

Experiments have shown that this similarly stands in the robot setup, e.g., for reorientation skills, which correspond to lifting a previously grasped object, rotating the end effector 104, and placing the object back in its original position, but with a modified orientation. This robot skill has the characteristic of significant position and orientation changes, and is therefore suitable for demonstrating the function of oriented ProMP.

For training robot skills like reorientation skills, each demonstration gives e.g. a full pose robot end effector trajectory

. Here, the

Is expressed in time steptThe end effector position. Thus, each presentation demonstrates a locus of positions (comprising a time sequence of positions, each consisting of

Element description) and orientation trajectory (including a time series of orientations, each consisting of

Element description). Raw data from the track can be used in

The ProMP model 112 is trained, the ProMP model 112 including a submodel for location and a submodel for orientation, where the location model is learned using the classical ProMP method and the orientation model is learned using the oriented ProMP method (e.g., the algorithm described above). For both submodels, the same group (e.g. for

) The basis functions may be used, but for different components (for each position component in the position sub-model and for each orientation component in the orientation sub-model).

In summary, according to various embodiments, a method as illustrated in fig. 6 is provided.

Fig. 6 shows a flow chart 600 illustrating a method for controlling a robotic device.

In 601, demonstrations are provided for the robot skills, wherein each demonstration demonstrates a trajectory comprising a sequence of robot configurations, wherein each robot configuration is described by an element having a predetermined configuration space of a riemann manifold structure.

In 602, for each demonstration track, a track representation is determined as a weight vector of a predetermined basic movement of the robotic device by searching for a weight vector minimizing a distance measure between a combination of the basic movements according to the weight vector and the demonstration track, wherein the combination is mapped to a manifold.

In 603, a probability distribution of weight vectors is determined by fitting the probability distribution to the weight vectors determined for the demonstration track.

In 604, the robot device is controlled by performing a basic movement according to the determined probability distribution of the weight vector.

This may include sampling from the probability distribution of the weight vector and performing a basic move (according to equation (1)) from the sample vector. It is also possible to derive a probability distribution of the trajectories (according to equation (14)) from which one can sample the control and which can be used for high-level control, such as trajectory mixing as explained above.

The method of fig. 6 may be performed by one or more computers comprising one or more data processing units. The term "data processing unit" may be understood as any type of entity allowing to process data or signals. For example, data or signals may be processed in accordance with at least one (i.e., one or more) specific function performed by a data processing unit. The data processing unit may include or be formed from analog circuitry, digital circuitry, composite signal circuitry, logic circuitry, a microprocessor, a microcontroller, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a Digital Signal Processor (DSP), a programmable gate array (FPGA) integrated circuit, or any combination thereof. Any other way of implementing a corresponding function may also be understood as a data processing unit or a logic circuit. It will be understood that one or more method steps described in detail herein may be performed (e.g., carried out) by the data processing unit by one or more specific functions performed by the data processing unit.

Various embodiments may receive and use image data from various visual sensors (cameras), such as video, radar, liDAR, ultrasound, thermal imaging, sonar, and the like, for example, to obtain data for a presentation.

The method of fig. 6 may be used to calculate control signals for controlling a physical system, such as, for example, a computer controlled machine, such as a robot, a vehicle, a household appliance, a power tool, a manufacturing machine, a personal assistant, or an access control system. According to various embodiments, a policy for controlling a physical system may be learned, and then the physical system may be operated accordingly.

According to one embodiment, the method is computer-implemented.

Although specific embodiments have been illustrated and described herein, it will be appreciated by those of ordinary skill in the art that a variety of alternate and/or equivalent implementations may be substituted for the specific embodiments shown and described without departing from the scope of the present invention. This application is intended to cover any adaptations or variations of the specific embodiments discussed herein. Accordingly, it is intended that this invention be limited only by the claims and the equivalents thereof.

Claims

1. A method for controlling a robotic device, comprising:

providing a demonstration of robot skills, wherein each demonstration demonstrates a trajectory comprising a sequence of robot configurations, wherein each robot configuration is described by an element having a predetermined configuration space of a Riemannian manifold structure;

for each demonstration trajectory, determining a representation of the trajectory as a weight vector of a predetermined basic movement of the robotic device by

Searching for a weight vector that minimizes a distance metric between a combination of the basic movements according to the weight vector and the presentation trajectory, wherein the combination is mapped to a manifold;

determining a probability distribution of weight vectors by fitting the probability distribution to the weight vectors determined for the demonstration track; and

the robot device is controlled by performing a basic movement according to the determined probability distribution of the weight vector.

2. The method of claim 1, wherein the probability distribution of weight vectors is determined by fitting a gaussian distribution to the weight vectors determined for the presentation trajectory.

3. The method according to claim 1 or 2, wherein each demonstration track comprises a robot configuration for each time of the predetermined sequence of time points, and wherein each combination of basic movements according to the weight vector specifies a robot configuration for each time of the predetermined sequence of time points, and wherein the weight vector is determined for each demonstration track by: determining, from a set of possible weight vectors, a weight vector of a combination of basic movements according to the weight vector and a presentation trajectory, wherein the combination is mapped to a manifold, being smallest in the set of possible weight vectors, wherein a distance between the combination of basic movements mapped to the manifold and the presentation trajectory is given by summing terms over time points of a sequence of time points, the terms comprising for each time point a term comprising a value or a power of a measure of the manifold between an element of the manifold given by the combination of basic movements at the time point when mapped to the manifold and the presentation trajectory.

4. A method according to any one of claims 1 to 3, comprising, for one of the presentation trajectories, searching for points of the manifold and the weight vectors such that the points and weight vectors minimize a distance measure between a combination of the basic movements according to the weight vectors and the presentation trajectory, wherein the combination is mapped from the tangent space at the points to the manifold, and wherein, for each presentation trajectory, the mapping of the respective combination to the manifold is performed by mapping the combination from the tangent space at the selected point.

5. The method of any one of claims 1 to 4, wherein the trajectory is an orientation trajectory and each demonstration further demonstrates a position trajectory and each robot configuration comprises a pose described by a vector in three-dimensional space and an orientation described by an element of a predetermined configuration space.

6. The method according to any one of claims 1 to 5 comprising providing a demonstration of more than robot skills and determining for each skill a representation of the trajectory and weight vector and a probability distribution of the weight vector and controlling the robotic device by determining for each skill a Riemann Gaussian distribution of the flow points from the probability distribution of the weight vector, determining a product distribution of the Riemann Gaussian distributions of skills and controlling the robotic device by sampling from the determined product probability distribution.

7. A robotic device controller configured to carry out the method of any one of claims 1 to 6.

8. A computer program comprising instructions which, when executed by a processor, cause the processor to carry out the method according to any one of claims 1 to 6.

9. A computer readable medium storing instructions that, when executed by a processor, cause the processor to perform the method according to any one of claims 1 to 6.