CN108983804A - A kind of biped robot's gait planning method based on deeply study - Google Patents

A kind of biped robot's gait planning method based on deeply study Download PDF

Info

Publication number
CN108983804A
CN108983804A CN201810979187.2A CN201810979187A CN108983804A CN 108983804 A CN108983804 A CN 108983804A CN 201810979187 A CN201810979187 A CN 201810979187A CN 108983804 A CN108983804 A CN 108983804A
Authority
CN
China
Prior art keywords
gait
robot
data
feature
noise reduction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810979187.2A
Other languages
Chinese (zh)
Other versions
CN108983804B (en
Inventor
吴晓光
刘绍维
杨磊
张天赐
李艳会
王挺进
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yanshan University
Original Assignee
Yanshan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yanshan University filed Critical Yanshan University
Priority to CN201810979187.2A priority Critical patent/CN108983804B/en
Publication of CN108983804A publication Critical patent/CN108983804A/en
Application granted granted Critical
Publication of CN108983804B publication Critical patent/CN108983804B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/08Control of attitude, i.e. control of roll, pitch, or yaw
    • G05D1/0891Control of attitude, i.e. control of roll, pitch, or yaw specially adapted for land vehicles

Landscapes

  • Engineering & Computer Science (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Manipulator (AREA)

Abstract

The invention discloses a kind of biped robot's gait planning methods based on deeply study, utilize the stability and flexibility of body gait, it control effectively in conjunction with deeply study to biped robot's gait, comprising the following steps: 1) establish passive biped robot's model;2) acquisition and processing of body gait data and target gait data;3) hidden feature in biped robot's gait data and body gait data is extracted respectively using noise reduction autocoder;4) body gait feature is learnt using deeply study, and then plans biped robot's gait.The present invention combines deeply study and body gait data, stable as people, the submissive walking of control biped robot.

Description

A kind of biped robot's gait planning method based on deeply study
Technical field
The present invention relates to biped robot's technical fields, and in particular to a kind of biped robot based on deeply study Gait planning method.
Background technique
The move mode of mobile robot includes crawler type, wheeled, biped etc. at present.Compared to crawler type, wheel type machine The adaptability of people, biped robot are stronger, can not only move in ground grading, but also (step can be moved in irregular environment Up and down, out-of-flatness ground running etc.).But biped robot itself is a kind of nonlinearity hybrid dynamics system, step State planning is always a difficulties.
The gait planning of biped robot, the stability in addition to keeping walking, it is necessary to consider the efficiency, soft of walking movement The problems such as pliable, environmental suitability.The gait planning method based on simplified model is commonly used in biped robot's gait planning.Base It is from the kinematics of biped robot's walking and aerodynamic point in the method for simplified model, consideration robot is mainly special It levies and is reduced to the models such as basic model, such as inverted pendulum, two connecting rods, double leval jib, be then based on these simplified models Gait planning is carried out to biped robot.Method based on simplified model has ignored the part physical feature of biped robot, makes Biped robot there are anti-interference abilities it is weak, environmentally sensitive, gait is single the problems such as.Gait planning based on intelligent algorithm Method becomes the hot spot studied at present because it has the characteristics that can to learn, adaptive, fault-tolerant high.Step based on intelligent algorithm State planing method includes neural network, support vector machine, fuzzy control, intensified learning etc..But intelligent algorithm under normal conditions It can guarantee the stabilized walking of biped robot, it cannot be guaranteed that possessing more efficient and submissive step while robot stabilized walking State, even resulting in biped robot sometimes will appear gait stiff, against convention.
Summary of the invention
The purpose of the present invention is to solve the above problem, provides a kind of biped robot's gait based on deeply study Planing method.The present invention, which utilizes, that knee biped robot model is close with manikin in structure and walking process, In conjunction with the deeply learning method under big data driving, it is weak, normal to solve the gait planning method anti-interference ability based on model The problems such as intelligent gait planning method gait is stiff is advised, the stability and flexibility when robot ambulation are improved.
To achieve the above object, the present invention is realized according to following technical scheme:
A kind of biped robot's gait planning method based on deeply study, which comprises the steps of:
Step S1: establishing biped robot's model, describes robot ambulation process;
Step S2: obtaining and handles body gait data and target gait data;
Step S3: it is extracted in biped robot's gait data and body gait data respectively using noise reduction autocoder Hidden feature;
Step S4: learning body gait feature using deeply learning method, and then plans biped robot Gait.
In above-mentioned technical proposal, step S1 specifically includes the following steps:
Step S101: establishing 4 connecting rods has knee circular arc biped robot model;Wherein, robot model include 2 thighs, 2 Shank and 2 circular arc foots, leg pass through hinge without frictionally linking together by rigid rod, and circular arc is respectively fixedly connected with enough It on shank, supporting leg and leads leg with identical quality and geometric parameter, and the uniform quality distribution of leg, robot mould Position-limit mechanism is set at the knee joint of type to simulate the knee joint function of human body, two motors are set in hip joint, respectively to branch It support leg and leads leg and applies control moment;
Step S102: model gait processes are divided using the right side of direction of advance during robot ambulation as viewpoint Analysis selects the dimensionless physical quantity of real-time characterization robotary, selected physical quantity is defined as robot ambulatory status Θr, robot ambulatory status is described as:
Wherein, rotation counterclockwise is taken to be positive, θr1,For lead leg shank to vertical direction angle and angular speed;θr2,For lead leg thigh to vertical direction angle and angular speed;θr3,For supporting leg shank to vertical direction angle and Angular speed.
In above-mentioned technical proposal, step S2 specifically includes the following steps:
Step S201: human body and robot are swung into the process led leg and collided with ground from leading leg and are defined as One gait cycle;
Step S202: human normal walking process data set is chosen from CMU human sports trapped data library, by data set It carries out human body division and resolves, obtain human locomotion process description;
Step S203: using robot model as reference, taking the 2D plane of human locomotion longitudinal direction, defines human locomotion state and is Θm, all data in human locomotion process description are used into ΘmIt is indicated, and by ΘmAs row vector, combination obtains people Body gait data ΘM
Step S204: from body gait data ΘMThe middle learning object for choosing a gait cycle as robot, is extracted Odd-numbered frame in learning object data forms new data set, and is defined as target gait data ΘS;Wherein, target gait number According to ΘsIn any row vector be Θ that extraction obtainsm
Step S205: by ambulatory status Θ of the robot in gait cyclerAccording to ΘsIn sample frequency sampled, Form robot gait data ΘR.Wherein, robot gait data ΘRIn any row vector be to sample obtained Θr
In above-mentioned technical proposal, step S3 is specifically included: according to Θr、ΘmData structure, it is identical to construct two structures Noise reduction autocoder, to robot gait data ΘRWith target gait data ΘSCarry out feature extraction.By ΘR、ΘSRow Vector is sent into noise reduction autocoder one by one, and obtained feature is arranged by original sequence, forms robot gait characteristic According to HRWith target gait feature data HS, by HRAnd HSIt is uniformly normalized for use in deeply study, wherein often A noise reduction autocoder workflow following steps:
S301: Θ is takenROr ΘSMiddle a line vector theta is sent into noise reduction autocoder, and noise reduction autocoder uses binomial point Cloth carries out selective erasing to original gait data Θ, is erased data and sets 0, obtains noise-containing gait dataPass through coding Function f willIt is mapped to hidden layer, obtains hidden layer feature h, wherein the coding function of noise reduction autocoder are as follows:
Wherein, w is the weight matrix of input layer and hiding interlayer;sfFor the activation primitive of coding function f, activation primitive is taken Sigmod function;
S302: hidden layer feature h is mapped to output layer by decoding functions g, obtains reconstruct output y;Reconstruct output y is kept The information of original gait data x, global error pass through whole loss function JDAEIt indicates, wherein noise reduction autocoder Decoding functions are as follows:
Wherein,For the weight matrix of hidden layer and output interlayer, and havesgFor decoding functions activation primitive, It is similarly Sigmod function;The whole loss function of noise reduction autocoder in given training set:
Wherein θDAEIt is the parameter of noise reduction autocoder, including w, p, q;L is defined as reconstructed error, for portraying y and Θ Degree of closeness:
Wherein n is the dimension of input and output layer;
S303: noise reduction autocoder training process is declined using gradient to JDAE(θ) is iterated calculating to obtain minimum Value, gradient decline to θDAERenewal function:
Wherein α is learning rate, and value is [0,1].
In above-mentioned technical proposal, in step S4, DDPG is as biped robot for selected depth deterministic policy gradient algorithm Learning algorithm, the robot gait characteristic H that noise reduction autocoder is handledRIt is calculated as depth deterministic policy gradient The input data s of methodt, target gait feature data HSAs rtCalculation basis, and it is defeated by depth deterministic policy gradient algorithm Motor executes torque a outt;Robot acquires the data of different gaits in continuous walking process, is supplied to depth certainty The training of Policy-Gradient algorithm, the energy for finally making depth deterministic policy gradient algorithm that there is control robot to reach target gait Power.
In above-mentioned technical proposal, wherein the tactful network of depth deterministic policy gradient algorithm uses 5 layers of convolutional Neural net Network, wherein respectively including input layer, two layers of convolutional layer, full linking layer, output layer, wherein input layer is for receiving st, output layer The torque a that output motor needs to be implementedt
Compared with prior art, the present invention having the following beneficial effects:
Deeply is learnt to combine with body gait data by the present invention, solves the gait planning method based on model The problems such as anti-interference ability is weak, normal procedure intelligent gait planning method gait is stiff.The introducing of noise reduction autocoder is both to gait Feature in data is extracted, and eliminates the influence of disparity and noise.DDPG comes compared to conventional intensified learning It says, can spend less time and solve the problems, such as more complicated, reach higher control requirement.By target gait feature data HS As rtCalculation basis, allow DDPG effectively using body gait data, so that rtBoth the steady of robot gait had been had rated It is qualitative, and have rated its flexibility.By training, DDPG may finally control equally stable, the submissive walking of machine portrait people.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with Other attached drawings are obtained according to these attached drawings.
Fig. 1 is planing method schematic diagram of the invention;
Fig. 2 is that 4 connecting rods have knee circular arc biped robot model schematic;
Fig. 3 is biped robot's walking schematic diagram;
Fig. 4 is the 2D process schematic of the human locomotion obtained by human body movement data library;
Fig. 5 is the operational process schematic diagram of noise reduction autocoder DAE;
Fig. 6 is the structure chart of depth deterministic policy gradient algorithm DDPG;
Fig. 7 is the training flow diagram of depth deterministic policy gradient algorithm DDPG.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.
Fig. 1 is planing method schematic diagram of the invention;As shown in Figure 1, of the invention a kind of based on deeply study Biped robot's gait planning method, comprising:
Step S1: establishing biped robot's model, describes robot ambulation process;Wherein step S1 specifically includes following step It is rapid:
Step S101: establishing 4 connecting rods has knee circular arc biped robot model;
Step S102: model gait processes are divided using the right side of direction of advance during robot ambulation as viewpoint Analysis selects the dimensionless physical quantity of real-time characterization robotary, selected physical quantity is defined as robot ambulatory status Θr
Specifically, biped robot's model that the present embodiment is established, which belongs to 4 connecting rods, knee circular arc biped robot model, mould Type is as shown in Figure 2.Robot is made of enough 2 thighs, 2 shanks and 2 circular arcs.Leg is by rigid rod by hinge without rubbing It links together with wiping, circular arc is respectively fixedly connected on shank enough.Supporting leg in model with lead leg with identical Quality and geometric parameter, and leg uniform quality distribution.Position-limit mechanism is set at machine human knee joint to simulate the knee of human body Function of joint.Two motors are set in hip joint, to supporting leg and can lead leg respectively and apply control moment.
The present embodiment only carries out 2D modeling to the right side of direction of advance during robot ambulation.The walking one of robot The process of step is as shown in figure 3, can be described as:
Stage I: handstand pendular motion is done in the locking of robot supporting leg knee joint, between supporting leg and ground without opposite sliding and Displacement;The knee joint that robot leads leg loosens, and swings forward, and hip joint travels forward.
Stage II: leading leg for robot is swung to before supporting leg, reaches big when maximum bends and stretches state when leading leg Leg and shank are because position-limit mechanism collides, and the collision process is instantaneously completed, and position-limit mechanism locks after collision, and keeps locking Determine state.
Stage III: leading leg for robot is dynamic to backswing relative to supporting leg, and hip joint still travels forward.
Stage IV: with ground instant shock occurs for leading leg for robot, collides instantaneous completion and nothing bounces;Supporting leg and The role that leads leg swaps.
In entire walking process, robot ambulatory status can be described as in real time:
Wherein, rotation counterclockwise is taken to be positive, θr1,For lead leg shank to vertical direction angle and angular speed;θr2,For lead leg thigh to vertical direction angle and angular speed;θr3,For supporting leg shank to vertical direction angle and Angular speed.
Step S2: obtaining and handles body gait data and target gait data.
Step S2 specifically includes the following steps:
Step S201: human body and robot are swung into the process led leg and collided with ground from leading leg and are defined as One gait cycle;
Step S202: human normal walking process data set is chosen from CMU human sports trapped data library, by data set It carries out human body division and resolves, obtain human locomotion process description;
Step S203: using robot model as reference, taking the 2D plane of human locomotion longitudinal direction, defines human locomotion state and is Θm.All data in human locomotion process description are used into ΘmIt is indicated, and by ΘmAs row vector, combination obtains people Body gait data ΘM
Step S204: from body gait data ΘMThe middle learning object for choosing a gait cycle as robot, is extracted Odd-numbered frame in learning object data forms new data set, and is defined as target gait data ΘS
Step S205: by ambulatory status Θ of the robot in gait cyclerAccording to ΘsIn sample frequency sampled, Form robot gait data ΘR
Specifically, it in the present embodiment, in order to allow biped robot to learn body gait, needs to catch using human motion It obtains technology and provides target gait data for robot.The quality of gait data will have a direct impact on the final study effect of robot Fruit, so its reliability is particularly important in the present embodiment.Reliable gait data can pass through people more well-known both at home and abroad Body motion capture database is obtained, the open source human body motion capture data that these databases provide, by numerous researchers It is used, accuracy and reliability with higher.
In the present embodiment, using the open source human sports trapped data library of Carnegie Mellon University CMU graph experiment room, The laboratory in the rectangular room of 3m*8m using 12 thermal cameras with the data of the image recording human motion of 120Hz, And by data creating at the file of standard.It, can be according to adult human body's inertial parameter index from the data of the selection in database Human body in gait data is divided into 16 parts, then filters out low frequency spur using filtering method, and combine Human Physiology The multiple regression equation of structure derives the data such as the density, inertial tensor, rotary inertia and centroid position of every section of limbs.
Human walking motion is the process participated in jointly by large number of joint freedom degrees, even if by these joints freedom Degree is artificially divided, and quantity is also excessive for robot.In order to make robot and human body have one on gait data Cause property needs to simplify the joint freedom degrees participated in during human locomotion.
There is knee biped robot model as reference using what is used, take the 2D plane of human locomotion longitudinal direction, by human locomotion shape State is defined as:
Wherein, rotation counterclockwise is taken to be positive, θm1,For lead leg shank to vertical direction angle and angular speed;θm2,For lead leg thigh to vertical direction angle and angular speed;θm3,Angle and angle for support shank to vertical direction Speed.
Human normal walking process data set is chosen from CMU human sports trapped data library, and data set is subjected to human body It divides and resolves, obtain the 2D process of human locomotion, as shown in Figure 4.The present embodiment by data set gained gait processes data Θ is defined as final body gait data according to the simplification that human locomotion state defines progress dataM, wherein arbitrarily Row vector is to extract obtained Θm
Human body and robot are swung to from leading leg the process led leg and collided with ground and are known as one by the present embodiment A gait cycle.From body gait data ΘMThe middle learning object for choosing 1 gait cycle as robot, it is contemplated that motor Time required for the variation of torque, the odd-numbered frame extracted in learning object data forms new data set, and is defined as target Gait data Θs.By gait data of the robot in gait cycle according to ΘsSample frequency sampled, form robot Gait data ΘR, wherein any vector is the Θ that sampling obtainsr, work as ΘRWith ΘSThen used when dimension difference resize method into Row processing keeps its identical.
Step S3: it is extracted in biped robot's gait data and body gait data respectively using noise reduction autocoder Hidden feature;Step S3 is specifically included: according to Θr、ΘmData structure, construct the identical noise reduction autocoding of two structures Device, to robot gait data ΘRWith target gait data ΘSCarry out feature extraction.By ΘR、ΘSRow vector be sent into one by one Noise reduction autocoder, and obtained feature is arranged by original sequence, form robot gait characteristic HRIt is walked with target State characteristic HS, by HRAnd HSIt is uniformly normalized for use in deeply study.
Since there is differences for the geometric parameter between human body and robot, and consider wide usage and step of the invention Noise present in state data.The present embodiment does into one robot and body gait data using noise reduction autocoder DAE Step processing, to extract from existing gait data and encode out more robust feature, while eliminating model parameter and noise Influence, allow the robot to preferably learn body gait.
DAE is the innovatory algorithm based on autocoder, and structure is simple, arithmetic speed is fast, is usually used in deep learning net The advance data of network is handled, and can be extracted from known data, be encoded out more robust feature, and eliminate that may be present Influence of noise.
The DAE that the present embodiment uses is a kind of single saphenous nerve network.Be made of three-layer network: first layer is input layer, is used In the original gait data of reception, and noise is added and obtains adding data of making an uproar;The second layer is hidden layer, and DAE will make an uproar to adding in this layer Data are encoded, and coding result can be considered the hidden feature of original gait;Third layer is output layer, this layer will be in hidden layer Hidden feature is decoded reconstruct, and DAE completes the reconstruct output after training should be identical as original gait data.DAE passes through gradient Decline is updated network parameter.
DAE is hidden when being originally inputted the loss function very little of x and reconstruct output y composition by training adjustment network parameter Hiding layer output can be considered as a kind of expression for being originally inputted x, and this expression is known as inputting the feature of x, and this feature can be used as original The good representation of beginning input signal.Noise is added further through to training data in DAE, and hidden layer is made to must be learned by removal noise and complete The original gait information of whole expression, forcing DAE to go study with this, input signal is more robust indicates.DAE workflow such as Fig. 5 It is shown, with robot gait data ΘRFor, it can be described as:
S301: Θ is takenRMiddle a line vector thetarIt is sent into DAE.DAE is using bi-distribution to original gait data ΘrIt carries out Selective erasing is erased data and sets 0, obtains noise-containing gait dataIt will by coding function fIt is mapped to hiding Layer, obtains hidden layer feature h, wherein the coding function of noise reduction autocoder are as follows:
Wherein, w is the weight matrix of input layer and hiding interlayer;sfFor the activation primitive of coding function f, activation primitive is taken Sigmod function;
S302: hidden layer feature h is mapped to output layer by decoding functions g, obtains reconstruct output y;Reconstruct output y is kept The information of original gait data x, to guarantee that hidden layer feature h characterizes original gait data, the global error of reconstruct output y passes through Whole loss function JDAEIt indicates, wherein the decoding functions of DAE:
Wherein,For the weight matrix of hidden layer and output interlayer, and havesgFor decoding functions activation primitive, together Sample is Sigmod function;The whole loss function of DAE in given training set:
Wherein θDAEIt is the parameter of DAE, including w, p, q;L is defined as reconstructed error, for portraying y and ΘrClose to journey Degree:
Wherein n is the dimension of input and output layer;
S303:DAE training process is declined using gradient to JDAE(θ) is iterated calculating to obtain minimum value, gradient decline To θDAERenewal function:
Wherein α is learning rate, and value is [0,1].
The present embodiment constructs the identical DAE network of two structures, DAERAnd DAEM, and robot gait data are used respectively ΘRWith body gait data ΘMIt is trained.By the DAE of mass data trainingRAnd DAEMThe machine described in the present embodiment Device people and body gait data carry out hidden feature extraction, and define the robot after extracting, body gait feature is hr、hm.Make Use DAEMTo ΘSEach row vector carry out feature extraction, and by the feature after extraction by original sequence arrange, obtain target walk State characteristic HS.In kind handle ΘR, obtain robot gait characteristic HR.By HS、HRPlace is uniformly normalized It manages and is supplied to deeply study and operated.HS、HRThe feature of effective characterization robot and body gait data, can Effectively reduce noise and geometric parameter difference influences to deeply study bring.
Step S4: learning body gait feature using deeply learning method, and then plans biped robot Gait.Intensified learning is a Main Branches of machine learning, can gradually be improved during intelligent body and environmental interaction The movement of intelligent body selects, and is finally reached control intelligent body and completes target.Intensified learning does not need accurate agent model, because This is very suitable for control biped robot.But conventional intensified learning convergence rate is slow, and nerual network technique is combined to improve Intensified learning, because the sample that acquires is highly relevant in time in interactive process, be discontented with although improving pace of learning The sample that sufficient neural metwork training requires is independent, therefore network easily over-fitting.With the rapid development of deep learning, depth is strong Chemistry is practised and being begun to appear within the sight of researcher.Deeply study is the knot of conventional intensified learning and deep learning Close, using deep learning theory the deficiency of intensified learning is supplemented so that intensified learning in all respects in have Great promotion.
According to the feature that biped robot's walking movement is continuous and hip joint driving motor motion space is continuous, the present embodiment Learning algorithm of the selected depth deterministic policy gradient algorithm DDPG as robot.DDPG is based on deterministic policy gradient The improved Actor-Critic structure algorithm of DPG, using neural network respectively instead of the strategic function in conventional intensified learning And cost function, the neural network after substitution are referred to as tactful network μ and Q network Q.Tactful network receiver device people state is simultaneously Motor torque is returned to, Q network then combines robotary and motor torque to evaluate the selection of tactful network, DDPG frame As shown in Figure 6.
In step S4, the robot gait characteristic H by the processing of noise reduction autocoder is usedRIt is determined as depth The input data s of property Policy-Gradient DDPGt, target gait feature data HsAs rtCalculation basis, and by depth certainty plan Slightly gradient DDPG output motor executes torque at;Robot acquires the data of different gaits in continuous walking process, provides Depth deterministic policy gradient DDPG training is given, finally makes depth deterministic policy gradient DDPG that there is control robot to reach mesh Mark the ability of gait.
In order to solve caused by the sample acquired in interactive process in time highly relevant network oscillation and excessively quasi- It closes, the present embodiment is that DDPG is provided with a memory pond.Pond is remembered by the robotary s in one gait cycle of robott、 Select the motor torque a executedt, obtained reward rtWith the robotary s after motor drivent+1, as one group of experience (st, at, rt, st+1) stored.When needing to carry out neural metwork training, n group experience is randomly selected from memory pond as training Data, the size of n are generally set by small lot data (minibatch).The mechanism randomly selected both had been upset between sample Temporal correlation prevents network oscillation and over-fitting, and robot is allowed to learn pervious experience and present simultaneously Experience.
Evaluation training is carried out if only using single Q Network Countermeasure in DDPG and omiting network, will lead to learning process not Stablize, reason is that the network parameter of single Q network while frequent updating, and be used to calculate Q network and policy network The gradient of network.Therefore, the present embodiment replicates the tactful network μ and Q network Q after initialization network parameter in DDPG, after duplication Obtained new network is referred to as off-line strategy network μ ' and offline Q network Q ', and legacy network is then known as strategy of on-line mesh network Network μ and online Q network Q.Online network portion is used to export the movement of robot, carries out when walking for robot.And it is offline The main function of network is to provide data supporting for the training of online network, so that whole network is more stable, quick convergence.
The network structure of network and offline network is completely the same online, and the difference of the two is that its network parameter updates Mode.The network parameter of online network update the data provided using experience, the offline network extracted at random from memory pond and Stochastic gradient descent is updated.The network parameter update of offline network is then carried out by soft update.It is soft to update from online network Network parameter in obtain updating according to update to offline network is completed, be with strategy of on-line network and off-line strategy network Example, soft update may be expressed as:
θμ′=τ θμ+(1-τ)θμ′
Wherein, θμAnd θμ′The respectively network parameter of strategy of on-line network and off-line strategy network, the general value 0.001 of τ. Similarly, online Q network and internetwork soft update of offline Q are all above formula.
Training process flow chart in this example is as shown in fig. 7, training process can be described as:
The selection and target gait data that robot sample frequency and sampling start time are led leg are consistent.T moment Robotary is ΘR, will be by DAE treated robot gait characteristic HRAs st, target gait feature data HS As rtFoundation.
S401: the tactful network in the present embodiment, DDPG is 5 layers of CNN network: first layer is as input layer, for connecing Receive st;The second layer and third layer are convolutional layer;4th layer is full linking layer;Layer 5 is output layer, set action maximum boundary , and the torque that output motor needs to be implemented,.Q network structure is roughly the same with tactful network structure, will only input layer unit Number increases to accommodate motor torque at, and output layer unit is set as 1 and is returned only to evaluation.
The network parameter of online network is subjected to random initializtion, and the network parameter after initialization is copied to corresponding Offline network.Setting memory pond maximum can store experience number E, and neural metwork training data set size minibatch, nerve is arranged Network single frequency of training T initializes strategy of on-line e-learning rate lpoilcy, online Q e-learning rate lQ, soft update The primary maximum step number W of interaction walking is arranged in rate τ.Maximum step number W is fallen or completed when robot be considered as primary complete friendship Mutually, it is denoted as epi, and maximum interaction times are EPI.Finally, robotary is carried out random initializtion.
S402: moment robotary of having led leg is st, strategy of on-line network is according to one group of electricity of current network output Machine power square at, can indicate are as follows:
at=μ (stμ)
Wherein, atRow vector be respectively supporting leg motor and the execution torque of motor of leading leg, columns and s at hip jointt Unanimously.
S403: in swing process of leading leg, two motors execute corresponding motor torque respectively at robot hip joint, atThe execution time of any row vector is identical as sampling interval duration.A is first carried out in motortThe first row, to robotary θrTorque is executed after the completion of sampling and switches to next line, is executed with this sequence.The present embodiment control moment user's wave power square, can Effectively to avoid the generation shaken in control process.When robot lead leg with ground collide when, update step number counting w, By all θ for sampling and obtainingrIt is sent into DAER, obtain the new state s of robott+1
S404: the design of reward function is a particularly significant step in deeply study and work, good reward design The effect of study can be obviously improved.The present embodiment application plan reward design to aiming drill faster, rtIt is as follows:
When robot does not fall, st+1With HsThe smaller r of gaptIt is bigger and permanent greater than 0.When robot falls, rt=- 1, such guided robot approaches target gait under the premise of keeping one's legs.
S405: by (st, at, rt, st+1) as one group of experience deposit memory pond, it updates memory pond experience quantity and counts exp. According to different conditions, each counter has following different operations: 1) resetting robotary if robot falls and return to execution S402 simultaneously resets w;2) if do not fall but w < W if by st+1As new stExecute S402;3) it is sequentially held if w >=W and exp > E Row S406 simultaneously resets w;4) otherwise resetting robotary returns to execution S402 and resets w.When execution S401, S403, S404 more New epi.
S406: minibatch group experience is randomly selected from memory pond, the training dataset as online network.
S407: the s that training data is concentrated is extractedt, at, it is sent into online Q network and is evaluated: Q (st, atQ).By data set In st+1It is sent into off-line strategy network and obtains motor torque a 't+1, by offline Q network to st+1、a′t+1It is evaluated: Q ' (st+1, μ′(st+1μ′)|θQ′).The loss function of so online Q network can indicate are as follows:
Wherein, yi=rt+γQ′(st+1, μ ' (st+1μ′)|θQ′).According to LQUsing stochastic gradient descent to online Q network It is updated.
S408: the Policy-Gradient in calculative strategy network, the loss function definition of strategy of on-line network:
Lμ=Q (st, μ (st, θμ)|θQ)
The loss function of Utilization strategies network can calculate the gradient of strategy of on-line network:
Equally strategy of on-line network parameter is updated using stochastic gradient descent.
S409: by strategy of on-line network and after the network parameter of line Q network all is completed to update, by soft update to offline Tactful network and offline Q network are updated:
S410: updating network training number times, when times is greater than single frequency of training T, then this network training knot Beam executes S411.Otherwise it returns to S406 and continues network training.
S411: as epi > EPI, DDPG calculating terminates, and saves online policy network network as controller.If epi < EPI, It resets robotary and returns to execution S402.
During robot ambulation, constantly using DDPG carry out learning training, until strategy of on-line network μ with Line Q network Q convergence reaches maximum interaction times EPI.When network convergence in DDPG, strategy of on-line network can be to random The robot of initial gait is controlled, until reaching target gait.Equally, if in the process of walking by outer force-disturbance, The first step is considered as initial gait and is control effectively using DDPG after can also being disturbed, and is only not in this way in robot Tumble state, DDPG can be carried out effectively controlling.Target gait feature data HSUse, be rtCalculating provide according to According to enabling rtThe stability and flexibility of robot gait can be described simultaneously.The present embodiment by deeply learn and human-step State data combine, and make robot that may finally obtain stable, submissive gait as people.
Specific embodiments of the present invention are described above.It is to be appreciated that the invention is not limited to above-mentioned Particular implementation, those skilled in the art can make a variety of changes or modify within the scope of the claims, this not shadow Ring substantive content of the invention.In the absence of conflict, the feature in embodiments herein and embodiment can any phase Mutually combination.

Claims (6)

1. a kind of biped robot's gait planning method based on deeply study, which comprises the steps of:
Step S1: establishing biped robot's model, describes robot ambulation process;
Step S2: obtaining and handles body gait data and target gait data;
Step S3: it is extracted respectively using noise reduction autocoder implicit in biped robot's gait data and body gait data Feature;
Step S4: body gait feature is learnt using deeply study, and then plans biped robot's gait.
2. a kind of biped robot's gait planning method based on deeply study according to claim 1, feature Be, step S1 specifically includes the following steps:
Step S101: establishing 4 connecting rods has knee circular arc biped robot model;Wherein, robot model includes 2 thighs, 2 shanks And 2 circular arc foots, leg pass through hinge without frictionally linking together by rigid rod, circular arc is respectively fixedly connected with enough in shank On, it supporting leg and leads leg with identical quality and geometric parameter, and the uniform quality distribution of leg, robot model's Position-limit mechanism is set at knee joint to simulate the knee joint function of human body, two motors are set in hip joint, respectively to supporting leg Apply control moment with leading leg;
Step S102: analyzing model gait processes using the right side of direction of advance during robot ambulation as viewpoint, The dimensionless physical quantity for selecting real-time characterization robotary, is defined as robot ambulatory status Θ for selected physical quantityr, Robot ambulatory status is described as:
Wherein, rotation counterclockwise is taken to be positive, θr1,For lead leg shank to vertical direction angle and angular speed;θr2,For Lead leg thigh to vertical direction angle and angular speed;θr3,For the angle of supporting leg shank to vertical direction and angle speed Degree.
3. a kind of biped robot's gait planning method based on deeply study according to claim 2, feature Be, step S2 specifically includes the following steps:
Step S201: human body and robot are swung into the process led leg and collided with ground from leading leg and are defined as one Gait cycle;
Step S202: human normal walking process data set is chosen from CMU human sports trapped data library, data set is carried out Human body is divided and is resolved, and obtains human locomotion process description;
Step S203: using robot model as reference, the 2D plane of human locomotion longitudinal direction is taken, definition human locomotion state is Θm, All data in human locomotion process description are used into ΘmIt is indicated, and by ΘmAs row vector, combination obtains human-step State data ΘM
Step S204: from body gait data ΘMThe middle learning object for choosing a gait cycle as robot, extracts study Odd-numbered frame in object data forms new data set, and is defined as target gait data ΘS, wherein target gait data ΘS In any row vector be Θ that extraction obtainsm
Step S205: by ambulatory status Θ of the robot in gait cyclerAccording to ΘSIn sample frequency sampled, form Robot gait data ΘR, wherein robot gait data ΘRIn any row vector be to sample obtained Θr
4. a kind of biped robot's gait planning method based on deeply study according to claim 3, feature It is, step S3 is specifically included: according to Θr、ΘmData structure, construct the identical noise reduction autocoder of two structures, it is right Robot gait data ΘRWith target gait data ΘSCarry out feature extraction;By ΘR、θSRow vector be sent into one by one noise reduction from Dynamic encoder, and obtained feature is arranged by original sequence, form robot gait characteristic HRWith target gait feature Data HS, by HRAnd HSIt is uniformly normalized for use in deeply study, wherein each noise reduction autocoder work Make process following steps:
S301: Θ is takenROr θSMiddle a line vector theta is sent into noise reduction autocoder, and noise reduction autocoder uses bi-distribution pair Original gait data Θ carries out selective erasing, is erased data and sets 0, obtains noise-containing gait dataPass through coding function F willIt is mapped to hidden layer, obtains hidden layer feature h, wherein the coding function of noise reduction autocoder are as follows:
Wherein, w is the weight matrix of input layer and hiding interlayer;sfFor the activation primitive of coding function f, activation primitive is taken Sigmod function;
S302: hidden layer feature h is mapped to output layer by decoding functions g, obtains reconstruct output y;Reconstruct output y keeps original The information of gait data x, global error pass through whole loss function JDAEIt indicates, the wherein decoding of noise reduction autocoder Function are as follows:
Wherein,For the weight matrix of hidden layer and output interlayer, and havesgFor the activation primitive of decoding functions, equally For Sigmod function;The whole loss function of noise reduction autocoder in given training set:
Wherein θDAEIt is the parameter of noise reduction autocoder, including w, p, q;L is defined as reconstructed error, for portraying connecing for y and Θ Short range degree:
Wherein n is the dimension of input and output layer;
S303: noise reduction autocoder training process is declined using gradient to JDAE(θ), which is iterated, to be calculated to obtain minimum value, Gradient declines to θDAERenewal function:
Wherein α is learning rate, and value is [0,1].
5. a kind of biped robot's gait planning method based on deeply study according to claim 3, feature It is, in step S4, learning algorithm of the selected depth deterministic policy gradient algorithm DDPG as biped robot, certainly by noise reduction The robot gait characteristic H of dynamic coder processesRInput data s as depth deterministic policy gradient algorithmt, target Gait feature data HSAs rtCalculation basis, and by depth deterministic policy gradient algorithm output motor execute torque at;Machine Device people acquires the data of different gaits in continuous walking process, is supplied to the training of depth deterministic policy gradient algorithm, most The ability for making depth deterministic policy gradient algorithm that there is control robot to reach target gait eventually.
6. a kind of biped robot's gait planning method based on deeply study according to claim 5, feature It is, wherein the tactful network of depth deterministic policy gradient algorithm uses 5 layers of convolutional neural networks, wherein respectively including inputting Layer, two layers of convolutional layer, full linking layer, output layer, wherein input layer is for receiving st, power that output layer output motor needs to be implemented Square at
CN201810979187.2A 2018-08-27 2018-08-27 Biped robot gait planning method based on deep reinforcement learning Active CN108983804B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810979187.2A CN108983804B (en) 2018-08-27 2018-08-27 Biped robot gait planning method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810979187.2A CN108983804B (en) 2018-08-27 2018-08-27 Biped robot gait planning method based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN108983804A true CN108983804A (en) 2018-12-11
CN108983804B CN108983804B (en) 2020-05-22

Family

ID=64547820

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810979187.2A Active CN108983804B (en) 2018-08-27 2018-08-27 Biped robot gait planning method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN108983804B (en)

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109760046A (en) * 2018-12-27 2019-05-17 西北工业大学 Robot for space based on intensified learning captures Tum bling Target motion planning method
CN110046457A (en) * 2019-04-26 2019-07-23 百度在线网络技术(北京)有限公司 Control method, device, electronic equipment and the storage medium of manikin
CN110496377A (en) * 2019-08-19 2019-11-26 华南理工大学 A kind of virtual table tennis forehand hit training method based on intensified learning
CN110711055A (en) * 2019-11-07 2020-01-21 江苏科技大学 Image sensor intelligence artificial limb leg system based on degree of depth learning
CN110764415A (en) * 2019-10-31 2020-02-07 清华大学深圳国际研究生院 Gait planning method for leg movement of quadruped robot
CN110861084A (en) * 2019-11-18 2020-03-06 东南大学 Four-legged robot falling self-resetting control method based on deep reinforcement learning
CN111142378A (en) * 2020-01-07 2020-05-12 四川省桑瑞光辉标识***股份有限公司 Neural network optimization method of biped robot neural network controller
CN111241700A (en) * 2020-01-19 2020-06-05 中国科学院光电技术研究所 Intelligent design method of microwave broadband super-surface absorber
CN111487864A (en) * 2020-05-14 2020-08-04 山东师范大学 Robot path navigation method and system based on deep reinforcement learning
CN111558937A (en) * 2020-04-07 2020-08-21 向仲宇 Robot motion control method based on deep learning
CN111625002A (en) * 2019-12-24 2020-09-04 杭州电子科技大学 Stair-climbing gait planning and control method of humanoid robot
CN111814618A (en) * 2020-06-28 2020-10-23 浙江大华技术股份有限公司 Pedestrian re-identification method, gait identification network training method and related device
CN112060075A (en) * 2020-07-21 2020-12-11 深圳先进技术研究院 Training method, training device and storage medium for gait generation network
CN112149835A (en) * 2019-06-28 2020-12-29 杭州海康威视数字技术股份有限公司 Network reconstruction method and device
CN112171660A (en) * 2020-08-18 2021-01-05 南京航空航天大学 Space double-arm system constrained motion planning method based on deep reinforcement learning
CN112232350A (en) * 2020-10-27 2021-01-15 广东技术师范大学 Paddy field robot mechanical leg length adjusting method and system based on reinforcement learning
CN112256028A (en) * 2020-10-15 2021-01-22 华中科技大学 Method, system, equipment and medium for controlling compliant gait of biped robot
CN112666939A (en) * 2020-12-09 2021-04-16 深圳先进技术研究院 Robot path planning algorithm based on deep reinforcement learning
CN112782973A (en) * 2019-11-07 2021-05-11 四川省桑瑞光辉标识***股份有限公司 Biped robot walking control method and system based on double-agent cooperative game
CN114047697A (en) * 2021-11-05 2022-02-15 河南科技大学 Four-footed robot balance inverted pendulum control method based on deep reinforcement learning
CN114684293A (en) * 2020-12-28 2022-07-01 成都启源西普科技有限公司 Robot walking simulation algorithm
CN115366099A (en) * 2022-08-18 2022-11-22 江苏科技大学 Mechanical arm depth certainty strategy gradient training method based on forward kinematics
CN117572877A (en) * 2024-01-16 2024-02-20 科大讯飞股份有限公司 Biped robot gait control method, biped robot gait control device, storage medium and equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101751037A (en) * 2008-12-03 2010-06-23 上海电气集团股份有限公司 Dynamic walking control method for biped walking robot
CN104751172A (en) * 2015-03-12 2015-07-01 西安电子科技大学 Method for classifying polarized SAR (Synthetic Aperture Radar) images based on de-noising automatic coding
CN106127804A (en) * 2016-06-17 2016-11-16 淮阴工学院 The method for tracking target of RGB D data cross-module formula feature learning based on sparse depth denoising own coding device
CN106406162A (en) * 2016-08-12 2017-02-15 广东技术师范学院 Alternating current servo control system based on transfer neural network
CN107506333A (en) * 2017-08-11 2017-12-22 深圳市唯特视科技有限公司 A kind of visual token algorithm based on ego-motion estimation
CN108241375A (en) * 2018-02-05 2018-07-03 景德镇陶瓷大学 A kind of application process of self-adaptive genetic operator in mobile robot path planning
US20180268262A1 (en) * 2017-03-15 2018-09-20 Fuji Xerox Co., Ltd. Information processing device and non-transitory computer readable medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101751037A (en) * 2008-12-03 2010-06-23 上海电气集团股份有限公司 Dynamic walking control method for biped walking robot
CN104751172A (en) * 2015-03-12 2015-07-01 西安电子科技大学 Method for classifying polarized SAR (Synthetic Aperture Radar) images based on de-noising automatic coding
CN106127804A (en) * 2016-06-17 2016-11-16 淮阴工学院 The method for tracking target of RGB D data cross-module formula feature learning based on sparse depth denoising own coding device
CN106406162A (en) * 2016-08-12 2017-02-15 广东技术师范学院 Alternating current servo control system based on transfer neural network
US20180268262A1 (en) * 2017-03-15 2018-09-20 Fuji Xerox Co., Ltd. Information processing device and non-transitory computer readable medium
CN107506333A (en) * 2017-08-11 2017-12-22 深圳市唯特视科技有限公司 A kind of visual token algorithm based on ego-motion estimation
CN108241375A (en) * 2018-02-05 2018-07-03 景德镇陶瓷大学 A kind of application process of self-adaptive genetic operator in mobile robot path planning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
KAI HENNING KOCH 等: "《Optimization-based walking generation for humanoid robot》", 《10TH IFAC SYMPOSIUM ON ROBOT CONTROL》 *
吴晓光 等: "《一种基于神经网络的双足机器人》", 《中国机械工程》 *
胡运富 等: "《简单双足被动行走模型仿真和分析》", 《哈尔滨工业大学学报》 *

Cited By (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109760046A (en) * 2018-12-27 2019-05-17 西北工业大学 Robot for space based on intensified learning captures Tum bling Target motion planning method
CN110046457A (en) * 2019-04-26 2019-07-23 百度在线网络技术(北京)有限公司 Control method, device, electronic equipment and the storage medium of manikin
CN110046457B (en) * 2019-04-26 2021-02-05 百度在线网络技术(北京)有限公司 Human body model control method and device, electronic equipment and storage medium
CN112149835A (en) * 2019-06-28 2020-12-29 杭州海康威视数字技术股份有限公司 Network reconstruction method and device
CN112149835B (en) * 2019-06-28 2024-03-05 杭州海康威视数字技术股份有限公司 Network reconstruction method and device
CN110496377B (en) * 2019-08-19 2020-07-28 华南理工大学 Virtual table tennis player ball hitting training method based on reinforcement learning
CN110496377A (en) * 2019-08-19 2019-11-26 华南理工大学 A kind of virtual table tennis forehand hit training method based on intensified learning
CN110764415A (en) * 2019-10-31 2020-02-07 清华大学深圳国际研究生院 Gait planning method for leg movement of quadruped robot
CN110764415B (en) * 2019-10-31 2022-04-15 清华大学深圳国际研究生院 Gait planning method for leg movement of quadruped robot
CN112782973A (en) * 2019-11-07 2021-05-11 四川省桑瑞光辉标识***股份有限公司 Biped robot walking control method and system based on double-agent cooperative game
CN110711055A (en) * 2019-11-07 2020-01-21 江苏科技大学 Image sensor intelligence artificial limb leg system based on degree of depth learning
CN110861084A (en) * 2019-11-18 2020-03-06 东南大学 Four-legged robot falling self-resetting control method based on deep reinforcement learning
CN110861084B (en) * 2019-11-18 2022-04-05 东南大学 Four-legged robot falling self-resetting control method based on deep reinforcement learning
CN111625002A (en) * 2019-12-24 2020-09-04 杭州电子科技大学 Stair-climbing gait planning and control method of humanoid robot
CN111625002B (en) * 2019-12-24 2022-12-13 杭州电子科技大学 Stair-climbing gait planning and control method of humanoid robot
CN111142378A (en) * 2020-01-07 2020-05-12 四川省桑瑞光辉标识***股份有限公司 Neural network optimization method of biped robot neural network controller
CN111241700B (en) * 2020-01-19 2022-12-30 中国科学院光电技术研究所 Intelligent design method of microwave broadband super-surface absorber
CN111241700A (en) * 2020-01-19 2020-06-05 中国科学院光电技术研究所 Intelligent design method of microwave broadband super-surface absorber
CN111558937A (en) * 2020-04-07 2020-08-21 向仲宇 Robot motion control method based on deep learning
CN111487864A (en) * 2020-05-14 2020-08-04 山东师范大学 Robot path navigation method and system based on deep reinforcement learning
CN111814618A (en) * 2020-06-28 2020-10-23 浙江大华技术股份有限公司 Pedestrian re-identification method, gait identification network training method and related device
CN111814618B (en) * 2020-06-28 2023-09-01 浙江大华技术股份有限公司 Pedestrian re-recognition method, gait recognition network training method and related devices
CN112060075A (en) * 2020-07-21 2020-12-11 深圳先进技术研究院 Training method, training device and storage medium for gait generation network
CN112171660A (en) * 2020-08-18 2021-01-05 南京航空航天大学 Space double-arm system constrained motion planning method based on deep reinforcement learning
CN112256028A (en) * 2020-10-15 2021-01-22 华中科技大学 Method, system, equipment and medium for controlling compliant gait of biped robot
CN112232350A (en) * 2020-10-27 2021-01-15 广东技术师范大学 Paddy field robot mechanical leg length adjusting method and system based on reinforcement learning
CN112232350B (en) * 2020-10-27 2022-04-19 广东技术师范大学 Paddy field robot mechanical leg length adjusting method and system based on reinforcement learning
CN112666939A (en) * 2020-12-09 2021-04-16 深圳先进技术研究院 Robot path planning algorithm based on deep reinforcement learning
CN112666939B (en) * 2020-12-09 2021-09-10 深圳先进技术研究院 Robot path planning algorithm based on deep reinforcement learning
CN114684293A (en) * 2020-12-28 2022-07-01 成都启源西普科技有限公司 Robot walking simulation algorithm
CN114047697B (en) * 2021-11-05 2023-08-25 河南科技大学 Four-foot robot balance inverted pendulum control method based on deep reinforcement learning
CN114047697A (en) * 2021-11-05 2022-02-15 河南科技大学 Four-footed robot balance inverted pendulum control method based on deep reinforcement learning
CN115366099A (en) * 2022-08-18 2022-11-22 江苏科技大学 Mechanical arm depth certainty strategy gradient training method based on forward kinematics
CN115366099B (en) * 2022-08-18 2024-05-28 江苏科技大学 Mechanical arm depth deterministic strategy gradient training method based on forward kinematics
CN117572877A (en) * 2024-01-16 2024-02-20 科大讯飞股份有限公司 Biped robot gait control method, biped robot gait control device, storage medium and equipment
CN117572877B (en) * 2024-01-16 2024-05-31 科大讯飞股份有限公司 Biped robot gait control method, biped robot gait control device, storage medium and equipment

Also Published As

Publication number Publication date
CN108983804B (en) 2020-05-22

Similar Documents

Publication Publication Date Title
CN108983804A (en) A kind of biped robot&#39;s gait planning method based on deeply study
Bruderlin et al. Goal-directed, dynamic animation of human walking
Bongard et al. Evolving complete agents using artificial ontogeny
CN107861508A (en) A kind of mobile robot local motion method and device for planning
Lewis et al. Genetic algorithms for gait synthesis in a hexapod robot
EP2360629A2 (en) Device for the autonomous bootstrapping of useful information
CN109483530A (en) A kind of legged type robot motion control method and system based on deeply study
Xia et al. Relmogen: Leveraging motion generation in reinforcement learning for mobile manipulation
CN113156892B (en) Four-footed robot simulated motion control method based on deep reinforcement learning
CN110400345A (en) Radioactive waste based on deeply study, which pushes away, grabs collaboration method for sorting
CN105965506A (en) Humanoid biped robot walking posture control method based on genetic algorithm
CN102157009A (en) Method for compiling three-dimensional human skeleton motion based on motion capture data
CN104921851B (en) The kneed forecast Control Algorithm of active above-knee prosthesis
CN110286592A (en) A kind of multi-modal movement technique of machine fish based on BP neural network and system
Fridman et al. Deeptraffic: Driving fast through dense traffic with deep reinforcement learning
Valsalam et al. Modular neuroevolution for multilegged locomotion
CN107481099A (en) Can 360 degree turn round real-time virtual fitting implementation method
CN106094817A (en) Intensified learning humanoid robot gait&#39;s planing method based on big data mode
CN110516736A (en) The visual multi-source heterogeneous data multilayer DRNN depth integration method of multidimensional
CN116977599A (en) Shield tunneling machine driving simulation method and system based on meta universe
CN113379027A (en) Method, system, storage medium and application for generating confrontation interactive simulation learning
Xu et al. Karting racing: A revisit to PPO and SAC algorithm
CN106910233B (en) Motion simulation method of virtual insect animation role
Conde et al. Learnable behavioural model for autonomous virtual agents: low-level learning
Revell et al. Sim2real: Issues in transferring autonomous driving model from simulation to real world

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant