CN106681149A - Bumblebee robot abdomen swinging control method based on virtual reality and reinforcement learning - Google Patents

Bumblebee robot abdomen swinging control method based on virtual reality and reinforcement learning Download PDF

Info

Publication number
CN106681149A
CN106681149A CN201710018201.8A CN201710018201A CN106681149A CN 106681149 A CN106681149 A CN 106681149A CN 201710018201 A CN201710018201 A CN 201710018201A CN 106681149 A CN106681149 A CN 106681149A
Authority
CN
China
Prior art keywords
bumblebee
abdomen
pendulum
robot
action
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710018201.8A
Other languages
Chinese (zh)
Other versions
CN106681149B (en
Inventor
郑能干
金梦洁
马倩
王雪菲
江毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201710018201.8A priority Critical patent/CN106681149B/en
Publication of CN106681149A publication Critical patent/CN106681149A/en
Application granted granted Critical
Publication of CN106681149B publication Critical patent/CN106681149B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/0265Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • G05B13/042Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance

Landscapes

  • Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Toys (AREA)

Abstract

The invention discloses a bumblebee robot abdomen swinging control method based on virtual reality and reinforcement learning, and belongs to the technical field of bug robots. The control method comprises the following steps: (1) establishing a Q meter according to a set bumblebee abdomen swinging target mode, wherein the Q meter records accumulated discount reward points obtained by every action in every abdomen swinging mode; and (2) monitoring the abdomen swinging mode of a bumblebee at the current time, and producing movements to control abdomen swinging of the bumblebee on the basis of the Q meter and according to the reward point maximum principle. The interactive mechanism between the abdomen swinging behavior of the bumblebee and a visual stimulator is established by an algorithm, and the circumstance that a large number of bioexperiments are carried out to obtain a relation between stimulation and behavioral response is avoided. The motion behavior of the bumblebee is induced by visual stimulation, an enhanced learning algorithm is used as a control algorithm, a loop-locked control system is formed effectively, and abdomen swinging action of the bumblebee can be controlled accurately.

Description

A kind of bumblebee robot pendulum abdomen control method based on virtual reality and intensified learning
Technical field
The present invention relates to insect robot technical field, and in particular to a kind of bumblebee based on virtual reality and intensified learning Abdomen control method is put by robot.
Background technology
Insect robot is with insect as carrier, with reference to a kind of new animal robot system that control module is formed.This Class special machine people is by carrier locomitivity in itself, it is not necessary to complicated machinery and dynamical system is considered, in scientific research, state The fields such as anti-security, industrial production have wide practical use.
How to control insect flying is the primary problem for solving in insect robot research.Insect robot generally comprises row It is stimulating module, insect lipids collection and analysis module, three parts of control instruction decision-making.
At present, the behavior stimulation mode of insect robot is based on electro photoluminescence, and adjusts insect not by hand by people With parameters of electrical stimulation corresponding under motion state.Electrical stimulation method is to be implanted to silicon or fibril electrode by animal surgery The muscle or nervous system of insect, the stimulation site of insect is input to using microcontroller generation square-wave pulse, reaches control The purpose of the behaviors such as insect deflection.Although the insect robot of electro photoluminescence formula achieves significant progress, still deposit In some problem:
First, the muscle or nervous system of insect can be caused in implantable stimulation electrode process damage, cause follow-up Control performance decline;Second, prolonged electro photoluminescence can cause insect fatigue, and the response sensitivity to electro photoluminescence declines;The Three, because insect build is smaller, the muscle of most of insect and the volume of brain are about 1 cubic millimeter, then an electro photoluminescence Multiple sensation-moving loops, even whole muscle or brain are may relate to, so as to trigger uncertain behavior reaction;The Four, to carry out manual adjustment stimulation parameter, it is necessary to carrying out substantial amounts of Bioexperiment in advance is stimulated the-pass of behavior response System, so needs substantial amounts of Bioexperiment, will consume the substantial amounts of time, and the individual behavior to identical stimulation parameter of different insects Response still suffers from larger deviation, therefore the mode of manual adjustment stimulation parameter has inefficiency, the problems such as control not accurate;
Developing the control mode of non-built-in mode can avoid the above-mentioned problem for causing to damage to insect body.Numerous studies table Bright, light stream plays key effect in many important behavior of insect, such as, flying speed control, height control, depth are surveyed Amount, flying distance measurement etc., for the insect robot for realizing optofluidic standard provides theoretical foundation.
The mode for being used to provide visual stimulus in the research of insect visual and electro physiology has:Print streaky paper, throwing Shadow instrument, display and LED dot matrix etc..Visual stimulator the most frequently used at present is made up of LED dot matrix, such as Application No. 201410422437.4 Chinese patent application document provides a kind of bumblebee flight controller, not only to meet the compound eye of bumblebee Characteristic, and customizable light stream stimulation can be generated.But how to be stimulated using light stream, and in nitrification enhancement framework Under, is control effectively in bumblebee flight and do not have been reported that but.
The content of the invention
Abdomen control method is put the invention provides a kind of bumblebee robot based on virtual reality and intensified learning, using strong Change learning algorithm control bumblebee belly swing reach a dbjective state from an original state, overcome in the prior art due to Artificial regulation stimulation parameter control is not accurate, and needs to carry out the problem of a large amount of Bioexperiment.
A kind of bumblebee robot pendulum abdomen control method based on virtual reality and intensified learning, including:
(1) target pattern that the bumblebee belly according to setting swings, Q tables, the Q tokens are set up using nitrification enhancement The accumulation discount award value that each action is obtained under pendulum abdomen pattern per class is recorded;
(2) the pendulum abdomen pattern at bumblebee current time is monitored, Q tables are compareed, according to award value maximum principle, action is made in real time Control bumblebee pendulum abdomen;
The action is the visual stimulus of the different parameters formed by the light beam flowed around bumblebee.
With nitrification enhancement be modeled the insect robot control problem by the present invention.Virtual reality system is provided Optic flow information is directly inputted to the compound eye of bumblebee, induces the pendulum abdomen behavior of bumblebee;The belly of high-definition camera real-time capture bumblebee Motor behavior, and the belly weave mode of bumblebee is obtained by computer picture algorithm, as ambient condition;Algorithm can change light The parameter of stream information, LED screen can change its kinematic parameter, then input to bumblebee, so as to form a closed-loop control system.
The award value maximum principle is Greedy strategy.
The action is produced by the virtual reality system that several LED displays are constituted, and the LED display is by being based on The LED control system control of CAN.
Specifically, a kind of LED display based on CAN, the Optic flow information for providing bumblebee is input into;The LED Display screen is 32 × 8 structures, and 32 pins control row of single-chip microcomputer show that remaining 8 pins control transverse direction LED shows to LED. Some constituted by the host node in PC control ends, CAN and from node comprising LED drive module.The PC control ends bag Input block and first serial communication service unit are included, the packet that will contain image information and control command is sent to CAN The host node of bus, the host node in the CAN includes second serial communication service unit, receives the number that client sends According to;CAN communication service unit, after the data encoding that will be received, uploads to CAN;It is described to include the from node Two CAN communication service units, receive data, and be transferred to corresponding LED drive module from CAN.The LED drives Single-chip microcomputer model MSP430F149 used by module, LED drive module drives LED, so as to reach display image and control The function of image motion.
The present invention reflects the flight behavior of bumblebee using record bumblebee belly weave mode, and numerous studies find, insect Belly played in insect flying control key effect because the position of belly determine wing flutter its wings up and down generation power side To.
Above-mentioned bumblebee puts the collection and analysis of abdomen behavior, is made up of the camera of high definition, a particle filter algorithm, can To obtain the deflection angle time series of bumblebee belly.The high-definition camera model ANC HD1080P, acquisition frame rate is 30 Frame/second;Camera is connected by USB line and computer, by image transmitting to computer, for follow-up behavioural analysis.The grain Son filtering is a kind of widely used, for the classic algorithm of object tracking, before the experiments, choose a certain fixing point as with Track point, one piece of bumblebee belly rectangular area is used as target, the color histogram in the region is tracked as feature in choosing image In next frame target position is calculated, the belly deflection angle of bumblebee is thus calculated.After obtaining angle, sliding window is used Method calculate bumblebee pendulum abdomen pattern.
Preferably, the pendulum abdomen pattern by the maximum of bumblebee belly deflection angle in single sliding window, minimum value, Average absolute value, Wave crest and wave trough number, 6 dimensional vectors constituted more than average point number and variance are characterized.
Preferably, the width of the sliding window is 200~400ms.More preferably, the width of sliding window is 200ms。
The present invention establishes the interaction mechanism between bumblebee and visual stimulator, intensified learning using nitrification enhancement Process be one constantly with the process of environmental interaction, by interaction, can obtain a kind of from environment to the optimal of action mapping Strategy.Intensified learning includes environment, action and three major parts of reward function.
The environment division is the belly pendulum abdomen pattern of bumblebee, is represented by 6 above-mentioned dimensional vectors;
The action is that the parameter to outside visual stimulus that nitrification enhancement can be taken regulates and controls, preferably, the Q Table includes 5~20 actions.The visual stimulus parameter is the angular speed of image rotation, specifically, there is 10 optional angle speed Degree, respectively:[2.05,2.89,3.97,4.88,6.20,7.21,8.72,11.02,14.5,22.16,]rad/s;Additionally, regarding Feel that stimulation parameter can also be including the brightness of image, the direction of motion, width etc..
The reward function is mainly put the Euclidean distance between abdomen pattern and is determined by the pendulum abdomen pattern and target of current bumblebee, Note dbjective state is vector xg, correspondence discrete state is kg, the bumblebee pendulum abdomen pattern of current time t is xt, corresponding to discrete state is kt, the pendulum abdomen pattern of last moment bumblebee is designated as xt-1, dist1It is xtWith xgBetween distance, dist2It is xt-1With xgBetween distance, So reward function can be designated as:
R=0.2, if (dist1> dist2and kt≠kg);
R=+1, if (dist1> dist2and kt=kg);
R=-1, otherwise.
Reward function is that the quality for producing certain to act to intensified learning makees a kind of evaluation, and the purpose of intensified learning is so that The reward of accumulation is maximized.
Preferably, the method for building up of the Q tables is:
(1) Q tables, learning rate α, discount factor γ, pendulum abdomen pattern clustering center are initialized;
(2) the pendulum abdomen pattern x of moment t bumblebee is obtainedt
(3) sequence K-means discretizations, adjustment pendulum abdomen pattern class center are utilized;
(4) according to epsilon-greedy selections action at
(5) execution action at, obtain award value rt+1With the pendulum abdomen pattern x of subsequent time bumblebeet+1
(6) repeat step (3) and (4), selection action at+1
(7) Q tables are updated according to equation below;
Q(xt,at)←Q(xt,at)+α[rt+1+γQ(xt+1,at+1)-Q(xt,at)];
(8) execution is started the cycle over from step (2) again, until the convergence of Q tables.
By interacting for nitrification enhancement and environment, obtain one from can with the optimal policy of environment to action mapping, The parameter of visual stimulus next time can be determined according to the pendulum abdomen pattern of current bumblebee, a control system for closed loop is formed System.In actual control process, pendulum abdomen pattern will fast, accurately level off to target-like under nitrification enhancement control State.
The beneficial effect that the present invention possesses:
(1) present invention is stimulated for the pathways for vision of bumblebee using light stream, is eliminated because being caused using implanted mode The injury to bumblebee in itself.
(2) interaction mechanism between bumblebee pendulum abdomen behavior and visual stimulator is established using nitrification enhancement, no Need to carry out substantial amounts of Bioexperiment to obtain the relation between stimulation-behavior response.
(3) visual stimulus of the present invention induces the motor behavior of bumblebee, and using nitrification enhancement as control algolithm, effectively Form a control system for closed loop, can be accurately controlled bumblebee belly swing behavior.
Brief description of the drawings
Fig. 1 is the bumblebee robot closed-loop control system control flow chart based on virtual reality and intensified learning of invention.
Fig. 2 is the bumblebee robot close-loop control part structural representation based on virtual reality and intensified learning of invention Figure.
Fig. 3 is the LED display module diagram based on CAN.
Behavior collections and analysis process and result schematic diagram of the Fig. 4 for bumblebee.
Fig. 5 is nitrification enhancement interaction schematic diagram.
Fig. 6 be based on sliding window come extract bumblebee belly swing feature process.
Fig. 7 is the learning process of nitrification enhancement.
Fig. 8 be a bumblebee respectively nitrification enhancement and randomized policy control under dimension SSC controlling curve.
Fig. 9 is the bumblebee dimension SSC and target dimension SSC under nitrification enhancement and randomized policy control respectively Aberration curve.
The Q table distribution maps that Figure 10 is arrived for Algorithm Learning.
Specific embodiment
With reference to embodiment, the invention will be further described.
Embodiment 1
As shown in figure 1, wherein Fig. 1 (A) is the bumblebee robot based on virtual reality and nitrification enhancement of the invention The control loop of closed-loop control system, control loop is that the compound eye of bumblebee first have received the light stream stimulation of outside, then produce Having given birth to belly swing phase should;In order to the belly for catching bumblebee swings behavior, the present invention tracks bumblebee using particle filter algorithm Belly tip, and its pendulum angle is calculated, feature extraction is then carried out according to sliding window;The bumblebee belly pendulum for extracting Dynamic feature, used as the ambient condition of nitrification enhancement, algorithm can choose the stimulation parameter at next moment, thus form The learning process of closed loop.Fig. 1 (B) is then the bumblebee robot closed loop control based on virtual reality and nitrification enhancement of the invention The physical composition of system processed, corresponding diagram 1 (A), system includes a LED display based on CAN, for providing light stream thorn Swash;One high-definition camera for being used to gather bumblebee pendulum abdomen behavior and one are used to that image to be analyzed and processed, and transport Go the computer of nitrification enhancement.
As shown in Fig. 2 a kind of bumblebee robot control method experiment porch based on light stream stimulation and intensified learning, including Vibration isolators 1,2, is placed on the LED display 3 on vibration isolators, and LED screen is used for producing specific visual stimulus, micro- for fixing Support bracket fastened pillar 4 is grasped, the micro- behaviour's fixed support 5 for adjusting bumblebee fixed position swings row for shooting bumblebee belly For high-definition camera 6, gimbals 7, for the extension of follow-up system, additionally, be further comprises in design high speed camera 8 with And for fixing the support 9 of high speed camera.Whole system is placed on vibration isolators in addition to PC, prevents desktop from shaking band To test influence.
As shown in figure 3, a kind of LED display based on CAN, the Optic flow information input for providing bumblebee;It is described LED display is 32 × 8 structures, and 32 pins control row of single-chip microcomputer show that remaining 8 pins control transverse direction LED shows to LED Show.Some constituted by the host node in PC control ends, CAN and from node comprising LED drive module.The PC control ends Including input block and first serial communication service unit, the packet that will contain image information and control command is sent to The host node of CAN, the host node in the CAN includes second serial communication service unit, receives client and sends Data;CAN communication service unit, after the data encoding that will be received, uploads to CAN;It is described from node bag The second CAN communication service unit is included, data is received from CAN, and be transferred to corresponding LED drive module.The LED Single-chip microcomputer model MSP430F149 used by drive module, LED drive module drive LED, so as to reach display image and Control the function of image motion.
Single bright fringes is used in the present invention as bumblebee robotic vision information input, thus induce bumblebee to transport Dynamic behavior.
As shown in figure 4, being the collection of bumblebee flight behavior and data analysis schematic diagram in the present invention, the belly of bumblebee swings Behavior is captured by the high-definition camera at the back side, and the model of camera is ANC HD1080P, the pendulum abdomen frequency highest order of bumblebee 7Hz or so, therefore it is 30 frames/second to set the frame per second of camera.(x in figure0,y0) it is point that bumblebee is fixed, bumblebee is fixed on On a piece thin iron staff of bending, the angle of bend of iron staff is 45 degree, and a diameter of 1mm, fixing point is bumblebee back.Selected by operating personnel Take the abdominal region region for needing tracking, the center for remembering region is (x, y), according to formula can obtain bumblebee pendulum abdomen angle beta= arctan[(x-x0)/(y-y0)]。
As shown in figure 5, being the interaction of the nitrification enhancement in the present invention, generally, intensified learning maintains one From environment to the action value function Q (x of action mappingt,at), represent in state xtUnder, take action atThe accumulation prize being obtained in that Encourage, each time, after this value function is obtained, generally select corresponding states xtUnder, it is obtained in that the action a of maximumt.This The renewal of value function is constantly obtained by intensified learning with environmental interaction.
During t, the pendulum abdomen pattern of bumblebee is xt, and state discrete is carried out according to sequence K-means algorithms, then choosing Select an action at, that is, current light stream stimulation parameter is have adjusted, meanwhile, the compound eye of bumblebee is received after light stream stimulates, meeting Corresponding belly is produced to swing behavior, the pendulum abdomen behavioral parameters of bumblebee are the deflection angle obtained by particle filter algorithm, and such as Shown in Fig. 6, we are provided with the sliding window that a size is 200ms here, take wherein all of deflection angle data, calculate Include maximum, minimum value, average absolute value, Wave crest and wave trough number, more than average point number and variance this 6 feature conduct The pendulum abdomen pattern of bumblebee, this 6 dimensional feature is the state x shown in figuret+1, meanwhile, by calculating reward function, can obtain Currently available instant prize rt+1, the more new formula of nitrification enhancement is:
Q(xt,at)←Q(xt,at)+α[rt+1+γQ(xt+1,at+1)-Q(xt,at)]。
Wherein for α is the learning rate of algorithm, γ is discount factor.Specific algorithm flow is as shown in Figure 7.
The reward function is mainly put the Euclidean distance between abdomen pattern and is determined by the pendulum abdomen pattern and target of current bumblebee, Note dbjective state is vector xg, correspondence discrete state is kg, the bumblebee pendulum abdomen pattern of current time t is xt, corresponding to discrete state is kt, the pendulum abdomen pattern of last moment bumblebee is designated as xt-1, dist1It is xtWith xgBetween distance, dist2It is xt-1With xgBetween distance, So reward function can be designated as:
R=0.2, if (dist1> dist2and kt≠kg);
R=+1, if (dist1> dist2and kt=kg);
R=-1, otherwise.
Using nitrification enhancement, without knowing the corresponding pass between light stream stimulation parameter and bumblebee pendulum abdomen pattern in advance System, by continuous interactive learning, nitrification enhancement may learn this from ambient condition (bumblebee pendulum abdomen pattern) to moving Make the mapping relations of (light stream stimulation parameter), and as follow-up closed loop Real-Time Control Strategy, it is to avoid artificially many experiments And carry out the process of data analysis.
In order to verify the validity of the controlling mechanism based on nitrification enhancement, we are also carried out with common random algorithm Compare, Fig. 8-10 is the data and its Algorithm Control Experiment result of the bumblebee that we collect, Fig. 8 and 9 is the control of bumblebee Koji-making line chart, embodiment be current state dimension SSC convergences from dimension SSC to dbjective state xg situation (SSC refer to slide The number of Wave crest and wave trough in window, has reacted the frequency information of abdominal exercise, is that description bumblebee belly swings topmost spy Levy).
Wherein the line with triangle is the SSC values of target, is the change under nitrification enhancement control with circular line Curve, band × line be then random algorithm control change curve, it can be seen that:
1st, the control result of nitrification enhancement causes that each steps of SSC all increasingly level off to dbjective state, each step shape The deviation (corresponding is the Euclidean distance of Figure 10) of state and dbjective state is monotone decreasing, and final deviation tends to 0.But it is random The controlling curve of algorithm is random in control process, and deviation is also random, does not control trend clearly.
2nd, under the control of nitrification enhancement, bumblebee has tended to desired value, and final holding within most 8 steps Stabilization, but random algorithm does not cause that the process of control tends to desired value, the state near desired value is without regularity.
3. the curve under nitrification enhancement control is more smooth, and the curve shake of random algorithm control is very big.
From the point of view of result and analysis more than, based on nitrification enhancement, we can be very good to be controlled with light stream stimulation The belly of bumblebee processed swings, and is that follow-up light streaming bumblebee robot lays a good foundation.

Claims (6)

1. abdomen control method is put by a kind of bumblebee robot based on virtual reality and intensified learning, including:
(1) target pattern that the bumblebee belly according to setting swings, Q tables are set up using nitrification enhancement, and the Q tables have recorded The accumulation discount award value that each action is obtained under pendulum abdomen pattern per class;
(2) the pendulum abdomen pattern at bumblebee current time is monitored, Q tables are compareed, according to award value maximum principle, action control is made in real time Bumblebee puts abdomen;
The action is the visual stimulus of the different parameters formed by the light beam flowed around bumblebee.
2. abdomen control method is put by bumblebee robot as claimed in claim 1, it is characterised in that the pendulum abdomen pattern is by single cunning The maximum of bumblebee belly deflection angle in dynamic window, minimum value, average absolute value, Wave crest and wave trough number, more than average point number And 6 dimensional vectors that variance is constituted are characterized.
3. bumblebee robot as claimed in claim 1 pendulum abdomen control method, it is characterised in that the width of the sliding window is 200~400ms.
4. abdomen control method is put by bumblebee robot as claimed in claim 1, it is characterised in that the action is by several LED The virtual reality system of display screen composition is produced, and the LED display is controlled by the LED control system based on CAN.
5. abdomen control method is put by bumblebee robot as claimed in claim 1, it is characterised in that the Q tables are moved comprising 5~20 Make.
6. bumblebee robot as claimed in claim 1 pendulum abdomen control method, it is characterised in that the method for building up of the Q tables is:
(1) Q tables, learning rate α, discount factor γ, pendulum abdomen pattern clustering center are initialized;
(2) the pendulum abdomen pattern x of moment t bumblebee is obtainedt
(3) cluster discretization is carried out to state using sequence K-means, and adjusts pendulum abdomen pattern class center;
(4) according to epsilon-greedy selections action at
(5) execution action at, obtain award value rt+1With the pendulum abdomen pattern x of subsequent time bumblebeet+1
(6) repeat step (3) and (4), selection action at+1
(7) Q tables are updated according to equation below;
Q(xt,at)←Q(xt,at)+α[rt+1+γQ(xt+1,at+1)-Q(xt,at)];
(8) execution is started the cycle over from step (2) again, until the convergence of Q tables.
CN201710018201.8A 2017-01-11 2017-01-11 It is a kind of that abdomen control method is put based on the bumblebee robot of virtual reality and intensified learning Active CN106681149B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710018201.8A CN106681149B (en) 2017-01-11 2017-01-11 It is a kind of that abdomen control method is put based on the bumblebee robot of virtual reality and intensified learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710018201.8A CN106681149B (en) 2017-01-11 2017-01-11 It is a kind of that abdomen control method is put based on the bumblebee robot of virtual reality and intensified learning

Publications (2)

Publication Number Publication Date
CN106681149A true CN106681149A (en) 2017-05-17
CN106681149B CN106681149B (en) 2019-11-19

Family

ID=58849400

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710018201.8A Active CN106681149B (en) 2017-01-11 2017-01-11 It is a kind of that abdomen control method is put based on the bumblebee robot of virtual reality and intensified learning

Country Status (1)

Country Link
CN (1) CN106681149B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112427843A (en) * 2020-11-09 2021-03-02 中国船舶重工集团公司第七一六研究所 Ship multi-mechanical-arm welding spot cooperative welding method based on QMIX reinforcement learning algorithm
CN112965595A (en) * 2021-02-26 2021-06-15 清华大学深圳国际研究生院 Modeling and predicting method for simulating touch by nerve electrical stimulation
CN114888829A (en) * 2022-04-29 2022-08-12 浙江大学 Bumblebee robot flight deflection behavior control method and system based on lightweight brain-computer interface
WO2022241556A1 (en) * 2021-05-20 2022-11-24 Ai Redefined Inc. Systems and methods for real-time reinforcement learning

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000271350A (en) * 1999-03-26 2000-10-03 Bandai Co Ltd Insect robot
CN104216300A (en) * 2014-08-25 2014-12-17 浙江大学 LED control system based on CAN bus and bumblebee flight controller
CN105438305A (en) * 2015-11-13 2016-03-30 安徽工业大学 Six-limb insect motion mode determination method, bionic six-limb insect robot and use method thereof
CN105620568A (en) * 2016-03-17 2016-06-01 华东交通大学 Flexible worm-shaped robot and control method thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000271350A (en) * 1999-03-26 2000-10-03 Bandai Co Ltd Insect robot
CN104216300A (en) * 2014-08-25 2014-12-17 浙江大学 LED control system based on CAN bus and bumblebee flight controller
CN105438305A (en) * 2015-11-13 2016-03-30 安徽工业大学 Six-limb insect motion mode determination method, bionic six-limb insect robot and use method thereof
CN105620568A (en) * 2016-03-17 2016-06-01 华东交通大学 Flexible worm-shaped robot and control method thereof

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
巩凡: "模拟飞行器设计及其在熊蜂偏航行为研究中的应用", 《中国优秀硕士学位论文全文数据库(电子期刊)》 *
马奇 等: "模拟昆虫视觉-行为抉择的强化学习模型", 《生物物理学报》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112427843A (en) * 2020-11-09 2021-03-02 中国船舶重工集团公司第七一六研究所 Ship multi-mechanical-arm welding spot cooperative welding method based on QMIX reinforcement learning algorithm
WO2022095278A1 (en) * 2020-11-09 2022-05-12 中国船舶重工集团公司第七一六研究所 Qmix reinforcement learning algorithm-based ship welding spots collaborative welding method using multiple manipulators
CN112965595A (en) * 2021-02-26 2021-06-15 清华大学深圳国际研究生院 Modeling and predicting method for simulating touch by nerve electrical stimulation
CN112965595B (en) * 2021-02-26 2023-09-05 清华大学深圳国际研究生院 Modeling and predicting method for nerve electrical stimulation simulation touch sense
WO2022241556A1 (en) * 2021-05-20 2022-11-24 Ai Redefined Inc. Systems and methods for real-time reinforcement learning
CN114888829A (en) * 2022-04-29 2022-08-12 浙江大学 Bumblebee robot flight deflection behavior control method and system based on lightweight brain-computer interface
CN114888829B (en) * 2022-04-29 2023-09-22 浙江大学 Bumblebee robot flight deflection behavior control method and system based on lightweight brain-computer interface

Also Published As

Publication number Publication date
CN106681149B (en) 2019-11-19

Similar Documents

Publication Publication Date Title
CN106681149A (en) Bumblebee robot abdomen swinging control method based on virtual reality and reinforcement learning
Reichardt et al. Visual control of orientation behaviour in the fly: Part I. A quantitative analysis
CN108712921B (en) System for constructing neural stimulation patterns using neural network models
Sato et al. A cyborg beetle: insect flight control through an implantable, tetherless microsystem
US7623918B2 (en) Human-implantable-neurostimulator user interface having multiple levels of abstraction
US20170189689A1 (en) Method and apparatus for optimizing spatio-temporal patterns of neurostimulation for varying conditions
US20170189686A1 (en) Method and apparatus for guided optimization of spatio-temporal patterns of neurostimulation
CN106714897A (en) System for calibrating dorsal horn stimulation
CN108127662B (en) Method and research system for controlling crawling and deflecting behaviors of bumblebee robot
WO2017117444A1 (en) System for reducing spatial sensitivity in neurostimulation
CN109864749A (en) Pay attention to training method, assessment method and its device of control ability
Zheng et al. Abdominal-waving control of tethered bumblebees based on sarsa with transformed reward
CN108670251A (en) Shoulder rehabilitation system and method
CN107224273A (en) A kind of maincenter peripheral nerve closed loop recovery training method and system based on optics Brian Imaging neural feedback
CN105056500B (en) A kind of situation simulation training/games system
CN109015635A (en) A kind of service robot control method based on brain-machine interaction
CN101703007A (en) Positioning device for active/passive exercise training device for simulated weightless tail-suspended rat
US20180214700A1 (en) Stimulation field templates to be applied across patient populations
US10948341B2 (en) Optical information collection system
Nakada et al. Biomimetic eye modeling & deep neuromuscular oculomotor control.
de Azevedo Neto et al. Control of interceptive actions is based on expectancy of time to target arrival
CN113426007A (en) Closed-loop epidural electrical stimulation system for upper limb functional recovery
CN114888829B (en) Bumblebee robot flight deflection behavior control method and system based on lightweight brain-computer interface
CN110743101A (en) Animal robot movement behavior recording and analyzing method
Huston Neural basis of a visuo-motor transformation in the fly

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant