CN106681149A

CN106681149A - Bumblebee robot abdomen swinging control method based on virtual reality and reinforcement learning

Info

Publication number: CN106681149A
Application number: CN201710018201.8A
Authority: CN
Inventors: 郑能干; 金梦洁; 马倩; 王雪菲; 江毅
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2017-01-11
Filing date: 2017-01-11
Publication date: 2017-05-17
Anticipated expiration: 2037-01-11
Also published as: CN106681149B

Abstract

The invention discloses a bumblebee robot abdomen swinging control method based on virtual reality and reinforcement learning, and belongs to the technical field of bug robots. The control method comprises the following steps: (1) establishing a Q meter according to a set bumblebee abdomen swinging target mode, wherein the Q meter records accumulated discount reward points obtained by every action in every abdomen swinging mode; and (2) monitoring the abdomen swinging mode of a bumblebee at the current time, and producing movements to control abdomen swinging of the bumblebee on the basis of the Q meter and according to the reward point maximum principle. The interactive mechanism between the abdomen swinging behavior of the bumblebee and a visual stimulator is established by an algorithm, and the circumstance that a large number of bioexperiments are carried out to obtain a relation between stimulation and behavioral response is avoided. The motion behavior of the bumblebee is induced by visual stimulation, an enhanced learning algorithm is used as a control algorithm, a loop-locked control system is formed effectively, and abdomen swinging action of the bumblebee can be controlled accurately.

Description

A kind of bumblebee robot pendulum abdomen control method based on virtual reality and intensified learning

Technical field

The present invention relates to insect robot technical field, and in particular to a kind of bumblebee based on virtual reality and intensified learning Abdomen control method is put by robot.

Background technology

Insect robot is with insect as carrier, with reference to a kind of new animal robot system that control module is formed.This Class special machine people is by carrier locomitivity in itself, it is not necessary to complicated machinery and dynamical system is considered, in scientific research, state The fields such as anti-security, industrial production have wide practical use.

How to control insect flying is the primary problem for solving in insect robot research.Insect robot generally comprises row It is stimulating module, insect lipids collection and analysis module, three parts of control instruction decision-making.

At present, the behavior stimulation mode of insect robot is based on electro photoluminescence, and adjusts insect not by hand by people With parameters of electrical stimulation corresponding under motion state.Electrical stimulation method is to be implanted to silicon or fibril electrode by animal surgery The muscle or nervous system of insect, the stimulation site of insect is input to using microcontroller generation square-wave pulse, reaches control The purpose of the behaviors such as insect deflection.Although the insect robot of electro photoluminescence formula achieves significant progress, still deposit In some problem：

First, the muscle or nervous system of insect can be caused in implantable stimulation electrode process damage, cause follow-up Control performance decline；Second, prolonged electro photoluminescence can cause insect fatigue, and the response sensitivity to electro photoluminescence declines；The Three, because insect build is smaller, the muscle of most of insect and the volume of brain are about 1 cubic millimeter, then an electro photoluminescence Multiple sensation-moving loops, even whole muscle or brain are may relate to, so as to trigger uncertain behavior reaction；The Four, to carry out manual adjustment stimulation parameter, it is necessary to carrying out substantial amounts of Bioexperiment in advance is stimulated the-pass of behavior response System, so needs substantial amounts of Bioexperiment, will consume the substantial amounts of time, and the individual behavior to identical stimulation parameter of different insects Response still suffers from larger deviation, therefore the mode of manual adjustment stimulation parameter has inefficiency, the problems such as control not accurate；

Developing the control mode of non-built-in mode can avoid the above-mentioned problem for causing to damage to insect body.Numerous studies table Bright, light stream plays key effect in many important behavior of insect, such as, flying speed control, height control, depth are surveyed Amount, flying distance measurement etc., for the insect robot for realizing optofluidic standard provides theoretical foundation.

The mode for being used to provide visual stimulus in the research of insect visual and electro physiology has：Print streaky paper, throwing Shadow instrument, display and LED dot matrix etc..Visual stimulator the most frequently used at present is made up of LED dot matrix, such as Application No. 201410422437.4 Chinese patent application document provides a kind of bumblebee flight controller, not only to meet the compound eye of bumblebee Characteristic, and customizable light stream stimulation can be generated.But how to be stimulated using light stream, and in nitrification enhancement framework Under, is control effectively in bumblebee flight and do not have been reported that but.

The content of the invention

Abdomen control method is put the invention provides a kind of bumblebee robot based on virtual reality and intensified learning, using strong Change learning algorithm control bumblebee belly swing reach a dbjective state from an original state, overcome in the prior art due to Artificial regulation stimulation parameter control is not accurate, and needs to carry out the problem of a large amount of Bioexperiment.

A kind of bumblebee robot pendulum abdomen control method based on virtual reality and intensified learning, including：

(1) target pattern that the bumblebee belly according to setting swings, Q tables, the Q tokens are set up using nitrification enhancement The accumulation discount award value that each action is obtained under pendulum abdomen pattern per class is recorded；

(2) the pendulum abdomen pattern at bumblebee current time is monitored, Q tables are compareed, according to award value maximum principle, action is made in real time Control bumblebee pendulum abdomen；

The action is the visual stimulus of the different parameters formed by the light beam flowed around bumblebee.

With nitrification enhancement be modeled the insect robot control problem by the present invention.Virtual reality system is provided Optic flow information is directly inputted to the compound eye of bumblebee, induces the pendulum abdomen behavior of bumblebee；The belly of high-definition camera real-time capture bumblebee Motor behavior, and the belly weave mode of bumblebee is obtained by computer picture algorithm, as ambient condition；Algorithm can change light The parameter of stream information, LED screen can change its kinematic parameter, then input to bumblebee, so as to form a closed-loop control system.

The award value maximum principle is Greedy strategy.

The action is produced by the virtual reality system that several LED displays are constituted, and the LED display is by being based on The LED control system control of CAN.

Specifically, a kind of LED display based on CAN, the Optic flow information for providing bumblebee is input into；The LED Display screen is 32 × 8 structures, and 32 pins control row of single-chip microcomputer show that remaining 8 pins control transverse direction LED shows to LED. Some constituted by the host node in PC control ends, CAN and from node comprising LED drive module.The PC control ends bag Input block and first serial communication service unit are included, the packet that will contain image information and control command is sent to CAN The host node of bus, the host node in the CAN includes second serial communication service unit, receives the number that client sends According to；CAN communication service unit, after the data encoding that will be received, uploads to CAN；It is described to include the from node Two CAN communication service units, receive data, and be transferred to corresponding LED drive module from CAN.The LED drives Single-chip microcomputer model MSP430F149 used by module, LED drive module drives LED, so as to reach display image and control The function of image motion.

The present invention reflects the flight behavior of bumblebee using record bumblebee belly weave mode, and numerous studies find, insect Belly played in insect flying control key effect because the position of belly determine wing flutter its wings up and down generation power side To.

Above-mentioned bumblebee puts the collection and analysis of abdomen behavior, is made up of the camera of high definition, a particle filter algorithm, can To obtain the deflection angle time series of bumblebee belly.The high-definition camera model ANC HD1080P, acquisition frame rate is 30 Frame/second；Camera is connected by USB line and computer, by image transmitting to computer, for follow-up behavioural analysis.The grain Son filtering is a kind of widely used, for the classic algorithm of object tracking, before the experiments, choose a certain fixing point as with Track point, one piece of bumblebee belly rectangular area is used as target, the color histogram in the region is tracked as feature in choosing image In next frame target position is calculated, the belly deflection angle of bumblebee is thus calculated.After obtaining angle, sliding window is used Method calculate bumblebee pendulum abdomen pattern.

Preferably, the pendulum abdomen pattern by the maximum of bumblebee belly deflection angle in single sliding window, minimum value, Average absolute value, Wave crest and wave trough number, 6 dimensional vectors constituted more than average point number and variance are characterized.

Preferably, the width of the sliding window is 200~400ms.More preferably, the width of sliding window is 200ms。

The present invention establishes the interaction mechanism between bumblebee and visual stimulator, intensified learning using nitrification enhancement Process be one constantly with the process of environmental interaction, by interaction, can obtain a kind of from environment to the optimal of action mapping Strategy.Intensified learning includes environment, action and three major parts of reward function.

The environment division is the belly pendulum abdomen pattern of bumblebee, is represented by 6 above-mentioned dimensional vectors；

The action is that the parameter to outside visual stimulus that nitrification enhancement can be taken regulates and controls, preferably, the Q Table includes 5~20 actions.The visual stimulus parameter is the angular speed of image rotation, specifically, there is 10 optional angle speed Degree, respectively：[2.05,2.89,3.97,4.88,6.20,7.21,8.72,11.02,14.5,22.16,]rad/s；Additionally, regarding Feel that stimulation parameter can also be including the brightness of image, the direction of motion, width etc..

The reward function is mainly put the Euclidean distance between abdomen pattern and is determined by the pendulum abdomen pattern and target of current bumblebee, Note dbjective state is vector x_g, correspondence discrete state is k_g, the bumblebee pendulum abdomen pattern of current time t is x_t, corresponding to discrete state is k_t, the pendulum abdomen pattern of last moment bumblebee is designated as x_t-1, dist₁It is x_tWith x_gBetween distance, dist₂It is x_t-1With x_gBetween distance, So reward function can be designated as：

R=0.2, if (dist₁＞ dist₂and k_t≠k_g)；

R=+1, if (dist₁＞ dist₂and k_t=k_g)；

R=-1, otherwise.

Reward function is that the quality for producing certain to act to intensified learning makees a kind of evaluation, and the purpose of intensified learning is so that The reward of accumulation is maximized.

Preferably, the method for building up of the Q tables is：

(1) Q tables, learning rate α, discount factor γ, pendulum abdomen pattern clustering center are initialized；

(2) the pendulum abdomen pattern x of moment t bumblebee is obtained_t；

(3) sequence K-means discretizations, adjustment pendulum abdomen pattern class center are utilized；

(4) according to epsilon-greedy selections action a_t；

(5) execution action a_t, obtain award value r_t+1With the pendulum abdomen pattern x of subsequent time bumblebee_t+1；

(6) repeat step (3) and (4), selection action a_t+1；

(7) Q tables are updated according to equation below；

Q(x_t,a_t)←Q(x_t,a_t)+α[r_t+1+γQ(x_t+1,a_t+1)-Q(x_t,a_t)]；

(8) execution is started the cycle over from step (2) again, until the convergence of Q tables.

By interacting for nitrification enhancement and environment, obtain one from can with the optimal policy of environment to action mapping, The parameter of visual stimulus next time can be determined according to the pendulum abdomen pattern of current bumblebee, a control system for closed loop is formed System.In actual control process, pendulum abdomen pattern will fast, accurately level off to target-like under nitrification enhancement control State.

The beneficial effect that the present invention possesses：

(1) present invention is stimulated for the pathways for vision of bumblebee using light stream, is eliminated because being caused using implanted mode The injury to bumblebee in itself.

(2) interaction mechanism between bumblebee pendulum abdomen behavior and visual stimulator is established using nitrification enhancement, no Need to carry out substantial amounts of Bioexperiment to obtain the relation between stimulation-behavior response.

(3) visual stimulus of the present invention induces the motor behavior of bumblebee, and using nitrification enhancement as control algolithm, effectively Form a control system for closed loop, can be accurately controlled bumblebee belly swing behavior.

Brief description of the drawings

Fig. 1 is the bumblebee robot closed-loop control system control flow chart based on virtual reality and intensified learning of invention.

Fig. 2 is the bumblebee robot close-loop control part structural representation based on virtual reality and intensified learning of invention Figure.

Fig. 3 is the LED display module diagram based on CAN.

Behavior collections and analysis process and result schematic diagram of the Fig. 4 for bumblebee.

Fig. 5 is nitrification enhancement interaction schematic diagram.

Fig. 6 be based on sliding window come extract bumblebee belly swing feature process.

Fig. 7 is the learning process of nitrification enhancement.

Fig. 8 be a bumblebee respectively nitrification enhancement and randomized policy control under dimension SSC controlling curve.

Fig. 9 is the bumblebee dimension SSC and target dimension SSC under nitrification enhancement and randomized policy control respectively Aberration curve.

The Q table distribution maps that Figure 10 is arrived for Algorithm Learning.

Specific embodiment

With reference to embodiment, the invention will be further described.

Embodiment 1

As shown in figure 1, wherein Fig. 1 (A) is the bumblebee robot based on virtual reality and nitrification enhancement of the invention The control loop of closed-loop control system, control loop is that the compound eye of bumblebee first have received the light stream stimulation of outside, then produce Having given birth to belly swing phase should；In order to the belly for catching bumblebee swings behavior, the present invention tracks bumblebee using particle filter algorithm Belly tip, and its pendulum angle is calculated, feature extraction is then carried out according to sliding window；The bumblebee belly pendulum for extracting Dynamic feature, used as the ambient condition of nitrification enhancement, algorithm can choose the stimulation parameter at next moment, thus form The learning process of closed loop.Fig. 1 (B) is then the bumblebee robot closed loop control based on virtual reality and nitrification enhancement of the invention The physical composition of system processed, corresponding diagram 1 (A), system includes a LED display based on CAN, for providing light stream thorn Swash；One high-definition camera for being used to gather bumblebee pendulum abdomen behavior and one are used to that image to be analyzed and processed, and transport Go the computer of nitrification enhancement.

As shown in Fig. 2 a kind of bumblebee robot control method experiment porch based on light stream stimulation and intensified learning, including Vibration isolators 1,2, is placed on the LED display 3 on vibration isolators, and LED screen is used for producing specific visual stimulus, micro- for fixing Support bracket fastened pillar 4 is grasped, the micro- behaviour's fixed support 5 for adjusting bumblebee fixed position swings row for shooting bumblebee belly For high-definition camera 6, gimbals 7, for the extension of follow-up system, additionally, be further comprises in design high speed camera 8 with And for fixing the support 9 of high speed camera.Whole system is placed on vibration isolators in addition to PC, prevents desktop from shaking band To test influence.

As shown in figure 3, a kind of LED display based on CAN, the Optic flow information input for providing bumblebee；It is described LED display is 32 × 8 structures, and 32 pins control row of single-chip microcomputer show that remaining 8 pins control transverse direction LED shows to LED Show.Some constituted by the host node in PC control ends, CAN and from node comprising LED drive module.The PC control ends Including input block and first serial communication service unit, the packet that will contain image information and control command is sent to The host node of CAN, the host node in the CAN includes second serial communication service unit, receives client and sends Data；CAN communication service unit, after the data encoding that will be received, uploads to CAN；It is described from node bag The second CAN communication service unit is included, data is received from CAN, and be transferred to corresponding LED drive module.The LED Single-chip microcomputer model MSP430F149 used by drive module, LED drive module drive LED, so as to reach display image and Control the function of image motion.

Single bright fringes is used in the present invention as bumblebee robotic vision information input, thus induce bumblebee to transport Dynamic behavior.

As shown in figure 4, being the collection of bumblebee flight behavior and data analysis schematic diagram in the present invention, the belly of bumblebee swings Behavior is captured by the high-definition camera at the back side, and the model of camera is ANC HD1080P, the pendulum abdomen frequency highest order of bumblebee 7Hz or so, therefore it is 30 frames/second to set the frame per second of camera.(x in figure₀,y₀) it is point that bumblebee is fixed, bumblebee is fixed on On a piece thin iron staff of bending, the angle of bend of iron staff is 45 degree, and a diameter of 1mm, fixing point is bumblebee back.Selected by operating personnel Take the abdominal region region for needing tracking, the center for remembering region is (x, y), according to formula can obtain bumblebee pendulum abdomen angle beta= arctan[(x-x₀)/(y-y₀)]。

As shown in figure 5, being the interaction of the nitrification enhancement in the present invention, generally, intensified learning maintains one From environment to the action value function Q (x of action mapping_t,a_t), represent in state x_tUnder, take action a_tThe accumulation prize being obtained in that Encourage, each time, after this value function is obtained, generally select corresponding states x_tUnder, it is obtained in that the action a of maximum_t.This The renewal of value function is constantly obtained by intensified learning with environmental interaction.

During t, the pendulum abdomen pattern of bumblebee is x_t, and state discrete is carried out according to sequence K-means algorithms, then choosing Select an action a_t, that is, current light stream stimulation parameter is have adjusted, meanwhile, the compound eye of bumblebee is received after light stream stimulates, meeting Corresponding belly is produced to swing behavior, the pendulum abdomen behavioral parameters of bumblebee are the deflection angle obtained by particle filter algorithm, and such as Shown in Fig. 6, we are provided with the sliding window that a size is 200ms here, take wherein all of deflection angle data, calculate Include maximum, minimum value, average absolute value, Wave crest and wave trough number, more than average point number and variance this 6 feature conduct The pendulum abdomen pattern of bumblebee, this 6 dimensional feature is the state x shown in figure_t+1, meanwhile, by calculating reward function, can obtain Currently available instant prize r_t+1, the more new formula of nitrification enhancement is：

Q(x_t,a_t)←Q(x_t,a_t)+α[r_t+1+γQ(x_t+1,a_t+1)-Q(x_t,a_t)]。

Wherein for α is the learning rate of algorithm, γ is discount factor.Specific algorithm flow is as shown in Figure 7.

R=0.2, if (dist₁＞ dist₂and k_t≠k_g)；

R=+1, if (dist₁＞ dist₂and k_t=k_g)；

R=-1, otherwise.

Using nitrification enhancement, without knowing the corresponding pass between light stream stimulation parameter and bumblebee pendulum abdomen pattern in advance System, by continuous interactive learning, nitrification enhancement may learn this from ambient condition (bumblebee pendulum abdomen pattern) to moving Make the mapping relations of (light stream stimulation parameter), and as follow-up closed loop Real-Time Control Strategy, it is to avoid artificially many experiments And carry out the process of data analysis.

In order to verify the validity of the controlling mechanism based on nitrification enhancement, we are also carried out with common random algorithm Compare, Fig. 8-10 is the data and its Algorithm Control Experiment result of the bumblebee that we collect, Fig. 8 and 9 is the control of bumblebee Koji-making line chart, embodiment be current state dimension SSC convergences from dimension SSC to dbjective state xg situation (SSC refer to slide The number of Wave crest and wave trough in window, has reacted the frequency information of abdominal exercise, is that description bumblebee belly swings topmost spy Levy).

Wherein the line with triangle is the SSC values of target, is the change under nitrification enhancement control with circular line Curve, band × line be then random algorithm control change curve, it can be seen that：

1st, the control result of nitrification enhancement causes that each steps of SSC all increasingly level off to dbjective state, each step shape The deviation (corresponding is the Euclidean distance of Figure 10) of state and dbjective state is monotone decreasing, and final deviation tends to 0.But it is random The controlling curve of algorithm is random in control process, and deviation is also random, does not control trend clearly.

2nd, under the control of nitrification enhancement, bumblebee has tended to desired value, and final holding within most 8 steps Stabilization, but random algorithm does not cause that the process of control tends to desired value, the state near desired value is without regularity.

3. the curve under nitrification enhancement control is more smooth, and the curve shake of random algorithm control is very big.

From the point of view of result and analysis more than, based on nitrification enhancement, we can be very good to be controlled with light stream stimulation The belly of bumblebee processed swings, and is that follow-up light streaming bumblebee robot lays a good foundation.

Claims

1. abdomen control method is put by a kind of bumblebee robot based on virtual reality and intensified learning, including：

(1) target pattern that the bumblebee belly according to setting swings, Q tables are set up using nitrification enhancement, and the Q tables have recorded The accumulation discount award value that each action is obtained under pendulum abdomen pattern per class；

(2) the pendulum abdomen pattern at bumblebee current time is monitored, Q tables are compareed, according to award value maximum principle, action control is made in real time Bumblebee puts abdomen；

2. abdomen control method is put by bumblebee robot as claimed in claim 1, it is characterised in that the pendulum abdomen pattern is by single cunning The maximum of bumblebee belly deflection angle in dynamic window, minimum value, average absolute value, Wave crest and wave trough number, more than average point number And 6 dimensional vectors that variance is constituted are characterized.

3. bumblebee robot as claimed in claim 1 pendulum abdomen control method, it is characterised in that the width of the sliding window is 200~400ms.

4. abdomen control method is put by bumblebee robot as claimed in claim 1, it is characterised in that the action is by several LED The virtual reality system of display screen composition is produced, and the LED display is controlled by the LED control system based on CAN.

5. abdomen control method is put by bumblebee robot as claimed in claim 1, it is characterised in that the Q tables are moved comprising 5~20 Make.

6. bumblebee robot as claimed in claim 1 pendulum abdomen control method, it is characterised in that the method for building up of the Q tables is：

(2) the pendulum abdomen pattern x of moment t bumblebee is obtained_t；

(3) cluster discretization is carried out to state using sequence K-means, and adjusts pendulum abdomen pattern class center；

(4) according to epsilon-greedy selections action a_t；

(6) repeat step (3) and (4), selection action a_t+1；

(7) Q tables are updated according to equation below；

Q(x_t,a_t)←Q(x_t,a_t)+α[r_t+1+γQ(x_t+1,a_t+1)-Q(x_t,a_t)]；