CN116151359A

CN116151359A - Deep neural network-based layered training method for six-foot robot driver decision model

Info

Publication number: CN116151359A
Application number: CN202211507379.6A
Authority: CN
Inventors: 尤波; 陈潇磊; 李佳钰; 董正
Original assignee: Harbin University of Science and Technology
Current assignee: Harbin University of Science and Technology
Priority date: 2022-11-29
Filing date: 2022-11-29
Publication date: 2023-05-23
Anticipated expiration: 2042-11-29
Also published as: CN116151359B

Abstract

The invention belongs to the technical field of foot robot driving operation. The invention discloses a six-foot robot driver decision model layering training method based on a deep neural network, which solves the problem of quantitative modeling of decision experience of a six-foot robot driver. According to the deep neural network-based hierarchical training method for the hexapod robot driver decision model, a deep neural network structure with clear functional layers and convenience in step-by-step debugging is adopted, the neural network parameters are trained layer by adopting a gradient descent method, and the hexapod robot driver decision neural network model is obtained in a hierarchical, gradual and reverse optimizing mode. The method can effectively improve the convergence speed of the traditional neural network on the training of the high-dimensional nonlinear multi-input/output decision model and the interpretability of the model output result, and the driver decision model obtained by the method can greatly improve the autonomous decision level of the hexapod robot.

Description

Deep neural network-based layered training method for six-foot robot driver decision model

Technical Field

The invention belongs to the technical field of foot robot driving operation, and particularly relates to a deep neural network training method for reducing and quantifying driving decision experience of a six-foot robot driver.

Background

Compared with the traditional wheel type, crawler type and other moving modes, the foot type moving mechanism has a discontinuous action point with the ground, can adapt to the terrain environment with abrupt change of geometric and physical characteristics, represents the development trend of the moving mechanism under a complex environment, and has the advantages of good stability, strong loading capacity, strong terrain adaptability and the like, so that the foot type moving mechanism is the best choice of the foot type moving system in the complex environment. However, when facing tasks such as material transportation, rescue and relief work in complex and changeable environments, the current control process still needs to be completed by the whole course of the driver, and the driver is required to fully mobilize the multi-dimensional movement capability of the hexapod robot to ensure that the hexapod robot meets the requirements of terrain trafficability, the control process is extremely complex and tedious, the burden of the driver is greatly increased, and the driver is easy to fatigue, so that safety accidents occur. Therefore, how to train the hexapod robot to make the hexapod robot possess a certain autonomous behavior decision intelligence becomes an important problem to be solved in the field.

The autonomous behavior decision of the large-scale hexapod robot is a multivariable, strong-coupling and dynamic nonlinear multiresolution mathematical problem, the existing method comprises a decision based on rules and a decision based on reinforcement learning, the former has insufficient flexibility, and the latter has difficulty in improving the system performance at a logic level above a parameter adjustment level, so that the system is difficult to cope with complex and variable working conditions. Therefore, how to quantitatively reduce the driving decision experience of the hexapod driver and embed the driving decision experience into the decision layer of the robot system is a difficult problem to be solved in the aspect of improving the autonomous decision capability of the large-scale hexapod robot.

The deep neural network has certain advantages for establishing a model of a nonlinear dynamic system, however, for the problem of complex multi-input/output nonlinear decision, the neural network established in a traditional structural mode can lead to the drastic increase of hidden layer numbers and dimensions, the convergence time of the neural network is greatly increased, and the difficulty of further optimizing iteration of the neural network is increased because the network structure is crossed and complicated and difficult to read. Therefore, the invention designs a driver decision neural network structure with clear functional layers and convenient step-by-step debugging, and provides a training method for obtaining a driver decision model in a layered step-by-step optimizing mode.

Disclosure of Invention

The invention aims to provide a layered training method for a six-foot robot driver decision model based on a deep neural network, which solves the problem of quantitative modeling of driving decision experience of the six-foot robot driver, and can improve the autonomous decision capability of a large-scale six-foot robot after being embedded into a decision layer of a robot system.

The invention adopts the scheme for solving the problems that: the method for realizing the layered training of the hexapod robot driver decision model based on the deep neural network comprises the following specific implementation processes:

step one, generating a local terrain information matrix:

obtaining a local digital elevation map facing the hexapod robot, dividing local topography by taking an enveloping square of the foot end of the hexapod robot as a unit grid, and dividing the center point coordinate (X _i ,Y _i ) And the average height h of the unit cell _i As a topographic environment information unit (X _i ,Y _i, h _i ) All terrain environment information elements constitute a local terrain information matrix.

Step two, generating a training data set of each layer of network of the decision model:

generating a training data set of each layer of network of the decision model: defining an environment information unit where 6 foot ends and centroid of the robot are located at a ground projection point as a terrain feature matrix of the current position of the robot; according to the deep neural network structure divided by the functional layers, when a driver makes driving decisions on given training terrain, the driver operation instructions of all the functional layers are collected and recorded and are mapped and matched with the terrain feature matrix of the current position of the robot to form training samples, the training samples of the whole driving process through the training terrain form a training data set, and each training sample comprises terrain coordinate information and data of two dimensions of the driver decision instructions under the terrain coordinates.

Step three: obtaining a six-foot robot driver decision model in a layered training and gradual optimizing mode:

obtaining a six-foot robot driver decision model in a layered training and gradual direction optimizing mode: and (3) establishing the loss function in a mode of solving cross entropy, reversely optimizing the neural network parameters (weight values and bias values) layer by using the training data set obtained in the step (2) by adopting a gradient descent method, so that the loss functions of three judgment layers and three instruction layers are minimized, and obtaining the decision model of the hexapod robot driver in a mode of layered and gradual optimization.

Further, the deep neural network structure according to claim 1, wherein the hidden layer is divided into two independent functional layers, namely a judging layer and an instruction layer according to the driving decision feature of the hexapod robot, wherein the judging layer further comprises 3 sub-functional layers, namely a direction judging layer, a distance judging layer and a speed judging layer, each sub-functional layer of the judging layer is connected in parallel, the instruction layer further comprises 3 sub-functional layers, namely a gait instruction layer, a stride/high instruction layer and a body pose instruction layer, each sub-functional layer of the instruction layer is connected in parallel, the judging layer is connected in series with the instruction layer, environmental information flows into the judging layer from an input layer in a form of a local topographic information matrix, then continues to flow into the instruction layer, and finally, a decision instruction is output from an output layer.

The invention has the beneficial effects that:

according to the six-foot robot driver decision model layered training method based on the deep neural network, on one hand, effective reduction and quantification of driver decision experience can be achieved; on the other hand, the neural network structure designed by the invention accords with the decision logic of a driver, and the output result has interpretability and traceability, so that the safety of the driver and the robot can be effectively ensured; finally, the invention adopts a layered training and reverse optimizing training method, can effectively improve the convergence speed of the neural network model and the accuracy of the model, and effectively improves the autonomous decision-making capability of the hexapod robot.

Drawings

Fig. 1 is a diagram of a neural network.

Fig. 2 is a flow chart of neural network layered training.

Detailed Description

Embodiments of the present invention are described in further detail below with reference to the accompanying drawings and examples.

One embodiment of the invention: a six-foot robot driver decision model layered training method based on a deep neural network comprises the following steps:

step 1: obtaining a terrain feature matrix of the current position of the robot:

step 1.1: the digital elevation map of the surrounding local environment of the robot needs to be acquired, and the acquisition range is defined as follows: if the robot body diameter is set as D, a digital elevation map of the terrain environment within the range of the 3D x 3D rectangular area with the body centroid as the center is required to be obtained. Furthermore, the digital elevation map requirements are: under the constraint of the resolution, three-dimensional coordinates of a designated ground point position in a world coordinate system can be provided, and the resolution is not lower than 1/2 foot end radius.

Step 1.2: dividing the digital elevation map of the robot peripheral local environment obtained in the step 1.1 by taking the enveloping square of the foot end of the foot robot as a unit grid, and dividing the center point coordinate (X _i ,Y _i ) And the average height h of the unit cell _i As a topographic environment information unit (X _i ,Y _i, h _i ) All terrain environment information elements constitute a local terrain information matrix.

Step 1.3: the topographic feature matrix defining the current position of the robot is:

the first row of the matrix is a terrain environment information unit of a unit grid where the centroid of the machine body is located at the ground projection point, and the second to sixth rows of the terrain environment information units of the unit grid where the central points of the six foot ends are located.

Step 2: and acquiring a decision instruction matrix issued by a driver when the robot is at the current position. The topographic feature matrix defining the current position of the robot is:

the first row elements of the matrix are distance, direction and speed judging instructions of a driver respectively; the second row elements of the matrix are respectively selected by the driver for two gait, three gait and six gaitSelecting an instruction; the third row element of the matrix is respectively an instruction of the driver for increasing the single step length, decreasing the single step length and keeping the single step length unchanged; the fourth row element of the matrix is a command of the driver for increasing the single step width, decreasing the single step width and keeping the single step width unchanged; the fifth element of the matrix is a command that the driver increases the rolling angle of the machine body, decreases the rolling angle of the machine body and keeps the rolling angle of the machine body unchanged; the sixth row of elements of the matrix are instructions of increasing the engine body pitch angle, reducing the engine body pitch angle and keeping the engine body pitch angle unchanged for a driver respectively; the sixth row of elements of the matrix are instructions for the driver to increase the yaw angle of the body, decrease the yaw angle of the body and keep the yaw angle of the body unchanged, respectively.

Step 3: constructing a training sample set of the driver decision neural network, specifically,

in order to train a specific form of a sample, a driver makes driving decisions under the training terrain, a system background acquires an instruction matrix and a terrain feature matrix when an instruction is sent out according to the training sample form, and a training sample set of the driver in the whole process under the terrain is generated by taking step frequency as acquisition frequency.

Step 4: and training the judgment layers and the instruction layers in the neural network shown in fig. 1 by using the training sample set one by one. Specifically, for example, when the layer receives the local terrain feature matrix in the training sample, the output value of the local terrain feature matrix is d, and the output value of the training sample is d', the loss function of the local terrain feature matrix can be expressed as:

in order to minimize the loss functions of three judgment layers and three instruction layers, the gradient descent method is utilized to reversely optimize the neural network parameters (weight value and bias value) layer by layer, and the weight vector of the current neural network parameters is set>

And the weight vector of the next time +.>

The neural network parameters are optimized by using a gradient descent parameter optimizing method, and the optimizing method can be expressed as follows: />

The layer training ends when the difference between d and d 'is less than 1% times d'. The other layers are similar.

Step 5: each sub-functional layer carries out training according to a logic sequence, as shown in fig. 2, after each sub-functional layer of the judging layer receives the data of the local topography feature matrix, the output result is compared with the judgment result of the driver in the training sample, the neural network parameters of the judging layer are gradually trained through the optimizing method, the output result is used for training the instruction layer after the training is finished, after each sub-functional layer of the instruction layer receives the data of the judging layer, the output result is compared with the decision instruction result of the driver in the sample, and the parameter optimization of the instruction layer is finished through the training method. In the above process, if the output result of the instruction layer does not meet the condition that the difference value between the instruction layer and the sample is smaller than 1% of the sample value, the method needs to be pushed back layer by layer, plays the advantage of interpretability of the neural network, and according to the decision logic of a driver, whether the data of the judgment layer is problematic is preferentially analyzed, and a specific sub-functional layer with the problem is further positioned, so that the ideal instruction output result is finally obtained through layer-by-layer analysis and accurate optimization.

Claims

1. A six-foot robot driver decision model layered training method based on a deep neural network is characterized in that: the six-foot robot driver decision model layered training method based on the deep neural network comprises the following steps:

step 1: generating a local terrain information matrix: converting the local topography faced by the hexapod robot into a digital elevation map, dividing the map by taking an enveloping square of the foot end of the hexapod robot as a unit grid, taking the center point coordinate of each unit grid and the average height of the unit grid as an environment information unit, and forming a local topography information matrix by all the environment information units;

step 2: generating a training data set of each layer of network of the decision model: defining an environment information unit where 6 foot ends and centroid of the robot are located at a ground projection point as a terrain feature matrix of the current position of the robot; according to a deep neural network structure divided by functional layers, when a driver makes driving decisions on given training terrain, acquiring and recording driver operation instructions of each functional layer, mapping and matching with a terrain feature matrix of the current position of a robot to form training samples, and forming a training data set by training the training samples of the whole course of driving through the training terrain, wherein each training sample comprises terrain coordinate information and data of two dimensions of the driver decision instructions under the terrain coordinates;

step 3: obtaining a six-foot robot driver decision model in a layered training and gradual direction optimizing mode: and (3) establishing the loss function in a mode of solving cross entropy, reversely optimizing the neural network parameters (weight values and bias values) layer by using the training data set obtained in the step (2) by adopting a gradient descent method, so that the loss functions of three judgment layers and three instruction layers are minimized, and obtaining the decision model of the hexapod robot driver in a mode of layered and gradual optimization.

2. The deep neural network structure of claim 1, wherein the hidden layer is divided into two independent functional layers, namely a judging layer and an instruction layer according to the driving decision feature of the hexapod robot, wherein the judging layer further comprises 3 sub functional layers, namely a direction judging layer, a distance judging layer and a speed judging layer, all the sub functional layers of the judging layer are connected in parallel, the instruction layer further comprises 3 sub functional layers, namely a gait instruction layer, a stride/high instruction layer and a body pose instruction layer, all the sub functional layers of the instruction layer are connected in parallel, the judging layer and the instruction layer are connected in series, environmental information flows into the judging layer from an input layer in the form of a local terrain information matrix, then continues to flow into the instruction layer, and finally a decision instruction is output from an output layer.