CN113134187B

CN113134187B - Multi-fire-fighting inspection cooperative robot system based on integral reinforcement learning

Info

Publication number: CN113134187B
Application number: CN202110419574.2A
Authority: CN
Inventors: 陈刚; 刘智
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2021-04-19
Filing date: 2021-04-19
Publication date: 2022-04-29
Anticipated expiration: 2041-04-19
Also published as: CN113134187A

Abstract

The invention relates to a multi-fire-fighting inspection cooperative robot system based on integral reinforcement learning, and belongs to the field of robots. The system comprises a hardware layer, an interaction layer, a sensing layer and a control layer; and the hardware layer adopts a DSP as a controller, and sends data acquired by the odometer and the gyroscope into the DSP for processing, and calculates the position of the robot in the routing inspection map in real time. Sending a speed instruction to the DSP through the upper computer, and coding the obtained speed information by the DSP to control the operation of the servo motor; the fire-fighting inspection robot adopts crawler-type drive; when the mechanical arm needs to move, a ros system in an upper computer passes through a robot on a moveit! The platform plans a motion track of a target point to which the mechanical arm moves, discretizes the planned motion track and sends the discretized motion track to the DSP, and the DSP obtains the angular velocity and the acceleration of each shaft and then controls a servo motor of the mechanical arm to move so as to reach the target point.

Description

Multi-fire-fighting inspection cooperative robot system based on integral reinforcement learning

Technical Field

The invention belongs to the field of robots, and relates to a multi-fire-fighting inspection cooperative robot system based on integral reinforcement learning.

Background

The main structure of the current common fire-fighting inspection robot is as follows: wheel type driving is adopted in the aspect of driving; installing flame detectors and temperature sensors around the robot to facilitate the detection of fire; a camera is arranged in front of the robot so as to transmit the inspection picture to the control room through a wireless module; a fire-fighting nozzle with a fixed but rotatable chassis is also arranged above the robot and is used for externally connecting a water pipe or a small water pump to extinguish a fire point; in the aspect of robot control, along with the development of multi-machine cooperation thought and theory, in order to complete the inspection of a large area, meanwhile, in order to improve the inspection efficiency and reduce the inspection difficulty, a plurality of fire inspection robots are usually matched with each other to complete the operation, a centralized control mode is adopted on the cooperative control of the plurality of fire inspection robots, the inspection task allocation and the work scheduling of all robots are completed through a main control program, the specific inspection implementation mode is that a map constructed by using a laser radar and a planned inspection route are respectively led into the interior of each robot after area division, each robot automatically inspects a key area marked on the map according to the acquired planned route after being started, in addition, when some specific fire extinguishing or inspection operation needs to be finished remotely, the fire fighter can remotely operate the fire fighting or inspection operation through the remote controller.

However, the above system has many disadvantages, that is, the wheel-type driving makes the robot have poor passing performance in dealing with stairs and rugged road, and the flexibility of steering and rotating is not high enough; the accuracy and timeliness of flame detection by using the flame detector and the temperature sensor cannot be well guaranteed, and the range of flame detection is small; secondly, after the flame is detected, only the alarm function can be realized, and the position of a fire point and the image of the fire situation obtained by the camera are transmitted to a fire control room, a few fire inspection robots can also be matched with a fire nozzle carried by the robots to extinguish the fire point under the remote control of fire fighters, but the flexibility and the initiative in the fire situation responding aspect are lacked on the whole; finally, in the aspect of multi-robot cooperative control, each individual robot does not have the ability of selecting actions and coordinating with each other in a centralized control mode, so that the inspection efficiency, robustness and expandability of the whole system are poor, and the optimal time and energy of each robot in the inspection process cannot be guaranteed, so that the integral cruising ability and the disturbance resistance to the outside are reduced, and the autonomous control aspect and the intelligent degree are improved.

Disclosure of Invention

In view of this, the present invention provides a cooperative robot system for multiple fire inspection patrolling based on integral reinforcement learning.

In order to achieve the purpose, the invention provides the following technical scheme:

the multi-fire-fighting inspection cooperative robot system based on integral reinforcement learning comprises a hardware layer, an interaction layer, a sensing layer and a control layer;

and the hardware layer adopts a DSP as a controller, and sends data acquired by the odometer and the gyroscope into the DSP for processing, and calculates the position of the robot in the routing inspection map in real time. Sending a speed instruction to the DSP through the upper computer, and coding the obtained speed information by the DSP to control the operation of the servo motor; the fire-fighting inspection robot adopts crawler-type drive; when the mechanical arm needs to move, a ros system in an upper computer passes through a robot on a moveit! The platform plans a motion track of a target point to which the mechanical arm moves, discretizes the planned motion track and sends the discretized motion track to the DSP, and the DSP obtains the angular velocity and the acceleration of each shaft and then controls a servo motor of the mechanical arm to move so as to reach the target point.

1. Track drive system

The track is in two sections, each section being driven by a separate servo motor. The front-section crawler belt is used for lifting a chassis of the robot to smoothly pass through when encountering a higher obstacle, and the height of the robot is adjusted by adjusting the front-section crawler belt, so that a larger operation radius is provided for the mechanical arm; the second half section of the crawler belt mainly plays a role in driving the robot, is coaxially driven by a servo motor, and can be used for decelerating and braking the crawler belt on one side when the robot turns. The rated voltage of the servo motor is 24V, the output power is 100W, and the speed information of x and y axes issued by the upper PC is converted into the rotating speed of the servo motor after being coded by the DSP so as to realize steering and driving.

2. Mechanical arm servo control

A four-axis mechanical arm is arranged above the robot, a rotatable claw-shaped clamping device is arranged at the front section of the mechanical arm, and a fire extinguishing device is arranged on the clamping device; after the fire extinguishing device is additionally arranged, the fire extinguishing device is matched with the mechanical arm to realize accurate extinguishing of a fire point; the four-axis mechanical arm is driven by four servo motors to move each axis, and the motion information of each axis is calculated by a moveit | in an upper computer ros system! And generating after path planning.

Firstly, the calibration of the mechanical arm under the condition that eyes are out of hand is completed

And the coordinate transformation of the target point under the world coordinate system to the coordinate relative to the coordinate system of the mechanical arm is finished through a calibration mode of 'eyes are out of hands'. For the calibration mode of 'eyes are out of hand', a transformation matrix Tgc from a manipulator base coordinate system Tg to a camera coordinate system Tc is constant, a transformation matrix Tbe from a calibration plate coordinate system Tb to a manipulator tail end coordinate system Te is constant, and the relation of coordinate transformation satisfies the following formula:

for the ith time: tbc_i＝Tbe*Teg_i*Tgc (1-1)

Time i + 1: tbc_i+1＝Tbe*Teg_i+1*Tgc (1-2)

Finishing to obtain: (Teg)_i)^-1*(Teg)_i+1＝Tgc*(Tbc)_i ^-1*(Tbc)_i+1*Tgc^-1 (1-3)

Then A ═ Teg_i)^-1*(Teg)_i+1Is the motion relationship of the object relative to the robot arm tip coordinate system Te.

② utilize moveit! Completing the planning of the motion trail of the mechanical arm

Using Moveit! And combining the independent functional components for controlling the mechanical arm, and then providing the combined functional components for a user to use in an action and service communication mode in the ROS. In moveit! In the method, a URDF model is created according to the real size and the number of axes of the mechanical arm, and after the URDF model is input, the model is used for calculating the weight of the mechanical arm according to the weight of the mechanical arm! The setup authority generates a corresponding configuration file according to the setting of the setup authority, and the configuration file comprises a collision matrix of the mechanical arm so as to avoid collision between the axes caused by the planned track, connection information of each joint, a defined initial position and the like. And then adding a control plug-in controller of the mechanical arm, wherein the control controller comprises a defined follow _ join _ project node and names of all axes, finally writing a program to realize that the PC is connected with the mechanical arm in a socket communication mode, and observing a real-time motion track of the mechanical arm in rviz by subscribing to a join _ state topic. The method comprises the steps of firstly, completing flame identification and detection through a fast convolution neural network, obtaining three-dimensional coordinates of an ignition point relative to a robot through point cloud data of a depth camera after successful identification, then obtaining the position of the tail end of a mechanical arm to be reached through TF coordinate change, and then completing the solution of a track through an algorithm integrated in the interior. The solved track information is composed of a large number of discrete points, and the track information includes angular velocity and angular acceleration of each axis to reach the points. When the solved points are enough, a very smooth motion track is fitted, and after the information of the discrete points is published and subscribed through topics, the mechanical arm moves to the target point smoothly according to the planned points.

The sensing layer is used for a laser radar for establishing a picture, an infrared sensor for avoiding obstacles, a flame detector for detecting flame, a temperature sensor, a realsense D435i depth camera, an odometer and a gyroscope.

Infrared sensor obstacle avoidance

The infrared sensor is used for detecting obstacles encountered by the inspection robot in the inspection process in real time, when the obstacle exists at the front side, the infrared sensor detects Euclidean distances between the robot and the obstacles, and the specific coordinates of the obstacle are calculated by the distances and odometer and gyroscope data obtained from the DSP. And after the coordinates are obtained, immediately designing an obstacle avoidance path by a control algorithm, wherein the obstacle avoidance path is arc-shaped and is required to keep a minimum distance from the obstacle in the whole process, and after the obstacle avoidance is finished, immediately returning to the previously planned optimal routing inspection path.

② flame identification based on fast convolution neural network

The flame characteristics are extracted and detected by adopting a fast convolutional neural network Faster R-CNN, and the method comprises the following steps:

② 1: inputting a shot flame picture;

2.: sending the picture into a Convolutional Neural Network (CNN) for feature extraction;

② -3: after feature extraction, performing feature mapping, wherein the feature mapping jointly acts on a subsequent full-connection layer and a subsequent region to generate a network RPN;

② 3.1: the feature mapping is entered into RPN, firstly, a series of region candidate suggestion frames are passed through, and then the suggestion frames are fed into two 1 × 1 convolution layers respectively, wherein the first convolution layer is used for region classification, namely, the intersection ratio IOU value of the suggestion frames is generated by calculation to distinguish positive and negative samples; and the other is subjected to non-maximization inhibition to generate a more accurate target detection frame due to the boundary box regression judgment.

② 3.2: the feature map is entered into the ROI pooling layer for subsequent network computations.

② 4: and after the pooled feature maps pass through the full-connection layer, classifying the suggestion frames by utilizing softmax again, identifying whether the detection frames are objects or not, and performing boundary frame regression judgment on the suggestion frames again.

The specific method for generating the detection frame by the RPN is to slide on the input feature mapping through a sliding frame to generate 9 suggestion frames on each pixel point, and the size of the suggestion frames is 128²、256²、512²Aspect ratio of 1: 1. 1: 2. 2: 1, distinguishing positive and negative samples by using intersection ratio of the detection frames and IOU intersection ratio, wherein the IOU value of the positive sample is more than 0.7, the IOU value of the negative sample is less than 0.3, and the proportion of the positive sample to the negative sample is set as 1: 1. aiming at different characteristics of flame in an image, a method of guiding anchoring is adopted to accelerate the detection speed of RPN, and the improved sparse anchoring strategy is as follows:

wherein x and y are coordinates of pixel points, F (x and y) represents the generated flame color mask, if the coordinates are 1, the pixel points generate a suggestion box, if the coordinates are 0, the pixel points do not generate, and m is_R(x，y)、m_G(x，y)、m_B(x, y) are the RGB channel values, T, of the image pixels, respectively_RIs a threshold value set in advance.

In addition, the principle of correcting the detection frame by utilizing boundary regression judgment is that an original suggestion frame A is mapped G to obtain a regression suggestion frame F which is closer to the real condition. This mapping G is obtained by translation and scaling:

firstly, translation: f_x＝A_w·d_x(A)+A_x (2-2)

F_y＝A_h·d_y(A)+A_y (2-3)

Rescaling: f_w＝A_w·exp(d_w(A)) (2-4)

F_h＝A_h·exp(d_h(A)) (2-5)

Wherein x, y, w, h respectively denote the suggestion boxesCenter coordinate, width, height, d_x、d_y、d_w、d_hRespectively, a transformation relationship, and when the difference between the original frame a and the real frame F is not large, the transformation is regarded as linear.

The output is the probability of being identified as a flame.

The interaction layer is as follows: in the inspection process, pictures captured by the camera in real time need to be sent to the control room and the mobile terminal through a wireless network, corresponding APP is developed in a matched mode, the inspection robot is controlled correspondingly at the remote terminal, and accordingly the inspection of an area needing to be inspected again by an operator is achieved. After the flame is detected, an alarm signal is sent to the control room immediately, and corresponding fire extinguishing measures can be automatically taken immediately. After the fire extinguishing measures are implemented, if the fire condition can not be restrained, the automatic mode is switched to the remote control mode, a professional in a control room takes over the control of the inspection robot comprehensively, the operation of the crawler belt and the action of the mechanical arm are controlled manually to realize the accurate extinguishing of a fire point, and whether the operations of cutting off a power supply, closing a gas valve and transferring inflammable matters need to be carried out or not is judged according to the fire condition. Each inspection machine can be connected with the whole fire fighting system in a grid mode, if the fire situation is still large after measures are taken, a request for taking over a fire fighting network is sent to a control room, the control room agrees or does not respond within one minute, a local spraying pipe network in a building is opened, meanwhile, a comprehensive fire fighting alarm is sent, and all fire fighting channels and emergency lighting facilities are opened. And an emergency stop button is arranged at the top end of the robot. And after the fire is extinguished, marking the ignition point on the patrol inspection map as a key patrol inspection area.

The control layer is as follows:

the whole fire-fighting inspection area is provided with N robots for cooperative inspection, and the N robots are respectively positioned at initial positions (x)_i0，y_i0) To reach respective destinations (x)_iD，y_iD) And i belongs to {1, 2.. eta., N }, and the position L of the ith fire-fighting inspection robot at the moment t is set_i(t)＝[L_ix(t)，L_iy(t)]^TVelocity V_i(t)＝[V_ix(t)，V_iy(t)]^TController input U_i(t)＝[u_ix(t)，u_iy(t)]^TControl input and unknown environmental disturbance W_i(t)＝[W_ix(t)，W_iy(t)]^TIn order to avoid saturation of the actuator, the input is constrained, and | U (t) | is required to be less than or equal to λ, wherein λ is a normal number. Set the distance r between two inspection robots_ij(t)＝||L_i(t)-L_j(t) | |, a safety distance r is set for avoiding collision of two inspection robots_sAnd r is required to be satisfied at any time in the inspection process_ij(t)≥r_sAnd when N robots reach the inspection destination, ensuring r_ij(t)＞＞r_sIn this case, i ≠ j.

Then consider the second order linear dynamics model of the ith fire inspection robot as:

wherein the system matrix is A, the input matrix is B, the output matrix is C, the interference matrix is D,

is the state of the robot at the time t,

to input, y_i(t) is the system unique output.

The global dynamics model is written as:

wherein

Is the product of Kronecker, X (t) ═ x₁(t)，x₂(t)，...，x_n(t)]^T，Y(t)＝[y₁(t)，y₂(t)，...，y_n(t)]^T，I_NIs an N-order identity matrix, and is set to L (t) [ < L >_1t，L_2t，...，L_Nt]^T，L_D＝[L_1D，L_2D，...，L_ND]^T，U₀＝[U₁，U₂，...，U_N]^TThe positions at time t, the target point positions and the control inputs of the N robots are respectively.

In order to achieve optimal control of the N fire inspection robots under unknown disturbances with minimum time and energy in continuous time, continuous state and control input space and to avoid collisions in the whole process, the following cost functions are considered:

wherein ζ>And 0 is used for representing the specific gravity of time in the routing inspection process, and R is a positive definite matrix. In order to solve the path planning problem that the minimum arrival time T of the robot is unknown, a hyperbolic tangent function is introduced to rewrite a cost function into an infinite integral form so as to solve the problem, in addition, in order to avoid the saturation of an actuator, the input is also required to be constrained, so that the common U (T)^TRu (t) linear quadratic rewrite to non-quadratic performance function phi (u (t)) is used to approximate the minimum energy cost and capture input constraints, and to avoid collisions between two robots introducing artificial potential field functions, the cost function is approximately rewritten as:

V(X(t)，U(t))＝∫_t ^∞ζtanh(L(t)-L_D)^T(L(t)-L_D)+φ(U(t))+Λ_R(t)^Tf_R(r_ij(t))dt (4-4)

where ζ is a normal number, tanh is a hyperbolic tangent function, which is a monotonically increasing odd function and continuously differentiable, and the cost function is an IRL-solvable form. Rewriting ζ to ζ tanh (L (t) -L)_D)^T(L(t)-L_D) When the current position L (t) of the robot is away from the target point L_DTime ζ tanh (L (t) -L)_D)^T(L(t)-L_D) Approximately ζ, ξ tanh (L: (L) (L))t)-L_D)^T(L(t)-L_D) The T integral of unknown time is converted to an infinite integral independent of the arrival time T to achieve an optimal solution to the value function.

Will U (t)^TRu (t) linear quadratic rewrite to non-quadratic performance function phi (u (t) is used to approximate the minimum energy cost and capture the input constraints:

wherein input constraint is | U (t) | is less than or equal to λ, λ and σ are both normal numbers, and R ═ diag (R)₁，r₂...r_m)＞0。

Adding an artificial potential field function f to avoid collision of any pair of inspection robots_R(r_ij(t)) the two robots are caused to emit repulsive potential fields to avoid each other, and in order to make V (x (t), U (t)) after the potential field function is added bounded, a weight matrix Λ is designed_R(t) for canceling the non-zero tail. Will reject function f_R(r_ij(t)) defines the form of a gaussian function, and this gaussian function is always greater than 0:

where the larger s the greater the steepness of the rejection function and the larger σ the larger the rejection range. To capture the repulsive distance r_ij(t) solving for s and σ in the rejection function, provided with:

f_R(r_s)＝K₀；f_R(rs+Δ)＝K₁ (4-7)

wherein 0 < K₁＜K₀Less than 1; Δ is a positive increment, and is substituted to obtain:

by a weight matrix Λ_R(t)＝[Λ₁₂(t)，Λ₁₃(t)，...，Λ_N-1，N(t)]^TSo that the value function after introducing the artificial potential field function is bounded and the weight matrix depends on the distance to the target point.

Λ_R(t)＝βtanh(||L_i(t)-L_iD||²+||L_j(t)-L_jD||²) (4-9)

Lambda when robot principle targets points_R(t) ═ β, Λ when the robot reaches the target point_RAnd (t) is 0, beta is a collision coefficient, and the size of beta is determined by the importance of avoiding collision in the inspection process.

Solving the optimal control input by using the cost function in (4-4), and carrying out derivation on t on two sides of the formula (4-4) and writing the Bellman equation as follows:

V(x(t))，U(t))＝-ζtanh(L(t)-L_D)^T(L(t)-L_D)-φ(U(t))-Λ_R(t)^Tf_R(r_ij(t)) (4-10)

let F_ζ(t)＝ζtanh(L(t)-L_D)^T(L(t)-L_D) Defining the optimum function as:

V*(X(t)，U(t))＝min∫_t ^∞F_ζ(t)+φ(U(t))+Λ_R(t)^Tf_R(r_ij(t))dt (4-11)

defining the HJB equation according to equation (4-10) as:

wherein

Under stable conditions have

(4-12) deriving U on both sides of the formula:

obtaining the optimal control input u after term shifting^*Comprises the following steps:

substituting (4-14) into (4-5) to obtain:

wherein l is a column vector of all one, and substituting (4-14) into (4-15) to obtain:

wherein

Substituting (4-16) into (4-12) to obtain:

and solving the HJB equation by utilizing a strategy iterative algorithm based on the integral reinforcement learning, wherein the integral reinforcement learning uses signals in (T, T + T) for learning, and a specific dynamic model of the system is not required to be known.

Firstly, the value function is rewritten into the form of integral difference value, and the following Bellman equation is obtained:

in order to solve (4-18) online in real time, an operator-critic neural network algorithm is introduced to realize real-time updating in the strategy iteration process. The value function V (X) is first approximated by a critic neural network, since

Wherein the first term is quadratic form easy to obtain, only the second term is approximated, and

using neural network pair V₀(X) approximating:

wherein w_cIs the weight, psi, of the critical neural network_c(X) is a basis function, ε_c(X) is the approximation error;

differentiating X on two sides of (4-20) to obtain:

substituting (4-20) into (4-18) results in a new bellman equation:

wherein epsilon_e(t)＝ε_c(X(t+T))-ε_c(X (t)) is the error of the Bellman equation,. DELTA.. psi_c(X(t)＝ψ_c(X(t+T)-ψ_c(X(t)。

To determine w_cThe (4-20) is rewritten as:

wherein

Is a V₀Approximation of (X)The value of the one or more of the one,

for ideal approximation coefficients, then (4-22) is:

order to

For Bellman tracking errors and constructing an objective function by making ε_e(t) minimizing to adjust the weighting coefficients of the critic neural network:

the two sides of the formula (4-25) are aligned

Derivation, obtained by the chain rule:

wherein beta is_c>0 is the learning rate, and 0 is the learning rate,

is delta psi_cAn approximation of (d).

Will E_eSubstituting into (4-26) to obtain weight coefficient of criticc neural network

Should be subject to:

the obtained ideal weight coefficient is substituted into (4-14) to obtain an optimal control strategy, however, the optimal strategy obtained through the value function of critic approximation cannot ensure the stability of a closed-loop system, and an actor neural network is introduced into an actuator to ensure convergence to an optimal solution and ensure the stability of the system:

is the optimal approximation coefficient of the actor neural network,

is determined by the following lyapunov function:

when w is_aWhen the following formula is satisfied, the approximated strategy makes the system consistent and finally bounded, and the system is finally bounded by

To obtain U^*(t)。

Wherein K₁，K₂In order to design a good normal number,

based on the expressions (4-19), (4-27), (4-28) and (4-30), the critic algorithm and the actor algorithm are respectively utilized to realize synchronous updating of the value function and the strategy function, and an online integral reinforcement learning algorithm based on strategy iteration is designed to solve the HJB equation so as to solve the optimal control input.

The algorithm is as follows: online IRL algorithm based on strategy iteration

Initialization: given a feasible actuator input

Step 1: policy evaluation, given initialization

Solving by

Step 2: improvement of the strategy is that

Substituting the following formula for updating

Step 3: order to

Go back to Step1 until

Converging to a minimum value.

The invention has the beneficial effects that:

1. the invention adopts a distributed control mode in the multi-fire-fighting inspection cooperative robot system, so that the autonomy, flexibility, reliability and response speed of each robot under the system are improved.

2. The four-axis mechanical arm is designed at the top of each fire inspection robot, the mechanical arm is matched with a specially-made fire extinguisher, so that an ignition point can be automatically and accurately put out after a fire is discovered, the mechanical arm can be remotely and manually controlled by a fireman to complete the operations of turning off a power switch, turning off a gas valve, removing combustible materials and the like, and the initiative and the operability after the fire is discovered are obviously improved.

3. The invention provides an improved fast convolutional neural network based on visual recognition to finish the recognition and detection of flame by matching with the picture acquired by a depth camera realsense D435i for more accurately recognizing flame and reducing the false alarm rate, and simultaneously introduces a method of guiding anchoring to improve the RPN detection speed in the fast convolutional neural network.

4. The approximation function designed in the controller algorithm can convert the finite integral of the minimum arrival time T unknown in the optimal path planning problem into an infinite integral form so as to be convenient for solving, and introduces a non-quadratic performance function for approximating the minimum energy cost and capturing input constraint.

5. The invention introduces an artificial potential field function to avoid collision among robots in the inspection process of the multi-fire inspection cooperative robot system, and designs a special weight coefficient matrix to offset the non-zero tail.

6. The invention uses the integral reinforcement learning algorithm in the multi-robot control algorithm to solve the problem that the matrix of the inspection robot system is unknown, and uses the critic and actor neural network algorithm to synchronously iterate and solve the Bellman equation on line in real time to obtain the optimal strategy, thereby obviously improving the inspection efficiency and the robustness of the multi-fire-fighting inspection robot system.

Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention may be realized and attained by the means of the instrumentalities and combinations particularly pointed out hereinafter.

Drawings

For the purposes of promoting a better understanding of the objects, aspects and advantages of the invention, reference will now be made to the following detailed description taken in conjunction with the accompanying drawings in which:

FIG. 1 is a bottom level diagram of hardware;

FIG. 2 is a schematic diagram of coordinate transformation;

FIG. 3 is a flow chart of motion trajectory generation;

FIG. 4 is a flow chart of obstacle avoidance for the fire inspection robot;

FIG. 5 is a fast convolutional neural network training process;

FIG. 6 is a fire inspection robot interaction structure;

FIG. 7 is an overall structure diagram of the fire inspection robot;

FIG. 8 is a schematic diagram of a multi-fire inspection cooperative robot system inspection;

FIG. 9 is a flowchart of the operation of the robotic arm to extinguish a fire;

fig. 10 is a flow chart of the work of the fire inspection robot.

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention in a schematic way, and the features in the following embodiments and examples may be combined with each other without conflict.

Wherein the showings are for the purpose of illustrating the invention only and not for the purpose of limiting the same, and in which there is shown by way of illustration only and not in the drawings in which there is no intention to limit the invention thereto; to better illustrate the embodiments of the present invention, some parts of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

The same or similar reference numerals in the drawings of the embodiments of the present invention correspond to the same or similar components; in the description of the present invention, it should be understood that if there is an orientation or positional relationship indicated by terms such as "upper", "lower", "left", "right", "front", "rear", etc., based on the orientation or positional relationship shown in the drawings, it is only for convenience of description and simplification of description, but it is not an indication or suggestion that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and therefore, the terms describing the positional relationship in the drawings are only used for illustrative purposes, and are not to be construed as limiting the present invention, and the specific meaning of the terms may be understood by those skilled in the art according to specific situations.

Aiming at each single fire inspection tour robot, the fire inspection tour robot is provided with a depth camera realsense D435i for finding the fire quickly and accurately on the basis of matching with a flame detector and a temperature sensor, the depth camera can realize the identification of the fire at a longer distance by extracting the characteristics of a scene, and the identification accuracy and the rapidity are improved compared with the sensors. Meanwhile, the depth camera transmits the inspection image to the main control room and the mobile terminal in real time, so that the inspection image can be observed by a controller conveniently, and control instructions sent by the control room and the mobile terminal can be received at any time. The inspection robot sends an alarm signal to the main control room immediately after finding a fire, but the alarm signal is far from enough, so that in order to improve the processing capacity of the inspection robot after finding the fire, a four-shaft mechanical arm is also assembled above the robot, and the front end of the mechanical arm is provided with a clamping jaw which is arranged to be beneficial to adding subsequent equipment; after the fire is discovered, under the remote control of fire fighters, the work of cutting off the power supply, removing a gas valve and combustible materials and the like can be completed through the mechanical arm under the necessary condition. In addition, a specially-made fire extinguishing device (such as a specially-made small fire extinguisher) can be arranged at the position of the clamping jaw of the mechanical arm to match with the mechanical arm to realize accurate extinguishing of an ignition point, so that the fire is prevented from spreading to the greatest extent, and greater economic loss is caused.

On the cooperative control of the multiple fire-fighting inspection robots, the multiple robots are required to complete the optimal online path planning of the minimum arrival time T unknown under the conditions of collision avoidance, constraint on input of an actuator and unknown external disturbance in the inspection process, in addition, the inspection efficiency, robustness and expandability of the whole system are ensured, and the robots cannot collide in the whole inspection process.

In order to meet the requirements, the software and hardware design scheme of the invention is as follows:

the novel multi-fire-fighting inspection cooperative robot system designed by the invention adopts the idea of layered design and respectively comprises a hardware layer, an interaction layer, a sensing layer and a control layer, wherein the first part to the third part introduce the specific software and hardware structure of each robot under the whole multi-fire-fighting inspection cooperative robot system, and the fourth part introduces the specific control algorithm realization of the multi-fire-fighting inspection cooperative robot system.

Hardware layer design of first part fire-fighting inspection robot

The hardware layer is used as a controller by the DSP, data collected by the odometer and the gyroscope are sent into the DSP to be processed, and the position of the robot in the routing inspection map can be calculated in real time. Sending a speed instruction to the DSP through the upper computer, and coding the obtained speed information by the DSP to control the operation of the servo motor; the fire inspection robot adopts a crawler-type drive, and aims to improve the passing capacity (such as stairs) and the steering flexibility of a complex road section of the fire inspection robot. When the mechanical arm needs to move, a ros system in an upper computer passes through a robot on a moveit! The platform plans a motion track of a target point to which the mechanical arm moves, discretizes the planned motion track and sends the discretized motion track to the DSP, and the DSP obtains the angular velocity and the acceleration of each shaft and then controls a servo motor of the mechanical arm to move so as to reach the target point.

The bottom design plan of the hardware layer is shown in fig. 1.

1. Track drive system

In order to adapt to various inspection environments and improve the flexibility and the trafficability characteristic in the inspection process, the inspection robot adopts crawler-type driving. The track structure is designed into two sections, and each section is driven by a separate servo motor. The front-section crawler belt is mainly used for lifting a chassis of the robot to smoothly pass through when encountering a higher obstacle, and can also be used for adjusting the height of the robot by adjusting the front-section crawler belt so as to provide a larger operating radius for the mechanical arm; the second half section of the crawler belt mainly plays a role in driving the robot, is coaxially driven by a servo motor, and can be used for decelerating and braking the crawler belt on one side when the robot turns. The rated voltage of the servo motor is 24V, the output power is 100W, and the speed information of x and y axes issued by the upper PC is converted into the rotating speed of the servo motor after being coded by the DSP so as to realize steering and driving.

2. Mechanical arm servo control

In order to improve the processing capacity of the inspection robot when finding a fire, a four-axis mechanical arm is arranged above the robot. A rotatable claw-shaped clamping device is installed at the front section of the mechanical arm, and a specially-made small fire extinguishing device (such as a fire extinguisher, a small water pump and the like) can be installed on the clamping device according to specific requirements. After the fire extinguishing device is additionally arranged, the fire extinguishing device can be matched with the mechanical arm to realize accurate extinguishing of a fire point; the fire extinguishing device is not additionally arranged, and whether a mechanical arm is manually controlled by a fire fighter to cut off a local power supply, close a gas valve, remove surrounding inflammable matters, close a fire door and the like is determined according to the fire degree when a fire is found, so that the fire spread is prevented to the maximum extent, and the economic loss is reduced. The four-axis mechanical arm is driven by four servo motors to move each axis, and the motion information of each axis is calculated by a moveit | in an upper computer ros system! And generating after path planning.

The coordinate transformation of the target point under the world coordinate system To the coordinate relative To the coordinate system of the mechanical arm is completed by a calibration form of Eye-To-Hand (Eye-Hand). For the calibration mode of 'eyes are out of hand', a transformation matrix Tgc from a manipulator base coordinate system Tg to a camera coordinate system Tc is constant, a transformation matrix Tbe from a calibration plate coordinate system Tb to a manipulator tail end coordinate system Te is constant, and the relation of coordinate transformation satisfies the following formula:

for the ith time: tbc_i+1＝Tbe*Teg_i+1*Tgc (1-1)

Time i + 1: tbc_i+1＝Tbe*Teg_i+1*Tgc (1-2)

A schematic diagram of the coordinate transformation is shown in fig. 2.

The ros (robot Operating system) is an Operating system specially used for realizing the control of the robot system, and can be developed in the Linux environment, and is especially suitable for a complex and multi-node control system such as a robot due to its simple operation mode, powerful function and strong expandability. In the control of the mechanical arm, a special integrated tool is arranged in an ROS system for completing the motion trail planning of the mechanical arm, which is moveit! . Moveit! May be considered an "integrator" by which the individual functional components controlling the robotic arm may be combined and then made available to the user through action and service communication in the ROS. In moveit! In the method, a model (URDF model) which is in accordance with the real size and the number of axes of the mechanical arm is created, and after the model is input, the moveit! The setup authority generates a corresponding configuration file according to the setting of the setup authority, and the configuration file comprises a collision matrix of the mechanical arm so as to avoid collision between the axes caused by the planned track, connection information of each joint, a defined initial position and the like. And then adding a control plug-in (controller) of the mechanical arm, wherein the controller mainly comprises a follow _ join _ project node and names for setting various axes, and finally writing a program to realize that the PC is connected with the mechanical arm in a socket communication mode, and observing a real-time motion track of the mechanical arm in rviz by subscribing to a join _ state topic. The method comprises the steps of firstly completing flame identification and detection through a fast convolution neural network, obtaining a three-dimensional coordinate of an ignition point relative to a robot through point cloud data of a depth camera after successful identification, obtaining a position where the tail end of a mechanical arm needs to reach through TF coordinate change, and then immediately completing track solving through an internally integrated algorithm (usually adopting cubic spline interpolation). The solved track information is composed of a large number of discrete points whose information includes the angular velocity and angular acceleration of each axis to which the point is to be reached. When the solved points are enough, a smooth motion track can be fitted, and the mechanical arm can smoothly move to a target point according to the planned points after the information of the points is published and subscribed through topics. Moveit! A flow chart for generating a motion trajectory is shown in fig. 3.

Second part fire-fighting inspection robot sensing layer design

The sensing layer design of the fire-fighting inspection robot mainly comprises a laser radar for drawing construction, an infrared sensor for obstacle avoidance, a flame detector for detecting flame, a temperature sensor, a realsense D435i depth camera, an odometer, a gyroscope and the like.

Infrared sensor obstacle avoidance

The infrared sensor is used for detecting the obstacles encountered by the inspection robot in the inspection process in real time, when the obstacle exists at the front side, the infrared sensor can detect the Euclidean distance between the robot and the obstacle, and the specific coordinates of the obstacle can be calculated by the distance and the odometer and gyroscope data obtained from the DSP. After the coordinates are obtained, an obstacle avoidance path can be designed immediately through a control algorithm, the obstacle avoidance path is usually arc-shaped, a minimum distance between the obstacle avoidance path and an obstacle is required to be kept in the whole process, and after obstacle avoidance is finished, the previously planned optimal routing inspection path needs to be returned immediately. The obstacle avoidance flow chart is shown in fig. 4.

② flame identification based on fast convolution neural network

In the inspection process, the detection of the flame is particularly critical, and along with the rapid development of computer technology, the flame is detected more rapidly and accurately by using vision than a fixed flame detector. However, since there are many objects similar to the flame in color in the inspection scene and the shape and texture of the flame are various, it is a difficult task to detect the position of the flame in the image. The invention adopts the fast convolutional neural network (Faster R-CNN) to extract and detect the flame characteristics, not only can accurately identify the flame, but also can accurately calculate the position where the flame is generated, and can reduce the false alarm rate of flame detection to the greatest extent.

The training steps of the fast convolutional neural network are as follows:

secondly-1, inputting the shot flame picture;

secondly, sending the picture into a Convolutional Neural Network (CNN) for feature extraction;

secondly-3, after feature extraction, feature maps (featuremaps) are used, and the feature maps jointly act on a subsequent full connection layer and an RPN (region generation network);

secondly-3.1 feature mapping into RPN, firstly, through a series of region candidate suggestion boxes, namely anchors (anchors), feeding the suggestion boxes into two 1 × 1 convolution layers respectively, wherein the first convolution layer is used for region classification, namely, positive and negative samples are distinguished by calculating IOU (intersection ratio) values for generating the suggestion boxes; and the other is subjected to non-maximization inhibition to generate a more accurate target detection frame due to the boundary box regression judgment.

And the-3.2 characteristic mapping enters the ROI pooling layer for subsequent network calculation.

And 4, after the pooled feature maps pass through a full connection layer, classifying the suggestion boxes by utilizing softmax again, namely identifying whether the detection boxes are objects or not, and performing boundary box regression judgment on the suggestion boxes again in order to further improve the accuracy of the target detection boxes.

A schematic diagram of the training process is shown in fig. 5.

The above steps using the RPN to generate the detection box (anchors) is the biggest advantage of FasterR-CNN compared with the traditional detection algorithm. The specific method for generating the detection frame by the RPN is to slide the input feature mapping through a sliding frame to generate 9 suggestion frames on each pixel pointThe size of these suggestion boxes may be 128²、256²、512²Aspect ratio of 1: 1. 1: 2. 2: 1, and distinguishing positive and negative samples by using an intersection ratio (IOU) of the detection boxes, wherein the IOU value of the positive sample is more than 0.7, the IOU value of the positive sample is less than the negative sample of 0.3, and the proportion of the positive sample to the negative sample is set as 1: 1. however, the number of the suggested frames drawn by such a method is still large, so that the method of the invention can adopt a guided anchoring method to accelerate the detection speed of the RPN according to different characteristics of flames in the image, and the improved sparse anchoring strategy is as follows:

In addition, the principle of correcting the detection frame by using boundary regression judgment (bounding box regression) is to map G the original suggestion frame A to obtain a regression suggestion frame F closer to the real situation. This mapping G can be obtained by translation and scaling:

firstly, translation: f_x＝A_w.d_x(A)+A_x (2-2)

F_y＝A_h.d_y(A)+A_y (2-3)

Rescaling: f_w＝A_w.exp(d_w(A)) (2-4)

F_h＝A_h.exp*d_h(A)) (2-5)

Wherein x, y, w, h respectively represent the center coordinates of the proposed box, width, height, d_x、d_y、d_w、d_hFor the transformation relations we are looking for, respectively, when the original frame a and the real frame F are not very different, the transformation can be generally considered to be linear.

The output is the probability of being identified as a flame.

Interaction layer design of third part fire-fighting inspection robot

In the inspection process, pictures captured by the camera in real time need to be sent to the control room and the mobile terminal through a wireless network, corresponding APPs are developed in a matched mode, inspection pictures and alarm signals can be received at terminals such as a PC (personal computer), a web, a mobile phone and a pad at any time and any place, and the inspection robot can be controlled correspondingly at a remote terminal, so that the inspection of an area needing to be inspected again by an operator is realized. After the flame is detected, an alarm signal should be sent to the control room immediately and corresponding fire extinguishing measures should be automatically taken immediately. After the fire extinguishing measures are implemented, if the fire condition is still not restrained, the automatic mode can be immediately switched to the remote control mode, a professional in a control room comprehensively takes over the control of the inspection robot, the operation of the crawler and the action of the mechanical arm are manually controlled to accurately extinguish a fire point, and whether the operations of cutting off a power supply, closing a gas valve, transferring inflammable matters and the like are needed or not is judged according to the fire condition. In addition, each inspection machine can be connected with the whole fire fighting system in a grid mode, if the fire condition is still large after measures are taken, a request for taking over a fire fighting network can be sent to the control room, the control room agrees or the fire fighting control room does not respond within one minute, a local spraying pipe network in the building can be opened, meanwhile, a comprehensive fire fighting alarm is sent, all fire fighting channels and emergency lighting facilities are opened, and therefore property loss and casualties are reduced to the maximum degree, and precious time is won for rescue. Meanwhile, in order to avoid sudden failures of the inspection robot in the inspection process, an emergency stop button is installed at the top end of the robot, and surrounding personnel are prevented from being injured. After the fire is extinguished, the ignition point is marked as a key inspection area on the inspection map so as to facilitate later inspection. The schematic diagram of the interactive structure of the fire inspection robot is shown in fig. 6.

Fourth part multi-fire-fighting inspection cooperative robot system control algorithm

Because the common fire-fighting inspection task is cooperatively completed by a plurality of robots, and the optimal path planning under the minimum arrival time in the inspection process is required to be realized in the whole multi-robot control process, the inspection range can be comprehensively covered, and the endurance time of the multi-robot inspection system can be ensured; and the interference usually existing to the inspection environment in the inspection process is unknown. In addition, in order to avoid saturation of the actuator, the input of the actuator is generally required to be constrained; meanwhile, for the sake of safety, the robots cannot collide with each other in the whole inspection process. Aiming at the control requirements of the multi-fire-fighting inspection cooperative robot system, a minimum arrival time T, unknown disturbance to the outside, unknown model of a system part and constraint on input are required to be designed, the robots are required to avoid collision, and in addition, accurate external information is difficult to be adopted under the actual condition, so that offline solution is changed into online solution, and the optimal controller based on integral reinforcement learning and an AC neural network algorithm is designed.

The whole fire-fighting inspection area is provided with N robots for cooperative inspection, and the N robots are respectively positioned at initial positions (x)_i0，y_i0) To reach respective destinations (x)_iD，y_iD) And i belongs to {1, 2.. eta., N }, and the position L of the ith fire-fighting inspection robot at the moment t is set_i(t)＝[L_ix(t)，L_iy(t)]^TVelocity V_i(t)＝[V_ix(t)，V_iy(t)]^TController input U_i(t)＝[u_ix(t)，u_iy(t)]^TControl input and unknown environmental disturbance W_i(t)＝[W_ix(t)，W_iy(t)]^TMeanwhile, in order to avoid saturation of the actuator, input is constrained, and | U (t) | is required to be less than or equal to lambda, wherein lambda is a normal number. Set the distance r between two inspection robots_ij(t)＝||L_i(t)-L_j(t) | |, a safety distance r is set for avoiding collision of two inspection robots_sAnd r is required to be satisfied at any time in the inspection process_ij(t)≥r_sAnd we assume that r is guaranteed after N robots reach the inspection destination_ij(t)＞＞r_sIn this case, i ≠ j.

is the state of the robot at the time t,

to input, y_i(t) is the system unique output.

The global dynamics model is written as:

wherein

In order for the N fire inspection robots to achieve optimal control of minimum time and energy in continuous time, continuous state and control input space under unknown disturbances and to avoid collisions throughout the process, the following cost functions are considered:

wherein ζ>And 0 is used for representing the specific gravity of time in the routing inspection process, and R is a positive definite matrix. In order to solve the path planning problem that the minimum arrival time T of the robot is unknown, a hyperbolic tangent function is introduced to rewrite a cost function into an infinite integral form so as to solve the problem, in addition, in order to avoid the saturation of an actuator, the input is also required to be constrained, so that the common U (T)^TRu (t) linear quadratic rewrite to non-quadratic performance function phi (u (t)) is used to approximate the minimum energy cost and capture the input constraints, and to introduce an artificial potential field function in order to avoid collisions between two robots, the cost function is approximately rewritten as:

where ζ is a normal number and tanh is a hyperbolic tangent function, which is an odd function that monotonically increases and is continuously differentiable, so the cost function after overwriting remains in an IRL-resolvable form. Rewriting ζ to ζ tanh (L (t) -L)_D)^T(L(t)-L_D) When the current position L (t) of the robot is away from the target point L_DTime ζ tanh (L (t) -L)_D)^T(L(t)-L_D) Approximately ζ, ξ tanh (L (t) -L) upon arrival at the target point_D)^T(L(t)-L_D) This translates the T integral at unknown time to an infinite integral independent of the time of arrival T, to achieve an optimal solution to the value function.

And because the robot system usually has constraints on input, the common U (t)^TRu (t) linear quadratic rewrite to non-quadratic performance function phi (u (t) is used to approximate the minimum energy cost and capture the input constraints:

In order to avoid collision of any pair of inspection robots, an artificial potential field function f is added_R(r_ij(t)) makes the two robots send out repulsive potential field to avoid each other, and a special weight matrix Lambda is designed for making V (x (t), U (t)) after adding the potential field function to be bounded_R(t) for canceling the non-zero tail. Will reject function f_R(r_ij(t)) defines the form of a gaussian function, and this gaussian function is always greater than 0:

where the larger s the greater the steepness of the rejection function and the larger σ the larger the rejection range. To capture the repulsive distance r_ij(t), solving for s and σ in the rejection function, assuming:

f_R(r_s)＝K₀；f_R(r_s+Δ)＝K₁ (4-7)

wherein 0 < K₁＜K₀Less than 1; Δ is a positive increment. Substituting the above formula into one can obtain:

Λ_R(t)＝βtanh(||L_i(t)-L_iD||²+||L_j(t)-L_jD||²) (4-9)

It can be seen that Λ is the robot principle target point_R(t) is beta whenLambda when the robot reaches the target point_RSince (t) is 0, β is a collision coefficient, and the magnitude of β is determined by the importance of avoiding collision in the polling process.

Following the solution of the optimal control input using the cost function in (4-4), it is clear that V can be minimized, and t is derived on both sides of the (4-4) equation, so the Bellman equation can be written as:

V(x(t)，U(t))＝-ζtanh(L(t)-L_D)T(L(t)-L_D)-φ(U(t))-Λ_R(t)^Tf_R(r_ij(t)) (4-10)

let F_ζ(t)＝ζtanh(L(t)-L_D)^T(L(t)-L_D) Defining the optimum function as:

V^*(X)t)，U(t))＝min∫_t ^∞F_ζ(t)+φ(U(t))+Λ_R(t)^Tf_R(r_ij(t))dt (4-11)

defining the HJB equation according to equation (4-10) as:

wherein

Under stable conditions have

(4-12) deriving U on both sides simultaneously:

the optimal control input u can be obtained after term shifting^*Comprises the following steps:

substituting (4-14) into (4-5) to obtain:

wherein

Substituting (4-16) into (4-12) to obtain:

however, in practical situations, the HJB equation is difficult to solve directly, and because the system model part is unknown, the HJB equation is obtained

The method can not be directly solved, so that the HJB equation can be solved by utilizing a strategy iterative algorithm based on the integral reinforcement learning, the signals in (T, T + T) are used for learning by the integral reinforcement learning, and a specific dynamic model of the system does not need to be known.

Firstly, the value function is rewritten into the form of integral difference value, and the following Bellman equation can be obtained:

in order to solve (4-18) online in real time, an operator-critic neural network algorithm is introduced to realize real-time update in the strategy iteration process. The value function V (X) is first approximated by a critic neural network, since

The first term is quadratic form which is easy to obtain, so that only the second term needs to be approximated and set

Using neural network pair V₀(X) approximating:

differentiating X on two sides of (4-20) to obtain:

substituting (4-20) into (4-18) can obtain a new bellman equation:

But due to the coefficient w of the critic neural network_cIs unknown and therefore (4-18) cannot be solved directly, in order to determine w_cTherefore we directly rewrite (4-20) to:

wherein

Is a V₀(ii) an approximation of (X),

for ideal approximation coefficients, then (4-22) is:

order to

the two sides of the formula (4-25) are aligned

Derivation, which can be obtained from the chain rule:

wherein beta is_c>0 is the learning rate, and 0 is the learning rate,

is delta psi_cAn approximation of (d).

Will E_eSubstituting into (4-26) to obtain the weight coefficient of critic neural network

Should be subject to:

the obtained ideal weight coefficient is substituted into (4-14) to obtain an optimal control strategy, however, the optimal strategy obtained through the value function of critic approximation cannot ensure the stability of a closed-loop system, so an actor neural network is introduced into the actuator to ensure convergence to an optimal solution and ensure the stability of the system:

is the optimal approximation coefficient of the actor neural network,

is determined by the following lyapunov function:

it can be shown that when w_aWhen the following formula is satisfied, the approximated strategy can make the system consistent and finally bounded, and the system can pass through

To obtain U^*(t)。

Wherein K₁、K₂In order to design a good normal number,

The algorithm is as follows: online IRL algorithm based on strategy iteration

Initialization: given a feasible actuator input

Step 1: policy evaluation, given initialization

Solving by

Step 2: improvement of the strategy is that

Substituting the following formula for updating

Step 3: order to

Go back to Step1 until

Converging to a minimum value.

The overall structure of the fire-fighting inspection robot is shown in fig. 7.

The inspection schematic diagram of the multi-fire inspection cooperative robot system is shown in fig. 8.

The whole square frame is an area to be inspected, the dotted line is an area dividing line, the light color pentagram represents a key inspection area, the dark color pentagram represents a fire finding point, and the bidirectional arrow represents that information interaction exists between the robots.

The workflow diagram for operating the robot to extinguish a fire is shown in fig. 9.

The workflow diagram of the fire inspection robot is shown in fig. 10.

Finally, the above embodiments are only intended to illustrate the technical solutions of the present invention and not to limit the present invention, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions, and all of them should be covered by the claims of the present invention.

Claims

1. Many fire control based on integral reinforcement learning patrols and examines cooperative robot system, its characterized in that: the system comprises a hardware layer, an interaction layer, a sensing layer and a control layer which are connected in sequence;

the hardware layer adopts a DSP as a controller, data acquired by the odometer and the gyroscope are sent into the DSP for processing, and the position of the robot in the routing inspection map is calculated in real time; sending a speed instruction to the DSP through the upper computer, and coding the obtained speed information by the DSP to control the operation of the servo motor; the fire-fighting inspection robot adopts crawler-type drive; when the mechanical arm needs to move, a ros system in an upper computer passes through a robot on a moveit! The platform plans a motion trail of a target point to which the mechanical arm moves, discretizes the planned motion trail and sends the discretized motion trail to the DSP, and the DSP obtains the angular velocity and the acceleration of each shaft and then controls a servo motor of the mechanical arm to move so as to reach the target point;

the sensing layer comprises a laser radar for establishing a map, an obstacle-avoiding infrared sensor, a flame detector for detecting flame, a temperature sensor, a realsense D435i depth camera, an odometer and a gyroscope;

the control layer is as follows:

the whole fire-fighting inspection area is provided with N robots for cooperative inspection, and the N robots are respectively positioned at initial positions (x)_i0，y_i0) To reach respective destinations (x)_iD，y_iD) And i belongs to {1, 2, …, N }, and the position L of the ith fire inspection robot at the time t is set_i(t)＝[L_ix(t)，L_iy(t)]^TVelocity V_i(t)＝[V_ix(t)，V_iy(t)]^TController input U_i(t)＝[u_ix(t)，u_iy(t)]^TControl input and unknown environmental disturbance W_i(t)＝[W_ix(t)，W_iy(t)]^TIn order to avoid saturation of the actuator, input is constrained, and | U (t) | is required to be less than or equal to lambda, wherein lambda is a normal number; set the distance r between two inspection robots_ij(t)＝||L_i(t)-L_j(t) | |, a safety distance r is set for avoiding collision of two inspection robots_sAnd r is required to be satisfied at any time in the inspection process_ij(t)≥r_sAnd when N robots reach the inspection destination, ensuring r_ij(t)＞＞r_sWhen i is not equal to j;

is the state of the robot at the time t,

to input, y_i(t) is the system only output;

the global dynamics model is written as:

wherein

Is the product of Kronecker, X (t) ═ x₁(t)，x₂(t)，...，x_n(t)]^T，Y(t)＝[y₁(t)，y₂(t)，...，y_n(t)]^T，I_NIs an N-order identity matrix, and is set to L (t) [ < L >_1t，L_2t，...，L_Nt]^T，L_D＝[L_1D，L_2D，...，L_ND]^T，U₀＝[U₁，U₂，...，U_N]^TThe positions of the N robots at the time t, the positions of the target points and the control input are respectively input;

wherein ζ>0, used for representing the proportion of time in the polling process, and R is a positive definite matrix; in order to solve the path planning problem that the minimum arrival time T of the robot is unknown, a hyperbolic tangent function is introduced to rewrite a cost function into an infinite integral form so as to solve the problem, and in addition, in order to avoid saturation of an actuator, the input is required to be restrained, U (T)^TRu (t) linear quadratic rewrite to non-quadratic performance function phi (u (t)) for approximating minimum energy cost and capturing input constraints, and introducing an artificial potential field function to avoid collisions between two robots, approximately rewriting the cost function as:

where ζ is a normal number, tanh is a hyperbolic tangent function, which is an odd function that monotonically increases and is continuously differentiable, and the cost function is an IRL-solvable form; rewriting ζ to ζ tanh (L (t) -L)_D)^T(L(t)-L_D) When the current position L (t) of the robot is away from the target point L_DTime ζ tanh (L (t) -L)_D)^T(L(t)-L_D) Approximately ζ, tanh (L (t) -L) when the target point is reached_D)^T(L(t)-L_D) Converting the T integral of unknown time into infinite integral irrelevant to the arrival time T to realize the optimal solution of the value function;

wherein input constraint is | U (t) | is less than or equal to λ, λ and σ are both normal numbers, and R ═ diag (R)₁，r₂...r_m)＞0；

Adding an artificial potential field function f to avoid collision of any pair of inspection robots_R(r_ij(t)) the two robots are caused to emit repulsive potential fields to avoid each other, and in order to make V (x (t), U (t)) after the potential field function is added bounded, a weight matrix Λ is designed_R(t) for canceling non-zero tails; will reject function f_R(r_ij(t)) defines the form of a gaussian function, and this gaussian function is always greater than 0:

wherein the larger s is, the greater the steepness of the repulsion function is, and the larger σ is, the larger the repulsion range is; to capture the repulsive distance r_ij(t)，Solving for s and σ in the rejection function, provided with:

f_R(r_s)＝K₀；f_R(r_s+Δ)＝K₁ (4-7)

by a weight matrix Λ_R(t)＝[Λ₁₂(t)，Λ₁₃(t)，...，Λ_N-1，N(t)]^TTo make the value function after introducing the artificial potential field function bounded and the weight matrix dependent on the distance to the target point;

Λ_R(t)＝βtanh(||L_i(t)-L_iD||²+||L_j(t)-L_jD||²) (4-9)

lambda when the robot is far away from the target point_R(t) ═ β, Λ when the robot reaches the target point_R(t) is 0, beta is a collision coefficient, and the size of beta is determined by the importance of avoiding collision in the routing inspection process;

let F_ζ(t)＝ζtanh(L(t)-L_D)^T(L(t)-L_D) Defining the optimum function as:

defining the HJB equation according to equation (4-10) as:

wherein

Under stable conditions have

(4-12) deriving U on both sides of the formula:

obtaining the optimal control input U after item shifting^*(t) is:

substituting (4-14) into (4-5) to obtain:

wherein

Substituting (4-16) into (4-12) to obtain:

solving an HJB equation by utilizing a strategy iterative algorithm based on integral reinforcement learning, wherein the integral reinforcement learning uses signals in (T, T + T) for learning without knowing a specific dynamic model of the system;

in order to solve (4-18) online in real time, an operator-critic neural network algorithm is introduced to realize real-time updating in the strategy iteration process; the value function V (X) is first approximated by a critic neural network, since

using neural network pair V₀(X) approximating:

differentiating X on two sides of (4-20) to obtain:

substituting (4-20) into (4-18) results in a new bellman equation:

wherein epsilon_e(t)＝ε_c(X(t+T))-ε_c(X (t)) is the error of the Bellman equation,. DELTA.. psi_c(X(t)＝ψ_c(X(t+T)-ψ_c(X(t)；

To determine w_cThe (4-20) is rewritten as:

wherein

Is a V₀(ii) an approximation of (X),

for ideal approximation coefficients, then (4-22) is:

order to

the two sides of the formula (4-25) are aligned

Derivation, obtained by the chain rule:

wherein beta is_c>0 is the learning rate, and 0 is the learning rate,

is delta psi_cAn approximation of (d);

Should be subject to:

and (4) substituting the obtained ideal weight coefficient into (4-14) to obtain an optimal control strategy, but the optimal strategy obtained by a value function of critic approximation cannot ensure the stability of a closed-loop system, and an actor neural network is introduced into an actuator to ensure convergence to an optimal solution and ensure the stability of the system:

is the optimal approximation coefficient of the actor neural network,

is determined by the following lyapunov function:

To obtain U^*(t)；

Wherein K₁、K₂In order to design a good normal number,

based on the formulas (4-19), (4-27), (4-28) and (4-30), respectively utilizing critic and operator algorithms to realize synchronous updating of a value function and a strategy function, and designing an online integral reinforcement learning algorithm based on strategy iteration to solve an HJB equation so as to solve optimal control input;

the algorithm is as follows: online IRL algorithm based on strategy iteration

Initialization: given a feasible actuator input

Step 1: policy evaluation, given initialization

Solving by

Step 2: improvement of the strategy is that

Substituting the following formula for updating

Step 3: order to

Go back to Step1 until

Converging to a minimum value.

2. The cooperative robot system for multiple fire inspection patrolling based on integral reinforcement learning as claimed in claim 1, wherein: the hardware layer comprises a crawler driving system and a mechanical arm servo control;

1. track drive system

The crawler belt is divided into two sections, and each section is driven by a separate servo motor; the front-section crawler belt is used for lifting a chassis of the robot to smoothly pass through the chassis when an obstacle exists, and the height of the robot is adjusted by adjusting the front-section crawler belt, so that a larger operation radius is provided for the mechanical arm; the second half section of the crawler belt mainly plays a role in driving the robot, is coaxially driven by a servo motor, and performs deceleration braking on the crawler belt on one side during steering; the rated voltage of the servo motor is 24V, the output power is 100W, and the speed information of x and y axes issued by the upper PC is converted into the rotating speed of the servo motor after being coded by the DSP so as to realize steering and driving;

2. mechanical arm servo control

A four-axis mechanical arm is arranged above the robot, a rotatable claw-shaped clamping device is arranged at the front section of the mechanical arm, and a fire extinguishing device is arranged on the clamping device; after the fire extinguishing device is additionally arranged, the fire extinguishing device is matched with the mechanical arm to realize accurate extinguishing of a fire point; the four-axis mechanical arm is driven by four servo motors to move each axis, and the motion information of each axis is calculated by a moveit | in an upper computer ros system! Generating after path planning;

The coordinate transformation of the target point under the world coordinate system to the coordinate relative to the mechanical arm coordinate system is completed through the calibration form of 'eyes are out of hand'; for the calibration mode of 'eyes are out of hand', a transformation matrix Tgc from a manipulator base coordinate system Tg to a camera coordinate system Tc is constant, a transformation matrix Tbe from a calibration plate coordinate system Tb to a manipulator tail end coordinate system Te is constant, and the relationship of coordinate transformation satisfies the following formula:

for the ith time: tbc_i＝Tbe*Teg_i*Tgc (1-1)

Time i + 1: tbc_i+1＝Tbe*Teg_i+1*Tgc (1-2)

Then A ═ Teg_i)^-1*(Teg)_i+1Namely the motion relation of the object relative to the tail end coordinate system Te of the mechanical arm;

Using Moveit! Combining all independent functional components of the control mechanical arm, and then providing the combination for a user to use through action and service communication in the ROS; in moveit! In the method, a URDF model is created according to the real size and the number of axes of the mechanical arm, and after the URDF model is input, the model is used for calculating the weight of the mechanical arm according to the weight of the mechanical arm! The setup assist generates a corresponding configuration file according to the setting of the setup assist, the content of the configuration file comprises a collision matrix of the mechanical arm so as to avoid collision between shafts caused by a planned track, connection information of each joint, a defined initial position and the like; then adding a control plug-in controller of the mechanical arm, wherein the controller comprises a defined follow _ join _ project node and names for setting each axis, finally writing a program to realize that the PC is connected with the mechanical arm in a socket communication mode, and observing a real-time motion track of the mechanical arm in rviz by subscribing a join _ state topic; the method comprises the steps that firstly, the flame is identified and detected through a fast convolution neural network, after identification is successful, the three-dimensional coordinate of an ignition point relative to a robot is obtained through point cloud data of a depth camera, then the position of the tail end of a mechanical arm, which needs to be reached, can be obtained through TF coordinate change, and then the track is solved through an internally integrated algorithm; the solved track information is composed of a large number of discrete points, and the track information comprises the angular velocity and the angular acceleration of each axis to reach the points; when the solved points are enough, a very smooth motion track is fitted, and after the information of the discrete points is published and subscribed through topics, the mechanical arm moves to the target point smoothly according to the planned points.

3. The cooperative robot system for multiple fire inspection patrolling based on integral reinforcement learning as claimed in claim 1, wherein: the sensing layer comprises infrared sensor obstacle avoidance and flame identification based on a fast convolutional neural network;

infrared sensor obstacle avoidance

Detecting obstacles encountered by the inspection robot in the inspection process in real time by using an infrared sensor, detecting Euclidean distances between the robot and the obstacles by using the infrared sensor when the obstacle exists at the front part, and calculating specific coordinates of the obstacles by using the distances and odometer and gyroscope data obtained from a DSP (digital signal processor); after the coordinates are obtained, an obstacle avoidance path is designed immediately through a control algorithm, the obstacle avoidance path is arc-shaped, a minimum distance between the obstacle avoidance path and an obstacle is required to be kept in the whole process, and after obstacle avoidance is finished, the obstacle avoidance path is required to return to the previously planned optimal routing inspection path immediately;

② flame identification based on fast convolution neural network

② 1: inputting a shot flame picture;

② 3.1: the feature mapping is entered into RPN, firstly, a series of region candidate suggestion frames are passed through, and then the suggestion frames are fed into two 1 × 1 convolution layers respectively, wherein the first convolution layer is used for region classification, namely, the intersection ratio IOU value of the suggestion frames is generated by calculation to distinguish positive and negative samples; the other is used for the regression judgment of the boundary box, and a more accurate target detection box is generated after non-maximization inhibition;

② 3.2: the feature mapping enters an ROI pooling layer for calculation of a subsequent network;

② 4: after the pooled feature maps pass through the full-connection layer, classifying the suggestion frames by utilizing softmax again, identifying whether the detection frames are objects or not, and performing boundary frame regression judgment on the suggestion frames again;

the specific method for generating the detection frame by the RPN is to slide on the input feature mapping through a sliding frame to generate 9 suggestion frames on each pixel point, and the size of the suggestion frames is 128²、256²、512²Aspect ratio of 1: 1. 1: 2. 2: 1, distinguishing positive and negative samples by utilizing the intersection ratio IOU values of the detection boxes, wherein the IOU value of the positive sample is more than 0.7, the IOU value of the negative sample is less than 0.3, and the proportion of the positive sample to the negative sample is set as 1: 1; aiming at different characteristics of flame in an image, a method of guiding anchoring is adopted to accelerate the detection speed of RPN, and the improved sparse anchoring strategy is as follows:

wherein x and y are coordinates of pixel points, F (x and y) represents the generated flame color mask, if the coordinates are 1, the pixel points generate a suggestion box, if the coordinates are 0, the pixel points do not generate, and m is_R(x，y)、m_G(x，y)、m_B(x, y) are image pixels, respectivelyRGB channel value, T, of a point_RIs a preset threshold value;

in addition, the principle of correcting the detection frame by utilizing boundary regression judgment is that an original suggestion frame A is mapped G to obtain a regression suggestion frame F which is closer to the real condition; this mapping G is obtained by translation and scaling:

firstly, translation: f_x＝A_w，d_x(A)+A_x (2-2)

F_y＝A_h.d_y(A)+A_y (2-3)

Rescaling: f_w＝A_w.exp(d_w(A)) (2-4)

F_h＝A_h.exp(d_h(A)) (2-5)

Wherein x, y, w, h respectively represent the center coordinates of the proposed box, width, height, d_x、d_y、d_w、d_hRespectively, the original frame A and the real frame F are in a transformation relation, and when the difference between the original frame A and the real frame F is not large, the transformation is regarded as linear;

the output is the probability of being identified as a flame.

4. The cooperative robot system for multiple fire inspection patrolling based on integral reinforcement learning as claimed in claim 1, wherein: the interaction layer is as follows: in the inspection process, pictures captured by the camera need to be sent to a control room and a mobile terminal in real time through a wireless network, corresponding APPs are developed in a matching manner, and the inspection robot is correspondingly controlled at a remote terminal so as to realize the inspection of an area to be inspected again by an operator; after the flame is detected, an alarm signal is immediately sent to a control room, and corresponding fire extinguishing measures can be immediately and automatically made; after fire extinguishing measures are implemented, if the fire condition is still not inhibited, the automatic mode is switched to the remote control mode, a professional in a control room takes over the control of the inspection robot comprehensively, the operation of the crawler belt and the action of the mechanical arm are controlled manually to realize the accurate extinguishing of a fire point, and whether the operations of cutting off a power supply, closing a gas valve and transferring inflammable matters are needed or not is judged according to the fire condition; each inspection robot can be connected with the whole fire fighting system in a grid mode, if the fire situation is still large after measures are taken, a request for taking over a fire fighting network is sent to a control room, a response is not made in one minute after the control room agrees or the fire fighting control room receives the response, a local spraying pipe network in a building is opened, meanwhile, a comprehensive fire fighting alarm is sent, and all fire fighting channels and emergency lighting facilities are opened; installing an emergency stop key at the top end of the robot; and after the fire is extinguished, marking the ignition point on the patrol inspection map as a key patrol inspection area.