CN113134187B - Multi-fire-fighting inspection cooperative robot system based on integral reinforcement learning - Google Patents

Multi-fire-fighting inspection cooperative robot system based on integral reinforcement learning Download PDF

Info

Publication number
CN113134187B
CN113134187B CN202110419574.2A CN202110419574A CN113134187B CN 113134187 B CN113134187 B CN 113134187B CN 202110419574 A CN202110419574 A CN 202110419574A CN 113134187 B CN113134187 B CN 113134187B
Authority
CN
China
Prior art keywords
robot
fire
inspection
mechanical arm
function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN202110419574.2A
Other languages
Chinese (zh)
Other versions
CN113134187A (en
Inventor
陈刚
刘智
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University
Original Assignee
Chongqing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University filed Critical Chongqing University
Priority to CN202110419574.2A priority Critical patent/CN113134187B/en
Publication of CN113134187A publication Critical patent/CN113134187A/en
Application granted granted Critical
Publication of CN113134187B publication Critical patent/CN113134187B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A62LIFE-SAVING; FIRE-FIGHTING
    • A62CFIRE-FIGHTING
    • A62C27/00Fire-fighting land vehicles
    • AHUMAN NECESSITIES
    • A62LIFE-SAVING; FIRE-FIGHTING
    • A62CFIRE-FIGHTING
    • A62C37/00Control of fire-fighting equipment
    • AHUMAN NECESSITIES
    • A62LIFE-SAVING; FIRE-FIGHTING
    • A62CFIRE-FIGHTING
    • A62C37/00Control of fire-fighting equipment
    • A62C37/50Testing or indicating devices for determining the state of readiness of the equipment

Landscapes

  • Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Business, Economics & Management (AREA)
  • Emergency Management (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
  • Manipulator (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention relates to a multi-fire-fighting inspection cooperative robot system based on integral reinforcement learning, and belongs to the field of robots. The system comprises a hardware layer, an interaction layer, a sensing layer and a control layer; and the hardware layer adopts a DSP as a controller, and sends data acquired by the odometer and the gyroscope into the DSP for processing, and calculates the position of the robot in the routing inspection map in real time. Sending a speed instruction to the DSP through the upper computer, and coding the obtained speed information by the DSP to control the operation of the servo motor; the fire-fighting inspection robot adopts crawler-type drive; when the mechanical arm needs to move, a ros system in an upper computer passes through a robot on a moveit! The platform plans a motion track of a target point to which the mechanical arm moves, discretizes the planned motion track and sends the discretized motion track to the DSP, and the DSP obtains the angular velocity and the acceleration of each shaft and then controls a servo motor of the mechanical arm to move so as to reach the target point.

Description

Multi-fire-fighting inspection cooperative robot system based on integral reinforcement learning
Technical Field
The invention belongs to the field of robots, and relates to a multi-fire-fighting inspection cooperative robot system based on integral reinforcement learning.
Background
The main structure of the current common fire-fighting inspection robot is as follows: wheel type driving is adopted in the aspect of driving; installing flame detectors and temperature sensors around the robot to facilitate the detection of fire; a camera is arranged in front of the robot so as to transmit the inspection picture to the control room through a wireless module; a fire-fighting nozzle with a fixed but rotatable chassis is also arranged above the robot and is used for externally connecting a water pipe or a small water pump to extinguish a fire point; in the aspect of robot control, along with the development of multi-machine cooperation thought and theory, in order to complete the inspection of a large area, meanwhile, in order to improve the inspection efficiency and reduce the inspection difficulty, a plurality of fire inspection robots are usually matched with each other to complete the operation, a centralized control mode is adopted on the cooperative control of the plurality of fire inspection robots, the inspection task allocation and the work scheduling of all robots are completed through a main control program, the specific inspection implementation mode is that a map constructed by using a laser radar and a planned inspection route are respectively led into the interior of each robot after area division, each robot automatically inspects a key area marked on the map according to the acquired planned route after being started, in addition, when some specific fire extinguishing or inspection operation needs to be finished remotely, the fire fighter can remotely operate the fire fighting or inspection operation through the remote controller.
However, the above system has many disadvantages, that is, the wheel-type driving makes the robot have poor passing performance in dealing with stairs and rugged road, and the flexibility of steering and rotating is not high enough; the accuracy and timeliness of flame detection by using the flame detector and the temperature sensor cannot be well guaranteed, and the range of flame detection is small; secondly, after the flame is detected, only the alarm function can be realized, and the position of a fire point and the image of the fire situation obtained by the camera are transmitted to a fire control room, a few fire inspection robots can also be matched with a fire nozzle carried by the robots to extinguish the fire point under the remote control of fire fighters, but the flexibility and the initiative in the fire situation responding aspect are lacked on the whole; finally, in the aspect of multi-robot cooperative control, each individual robot does not have the ability of selecting actions and coordinating with each other in a centralized control mode, so that the inspection efficiency, robustness and expandability of the whole system are poor, and the optimal time and energy of each robot in the inspection process cannot be guaranteed, so that the integral cruising ability and the disturbance resistance to the outside are reduced, and the autonomous control aspect and the intelligent degree are improved.
Disclosure of Invention
In view of this, the present invention provides a cooperative robot system for multiple fire inspection patrolling based on integral reinforcement learning.
In order to achieve the purpose, the invention provides the following technical scheme:
the multi-fire-fighting inspection cooperative robot system based on integral reinforcement learning comprises a hardware layer, an interaction layer, a sensing layer and a control layer;
and the hardware layer adopts a DSP as a controller, and sends data acquired by the odometer and the gyroscope into the DSP for processing, and calculates the position of the robot in the routing inspection map in real time. Sending a speed instruction to the DSP through the upper computer, and coding the obtained speed information by the DSP to control the operation of the servo motor; the fire-fighting inspection robot adopts crawler-type drive; when the mechanical arm needs to move, a ros system in an upper computer passes through a robot on a moveit! The platform plans a motion track of a target point to which the mechanical arm moves, discretizes the planned motion track and sends the discretized motion track to the DSP, and the DSP obtains the angular velocity and the acceleration of each shaft and then controls a servo motor of the mechanical arm to move so as to reach the target point.
1. Track drive system
The track is in two sections, each section being driven by a separate servo motor. The front-section crawler belt is used for lifting a chassis of the robot to smoothly pass through when encountering a higher obstacle, and the height of the robot is adjusted by adjusting the front-section crawler belt, so that a larger operation radius is provided for the mechanical arm; the second half section of the crawler belt mainly plays a role in driving the robot, is coaxially driven by a servo motor, and can be used for decelerating and braking the crawler belt on one side when the robot turns. The rated voltage of the servo motor is 24V, the output power is 100W, and the speed information of x and y axes issued by the upper PC is converted into the rotating speed of the servo motor after being coded by the DSP so as to realize steering and driving.
2. Mechanical arm servo control
A four-axis mechanical arm is arranged above the robot, a rotatable claw-shaped clamping device is arranged at the front section of the mechanical arm, and a fire extinguishing device is arranged on the clamping device; after the fire extinguishing device is additionally arranged, the fire extinguishing device is matched with the mechanical arm to realize accurate extinguishing of a fire point; the four-axis mechanical arm is driven by four servo motors to move each axis, and the motion information of each axis is calculated by a moveit | in an upper computer ros system! And generating after path planning.
Firstly, the calibration of the mechanical arm under the condition that eyes are out of hand is completed
And the coordinate transformation of the target point under the world coordinate system to the coordinate relative to the coordinate system of the mechanical arm is finished through a calibration mode of 'eyes are out of hands'. For the calibration mode of 'eyes are out of hand', a transformation matrix Tgc from a manipulator base coordinate system Tg to a camera coordinate system Tc is constant, a transformation matrix Tbe from a calibration plate coordinate system Tb to a manipulator tail end coordinate system Te is constant, and the relation of coordinate transformation satisfies the following formula:
for the ith time: tbci=Tbe*Tegi*Tgc (1-1)
Time i + 1: tbci+1=Tbe*Tegi+1*Tgc (1-2)
Finishing to obtain: (Teg)i)-1*(Teg)i+1=Tgc*(Tbc)i -1*(Tbc)i+1*Tgc-1 (1-3)
Then A ═ Tegi)-1*(Teg)i+1Is the motion relationship of the object relative to the robot arm tip coordinate system Te.
② utilize moveit! Completing the planning of the motion trail of the mechanical arm
Using Moveit! And combining the independent functional components for controlling the mechanical arm, and then providing the combined functional components for a user to use in an action and service communication mode in the ROS. In moveit! In the method, a URDF model is created according to the real size and the number of axes of the mechanical arm, and after the URDF model is input, the model is used for calculating the weight of the mechanical arm according to the weight of the mechanical arm! The setup authority generates a corresponding configuration file according to the setting of the setup authority, and the configuration file comprises a collision matrix of the mechanical arm so as to avoid collision between the axes caused by the planned track, connection information of each joint, a defined initial position and the like. And then adding a control plug-in controller of the mechanical arm, wherein the control controller comprises a defined follow _ join _ project node and names of all axes, finally writing a program to realize that the PC is connected with the mechanical arm in a socket communication mode, and observing a real-time motion track of the mechanical arm in rviz by subscribing to a join _ state topic. The method comprises the steps of firstly, completing flame identification and detection through a fast convolution neural network, obtaining three-dimensional coordinates of an ignition point relative to a robot through point cloud data of a depth camera after successful identification, then obtaining the position of the tail end of a mechanical arm to be reached through TF coordinate change, and then completing the solution of a track through an algorithm integrated in the interior. The solved track information is composed of a large number of discrete points, and the track information includes angular velocity and angular acceleration of each axis to reach the points. When the solved points are enough, a very smooth motion track is fitted, and after the information of the discrete points is published and subscribed through topics, the mechanical arm moves to the target point smoothly according to the planned points.
The sensing layer is used for a laser radar for establishing a picture, an infrared sensor for avoiding obstacles, a flame detector for detecting flame, a temperature sensor, a realsense D435i depth camera, an odometer and a gyroscope.
Infrared sensor obstacle avoidance
The infrared sensor is used for detecting obstacles encountered by the inspection robot in the inspection process in real time, when the obstacle exists at the front side, the infrared sensor detects Euclidean distances between the robot and the obstacles, and the specific coordinates of the obstacle are calculated by the distances and odometer and gyroscope data obtained from the DSP. And after the coordinates are obtained, immediately designing an obstacle avoidance path by a control algorithm, wherein the obstacle avoidance path is arc-shaped and is required to keep a minimum distance from the obstacle in the whole process, and after the obstacle avoidance is finished, immediately returning to the previously planned optimal routing inspection path.
② flame identification based on fast convolution neural network
The flame characteristics are extracted and detected by adopting a fast convolutional neural network Faster R-CNN, and the method comprises the following steps:
② 1: inputting a shot flame picture;
2.: sending the picture into a Convolutional Neural Network (CNN) for feature extraction;
② -3: after feature extraction, performing feature mapping, wherein the feature mapping jointly acts on a subsequent full-connection layer and a subsequent region to generate a network RPN;
② 3.1: the feature mapping is entered into RPN, firstly, a series of region candidate suggestion frames are passed through, and then the suggestion frames are fed into two 1 × 1 convolution layers respectively, wherein the first convolution layer is used for region classification, namely, the intersection ratio IOU value of the suggestion frames is generated by calculation to distinguish positive and negative samples; and the other is subjected to non-maximization inhibition to generate a more accurate target detection frame due to the boundary box regression judgment.
② 3.2: the feature map is entered into the ROI pooling layer for subsequent network computations.
② 4: and after the pooled feature maps pass through the full-connection layer, classifying the suggestion frames by utilizing softmax again, identifying whether the detection frames are objects or not, and performing boundary frame regression judgment on the suggestion frames again.
The specific method for generating the detection frame by the RPN is to slide on the input feature mapping through a sliding frame to generate 9 suggestion frames on each pixel point, and the size of the suggestion frames is 1282、2562、5122Aspect ratio of 1: 1. 1: 2. 2: 1, distinguishing positive and negative samples by using intersection ratio of the detection frames and IOU intersection ratio, wherein the IOU value of the positive sample is more than 0.7, the IOU value of the negative sample is less than 0.3, and the proportion of the positive sample to the negative sample is set as 1: 1. aiming at different characteristics of flame in an image, a method of guiding anchoring is adopted to accelerate the detection speed of RPN, and the improved sparse anchoring strategy is as follows:
Figure BDA0003027378310000041
wherein x and y are coordinates of pixel points, F (x and y) represents the generated flame color mask, if the coordinates are 1, the pixel points generate a suggestion box, if the coordinates are 0, the pixel points do not generate, and m isR(x,y)、mG(x,y)、mB(x, y) are the RGB channel values, T, of the image pixels, respectivelyRIs a threshold value set in advance.
In addition, the principle of correcting the detection frame by utilizing boundary regression judgment is that an original suggestion frame A is mapped G to obtain a regression suggestion frame F which is closer to the real condition. This mapping G is obtained by translation and scaling:
firstly, translation: fx=Aw·dx(A)+Ax (2-2)
Fy=Ah·dy(A)+Ay (2-3)
Rescaling: fw=Aw·exp(dw(A)) (2-4)
Fh=Ah·exp(dh(A)) (2-5)
Wherein x, y, w, h respectively denote the suggestion boxesCenter coordinate, width, height, dx、dy、dw、dhRespectively, a transformation relationship, and when the difference between the original frame a and the real frame F is not large, the transformation is regarded as linear.
The output is the probability of being identified as a flame.
The interaction layer is as follows: in the inspection process, pictures captured by the camera in real time need to be sent to the control room and the mobile terminal through a wireless network, corresponding APP is developed in a matched mode, the inspection robot is controlled correspondingly at the remote terminal, and accordingly the inspection of an area needing to be inspected again by an operator is achieved. After the flame is detected, an alarm signal is sent to the control room immediately, and corresponding fire extinguishing measures can be automatically taken immediately. After the fire extinguishing measures are implemented, if the fire condition can not be restrained, the automatic mode is switched to the remote control mode, a professional in a control room takes over the control of the inspection robot comprehensively, the operation of the crawler belt and the action of the mechanical arm are controlled manually to realize the accurate extinguishing of a fire point, and whether the operations of cutting off a power supply, closing a gas valve and transferring inflammable matters need to be carried out or not is judged according to the fire condition. Each inspection machine can be connected with the whole fire fighting system in a grid mode, if the fire situation is still large after measures are taken, a request for taking over a fire fighting network is sent to a control room, the control room agrees or does not respond within one minute, a local spraying pipe network in a building is opened, meanwhile, a comprehensive fire fighting alarm is sent, and all fire fighting channels and emergency lighting facilities are opened. And an emergency stop button is arranged at the top end of the robot. And after the fire is extinguished, marking the ignition point on the patrol inspection map as a key patrol inspection area.
The control layer is as follows:
the whole fire-fighting inspection area is provided with N robots for cooperative inspection, and the N robots are respectively positioned at initial positions (x)i0,yi0) To reach respective destinations (x)iD,yiD) And i belongs to {1, 2.. eta., N }, and the position L of the ith fire-fighting inspection robot at the moment t is seti(t)=[Lix(t),Liy(t)]TVelocity Vi(t)=[Vix(t),Viy(t)]TController input Ui(t)=[uix(t),uiy(t)]TControl input and unknown environmental disturbance Wi(t)=[Wix(t),Wiy(t)]TIn order to avoid saturation of the actuator, the input is constrained, and | U (t) | is required to be less than or equal to λ, wherein λ is a normal number. Set the distance r between two inspection robotsij(t)=||Li(t)-Lj(t) | |, a safety distance r is set for avoiding collision of two inspection robotssAnd r is required to be satisfied at any time in the inspection processij(t)≥rsAnd when N robots reach the inspection destination, ensuring rij(t)>>rsIn this case, i ≠ j.
Then consider the second order linear dynamics model of the ith fire inspection robot as:
Figure BDA0003027378310000051
wherein the system matrix is A, the input matrix is B, the output matrix is C, the interference matrix is D,
Figure BDA0003027378310000052
is the state of the robot at the time t,
Figure BDA0003027378310000053
to input, yi(t) is the system unique output.
The global dynamics model is written as:
Figure BDA0003027378310000054
wherein
Figure BDA0003027378310000055
Is the product of Kronecker, X (t) ═ x1(t),x2(t),...,xn(t)]T,Y(t)=[y1(t),y2(t),...,yn(t)]T,INIs an N-order identity matrix, and is set to L (t) [ < L >1t,L2t,...,LNt]T,LD=[L1D,L2D,...,LND]T,U0=[U1,U2,...,UN]TThe positions at time t, the target point positions and the control inputs of the N robots are respectively.
In order to achieve optimal control of the N fire inspection robots under unknown disturbances with minimum time and energy in continuous time, continuous state and control input space and to avoid collisions in the whole process, the following cost functions are considered:
Figure BDA0003027378310000056
wherein ζ>And 0 is used for representing the specific gravity of time in the routing inspection process, and R is a positive definite matrix. In order to solve the path planning problem that the minimum arrival time T of the robot is unknown, a hyperbolic tangent function is introduced to rewrite a cost function into an infinite integral form so as to solve the problem, in addition, in order to avoid the saturation of an actuator, the input is also required to be constrained, so that the common U (T)TRu (t) linear quadratic rewrite to non-quadratic performance function phi (u (t)) is used to approximate the minimum energy cost and capture input constraints, and to avoid collisions between two robots introducing artificial potential field functions, the cost function is approximately rewritten as:
V(X(t),U(t))=∫t ζtanh(L(t)-LD)T(L(t)-LD)+φ(U(t))+ΛR(t)TfR(rij(t))dt (4-4)
where ζ is a normal number, tanh is a hyperbolic tangent function, which is a monotonically increasing odd function and continuously differentiable, and the cost function is an IRL-solvable form. Rewriting ζ to ζ tanh (L (t) -L)D)T(L(t)-LD) When the current position L (t) of the robot is away from the target point LDTime ζ tanh (L (t) -L)D)T(L(t)-LD) Approximately ζ, ξ tanh (L: (L) (L))t)-LD)T(L(t)-LD) The T integral of unknown time is converted to an infinite integral independent of the arrival time T to achieve an optimal solution to the value function.
Will U (t)TRu (t) linear quadratic rewrite to non-quadratic performance function phi (u (t) is used to approximate the minimum energy cost and capture the input constraints:
Figure BDA0003027378310000061
wherein input constraint is | U (t) | is less than or equal to λ, λ and σ are both normal numbers, and R ═ diag (R)1,r2...rm)>0。
Adding an artificial potential field function f to avoid collision of any pair of inspection robotsR(rij(t)) the two robots are caused to emit repulsive potential fields to avoid each other, and in order to make V (x (t), U (t)) after the potential field function is added bounded, a weight matrix Λ is designedR(t) for canceling the non-zero tail. Will reject function fR(rij(t)) defines the form of a gaussian function, and this gaussian function is always greater than 0:
Figure BDA0003027378310000062
where the larger s the greater the steepness of the rejection function and the larger σ the larger the rejection range. To capture the repulsive distance rij(t) solving for s and σ in the rejection function, provided with:
fR(rs)=K0;fR(rs+Δ)=K1 (4-7)
wherein 0 < K1<K0Less than 1; Δ is a positive increment, and is substituted to obtain:
Figure BDA0003027378310000063
by a weight matrix ΛR(t)=[Λ12(t),Λ13(t),...,ΛN-1,N(t)]TSo that the value function after introducing the artificial potential field function is bounded and the weight matrix depends on the distance to the target point.
ΛR(t)=βtanh(||Li(t)-LiD||2+||Lj(t)-LjD||2) (4-9)
Lambda when robot principle targets pointsR(t) ═ β, Λ when the robot reaches the target pointRAnd (t) is 0, beta is a collision coefficient, and the size of beta is determined by the importance of avoiding collision in the inspection process.
Solving the optimal control input by using the cost function in (4-4), and carrying out derivation on t on two sides of the formula (4-4) and writing the Bellman equation as follows:
V(x(t)),U(t))=-ζtanh(L(t)-LD)T(L(t)-LD)-φ(U(t))-ΛR(t)TfR(rij(t)) (4-10)
let Fζ(t)=ζtanh(L(t)-LD)T(L(t)-LD) Defining the optimum function as:
V*(X(t),U(t))=min∫t Fζ(t)+φ(U(t))+ΛR(t)TfR(rij(t))dt (4-11)
defining the HJB equation according to equation (4-10) as:
Figure BDA0003027378310000071
wherein
Figure BDA0003027378310000072
Under stable conditions have
Figure BDA0003027378310000073
(4-12) deriving U on both sides of the formula:
Figure BDA0003027378310000074
obtaining the optimal control input u after term shifting*Comprises the following steps:
Figure BDA0003027378310000075
substituting (4-14) into (4-5) to obtain:
Figure BDA0003027378310000076
wherein l is a column vector of all one, and substituting (4-14) into (4-15) to obtain:
Figure BDA0003027378310000077
wherein
Figure BDA0003027378310000078
Substituting (4-16) into (4-12) to obtain:
Figure BDA0003027378310000079
and solving the HJB equation by utilizing a strategy iterative algorithm based on the integral reinforcement learning, wherein the integral reinforcement learning uses signals in (T, T + T) for learning, and a specific dynamic model of the system is not required to be known.
Firstly, the value function is rewritten into the form of integral difference value, and the following Bellman equation is obtained:
Figure BDA00030273783100000710
in order to solve (4-18) online in real time, an operator-critic neural network algorithm is introduced to realize real-time updating in the strategy iteration process. The value function V (X) is first approximated by a critic neural network, since
Figure BDA00030273783100000711
Wherein the first term is quadratic form easy to obtain, only the second term is approximated, and
Figure BDA00030273783100000712
using neural network pair V0(X) approximating:
Figure BDA0003027378310000081
wherein wcIs the weight, psi, of the critical neural networkc(X) is a basis function, εc(X) is the approximation error;
differentiating X on two sides of (4-20) to obtain:
Figure BDA0003027378310000082
substituting (4-20) into (4-18) results in a new bellman equation:
Figure BDA0003027378310000083
wherein epsilone(t)=εc(X(t+T))-εc(X (t)) is the error of the Bellman equation,. DELTA.. psic(X(t)=ψc(X(t+T)-ψc(X(t)。
To determine wcThe (4-20) is rewritten as:
Figure BDA0003027378310000084
wherein
Figure BDA0003027378310000085
Is a V0Approximation of (X)The value of the one or more of the one,
Figure BDA0003027378310000086
for ideal approximation coefficients, then (4-22) is:
Figure BDA0003027378310000087
order to
Figure BDA0003027378310000088
For Bellman tracking errors and constructing an objective function by making εe(t) minimizing to adjust the weighting coefficients of the critic neural network:
Figure BDA0003027378310000089
the two sides of the formula (4-25) are aligned
Figure BDA00030273783100000810
Derivation, obtained by the chain rule:
Figure BDA00030273783100000811
wherein beta isc>0 is the learning rate, and 0 is the learning rate,
Figure BDA00030273783100000812
Figure BDA00030273783100000813
is delta psicAn approximation of (d).
Will EeSubstituting into (4-26) to obtain weight coefficient of criticc neural network
Figure BDA00030273783100000814
Should be subject to:
Figure BDA00030273783100000815
the obtained ideal weight coefficient is substituted into (4-14) to obtain an optimal control strategy, however, the optimal strategy obtained through the value function of critic approximation cannot ensure the stability of a closed-loop system, and an actor neural network is introduced into an actuator to ensure convergence to an optimal solution and ensure the stability of the system:
Figure BDA00030273783100000816
Figure BDA00030273783100000817
is the optimal approximation coefficient of the actor neural network,
Figure BDA00030273783100000818
is determined by the following lyapunov function:
Figure BDA00030273783100000819
when w isaWhen the following formula is satisfied, the approximated strategy makes the system consistent and finally bounded, and the system is finally bounded by
Figure BDA00030273783100000820
To obtain U*(t)。
Figure BDA00030273783100000821
Wherein K1,K2In order to design a good normal number,
Figure BDA00030273783100000822
based on the expressions (4-19), (4-27), (4-28) and (4-30), the critic algorithm and the actor algorithm are respectively utilized to realize synchronous updating of the value function and the strategy function, and an online integral reinforcement learning algorithm based on strategy iteration is designed to solve the HJB equation so as to solve the optimal control input.
The algorithm is as follows: online IRL algorithm based on strategy iteration
Initialization: given a feasible actuator input
Figure BDA0003027378310000091
Step 1: policy evaluation, given initialization
Figure BDA0003027378310000092
Solving by
Figure BDA0003027378310000093
Figure BDA0003027378310000094
Step 2: improvement of the strategy is that
Figure BDA0003027378310000095
Substituting the following formula for updating
Figure BDA0003027378310000096
Figure BDA0003027378310000097
Step 3: order to
Figure BDA0003027378310000098
Go back to Step1 until
Figure BDA0003027378310000099
Converging to a minimum value.
The invention has the beneficial effects that:
1. the invention adopts a distributed control mode in the multi-fire-fighting inspection cooperative robot system, so that the autonomy, flexibility, reliability and response speed of each robot under the system are improved.
2. The four-axis mechanical arm is designed at the top of each fire inspection robot, the mechanical arm is matched with a specially-made fire extinguisher, so that an ignition point can be automatically and accurately put out after a fire is discovered, the mechanical arm can be remotely and manually controlled by a fireman to complete the operations of turning off a power switch, turning off a gas valve, removing combustible materials and the like, and the initiative and the operability after the fire is discovered are obviously improved.
3. The invention provides an improved fast convolutional neural network based on visual recognition to finish the recognition and detection of flame by matching with the picture acquired by a depth camera realsense D435i for more accurately recognizing flame and reducing the false alarm rate, and simultaneously introduces a method of guiding anchoring to improve the RPN detection speed in the fast convolutional neural network.
4. The approximation function designed in the controller algorithm can convert the finite integral of the minimum arrival time T unknown in the optimal path planning problem into an infinite integral form so as to be convenient for solving, and introduces a non-quadratic performance function for approximating the minimum energy cost and capturing input constraint.
5. The invention introduces an artificial potential field function to avoid collision among robots in the inspection process of the multi-fire inspection cooperative robot system, and designs a special weight coefficient matrix to offset the non-zero tail.
6. The invention uses the integral reinforcement learning algorithm in the multi-robot control algorithm to solve the problem that the matrix of the inspection robot system is unknown, and uses the critic and actor neural network algorithm to synchronously iterate and solve the Bellman equation on line in real time to obtain the optimal strategy, thereby obviously improving the inspection efficiency and the robustness of the multi-fire-fighting inspection robot system.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention may be realized and attained by the means of the instrumentalities and combinations particularly pointed out hereinafter.
Drawings
For the purposes of promoting a better understanding of the objects, aspects and advantages of the invention, reference will now be made to the following detailed description taken in conjunction with the accompanying drawings in which:
FIG. 1 is a bottom level diagram of hardware;
FIG. 2 is a schematic diagram of coordinate transformation;
FIG. 3 is a flow chart of motion trajectory generation;
FIG. 4 is a flow chart of obstacle avoidance for the fire inspection robot;
FIG. 5 is a fast convolutional neural network training process;
FIG. 6 is a fire inspection robot interaction structure;
FIG. 7 is an overall structure diagram of the fire inspection robot;
FIG. 8 is a schematic diagram of a multi-fire inspection cooperative robot system inspection;
FIG. 9 is a flowchart of the operation of the robotic arm to extinguish a fire;
fig. 10 is a flow chart of the work of the fire inspection robot.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention in a schematic way, and the features in the following embodiments and examples may be combined with each other without conflict.
Wherein the showings are for the purpose of illustrating the invention only and not for the purpose of limiting the same, and in which there is shown by way of illustration only and not in the drawings in which there is no intention to limit the invention thereto; to better illustrate the embodiments of the present invention, some parts of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The same or similar reference numerals in the drawings of the embodiments of the present invention correspond to the same or similar components; in the description of the present invention, it should be understood that if there is an orientation or positional relationship indicated by terms such as "upper", "lower", "left", "right", "front", "rear", etc., based on the orientation or positional relationship shown in the drawings, it is only for convenience of description and simplification of description, but it is not an indication or suggestion that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and therefore, the terms describing the positional relationship in the drawings are only used for illustrative purposes, and are not to be construed as limiting the present invention, and the specific meaning of the terms may be understood by those skilled in the art according to specific situations.
Aiming at each single fire inspection tour robot, the fire inspection tour robot is provided with a depth camera realsense D435i for finding the fire quickly and accurately on the basis of matching with a flame detector and a temperature sensor, the depth camera can realize the identification of the fire at a longer distance by extracting the characteristics of a scene, and the identification accuracy and the rapidity are improved compared with the sensors. Meanwhile, the depth camera transmits the inspection image to the main control room and the mobile terminal in real time, so that the inspection image can be observed by a controller conveniently, and control instructions sent by the control room and the mobile terminal can be received at any time. The inspection robot sends an alarm signal to the main control room immediately after finding a fire, but the alarm signal is far from enough, so that in order to improve the processing capacity of the inspection robot after finding the fire, a four-shaft mechanical arm is also assembled above the robot, and the front end of the mechanical arm is provided with a clamping jaw which is arranged to be beneficial to adding subsequent equipment; after the fire is discovered, under the remote control of fire fighters, the work of cutting off the power supply, removing a gas valve and combustible materials and the like can be completed through the mechanical arm under the necessary condition. In addition, a specially-made fire extinguishing device (such as a specially-made small fire extinguisher) can be arranged at the position of the clamping jaw of the mechanical arm to match with the mechanical arm to realize accurate extinguishing of an ignition point, so that the fire is prevented from spreading to the greatest extent, and greater economic loss is caused.
On the cooperative control of the multiple fire-fighting inspection robots, the multiple robots are required to complete the optimal online path planning of the minimum arrival time T unknown under the conditions of collision avoidance, constraint on input of an actuator and unknown external disturbance in the inspection process, in addition, the inspection efficiency, robustness and expandability of the whole system are ensured, and the robots cannot collide in the whole inspection process.
In order to meet the requirements, the software and hardware design scheme of the invention is as follows:
the novel multi-fire-fighting inspection cooperative robot system designed by the invention adopts the idea of layered design and respectively comprises a hardware layer, an interaction layer, a sensing layer and a control layer, wherein the first part to the third part introduce the specific software and hardware structure of each robot under the whole multi-fire-fighting inspection cooperative robot system, and the fourth part introduces the specific control algorithm realization of the multi-fire-fighting inspection cooperative robot system.
Hardware layer design of first part fire-fighting inspection robot
The hardware layer is used as a controller by the DSP, data collected by the odometer and the gyroscope are sent into the DSP to be processed, and the position of the robot in the routing inspection map can be calculated in real time. Sending a speed instruction to the DSP through the upper computer, and coding the obtained speed information by the DSP to control the operation of the servo motor; the fire inspection robot adopts a crawler-type drive, and aims to improve the passing capacity (such as stairs) and the steering flexibility of a complex road section of the fire inspection robot. When the mechanical arm needs to move, a ros system in an upper computer passes through a robot on a moveit! The platform plans a motion track of a target point to which the mechanical arm moves, discretizes the planned motion track and sends the discretized motion track to the DSP, and the DSP obtains the angular velocity and the acceleration of each shaft and then controls a servo motor of the mechanical arm to move so as to reach the target point.
The bottom design plan of the hardware layer is shown in fig. 1.
1. Track drive system
In order to adapt to various inspection environments and improve the flexibility and the trafficability characteristic in the inspection process, the inspection robot adopts crawler-type driving. The track structure is designed into two sections, and each section is driven by a separate servo motor. The front-section crawler belt is mainly used for lifting a chassis of the robot to smoothly pass through when encountering a higher obstacle, and can also be used for adjusting the height of the robot by adjusting the front-section crawler belt so as to provide a larger operating radius for the mechanical arm; the second half section of the crawler belt mainly plays a role in driving the robot, is coaxially driven by a servo motor, and can be used for decelerating and braking the crawler belt on one side when the robot turns. The rated voltage of the servo motor is 24V, the output power is 100W, and the speed information of x and y axes issued by the upper PC is converted into the rotating speed of the servo motor after being coded by the DSP so as to realize steering and driving.
2. Mechanical arm servo control
In order to improve the processing capacity of the inspection robot when finding a fire, a four-axis mechanical arm is arranged above the robot. A rotatable claw-shaped clamping device is installed at the front section of the mechanical arm, and a specially-made small fire extinguishing device (such as a fire extinguisher, a small water pump and the like) can be installed on the clamping device according to specific requirements. After the fire extinguishing device is additionally arranged, the fire extinguishing device can be matched with the mechanical arm to realize accurate extinguishing of a fire point; the fire extinguishing device is not additionally arranged, and whether a mechanical arm is manually controlled by a fire fighter to cut off a local power supply, close a gas valve, remove surrounding inflammable matters, close a fire door and the like is determined according to the fire degree when a fire is found, so that the fire spread is prevented to the maximum extent, and the economic loss is reduced. The four-axis mechanical arm is driven by four servo motors to move each axis, and the motion information of each axis is calculated by a moveit | in an upper computer ros system! And generating after path planning.
Firstly, the calibration of the mechanical arm under the condition that eyes are out of hand is completed
The coordinate transformation of the target point under the world coordinate system To the coordinate relative To the coordinate system of the mechanical arm is completed by a calibration form of Eye-To-Hand (Eye-Hand). For the calibration mode of 'eyes are out of hand', a transformation matrix Tgc from a manipulator base coordinate system Tg to a camera coordinate system Tc is constant, a transformation matrix Tbe from a calibration plate coordinate system Tb to a manipulator tail end coordinate system Te is constant, and the relation of coordinate transformation satisfies the following formula:
for the ith time: tbci+1=Tbe*Tegi+1*Tgc (1-1)
Time i + 1: tbci+1=Tbe*Tegi+1*Tgc (1-2)
Finishing to obtain: (Teg)i)-1*(Teg)i+1=Tgc*(Tbc)i -1*(Tbc)i+1*Tgc-1 (1-3)
Then A ═ Tegi)-1*(Teg)i+1Is the motion relationship of the object relative to the robot arm tip coordinate system Te.
A schematic diagram of the coordinate transformation is shown in fig. 2.
② utilize moveit! Completing the planning of the motion trail of the mechanical arm
The ros (robot Operating system) is an Operating system specially used for realizing the control of the robot system, and can be developed in the Linux environment, and is especially suitable for a complex and multi-node control system such as a robot due to its simple operation mode, powerful function and strong expandability. In the control of the mechanical arm, a special integrated tool is arranged in an ROS system for completing the motion trail planning of the mechanical arm, which is moveit! . Moveit! May be considered an "integrator" by which the individual functional components controlling the robotic arm may be combined and then made available to the user through action and service communication in the ROS. In moveit! In the method, a model (URDF model) which is in accordance with the real size and the number of axes of the mechanical arm is created, and after the model is input, the moveit! The setup authority generates a corresponding configuration file according to the setting of the setup authority, and the configuration file comprises a collision matrix of the mechanical arm so as to avoid collision between the axes caused by the planned track, connection information of each joint, a defined initial position and the like. And then adding a control plug-in (controller) of the mechanical arm, wherein the controller mainly comprises a follow _ join _ project node and names for setting various axes, and finally writing a program to realize that the PC is connected with the mechanical arm in a socket communication mode, and observing a real-time motion track of the mechanical arm in rviz by subscribing to a join _ state topic. The method comprises the steps of firstly completing flame identification and detection through a fast convolution neural network, obtaining a three-dimensional coordinate of an ignition point relative to a robot through point cloud data of a depth camera after successful identification, obtaining a position where the tail end of a mechanical arm needs to reach through TF coordinate change, and then immediately completing track solving through an internally integrated algorithm (usually adopting cubic spline interpolation). The solved track information is composed of a large number of discrete points whose information includes the angular velocity and angular acceleration of each axis to which the point is to be reached. When the solved points are enough, a smooth motion track can be fitted, and the mechanical arm can smoothly move to a target point according to the planned points after the information of the points is published and subscribed through topics. Moveit! A flow chart for generating a motion trajectory is shown in fig. 3.
Second part fire-fighting inspection robot sensing layer design
The sensing layer design of the fire-fighting inspection robot mainly comprises a laser radar for drawing construction, an infrared sensor for obstacle avoidance, a flame detector for detecting flame, a temperature sensor, a realsense D435i depth camera, an odometer, a gyroscope and the like.
Infrared sensor obstacle avoidance
The infrared sensor is used for detecting the obstacles encountered by the inspection robot in the inspection process in real time, when the obstacle exists at the front side, the infrared sensor can detect the Euclidean distance between the robot and the obstacle, and the specific coordinates of the obstacle can be calculated by the distance and the odometer and gyroscope data obtained from the DSP. After the coordinates are obtained, an obstacle avoidance path can be designed immediately through a control algorithm, the obstacle avoidance path is usually arc-shaped, a minimum distance between the obstacle avoidance path and an obstacle is required to be kept in the whole process, and after obstacle avoidance is finished, the previously planned optimal routing inspection path needs to be returned immediately. The obstacle avoidance flow chart is shown in fig. 4.
② flame identification based on fast convolution neural network
In the inspection process, the detection of the flame is particularly critical, and along with the rapid development of computer technology, the flame is detected more rapidly and accurately by using vision than a fixed flame detector. However, since there are many objects similar to the flame in color in the inspection scene and the shape and texture of the flame are various, it is a difficult task to detect the position of the flame in the image. The invention adopts the fast convolutional neural network (Faster R-CNN) to extract and detect the flame characteristics, not only can accurately identify the flame, but also can accurately calculate the position where the flame is generated, and can reduce the false alarm rate of flame detection to the greatest extent.
The training steps of the fast convolutional neural network are as follows:
secondly-1, inputting the shot flame picture;
secondly, sending the picture into a Convolutional Neural Network (CNN) for feature extraction;
secondly-3, after feature extraction, feature maps (featuremaps) are used, and the feature maps jointly act on a subsequent full connection layer and an RPN (region generation network);
secondly-3.1 feature mapping into RPN, firstly, through a series of region candidate suggestion boxes, namely anchors (anchors), feeding the suggestion boxes into two 1 × 1 convolution layers respectively, wherein the first convolution layer is used for region classification, namely, positive and negative samples are distinguished by calculating IOU (intersection ratio) values for generating the suggestion boxes; and the other is subjected to non-maximization inhibition to generate a more accurate target detection frame due to the boundary box regression judgment.
And the-3.2 characteristic mapping enters the ROI pooling layer for subsequent network calculation.
And 4, after the pooled feature maps pass through a full connection layer, classifying the suggestion boxes by utilizing softmax again, namely identifying whether the detection boxes are objects or not, and performing boundary box regression judgment on the suggestion boxes again in order to further improve the accuracy of the target detection boxes.
A schematic diagram of the training process is shown in fig. 5.
The above steps using the RPN to generate the detection box (anchors) is the biggest advantage of FasterR-CNN compared with the traditional detection algorithm. The specific method for generating the detection frame by the RPN is to slide the input feature mapping through a sliding frame to generate 9 suggestion frames on each pixel pointThe size of these suggestion boxes may be 1282、2562、5122Aspect ratio of 1: 1. 1: 2. 2: 1, and distinguishing positive and negative samples by using an intersection ratio (IOU) of the detection boxes, wherein the IOU value of the positive sample is more than 0.7, the IOU value of the positive sample is less than the negative sample of 0.3, and the proportion of the positive sample to the negative sample is set as 1: 1. however, the number of the suggested frames drawn by such a method is still large, so that the method of the invention can adopt a guided anchoring method to accelerate the detection speed of the RPN according to different characteristics of flames in the image, and the improved sparse anchoring strategy is as follows:
Figure BDA0003027378310000141
wherein x and y are coordinates of pixel points, F (x and y) represents the generated flame color mask, if the coordinates are 1, the pixel points generate a suggestion box, if the coordinates are 0, the pixel points do not generate, and m isR(x,y)、mG(x,y)、mB(x, y) are the RGB channel values, T, of the image pixels, respectivelyRIs a threshold value set in advance.
In addition, the principle of correcting the detection frame by using boundary regression judgment (bounding box regression) is to map G the original suggestion frame A to obtain a regression suggestion frame F closer to the real situation. This mapping G can be obtained by translation and scaling:
firstly, translation: fx=Aw.dx(A)+Ax (2-2)
Fy=Ah.dy(A)+Ay (2-3)
Rescaling: fw=Aw.exp(dw(A)) (2-4)
Fh=Ah.exp*dh(A)) (2-5)
Wherein x, y, w, h respectively represent the center coordinates of the proposed box, width, height, dx、dy、dw、dhFor the transformation relations we are looking for, respectively, when the original frame a and the real frame F are not very different, the transformation can be generally considered to be linear.
The output is the probability of being identified as a flame.
Interaction layer design of third part fire-fighting inspection robot
In the inspection process, pictures captured by the camera in real time need to be sent to the control room and the mobile terminal through a wireless network, corresponding APPs are developed in a matched mode, inspection pictures and alarm signals can be received at terminals such as a PC (personal computer), a web, a mobile phone and a pad at any time and any place, and the inspection robot can be controlled correspondingly at a remote terminal, so that the inspection of an area needing to be inspected again by an operator is realized. After the flame is detected, an alarm signal should be sent to the control room immediately and corresponding fire extinguishing measures should be automatically taken immediately. After the fire extinguishing measures are implemented, if the fire condition is still not restrained, the automatic mode can be immediately switched to the remote control mode, a professional in a control room comprehensively takes over the control of the inspection robot, the operation of the crawler and the action of the mechanical arm are manually controlled to accurately extinguish a fire point, and whether the operations of cutting off a power supply, closing a gas valve, transferring inflammable matters and the like are needed or not is judged according to the fire condition. In addition, each inspection machine can be connected with the whole fire fighting system in a grid mode, if the fire condition is still large after measures are taken, a request for taking over a fire fighting network can be sent to the control room, the control room agrees or the fire fighting control room does not respond within one minute, a local spraying pipe network in the building can be opened, meanwhile, a comprehensive fire fighting alarm is sent, all fire fighting channels and emergency lighting facilities are opened, and therefore property loss and casualties are reduced to the maximum degree, and precious time is won for rescue. Meanwhile, in order to avoid sudden failures of the inspection robot in the inspection process, an emergency stop button is installed at the top end of the robot, and surrounding personnel are prevented from being injured. After the fire is extinguished, the ignition point is marked as a key inspection area on the inspection map so as to facilitate later inspection. The schematic diagram of the interactive structure of the fire inspection robot is shown in fig. 6.
Fourth part multi-fire-fighting inspection cooperative robot system control algorithm
Because the common fire-fighting inspection task is cooperatively completed by a plurality of robots, and the optimal path planning under the minimum arrival time in the inspection process is required to be realized in the whole multi-robot control process, the inspection range can be comprehensively covered, and the endurance time of the multi-robot inspection system can be ensured; and the interference usually existing to the inspection environment in the inspection process is unknown. In addition, in order to avoid saturation of the actuator, the input of the actuator is generally required to be constrained; meanwhile, for the sake of safety, the robots cannot collide with each other in the whole inspection process. Aiming at the control requirements of the multi-fire-fighting inspection cooperative robot system, a minimum arrival time T, unknown disturbance to the outside, unknown model of a system part and constraint on input are required to be designed, the robots are required to avoid collision, and in addition, accurate external information is difficult to be adopted under the actual condition, so that offline solution is changed into online solution, and the optimal controller based on integral reinforcement learning and an AC neural network algorithm is designed.
The whole fire-fighting inspection area is provided with N robots for cooperative inspection, and the N robots are respectively positioned at initial positions (x)i0,yi0) To reach respective destinations (x)iD,yiD) And i belongs to {1, 2.. eta., N }, and the position L of the ith fire-fighting inspection robot at the moment t is seti(t)=[Lix(t),Liy(t)]TVelocity Vi(t)=[Vix(t),Viy(t)]TController input Ui(t)=[uix(t),uiy(t)]TControl input and unknown environmental disturbance Wi(t)=[Wix(t),Wiy(t)]TMeanwhile, in order to avoid saturation of the actuator, input is constrained, and | U (t) | is required to be less than or equal to lambda, wherein lambda is a normal number. Set the distance r between two inspection robotsij(t)=||Li(t)-Lj(t) | |, a safety distance r is set for avoiding collision of two inspection robotssAnd r is required to be satisfied at any time in the inspection processij(t)≥rsAnd we assume that r is guaranteed after N robots reach the inspection destinationij(t)>>rsIn this case, i ≠ j.
Then consider the second order linear dynamics model of the ith fire inspection robot as:
Figure BDA0003027378310000161
wherein the system matrix is A, the input matrix is B, the output matrix is C, the interference matrix is D,
Figure BDA0003027378310000162
is the state of the robot at the time t,
Figure BDA0003027378310000163
to input, yi(t) is the system unique output.
The global dynamics model is written as:
Figure BDA0003027378310000164
wherein
Figure BDA0003027378310000165
Is the product of Kronecker, X (t) ═ x1(t),x2(t),...,xn(t)]T,Y(t)=[y1(t),y2(t),...,yn(t)]T,INIs an N-order identity matrix, and is set to L (t) [ < L >1t,L2t,...,LNt]T,LD=[L1D,L2D,...,LND]T,U0=[U1,U2,...,UN]TThe positions at time t, the target point positions and the control inputs of the N robots are respectively.
In order for the N fire inspection robots to achieve optimal control of minimum time and energy in continuous time, continuous state and control input space under unknown disturbances and to avoid collisions throughout the process, the following cost functions are considered:
Figure BDA0003027378310000166
wherein ζ>And 0 is used for representing the specific gravity of time in the routing inspection process, and R is a positive definite matrix. In order to solve the path planning problem that the minimum arrival time T of the robot is unknown, a hyperbolic tangent function is introduced to rewrite a cost function into an infinite integral form so as to solve the problem, in addition, in order to avoid the saturation of an actuator, the input is also required to be constrained, so that the common U (T)TRu (t) linear quadratic rewrite to non-quadratic performance function phi (u (t)) is used to approximate the minimum energy cost and capture the input constraints, and to introduce an artificial potential field function in order to avoid collisions between two robots, the cost function is approximately rewritten as:
V(X(t),U(t))=∫t ζtanh(L(t)-LD)T(L(t)-LD)+φ(U(t))+ΛR(t)TfR(rij(t))dt (4-4)
where ζ is a normal number and tanh is a hyperbolic tangent function, which is an odd function that monotonically increases and is continuously differentiable, so the cost function after overwriting remains in an IRL-resolvable form. Rewriting ζ to ζ tanh (L (t) -L)D)T(L(t)-LD) When the current position L (t) of the robot is away from the target point LDTime ζ tanh (L (t) -L)D)T(L(t)-LD) Approximately ζ, ξ tanh (L (t) -L) upon arrival at the target pointD)T(L(t)-LD) This translates the T integral at unknown time to an infinite integral independent of the time of arrival T, to achieve an optimal solution to the value function.
And because the robot system usually has constraints on input, the common U (t)TRu (t) linear quadratic rewrite to non-quadratic performance function phi (u (t) is used to approximate the minimum energy cost and capture the input constraints:
Figure BDA0003027378310000171
wherein input constraint is | U (t) | is less than or equal to λ, λ and σ are both normal numbers, and R ═ diag (R)1,r2...rm)>0。
In order to avoid collision of any pair of inspection robots, an artificial potential field function f is addedR(rij(t)) makes the two robots send out repulsive potential field to avoid each other, and a special weight matrix Lambda is designed for making V (x (t), U (t)) after adding the potential field function to be boundedR(t) for canceling the non-zero tail. Will reject function fR(rij(t)) defines the form of a gaussian function, and this gaussian function is always greater than 0:
Figure BDA0003027378310000172
where the larger s the greater the steepness of the rejection function and the larger σ the larger the rejection range. To capture the repulsive distance rij(t), solving for s and σ in the rejection function, assuming:
fR(rs)=K0;fR(rs+Δ)=K1 (4-7)
wherein 0 < K1<K0Less than 1; Δ is a positive increment. Substituting the above formula into one can obtain:
Figure BDA0003027378310000173
by a weight matrix ΛR(t)=[Λ12(t),Λ13(t),...,ΛN-1,N(t)]TSo that the value function after introducing the artificial potential field function is bounded and the weight matrix depends on the distance to the target point.
ΛR(t)=βtanh(||Li(t)-LiD||2+||Lj(t)-LjD||2) (4-9)
It can be seen that Λ is the robot principle target pointR(t) is beta whenLambda when the robot reaches the target pointRSince (t) is 0, β is a collision coefficient, and the magnitude of β is determined by the importance of avoiding collision in the polling process.
Following the solution of the optimal control input using the cost function in (4-4), it is clear that V can be minimized, and t is derived on both sides of the (4-4) equation, so the Bellman equation can be written as:
V(x(t),U(t))=-ζtanh(L(t)-LD)T(L(t)-LD)-φ(U(t))-ΛR(t)TfR(rij(t)) (4-10)
let Fζ(t)=ζtanh(L(t)-LD)T(L(t)-LD) Defining the optimum function as:
V*(X)t),U(t))=min∫t Fζ(t)+φ(U(t))+ΛR(t)TfR(rij(t))dt (4-11)
defining the HJB equation according to equation (4-10) as:
Figure BDA0003027378310000181
wherein
Figure BDA0003027378310000182
Under stable conditions have
Figure BDA0003027378310000183
(4-12) deriving U on both sides simultaneously:
Figure BDA0003027378310000184
the optimal control input u can be obtained after term shifting*Comprises the following steps:
Figure BDA0003027378310000185
substituting (4-14) into (4-5) to obtain:
Figure BDA0003027378310000186
wherein l is a column vector of all one, and substituting (4-14) into (4-15) to obtain:
Figure BDA0003027378310000187
wherein
Figure BDA0003027378310000188
Substituting (4-16) into (4-12) to obtain:
Figure BDA0003027378310000189
however, in practical situations, the HJB equation is difficult to solve directly, and because the system model part is unknown, the HJB equation is obtained
Figure BDA00030273783100001810
The method can not be directly solved, so that the HJB equation can be solved by utilizing a strategy iterative algorithm based on the integral reinforcement learning, the signals in (T, T + T) are used for learning by the integral reinforcement learning, and a specific dynamic model of the system does not need to be known.
Firstly, the value function is rewritten into the form of integral difference value, and the following Bellman equation can be obtained:
Figure BDA00030273783100001811
in order to solve (4-18) online in real time, an operator-critic neural network algorithm is introduced to realize real-time update in the strategy iteration process. The value function V (X) is first approximated by a critic neural network, since
Figure BDA00030273783100001812
The first term is quadratic form which is easy to obtain, so that only the second term needs to be approximated and set
Figure BDA00030273783100001813
Using neural network pair V0(X) approximating:
Figure BDA00030273783100001814
wherein wcIs the weight, psi, of the critical neural networkc(X) is a basis function, εc(X) is the approximation error;
differentiating X on two sides of (4-20) to obtain:
Figure BDA0003027378310000191
substituting (4-20) into (4-18) can obtain a new bellman equation:
Figure BDA0003027378310000192
wherein epsilone(t)=εc(X(t+T))-εc(X (t)) is the error of the Bellman equation,. DELTA.. psic(X(t)=ψc(X(t+T)-ψc(X(t)。
But due to the coefficient w of the critic neural networkcIs unknown and therefore (4-18) cannot be solved directly, in order to determine wcTherefore we directly rewrite (4-20) to:
Figure BDA0003027378310000193
wherein
Figure BDA0003027378310000194
Is a V0(ii) an approximation of (X),
Figure BDA0003027378310000195
for ideal approximation coefficients, then (4-22) is:
Figure BDA0003027378310000196
order to
Figure BDA0003027378310000197
For Bellman tracking errors and constructing an objective function by making εe(t) minimizing to adjust the weighting coefficients of the critic neural network:
Figure BDA0003027378310000198
the two sides of the formula (4-25) are aligned
Figure BDA0003027378310000199
Derivation, which can be obtained from the chain rule:
Figure BDA00030273783100001910
wherein beta isc>0 is the learning rate, and 0 is the learning rate,
Figure BDA00030273783100001911
Figure BDA00030273783100001912
is delta psicAn approximation of (d).
Will EeSubstituting into (4-26) to obtain the weight coefficient of critic neural network
Figure BDA00030273783100001913
Should be subject to:
Figure BDA00030273783100001914
the obtained ideal weight coefficient is substituted into (4-14) to obtain an optimal control strategy, however, the optimal strategy obtained through the value function of critic approximation cannot ensure the stability of a closed-loop system, so an actor neural network is introduced into the actuator to ensure convergence to an optimal solution and ensure the stability of the system:
Figure BDA00030273783100001915
Figure BDA00030273783100001916
is the optimal approximation coefficient of the actor neural network,
Figure BDA00030273783100001917
is determined by the following lyapunov function:
Figure BDA00030273783100001918
it can be shown that when waWhen the following formula is satisfied, the approximated strategy can make the system consistent and finally bounded, and the system can pass through
Figure BDA00030273783100001919
To obtain U*(t)。
Figure BDA00030273783100001920
Wherein K1、K2In order to design a good normal number,
Figure BDA00030273783100001921
based on the expressions (4-19), (4-27), (4-28) and (4-30), the critic algorithm and the actor algorithm are respectively utilized to realize synchronous updating of the value function and the strategy function, and an online integral reinforcement learning algorithm based on strategy iteration is designed to solve the HJB equation so as to solve the optimal control input.
The algorithm is as follows: online IRL algorithm based on strategy iteration
Initialization: given a feasible actuator input
Figure BDA0003027378310000201
Step 1: policy evaluation, given initialization
Figure BDA0003027378310000202
Solving by
Figure BDA0003027378310000203
Figure BDA0003027378310000204
Step 2: improvement of the strategy is that
Figure BDA0003027378310000205
Substituting the following formula for updating
Figure BDA0003027378310000206
Figure BDA0003027378310000207
Step 3: order to
Figure BDA0003027378310000208
Go back to Step1 until
Figure BDA0003027378310000209
Converging to a minimum value.
The overall structure of the fire-fighting inspection robot is shown in fig. 7.
The inspection schematic diagram of the multi-fire inspection cooperative robot system is shown in fig. 8.
The whole square frame is an area to be inspected, the dotted line is an area dividing line, the light color pentagram represents a key inspection area, the dark color pentagram represents a fire finding point, and the bidirectional arrow represents that information interaction exists between the robots.
The workflow diagram for operating the robot to extinguish a fire is shown in fig. 9.
The workflow diagram of the fire inspection robot is shown in fig. 10.
Finally, the above embodiments are only intended to illustrate the technical solutions of the present invention and not to limit the present invention, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions, and all of them should be covered by the claims of the present invention.

Claims (4)

1. Many fire control based on integral reinforcement learning patrols and examines cooperative robot system, its characterized in that: the system comprises a hardware layer, an interaction layer, a sensing layer and a control layer which are connected in sequence;
the hardware layer adopts a DSP as a controller, data acquired by the odometer and the gyroscope are sent into the DSP for processing, and the position of the robot in the routing inspection map is calculated in real time; sending a speed instruction to the DSP through the upper computer, and coding the obtained speed information by the DSP to control the operation of the servo motor; the fire-fighting inspection robot adopts crawler-type drive; when the mechanical arm needs to move, a ros system in an upper computer passes through a robot on a moveit! The platform plans a motion trail of a target point to which the mechanical arm moves, discretizes the planned motion trail and sends the discretized motion trail to the DSP, and the DSP obtains the angular velocity and the acceleration of each shaft and then controls a servo motor of the mechanical arm to move so as to reach the target point;
the sensing layer comprises a laser radar for establishing a map, an obstacle-avoiding infrared sensor, a flame detector for detecting flame, a temperature sensor, a realsense D435i depth camera, an odometer and a gyroscope;
the control layer is as follows:
the whole fire-fighting inspection area is provided with N robots for cooperative inspection, and the N robots are respectively positioned at initial positions (x)i0,yi0) To reach respective destinations (x)iD,yiD) And i belongs to {1, 2, …, N }, and the position L of the ith fire inspection robot at the time t is seti(t)=[Lix(t),Liy(t)]TVelocity Vi(t)=[Vix(t),Viy(t)]TController input Ui(t)=[uix(t),uiy(t)]TControl input and unknown environmental disturbance Wi(t)=[Wix(t),Wiy(t)]TIn order to avoid saturation of the actuator, input is constrained, and | U (t) | is required to be less than or equal to lambda, wherein lambda is a normal number; set the distance r between two inspection robotsij(t)=||Li(t)-Lj(t) | |, a safety distance r is set for avoiding collision of two inspection robotssAnd r is required to be satisfied at any time in the inspection processij(t)≥rsAnd when N robots reach the inspection destination, ensuring rij(t)>>rsWhen i is not equal to j;
then consider the second order linear dynamics model of the ith fire inspection robot as:
Figure FDA0003536798470000011
wherein the system matrix is A, the input matrix is B, the output matrix is C, the interference matrix is D,
Figure FDA0003536798470000012
is the state of the robot at the time t,
Figure FDA0003536798470000013
to input, yi(t) is the system only output;
the global dynamics model is written as:
Figure FDA0003536798470000014
wherein
Figure FDA0003536798470000015
Is the product of Kronecker, X (t) ═ x1(t),x2(t),...,xn(t)]T,Y(t)=[y1(t),y2(t),...,yn(t)]T,INIs an N-order identity matrix, and is set to L (t) [ < L >1t,L2t,...,LNt]T,LD=[L1D,L2D,...,LND]T,U0=[U1,U2,...,UN]TThe positions of the N robots at the time t, the positions of the target points and the control input are respectively input;
in order to achieve optimal control of the N fire inspection robots under unknown disturbances with minimum time and energy in continuous time, continuous state and control input space and to avoid collisions in the whole process, the following cost functions are considered:
Figure FDA0003536798470000021
wherein ζ>0, used for representing the proportion of time in the polling process, and R is a positive definite matrix; in order to solve the path planning problem that the minimum arrival time T of the robot is unknown, a hyperbolic tangent function is introduced to rewrite a cost function into an infinite integral form so as to solve the problem, and in addition, in order to avoid saturation of an actuator, the input is required to be restrained, U (T)TRu (t) linear quadratic rewrite to non-quadratic performance function phi (u (t)) for approximating minimum energy cost and capturing input constraints, and introducing an artificial potential field function to avoid collisions between two robots, approximately rewriting the cost function as:
Figure FDA0003536798470000022
where ζ is a normal number, tanh is a hyperbolic tangent function, which is an odd function that monotonically increases and is continuously differentiable, and the cost function is an IRL-solvable form; rewriting ζ to ζ tanh (L (t) -L)D)T(L(t)-LD) When the current position L (t) of the robot is away from the target point LDTime ζ tanh (L (t) -L)D)T(L(t)-LD) Approximately ζ, tanh (L (t) -L) when the target point is reachedD)T(L(t)-LD) Converting the T integral of unknown time into infinite integral irrelevant to the arrival time T to realize the optimal solution of the value function;
will U (t)TRu (t) linear quadratic rewrite to non-quadratic performance function phi (u (t) is used to approximate the minimum energy cost and capture the input constraints:
Figure FDA0003536798470000023
wherein input constraint is | U (t) | is less than or equal to λ, λ and σ are both normal numbers, and R ═ diag (R)1,r2...rm)>0;
Adding an artificial potential field function f to avoid collision of any pair of inspection robotsR(rij(t)) the two robots are caused to emit repulsive potential fields to avoid each other, and in order to make V (x (t), U (t)) after the potential field function is added bounded, a weight matrix Λ is designedR(t) for canceling non-zero tails; will reject function fR(rij(t)) defines the form of a gaussian function, and this gaussian function is always greater than 0:
Figure FDA0003536798470000024
wherein the larger s is, the greater the steepness of the repulsion function is, and the larger σ is, the larger the repulsion range is; to capture the repulsive distance rij(t),Solving for s and σ in the rejection function, provided with:
fR(rs)=K0;fR(rs+Δ)=K1 (4-7)
wherein 0 < K1<K0Less than 1; Δ is a positive increment, and is substituted to obtain:
Figure FDA0003536798470000025
by a weight matrix ΛR(t)=[Λ12(t),Λ13(t),...,ΛN-1,N(t)]TTo make the value function after introducing the artificial potential field function bounded and the weight matrix dependent on the distance to the target point;
ΛR(t)=βtanh(||Li(t)-LiD||2+||Lj(t)-LjD||2) (4-9)
lambda when the robot is far away from the target pointR(t) ═ β, Λ when the robot reaches the target pointR(t) is 0, beta is a collision coefficient, and the size of beta is determined by the importance of avoiding collision in the routing inspection process;
solving the optimal control input by using the cost function in (4-4), and carrying out derivation on t on two sides of the formula (4-4) and writing the Bellman equation as follows:
Figure FDA0003536798470000031
let Fζ(t)=ζtanh(L(t)-LD)T(L(t)-LD) Defining the optimum function as:
Figure FDA0003536798470000032
defining the HJB equation according to equation (4-10) as:
Figure FDA0003536798470000033
wherein
Figure FDA0003536798470000034
Under stable conditions have
Figure FDA0003536798470000035
(4-12) deriving U on both sides of the formula:
Figure FDA0003536798470000036
obtaining the optimal control input U after item shifting*(t) is:
Figure FDA0003536798470000037
substituting (4-14) into (4-5) to obtain:
Figure FDA0003536798470000038
wherein l is a column vector of all one, and substituting (4-14) into (4-15) to obtain:
Figure FDA0003536798470000039
wherein
Figure FDA00035367984700000310
Substituting (4-16) into (4-12) to obtain:
Figure FDA0003536798470000041
solving an HJB equation by utilizing a strategy iterative algorithm based on integral reinforcement learning, wherein the integral reinforcement learning uses signals in (T, T + T) for learning without knowing a specific dynamic model of the system;
firstly, the value function is rewritten into the form of integral difference value, and the following Bellman equation is obtained:
Figure FDA0003536798470000042
in order to solve (4-18) online in real time, an operator-critic neural network algorithm is introduced to realize real-time updating in the strategy iteration process; the value function V (X) is first approximated by a critic neural network, since
Figure FDA0003536798470000043
Wherein the first term is quadratic form easy to obtain, only the second term is approximated, and
Figure FDA0003536798470000044
using neural network pair V0(X) approximating:
Figure FDA0003536798470000045
wherein wcIs the weight, psi, of the critical neural networkc(X) is a basis function, εc(X) is the approximation error;
differentiating X on two sides of (4-20) to obtain:
Figure FDA0003536798470000046
substituting (4-20) into (4-18) results in a new bellman equation:
Figure FDA0003536798470000047
wherein epsilone(t)=εc(X(t+T))-εc(X (t)) is the error of the Bellman equation,. DELTA.. psic(X(t)=ψc(X(t+T)-ψc(X(t);
To determine wcThe (4-20) is rewritten as:
Figure FDA0003536798470000048
wherein
Figure FDA0003536798470000049
Is a V0(ii) an approximation of (X),
Figure FDA00035367984700000410
for ideal approximation coefficients, then (4-22) is:
Figure FDA00035367984700000411
order to
Figure FDA00035367984700000412
For Bellman tracking errors and constructing an objective function by making εe(t) minimizing to adjust the weighting coefficients of the critic neural network:
Figure FDA00035367984700000413
the two sides of the formula (4-25) are aligned
Figure FDA00035367984700000414
Derivation, obtained by the chain rule:
Figure FDA00035367984700000415
wherein beta isc>0 is the learning rate, and 0 is the learning rate,
Figure FDA00035367984700000416
Figure FDA00035367984700000417
is delta psicAn approximation of (d);
will EeSubstituting into (4-26) to obtain weight coefficient of criticc neural network
Figure FDA00035367984700000418
Should be subject to:
Figure FDA0003536798470000051
and (4) substituting the obtained ideal weight coefficient into (4-14) to obtain an optimal control strategy, but the optimal strategy obtained by a value function of critic approximation cannot ensure the stability of a closed-loop system, and an actor neural network is introduced into an actuator to ensure convergence to an optimal solution and ensure the stability of the system:
Figure FDA0003536798470000052
Figure FDA0003536798470000053
is the optimal approximation coefficient of the actor neural network,
Figure FDA0003536798470000054
is determined by the following lyapunov function:
Figure FDA0003536798470000055
when w isaWhen the following formula is satisfied, the approximated strategy makes the system consistent and finally bounded, and the system is finally bounded by
Figure FDA0003536798470000056
To obtain U*(t);
Figure FDA0003536798470000057
Wherein K1、K2In order to design a good normal number,
Figure FDA0003536798470000058
based on the formulas (4-19), (4-27), (4-28) and (4-30), respectively utilizing critic and operator algorithms to realize synchronous updating of a value function and a strategy function, and designing an online integral reinforcement learning algorithm based on strategy iteration to solve an HJB equation so as to solve optimal control input;
the algorithm is as follows: online IRL algorithm based on strategy iteration
Initialization: given a feasible actuator input
Figure FDA0003536798470000059
Step 1: policy evaluation, given initialization
Figure FDA00035367984700000510
Solving by
Figure FDA00035367984700000511
Figure FDA00035367984700000512
Step 2: improvement of the strategy is that
Figure FDA00035367984700000513
Substituting the following formula for updating
Figure FDA00035367984700000514
Figure FDA00035367984700000515
Step 3: order to
Figure FDA00035367984700000516
Go back to Step1 until
Figure FDA00035367984700000517
Converging to a minimum value.
2. The cooperative robot system for multiple fire inspection patrolling based on integral reinforcement learning as claimed in claim 1, wherein: the hardware layer comprises a crawler driving system and a mechanical arm servo control;
1. track drive system
The crawler belt is divided into two sections, and each section is driven by a separate servo motor; the front-section crawler belt is used for lifting a chassis of the robot to smoothly pass through the chassis when an obstacle exists, and the height of the robot is adjusted by adjusting the front-section crawler belt, so that a larger operation radius is provided for the mechanical arm; the second half section of the crawler belt mainly plays a role in driving the robot, is coaxially driven by a servo motor, and performs deceleration braking on the crawler belt on one side during steering; the rated voltage of the servo motor is 24V, the output power is 100W, and the speed information of x and y axes issued by the upper PC is converted into the rotating speed of the servo motor after being coded by the DSP so as to realize steering and driving;
2. mechanical arm servo control
A four-axis mechanical arm is arranged above the robot, a rotatable claw-shaped clamping device is arranged at the front section of the mechanical arm, and a fire extinguishing device is arranged on the clamping device; after the fire extinguishing device is additionally arranged, the fire extinguishing device is matched with the mechanical arm to realize accurate extinguishing of a fire point; the four-axis mechanical arm is driven by four servo motors to move each axis, and the motion information of each axis is calculated by a moveit | in an upper computer ros system! Generating after path planning;
firstly, the calibration of the mechanical arm under the condition that eyes are out of hand is completed
The coordinate transformation of the target point under the world coordinate system to the coordinate relative to the mechanical arm coordinate system is completed through the calibration form of 'eyes are out of hand'; for the calibration mode of 'eyes are out of hand', a transformation matrix Tgc from a manipulator base coordinate system Tg to a camera coordinate system Tc is constant, a transformation matrix Tbe from a calibration plate coordinate system Tb to a manipulator tail end coordinate system Te is constant, and the relationship of coordinate transformation satisfies the following formula:
for the ith time: tbci=Tbe*Tegi*Tgc (1-1)
Time i + 1: tbci+1=Tbe*Tegi+1*Tgc (1-2)
Finishing to obtain: (Teg)i)-1*(Teg)i+1=Tgc*(Tbc)i -1*(Tbc)i+1*Tgc-1 (1-3)
Then A ═ Tegi)-1*(Teg)i+1Namely the motion relation of the object relative to the tail end coordinate system Te of the mechanical arm;
② utilize moveit! Completing the planning of the motion trail of the mechanical arm
Using Moveit! Combining all independent functional components of the control mechanical arm, and then providing the combination for a user to use through action and service communication in the ROS; in moveit! In the method, a URDF model is created according to the real size and the number of axes of the mechanical arm, and after the URDF model is input, the model is used for calculating the weight of the mechanical arm according to the weight of the mechanical arm! The setup assist generates a corresponding configuration file according to the setting of the setup assist, the content of the configuration file comprises a collision matrix of the mechanical arm so as to avoid collision between shafts caused by a planned track, connection information of each joint, a defined initial position and the like; then adding a control plug-in controller of the mechanical arm, wherein the controller comprises a defined follow _ join _ project node and names for setting each axis, finally writing a program to realize that the PC is connected with the mechanical arm in a socket communication mode, and observing a real-time motion track of the mechanical arm in rviz by subscribing a join _ state topic; the method comprises the steps that firstly, the flame is identified and detected through a fast convolution neural network, after identification is successful, the three-dimensional coordinate of an ignition point relative to a robot is obtained through point cloud data of a depth camera, then the position of the tail end of a mechanical arm, which needs to be reached, can be obtained through TF coordinate change, and then the track is solved through an internally integrated algorithm; the solved track information is composed of a large number of discrete points, and the track information comprises the angular velocity and the angular acceleration of each axis to reach the points; when the solved points are enough, a very smooth motion track is fitted, and after the information of the discrete points is published and subscribed through topics, the mechanical arm moves to the target point smoothly according to the planned points.
3. The cooperative robot system for multiple fire inspection patrolling based on integral reinforcement learning as claimed in claim 1, wherein: the sensing layer comprises infrared sensor obstacle avoidance and flame identification based on a fast convolutional neural network;
infrared sensor obstacle avoidance
Detecting obstacles encountered by the inspection robot in the inspection process in real time by using an infrared sensor, detecting Euclidean distances between the robot and the obstacles by using the infrared sensor when the obstacle exists at the front part, and calculating specific coordinates of the obstacles by using the distances and odometer and gyroscope data obtained from a DSP (digital signal processor); after the coordinates are obtained, an obstacle avoidance path is designed immediately through a control algorithm, the obstacle avoidance path is arc-shaped, a minimum distance between the obstacle avoidance path and an obstacle is required to be kept in the whole process, and after obstacle avoidance is finished, the obstacle avoidance path is required to return to the previously planned optimal routing inspection path immediately;
② flame identification based on fast convolution neural network
The flame characteristics are extracted and detected by adopting a fast convolutional neural network Faster R-CNN, and the method comprises the following steps:
② 1: inputting a shot flame picture;
2.: sending the picture into a Convolutional Neural Network (CNN) for feature extraction;
② -3: after feature extraction, performing feature mapping, wherein the feature mapping jointly acts on a subsequent full-connection layer and a subsequent region to generate a network RPN;
② 3.1: the feature mapping is entered into RPN, firstly, a series of region candidate suggestion frames are passed through, and then the suggestion frames are fed into two 1 × 1 convolution layers respectively, wherein the first convolution layer is used for region classification, namely, the intersection ratio IOU value of the suggestion frames is generated by calculation to distinguish positive and negative samples; the other is used for the regression judgment of the boundary box, and a more accurate target detection box is generated after non-maximization inhibition;
② 3.2: the feature mapping enters an ROI pooling layer for calculation of a subsequent network;
② 4: after the pooled feature maps pass through the full-connection layer, classifying the suggestion frames by utilizing softmax again, identifying whether the detection frames are objects or not, and performing boundary frame regression judgment on the suggestion frames again;
the specific method for generating the detection frame by the RPN is to slide on the input feature mapping through a sliding frame to generate 9 suggestion frames on each pixel point, and the size of the suggestion frames is 1282、2562、5122Aspect ratio of 1: 1. 1: 2. 2: 1, distinguishing positive and negative samples by utilizing the intersection ratio IOU values of the detection boxes, wherein the IOU value of the positive sample is more than 0.7, the IOU value of the negative sample is less than 0.3, and the proportion of the positive sample to the negative sample is set as 1: 1; aiming at different characteristics of flame in an image, a method of guiding anchoring is adopted to accelerate the detection speed of RPN, and the improved sparse anchoring strategy is as follows:
Figure FDA0003536798470000071
wherein x and y are coordinates of pixel points, F (x and y) represents the generated flame color mask, if the coordinates are 1, the pixel points generate a suggestion box, if the coordinates are 0, the pixel points do not generate, and m isR(x,y)、mG(x,y)、mB(x, y) are image pixels, respectivelyRGB channel value, T, of a pointRIs a preset threshold value;
in addition, the principle of correcting the detection frame by utilizing boundary regression judgment is that an original suggestion frame A is mapped G to obtain a regression suggestion frame F which is closer to the real condition; this mapping G is obtained by translation and scaling:
firstly, translation: fx=Aw,dx(A)+Ax (2-2)
Fy=Ah.dy(A)+Ay (2-3)
Rescaling: fw=Aw.exp(dw(A)) (2-4)
Fh=Ah.exp(dh(A)) (2-5)
Wherein x, y, w, h respectively represent the center coordinates of the proposed box, width, height, dx、dy、dw、dhRespectively, the original frame A and the real frame F are in a transformation relation, and when the difference between the original frame A and the real frame F is not large, the transformation is regarded as linear;
the output is the probability of being identified as a flame.
4. The cooperative robot system for multiple fire inspection patrolling based on integral reinforcement learning as claimed in claim 1, wherein: the interaction layer is as follows: in the inspection process, pictures captured by the camera need to be sent to a control room and a mobile terminal in real time through a wireless network, corresponding APPs are developed in a matching manner, and the inspection robot is correspondingly controlled at a remote terminal so as to realize the inspection of an area to be inspected again by an operator; after the flame is detected, an alarm signal is immediately sent to a control room, and corresponding fire extinguishing measures can be immediately and automatically made; after fire extinguishing measures are implemented, if the fire condition is still not inhibited, the automatic mode is switched to the remote control mode, a professional in a control room takes over the control of the inspection robot comprehensively, the operation of the crawler belt and the action of the mechanical arm are controlled manually to realize the accurate extinguishing of a fire point, and whether the operations of cutting off a power supply, closing a gas valve and transferring inflammable matters are needed or not is judged according to the fire condition; each inspection robot can be connected with the whole fire fighting system in a grid mode, if the fire situation is still large after measures are taken, a request for taking over a fire fighting network is sent to a control room, a response is not made in one minute after the control room agrees or the fire fighting control room receives the response, a local spraying pipe network in a building is opened, meanwhile, a comprehensive fire fighting alarm is sent, and all fire fighting channels and emergency lighting facilities are opened; installing an emergency stop key at the top end of the robot; and after the fire is extinguished, marking the ignition point on the patrol inspection map as a key patrol inspection area.
CN202110419574.2A 2021-04-19 2021-04-19 Multi-fire-fighting inspection cooperative robot system based on integral reinforcement learning Expired - Fee Related CN113134187B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110419574.2A CN113134187B (en) 2021-04-19 2021-04-19 Multi-fire-fighting inspection cooperative robot system based on integral reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110419574.2A CN113134187B (en) 2021-04-19 2021-04-19 Multi-fire-fighting inspection cooperative robot system based on integral reinforcement learning

Publications (2)

Publication Number Publication Date
CN113134187A CN113134187A (en) 2021-07-20
CN113134187B true CN113134187B (en) 2022-04-29

Family

ID=76812679

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110419574.2A Expired - Fee Related CN113134187B (en) 2021-04-19 2021-04-19 Multi-fire-fighting inspection cooperative robot system based on integral reinforcement learning

Country Status (1)

Country Link
CN (1) CN113134187B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114442606B (en) * 2021-12-17 2024-04-05 北京未末卓然科技有限公司 Alert condition early warning robot and control method thereof
CN114639088A (en) * 2022-03-23 2022-06-17 姜妹英 Big data automatic navigation method
CN115167444A (en) * 2022-07-27 2022-10-11 成都群智微纳科技有限公司 ROS-based multi-agent autonomous inspection method and system
CN117944043B (en) * 2023-11-22 2024-07-16 广州深度医疗器械科技有限公司 Robot control method and robot thereof
CN117444985B (en) * 2023-12-20 2024-03-12 安徽大学 Mechanical arm trolley control method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109173125A (en) * 2018-09-29 2019-01-11 北京力升高科科技有限公司 A kind of collaboration working method and system of fire-fighting robot
CN109276833A (en) * 2018-08-01 2019-01-29 吉林大学珠海学院 A kind of robot patrol fire-fighting system and its control method based on ROS
CN109976161A (en) * 2019-04-23 2019-07-05 哈尔滨工业大学 A kind of finite time optimization tracking and controlling method of uncertain nonlinear system
CN110782011A (en) * 2019-10-21 2020-02-11 辽宁石油化工大学 Networked multi-agent system distributed optimization control method based on reinforcement learning
CN111408089A (en) * 2020-04-22 2020-07-14 北京新松融通机器人科技有限公司 Fire-fighting robot and fire-fighting robot fire extinguishing system
CN112130570A (en) * 2020-09-27 2020-12-25 重庆大学 Blind guiding robot of optimal output feedback controller based on reinforcement learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109276833A (en) * 2018-08-01 2019-01-29 吉林大学珠海学院 A kind of robot patrol fire-fighting system and its control method based on ROS
CN109173125A (en) * 2018-09-29 2019-01-11 北京力升高科科技有限公司 A kind of collaboration working method and system of fire-fighting robot
CN109976161A (en) * 2019-04-23 2019-07-05 哈尔滨工业大学 A kind of finite time optimization tracking and controlling method of uncertain nonlinear system
CN110782011A (en) * 2019-10-21 2020-02-11 辽宁石油化工大学 Networked multi-agent system distributed optimization control method based on reinforcement learning
CN111408089A (en) * 2020-04-22 2020-07-14 北京新松融通机器人科技有限公司 Fire-fighting robot and fire-fighting robot fire extinguishing system
CN112130570A (en) * 2020-09-27 2020-12-25 重庆大学 Blind guiding robot of optimal output feedback controller based on reinforcement learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于卷积神经网络的火焰识别;段锁林 等;《计算机工程与设计》;20191130;第3288-3292段 *
基于多智能体强化学习的空间机械臂轨迹规划;赵毓;《航空学报》;20210125;全文 *

Also Published As

Publication number Publication date
CN113134187A (en) 2021-07-20

Similar Documents

Publication Publication Date Title
CN113134187B (en) Multi-fire-fighting inspection cooperative robot system based on integral reinforcement learning
Petrlík et al. A robust UAV system for operations in a constrained environment
WO2021196529A1 (en) Air-ground cooperative intelligent inspection robot and inspection method
US9563528B2 (en) Mobile apparatus and localization method thereof
CN107097228B (en) Autonomous traveling robot system
US7668621B2 (en) Robotic guarded motion system and method
Reid et al. Cooperative multi-robot navigation, exploration, mapping and object detection with ROS
US20080009969A1 (en) Multi-Robot Control Interface
WO2009148672A1 (en) System and method for seamless task-directed autonomy for robots
Xiao et al. Visual servoing for teleoperation using a tethered uav
Kalinov et al. Warevr: Virtual reality interface for supervision of autonomous robotic system aimed at warehouse stocktaking
Zhang et al. Design of intelligent fire-fighting robot based on multi-sensor fusion and experimental study on fire scene patrol
Lee et al. Artificial intelligence and Internet of Things for robotic disaster response
CN113759901A (en) Mobile robot autonomous obstacle avoidance method based on deep reinforcement learning
Uzakov et al. UAV vision-based nonlinear formation control applied to inspection of electrical power lines
Wang et al. Development of a search and rescue robot system for the underground building environment
Hager et al. Toward domain-independent navigation: Dynamic vision and control
CN114425133B (en) Indoor flame autonomous inspection and fire extinguishing method
Chuttur et al. Design and implementation of an autonomous wheeled robot using iot with human recognition capability
Dani et al. Position-based visual servo control of leader-follower formation using image-based relative pose and relative velocity estimation
Choi et al. Wearable gesture control of agile micro quadrotors
Chen et al. Multivr: Digital twin and virtual reality based system for multi-people remote control unmanned aerial vehicles
Saleh et al. Local motion planning for ground mobile robots via deep imitation learning
Guan et al. Intelligent obstacle avoidance algorithm for mobile robots in uncertain environment
Ortiz Visual servoing for an omnidirectional mobile robot using the neural network-Multilayer perceptron

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20220429