CN115574826B - National park unmanned aerial vehicle patrol path optimization method based on reinforcement learning - Google Patents

National park unmanned aerial vehicle patrol path optimization method based on reinforcement learning Download PDF

Info

Publication number
CN115574826B
CN115574826B CN202211572414.2A CN202211572414A CN115574826B CN 115574826 B CN115574826 B CN 115574826B CN 202211572414 A CN202211572414 A CN 202211572414A CN 115574826 B CN115574826 B CN 115574826B
Authority
CN
China
Prior art keywords
path
unmanned aerial
aerial vehicle
energy consumption
task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211572414.2A
Other languages
Chinese (zh)
Other versions
CN115574826A (en
Inventor
郭强辉
殷虹娇
张鹏
王永峰
宋尚源
刘兆泽
高琳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Deepiot Technology Co ltd
Nankai University
Original Assignee
Beijing Deepiot Technology Co ltd
Nankai University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Deepiot Technology Co ltd, Nankai University filed Critical Beijing Deepiot Technology Co ltd
Priority to CN202211572414.2A priority Critical patent/CN115574826B/en
Publication of CN115574826A publication Critical patent/CN115574826A/en
Application granted granted Critical
Publication of CN115574826B publication Critical patent/CN115574826B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/20Instruments for performing navigational calculations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • G06Q10/047Optimisation of routes or paths, e.g. travelling salesman problem
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Tourism & Hospitality (AREA)
  • Biophysics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Development Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Automation & Control Theory (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Game Theory and Decision Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention discloses a national park Unmanned Aerial Vehicle (UAV) patrol route optimization method based on reinforcement learning, which comprises the steps of firstly, taking a flight route of an UAV as an optimization target, adding constraint conditions of traversal route points of the UAV, electric quantity limitation of the UAV and energy consumption of task execution of the route points, and establishing an UAV route planning model with a self-service charging function; then respectively corresponding the unmanned aerial vehicle, the path points, the charging base station, the energy, the battery capacity, the flight path energy consumption and the path point task energy consumption in the unmanned aerial vehicle path planning model to a CVRP problem model; the unmanned aerial vehicle patrol path planning problem which needs to consider side energy consumption constraint and point energy consumption constraint originally is reduced to a CVRP problem which takes path length as an optimization target and takes customer demand and vehicle cargo as constraints by using a feedforward weighting method; and finally, solving the reduced CVRP problem by using a multi-decoder attention model.

Description

National park unmanned aerial vehicle patrol path optimization method based on reinforcement learning
Technical Field
The invention belongs to the technical field of computer intelligent calculation and unmanned aerial vehicle flight control, and particularly relates to a national park unmanned aerial vehicle patrol path optimization method based on reinforcement learning.
Background
The field patrol monitoring is the most important ecological monitoring and daily supervision means in national parks and natural conservation places, and a patrol guard collects data in the aspects of wild species population, habitat, phenology and the like through patrol monitoring, can timely discover ecological environment problems, inhibit illegal activities and the like, realizes effective protection on the national parks and the natural conservation places, and provides decision basis for natural resource supervision. However, national parks and natural protection lands have large areas, wide ranges and complex terrains, people and vehicles in most regions are difficult to reach, and the traditional manual patrol mode has low efficiency and wastes time and labor. Therefore, in recent years, unmanned aerial vehicles are increasingly used for patrol monitoring work of various natural protection places.
The unmanned aerial vehicle technology is an unmanned aerial vehicle remote sensing technology which is realized by fusing an aircraft technology, a communication technology, a GPS (global positioning system), a differential positioning technology and an image technology, and automatic acquisition and transmission of monitoring data are realized by carrying sensing equipment such as a high-definition camera and an intelligent sensor and combining a wireless communication network. The current unmanned aerial vehicle used for patrol monitoring of national parks and natural conservation places has the challenges of short endurance, high requirement on flight control personnel, difficulty in storage and transportation of airplanes, high difficulty in application integration and the like, and is difficult to meet the application requirements of normalized monitoring.
The automatic airport of unmanned aerial vehicle is the ground automation facility of assisting unmanned aerial vehicle full flow operation, for unmanned aerial vehicle provides all-weather protection, through automatic opening and shutting, go up and down, get and unload structural design, let unmanned aerial vehicle take off, descend, deposit and battery management all can accomplish automatically, need not artificial intervention. The unmanned aerial vehicle is stored in the automatic airport, and when flight demands exist, the unmanned aerial vehicle takes off from the airport autonomously, and automatically lands in the automatic airport after a task is finished, so that the unmanned aerial vehicle is charged in the automatic airport, preparation is made for the next task, and full-automatic operation is realized.
For realizing the normalized development of unmanned aerial vehicle in national park and the ecological monitoring work of nature protected area, satisfy the field and patrol and protect the monitoring management demand, this patent carries out path planning, electric quantity state control, commander's dispatch to unmanned aerial vehicle based on the automatic airport of unmanned aerial vehicle, and very big degree promotes unmanned aerial vehicle and patrols and protects monitoring efficiency.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a national park unmanned aerial vehicle patrol route optimization method based on reinforcement learning.
The invention is realized by the following technical scheme:
a national park unmanned aerial vehicle patrol path optimization method based on reinforcement learning comprises the following steps:
step 1: inputting three-dimensional terrain data to generate a bounded three-dimensional region
Figure 654531DEST_PATH_IMAGE001
And setting a path point set & ltSUB & gt in the air above the area according to the performance and patrol requirement of the unmanned aerial vehicle airborne camera>
Figure 681393DEST_PATH_IMAGE002
The unmanned aerial vehicle is required to complete the visual coverage task after traversing all path points;
step 2: taking the flight path of the unmanned aerial vehicle as an optimization target, adding constraint conditions of traversal path points of the unmanned aerial vehicle, electric quantity limitation of the unmanned aerial vehicle and energy consumption of task execution of the path points, and establishing an unmanned aerial vehicle path planning model with a self-service charging function;
and 3, step 3: respectively corresponding unmanned aerial vehicle, path points, charging base stations, energy, battery capacity, flight path energy consumption and path point task energy consumption in the established unmanned aerial vehicle path planning model with the self-service charging function to vehicles, clients, warehouses, goods, the maximum cargo capacity of the vehicles, the path length and the client requirements in the CVRP problem model; defining new path point task energy consumption by using a feedforward weighting method, so that the new path point task energy consumption comprises the task energy consumption of a path point and the average edge energy consumption reaching the path point; the obtained new path point task energy consumption corresponds to the customer requirements of the CVRP problem model, and then the unmanned aerial vehicle patrol path planning problem is reduced to a CVRP problem which takes the path length as an optimization target and the customer requirements and the vehicle cargo as constraints;
and 4, step 4: the CVRP problem reduced in step 3 is solved using a multi-decoder attention model.
In the above technical scheme, in step 2, an unmanned aerial vehicle path planning model with a self-service charging function is established, and the specific steps are as follows:
step 2.1: defining unmanned aerial vehicleFlight path decision variables ofx ij
x ij =1, representing unmanned aerial vehicle from a waypointiFly to the waypointj
x ij =0, meaning that the drone is not following a waypointiFly to the waypointj
Defining an objective function:
Figure 459862DEST_PATH_IMAGE003
(1)
wherein,
Figure 888569DEST_PATH_IMAGE004
is flight path energy consumption and represents the path point of the unmanned planeiAnd waypointsjEnergy consumption is needed;
the flight path decision variables are to form a complete and feasible one-time traversal path, and the constraints are as follows:
Figure 336868DEST_PATH_IMAGE005
(2)
Figure 116605DEST_PATH_IMAGE006
(3)
step 2.2: aiming at the self-service charging function of the unmanned aerial vehicle, the route planning with the charging base station is adjusted, the energy consumption of the unmanned aerial vehicle is measured by the flight path, and the maximum endurance of the unmanned aerial vehicle is recorded asQDefining the energy loss variable
Figure 183918DEST_PATH_IMAGE007
The charging base station is the starting point of the unmanned aerial vehicle and is recorded as ^ er>
Figure 467132DEST_PATH_IMAGE008
During the execution of the taskThe residual cruising distance of the unmanned aerial vehicle does not exceed the maximum cruising distance
Figure 351911DEST_PATH_IMAGE009
Is given by the following equation:
Figure 353365DEST_PATH_IMAGE010
(4)
Figure 441014DEST_PATH_IMAGE011
(5)
wherein,
Figure 906630DEST_PATH_IMAGE012
is the way point->
Figure 899994DEST_PATH_IMAGE013
Task energy consumption, representing the unmanned aerial vehicle completing the path point->
Figure 60848DEST_PATH_IMAGE013
The patrol task needs energy consumption and is based on the judgment result>
Figure 797860DEST_PATH_IMAGE014
Represents a waypoint pick>
Figure 117983DEST_PATH_IMAGE015
Out of path point->
Figure 16669DEST_PATH_IMAGE016
To the waypoint->
Figure 179666DEST_PATH_IMAGE015
Is taken into consideration, based on the decision variable of the side in question, is taken>
Figure 454789DEST_PATH_IMAGE017
Indicating that the drone is from a waypoint pick-and-place>
Figure 629419DEST_PATH_IMAGE016
Executing the task and flying to the path point->
Figure 636689DEST_PATH_IMAGE015
The residual energy after the reaction;
when unmanned aerial vehicle leaves charging base station, the electric quantity is full, and the formula is as follows:
Figure 834452DEST_PATH_IMAGE018
(6)
Figure 975583DEST_PATH_IMAGE019
indicating that the drone leaves the charging base station to a waypoint->
Figure 207982DEST_PATH_IMAGE020
The remaining energy is then selected>
Figure 136886DEST_PATH_IMAGE021
Indicating that the drone is flying from the charging base station to a waypoint->
Figure 87524DEST_PATH_IMAGE015
Is greater than or equal to>
Figure 766767DEST_PATH_IMAGE022
Is a way point>
Figure 525776DEST_PATH_IMAGE015
Task energy consumption, representing the unmanned aerial vehicle completing the path point->
Figure 937165DEST_PATH_IMAGE015
Energy consumption required by the patrol task.
In the above technical scheme, in step 3, firstly, under the condition that the edge energy consumption constraint between the path point and the path point is not considered, a deep reinforcement learning method is used to independently solve the CVRP problem corresponding to the unmanned aerial vehicle patrol path for multiple times, and the number of the solution times is recorded as
Figure 171838DEST_PATH_IMAGE023
And training the neural network in the deep reinforcement learning model again every time of solving, and using the neural network trained every time for predicting the CVRP (continuously variable Transmission protocol) problem corresponding to the patrol problem of the original unmanned aerial vehicle, and then judging whether the CVRP problem is greater than or equal to the patrol problem of the original unmanned aerial vehicle>
Figure 592455DEST_PATH_IMAGE023
Sub-resolution in>
Figure 720817DEST_PATH_IMAGE023
Different solutions are grouped to form a solution set>
Figure 568687DEST_PATH_IMAGE024
De-collecting/collecting->
Figure 290655DEST_PATH_IMAGE024
In which comprises>
Figure 249384DEST_PATH_IMAGE023
A patrol path scheme is planted;
redefining new task point energy consumption variables on the basis of the known solution set
Figure 248564DEST_PATH_IMAGE025
Figure 267336DEST_PATH_IMAGE026
(7)
Wherein,
Figure 211021DEST_PATH_IMAGE027
represents a waypoint pick>
Figure 973440DEST_PATH_IMAGE028
To the waypoint->
Figure 574930DEST_PATH_IMAGE029
Is on the side of the solution set->
Figure 764603DEST_PATH_IMAGE024
The number of occurrences in (1) corresponds to a weighted average of the path energy consumption required to reach any one of the waypoints, the weight->
Figure 461163DEST_PATH_IMAGE030
Then it is the solution set based on the path length optimization of the total patrol task>
Figure 761694DEST_PATH_IMAGE024
In the above technical solution, the solving process of step 4 includes the following steps:
step 4.1: firstly, according to the scale of input information, several groups of data sets with identical path point quantity are produced, and said data sets are equipped with
Figure 469887DEST_PATH_IMAGE031
Group data set, th->
Figure 830462DEST_PATH_IMAGE032
The information in the group dataset comprises a randomly generated start point->
Figure 748739DEST_PATH_IMAGE033
And the way point position>
Figure 852961DEST_PATH_IMAGE034
And energy is consumed by the randomly generated waypoint task->
Figure 930508DEST_PATH_IMAGE035
Wherein->
Figure 461983DEST_PATH_IMAGE036
Step 4.2: using generated
Figure 867557DEST_PATH_IMAGE031
Training a multi-decoder attention model in a block data set, where the encoder and decoder parameters are ≥ s>
Figure 509891DEST_PATH_IMAGE037
The model is trained by a strategy gradient algorithm with baseline, and parameters of the optimized model are continuously updated circularly to obtain a trained attention model of the multi-decoder;
step 4.3: after the training of the model parameters is finished, inputting the data of the task planning problem of the original unmanned aerial vehicle as a reduced CVRP problem example into the trained model, and taking the output sequence of the model at the moment as a path point access scheme of the unmanned aerial vehicle patrol problem.
In the above technical solution, in step 4.3, the data of the original unmanned aerial vehicle mission planning problem includes a starting point
Figure 192676DEST_PATH_IMAGE038
、/>
Figure 895053DEST_PATH_IMAGE039
Number of way points->
Figure 787922DEST_PATH_IMAGE040
And information of energy consumption of each path point task, wherein the energy consumption of the path point task refers to the energy consumption of the new path point task defined in the step 2.
The invention has the advantages and beneficial effects that:
the base station is introduced to provide real-time charging service for the working unmanned aerial vehicle, and the unmanned aerial vehicle can access the base station to perform charging for multiple times when executing tasks. Under the system, a constraint formula is constructed by taking the optimized unmanned aerial vehicle task path length as a target, a multi-unmanned aerial vehicle path planning model is established, and the problem is converted into a combined optimization problem. A known combined optimization solver is utilized, a feedforward weighting method is designed to calculate the path energy consumption constraint, and the problem is further converted into a vehicle path problem (CVRP) with capacity limitation. In addition, the deep reinforcement learning method based on the multi-decoder attention model can stably output a high-quality solution of a visual coverage problem for a specific scene, has generalization capability on solving the reduced unmanned aerial vehicle path planning problem, has strong adaptability to a training data set, can guarantee an efficient training network for path planning under different scenes, and can obtain the high-quality solution. Based on a trained learning model, the result can be quickly obtained by only calling neural network prediction after the unmanned aerial vehicle path problem example is reduced, the solving speed is higher than the efficiency of the traditional search algorithm, and the decision requirement of the unmanned aerial vehicle quick scheduling planning can be met.
Drawings
FIG. 1 is a flow chart of the national park unmanned aerial vehicle patrol route optimization method based on reinforcement learning.
FIG. 2 is a flow diagram of a solution of a problem instance to a multi-decoder attention model.
For a person skilled in the art, without inventive effort, other relevant figures can be derived from the above figures.
Detailed Description
In order to make the technical solution of the present invention better understood, the technical solution of the present invention is further described below with reference to specific examples.
A national park unmanned aerial vehicle patrol path optimization method based on reinforcement learning is disclosed, referring to the attached figure 1, and comprises the following steps:
step 1: inputting three-dimensional terrain data to generate a bounded three-dimensional region
Figure 233947DEST_PATH_IMAGE041
And setting a path point set & ltSUB & gt in the air above the area according to the performance and patrol requirement of the unmanned aerial vehicle airborne camera>
Figure 787550DEST_PATH_IMAGE042
Obtaining initial data
Figure 660828DEST_PATH_IMAGE043
And the unmanned aerial vehicle is required to complete the visual coverage task after traversing all path points.
Step 2: and establishing a constraint formula, taking the flight path of the unmanned aerial vehicle as an optimization target, adding constraint conditions of traversal path points of the unmanned aerial vehicle, electric quantity limitation of the unmanned aerial vehicle and energy consumption of task execution of the path points, and establishing an unmanned aerial vehicle path planning model with a self-service charging function without considering uncontrollable factors such as wind power, visibility and faults of the unmanned aerial vehicle. The method comprises the following specific steps.
Step 2.1: defining flight path decision variables for dronesx ij
x ij =1, representing unmanned aerial vehicle from a waypointiFly to the waypointj
x ij =0, mean that the drone will not follow the waypointiFly to the waypointj
Defining an objective function:
Figure 775415DEST_PATH_IMAGE044
(1)
wherein,
Figure 25131DEST_PATH_IMAGE045
is flight path energy consumption and represents the path points of the unmanned aerial vehicleiAnd a waypointjThe energy consumption between the path points is in direct proportion to the distance, and the aim of the task is to optimize the flight path of the unmanned aerial vehicle so as to minimize the flight path on the premise of completing the task. Meanwhile, flight path decision variables need to ensure that a complete and feasible one-time traversal path can be formed, and the specific constraints are as follows:
Figure 682508DEST_PATH_IMAGE046
(2)
Figure 726688DEST_PATH_IMAGE047
(3)
step 2.2: aiming at the self-service charging function of the unmanned aerial vehicle, the route planning with the charging base station is adjusted, the energy consumption of the unmanned aerial vehicle is measured by the flight path, and the maximum endurance of the unmanned aerial vehicle is recorded asQDefining energy lossVariables of
Figure 328570DEST_PATH_IMAGE048
And the charging base station, namely the departure point of the unmanned aerial vehicle, is recorded as ^ er>
Figure 116398DEST_PATH_IMAGE049
。/>
First, the drone consumes energy as it moves between waypoints and the remaining range of the drone during the mission should not exceed the maximum range
Figure 877549DEST_PATH_IMAGE050
Is given by the following equation:
Figure 92630DEST_PATH_IMAGE051
(4)
Figure 447388DEST_PATH_IMAGE011
(5)
wherein,
Figure 773327DEST_PATH_IMAGE052
is the way point->
Figure 405297DEST_PATH_IMAGE053
Task energy consumption, representing the unmanned aerial vehicle completing the path point->
Figure 588016DEST_PATH_IMAGE053
The required energy consumption of the patrol task is reduced,
Figure 367753DEST_PATH_IMAGE054
represents a waypoint pick>
Figure 497383DEST_PATH_IMAGE055
Other waypoints>
Figure 466083DEST_PATH_IMAGE056
To the path point/>
Figure 85283DEST_PATH_IMAGE055
Is taken into consideration, based on the decision variable of the side in question, is taken>
Figure 758841DEST_PATH_IMAGE057
Indicating that the drone is from a waypoint pick-and-place>
Figure 692162DEST_PATH_IMAGE056
Executing the task and flying to the path point->
Figure 157778DEST_PATH_IMAGE055
The remaining energy (i.e., electricity).
Secondly, when unmanned aerial vehicle leaves charging base station, the electric quantity is full, and the formula is expressed as follows:
Figure 885563DEST_PATH_IMAGE058
(6)
Figure 561264DEST_PATH_IMAGE059
indicating that the drone leaves the charging base station to a waypoint->
Figure 298276DEST_PATH_IMAGE055
The remaining energy is then selected>
Figure 618399DEST_PATH_IMAGE060
Indicating that the drone is flying from the charging base station to a waypoint->
Figure 517084DEST_PATH_IMAGE055
Is greater than or equal to>
Figure 430814DEST_PATH_IMAGE061
Is the way point->
Figure 705937DEST_PATH_IMAGE055
Task energy consumption for indicating that unmanned aerial vehicle completes path point->
Figure 880567DEST_PATH_IMAGE055
Energy consumption required by the patrol task.
In conclusion, an unmanned aerial vehicle path planning model with a self-service charging function is established, and the model comprises an objective function (1) and constraint formulas (2), (3), (4), (5) and (6). The solution of the model is a combined optimization problem, namely, the unmanned aerial vehicle patrol path planning problem is converted into the combined optimization problem.
And step 3: referring to table 1, the unmanned aerial vehicle, the waypoints, the charging base station, the energy (i.e., the electric quantity), the battery capacity, the flight path energy consumption, and the waypoint task energy consumption in the unmanned aerial vehicle path planning model with the self-service charging function, which are established as above, are respectively corresponding to the maximum cargo capacity, the path length, and the customer demand of the vehicle, the customer, the warehouse, the goods, and the vehicle in the CVRP problem (the capacity-limited vehicle path solving problem) model, and then the unmanned aerial vehicle path planning model is converted into the capacity-limited vehicle path solving problem (CVRP).
Table 1: correspondence between unmanned aerial vehicle path planning and CVRP problem model
Figure 215733DEST_PATH_IMAGE062
The energy consumption of the unmanned aerial vehicle comprises the edge energy consumption from the path point to the path point and the point energy consumption required by the path point to complete the patrol task, but in the CVRP problem model, the edge energy consumption is only used as an optimization target for planning the vehicle path, and only the point energy consumption is used as a constraint condition of the vehicle path. Therefore, the invention uses a feedforward weighting method to enable point energy consumption to replace 'point plus edge energy consumption', and then add edge energy consumption into the constraint condition, so that the problem of unmanned aerial vehicle patrol route planning which originally needs to consider edge energy consumption constraint and point energy consumption constraint is reduced to a CVRP problem which takes the route length as an optimization target and takes customer requirements and vehicle cargo as constraints. The specific treatment method is as follows.
First, do not examineUnder the condition of considering energy consumption constraint, a deep reinforcement learning method is used for solving CVRP (continuously variable Transmission protocol) problems corresponding to unmanned aerial vehicle routing paths for multiple times independently, and the solving times are recorded as
Figure 101912DEST_PATH_IMAGE063
And training the neural network in the deep reinforcement learning model again (or independently) every time of solving, using the neural network trained every time for predicting the CVRP (continuously variable Transmission protocol) problem corresponding to the original unmanned aerial vehicle patrol problem, and combining the training set because the generation and extraction of the training set are random>
Figure 180726DEST_PATH_IMAGE063
The sum obtained in the next training->
Figure 209862DEST_PATH_IMAGE063
The neural networks are different, and their prediction of the problem is different, so that a decision can be made whether or not a decision is made>
Figure 715930DEST_PATH_IMAGE063
Different solutions are grouped to form a solution set>
Figure 338672DEST_PATH_IMAGE064
De-collecting/collecting->
Figure 955598DEST_PATH_IMAGE064
In which comprises>
Figure 104820DEST_PATH_IMAGE063
And (6) a patrol path scheme is planted.
Redefining new path point task energy consumption based on known solution set
Figure 781789DEST_PATH_IMAGE065
(i.e., waypoint @)>
Figure 141095DEST_PATH_IMAGE066
Energy consumption required for completing the patrol task):
Figure 561712DEST_PATH_IMAGE067
(7)
wherein,
Figure 565440DEST_PATH_IMAGE068
represents a waypoint pick>
Figure 413310DEST_PATH_IMAGE069
To the waypoint->
Figure 10645DEST_PATH_IMAGE070
In the solution set>
Figure 969374DEST_PATH_IMAGE071
The number of occurrences in (1) corresponds to a weighted average of the path energy consumption required to reach any one of the waypoints, the weight->
Figure 93188DEST_PATH_IMAGE072
Then it is consulted with a solution set based on a total patrol task path length optimization>
Figure 846380DEST_PATH_IMAGE071
The obtained new path point task energy consumption
Figure 678813DEST_PATH_IMAGE065
Customer demand corresponding to CVRP problem model such that new waypoint tasks consume >>
Figure 441233DEST_PATH_IMAGE065
The task energy consumption of a path point and the average side energy consumption for reaching the path point are included, and the patrol path problem which originally needs to consider side energy consumption constraint and point energy consumption constraint is reduced to a CVRP problem which takes path length as an optimization target and takes customer demand and vehicle cargo as constraints.
And 4, step 4: the CVRP problem after reduction in step 3 is solved using a multi-decoder attention model.
The data of the unmanned aerial vehicle path planning problem comprises a starting point
Figure 419553DEST_PATH_IMAGE073
And->
Figure 609226DEST_PATH_IMAGE074
Number of way points->
Figure 915574DEST_PATH_IMAGE075
And information of task energy consumption of each path point (the path point task energy consumption refers to new path point task energy consumption defined in the step 2), and the information is reduced to information of warehouse, client demand and the like in the CVRP problem according to the step 3 and serves as input information of the model. The encoder structure of the model is based on a transformer model, a plurality of decoders with the same structure and independent parameters are used in a decoder part, the difference degree of construction solutions between the decoders is measured by Kullback-Leibler divergence (abbreviated as 'KL divergence') between probability distributions calculated by different decoders, and in addition, each decoder increases the masking of nodes when calculating attention weights and is used for realizing task path constraint in the CVRP problem. The model is trained by a policy gradient algorithm with baseline and a plurality of data sets which are randomly generated and have the same scale with the problem to be solved. Referring to fig. 2, the specific solving process is as follows.
Step 4.1: firstly, groups with the same path point number (namely, the same path point number) are generated according to the scale of input information
Figure 216105DEST_PATH_IMAGE076
) Is assumed to have shared ≥ is>
Figure 314511DEST_PATH_IMAGE077
Group data set on a fifth basis>
Figure 675085DEST_PATH_IMAGE078
For the example of a group dataset, the information therein includes a randomly generated start point @>
Figure 717996DEST_PATH_IMAGE079
And the way point position>
Figure 822219DEST_PATH_IMAGE080
And a randomly generated waypoint task energy consumption >>
Figure 775131DEST_PATH_IMAGE081
Wherein->
Figure 978711DEST_PATH_IMAGE082
Step 4.2: using generated
Figure 321967DEST_PATH_IMAGE083
Training a multi-decoder attention model in a block data set, where the encoder and decoder parameters are ≥ s>
Figure 229880DEST_PATH_IMAGE084
The model is trained by a policy gradient algorithm with baseline, parameters of the optimized model are continuously updated in a circulating way, the training target is the model parameter which is optimized to ensure that the path length of a client access scheme is shortest and KL divergence of a decoder parameter is recorded and recorded>
Figure 37299DEST_PATH_IMAGE085
Calculating the total length of the task path for the model parameters, and recording>
Figure 162512DEST_PATH_IMAGE086
And performing parameter training for the KL divergence of the decoder parameters under the model parameters according to the following algorithm to obtain the trained attention model of the multi-decoder.
The reinforcement learning algorithm with baseline is as follows:
1, inputting
Figure 258644DEST_PATH_IMAGE087
Group dataset significance level>
Figure 704669DEST_PATH_IMAGE088
Training period>
Figure 366595DEST_PATH_IMAGE089
2, initializing model parameters
Figure 239873DEST_PATH_IMAGE084
3, recording Baseline parameters
Figure 495405DEST_PATH_IMAGE090
4, current number of training sessions
Figure 541858DEST_PATH_IMAGE091
5, combining the optimization objectives according to the current
Figure 261553DEST_PATH_IMAGE087
Group dataset and parameter->
Figure 305732DEST_PATH_IMAGE084
The task path length and KL divergence of the underlying model output result are calculated ^ based on ^ s>
Figure 297828DEST_PATH_IMAGE084
Optimized direction->
Figure 85655DEST_PATH_IMAGE092
6 according to the optimization direction
Figure 722173DEST_PATH_IMAGE092
Updating parameters ÷ using Adam function>
Figure 937253DEST_PATH_IMAGE084
7, using t test parameters
Figure 167378DEST_PATH_IMAGE084
And a baseline parameter>
Figure 493317DEST_PATH_IMAGE090
If the significance is less than ≦>
Figure 984341DEST_PATH_IMAGE088
Is updated->
Figure 635902DEST_PATH_IMAGE084
8, if there is
Figure 835546DEST_PATH_IMAGE093
, />
Figure 965176DEST_PATH_IMAGE094
Returning to the step 5; otherwise, turning to the next step;
9, training is finished, and the obtained parameters are
Figure 576286DEST_PATH_IMAGE084
The multi-decoder attention model of (1).
Step 4.3: after the training of the model parameters is finished, the data (including the starting point) of the original unmanned aerial vehicle mission planning problem is processed
Figure 133169DEST_PATH_IMAGE095
Figure 72306DEST_PATH_IMAGE096
Number of way points->
Figure 5627DEST_PATH_IMAGE097
And the energy consumption information of each path point task) as a reduced CVRP problem example, inputting a trained model, and taking an output sequence of the model at the moment as a path point access scheme of the unmanned aerial vehicle patrol problem.
The invention has been described in an illustrative manner, and it is to be understood that any simple variations, modifications or other equivalent changes which can be made by one skilled in the art without departing from the spirit of the invention fall within the scope of the invention.

Claims (3)

1. A national park Unmanned Aerial Vehicle (UAV) patrol path optimization method based on reinforcement learning is characterized by comprising the following steps of:
step 1: inputting three-dimensional terrain data to generate a bounded three-dimensional region
Figure FDA0004053608490000011
According to the performance and the patrol requirement of an airborne camera of the unmanned aerial vehicle, setting a path point set V = { V } in the air above the area 1 ,v 2 ,...,v n The unmanned aerial vehicle is required to complete the visual coverage task after traversing all the path points;
and 2, step: taking the flight path of the unmanned aerial vehicle as an optimization target, adding constraint conditions of traversal path points of the unmanned aerial vehicle, electric quantity limitation of the unmanned aerial vehicle and energy consumption of task execution of the path points, and establishing an unmanned aerial vehicle path planning model with a self-service charging function; the method comprises the following specific steps:
step 2.1: defining a flight path decision variable x for an unmanned aerial vehicle ij
x ij =1, indicating that the drone flies from waypoint i to waypoint j;
x ij =0, meaning that the drone will not fly from waypoint i to waypoint j;
defining an objective function:
Figure FDA0004053608490000012
wherein, c ij The energy consumption of the flight path represents the energy consumption required between a path point i and a path point j of the unmanned aerial vehicle;
the flight path decision variables are to form a complete and feasible one-time traversal path, and the constraints are as follows:
Figure FDA0004053608490000013
Figure FDA0004053608490000014
step 2.2: aiming at the self-service charging function of the unmanned aerial vehicle, the route planning with the charging base station is adjusted, the energy consumption of the unmanned aerial vehicle is measured by the flight path, the maximum endurance of the unmanned aerial vehicle is recorded as Q, and an energy loss variable E is defined ij The charging base station is the starting point of the unmanned aerial vehicle and is marked as v 0
The remaining range of the drone during the mission does not exceed the non-negative of the maximum range Q, the formula being as follows:
Figure FDA0004053608490000015
Figure FDA0004053608490000016
wherein r is j Is the task energy consumption of the path point j, which represents the energy consumption, x, required by the unmanned aerial vehicle to finish the patrol task of the path point j ki A decision variable E representing the edge from a path point k to a path point i other than the path point i ki Representing the energy left after the unmanned aerial vehicle flies to a path point i from a path point k to execute a task;
when unmanned aerial vehicle leaves charging base station, the electric quantity is full, and the formula is as follows:
Figure FDA0004053608490000021
E 0i representing the remaining energy, x, of the unmanned aerial vehicle after leaving the charging base station and arriving at a path point i 0i A decision variable r representing the flight of the unmanned aerial vehicle from the charging base station to a path point i i The energy consumption of the task at the path point i represents the energy consumption required by the unmanned aerial vehicle to finish the patrol task at the path point i;
and step 3: respectively corresponding unmanned aerial vehicle, path points, charging base stations, energy, battery capacity, flight path energy consumption and path point task energy consumption in the established unmanned aerial vehicle path planning model with the self-service charging function to vehicles, clients, warehouses, goods, the maximum cargo capacity of the vehicles, the path length and the client requirements in the CVRP problem model; defining new path point task energy consumption by using a feedforward weighting method, so that the new path point task energy consumption comprises the task energy consumption of a path point and the average edge energy consumption reaching the path point; the obtained new path point task energy consumption corresponds to the customer requirements of the CVRP problem model, and then the unmanned aerial vehicle patrol path planning problem is reduced to a CVRP problem which takes the path length as an optimization target and the customer requirements and the vehicle cargo as constraints;
in step 3, firstly, under the condition of not considering the limit energy consumption constraint between the path points, a deep reinforcement learning method is used for solving CVRP (continuously variable Transmission protocol) problems corresponding to the unmanned aerial vehicle patrol path for multiple times independently, the solving times are recorded as N, the neural network in the deep reinforcement learning model is trained again in each solving, the neural network trained in each time is used for predicting the CVRP (continuously variable Transmission protocol) problems corresponding to the original unmanned aerial vehicle patrol problems, N groups of different solutions are obtained through N times of solving, and a solution set S is formed N Solution set S N N patrol path schemes are included;
redefining new task point energy consumption variable r 'based on known solution set' j
Figure FDA0004053608490000022
Wherein, N ij Representing the edge from Path point i to Path point j in solution set S N The number of occurrences in (1) is equivalent to the weighted average of the path energy consumption required for reaching any path point, and the weight N is ij Then the solution set S is obtained by optimizing the path length of the reference total patrol task N
And 4, step 4: the CVRP problem reduced in step 3 is solved using a multi-decoder attention model.
2. The reinforcement learning-based national park unmanned aerial vehicle patrol route optimization method according to claim 1, wherein: the solving process of the step 4 comprises the following steps:
step 4.1: firstly, according to the scale of input information, several groups of data sets with identical path point quantity are generated, K groups of data sets are set, and the information in the ith group of data sets includes randomly-generated starting point v 0 And the position of the path point
Figure FDA0004053608490000031
And randomly generated waypoint task energy consumption>
Figure FDA0004053608490000032
Wherein i =1,2,. K;
step 4.2: training a multi-decoder attention model by using the generated K groups of data set, wherein the parameters of an encoder and a decoder in the model are theta, the model is trained by a policy gradient algorithm with baseline, and parameters of the optimization model are continuously updated in a circulating manner to obtain the trained multi-decoder attention model;
step 4.3: after the training of the model parameters is finished, inputting the data of the task planning problem of the original unmanned aerial vehicle as a reduced CVRP problem example into the trained model, and taking the output sequence of the model at the moment as a path point access scheme of the unmanned aerial vehicle patrol problem.
3. The reinforcement learning-based national park unmanned aerial vehicle patrol route optimization method according to claim 2, wherein: in step 4.3, the data of the original unmanned aerial vehicle mission planning problem comprises a starting point v 0 N path points { v 1 ,...,v n And the energy consumption of each path point task refers to the new energy consumption of the path point task defined in step 2.
CN202211572414.2A 2022-12-08 2022-12-08 National park unmanned aerial vehicle patrol path optimization method based on reinforcement learning Active CN115574826B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211572414.2A CN115574826B (en) 2022-12-08 2022-12-08 National park unmanned aerial vehicle patrol path optimization method based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211572414.2A CN115574826B (en) 2022-12-08 2022-12-08 National park unmanned aerial vehicle patrol path optimization method based on reinforcement learning

Publications (2)

Publication Number Publication Date
CN115574826A CN115574826A (en) 2023-01-06
CN115574826B true CN115574826B (en) 2023-04-07

Family

ID=84590469

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211572414.2A Active CN115574826B (en) 2022-12-08 2022-12-08 National park unmanned aerial vehicle patrol path optimization method based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN115574826B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116894519B (en) * 2023-07-21 2024-06-28 江苏舟行时空智能科技股份有限公司 Position point optimization determination method meeting user service coverage requirement

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110470301A (en) * 2019-08-13 2019-11-19 上海交通大学 Unmanned plane paths planning method under more dynamic task target points

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11727812B2 (en) * 2017-07-27 2023-08-15 Beihang University Airplane flight path planning method and device based on the pigeon-inspired optimization
CN110263983B (en) * 2019-05-31 2021-09-07 中国人民解放军国防科技大学 Double-layer path planning method and system for logistics distribution of vehicles and unmanned aerial vehicles
CN110428111B (en) * 2019-08-08 2022-12-30 西安工业大学 UAV/UGV (unmanned aerial vehicle/user generated Union vector) cooperative long-time multitask operation trajectory planning method
CN111429052A (en) * 2020-03-16 2020-07-17 北京航空航天大学 Initial solution structure for vehicle path problem distributed by cooperating unmanned aerial vehicle
US10809080B2 (en) * 2020-03-23 2020-10-20 Alipay Labs (singapore) Pte. Ltd. System and method for determining routing by learned selective optimization
US20210325195A1 (en) * 2020-04-20 2021-10-21 Insurance Services Office, Inc. Systems and Methods for Automated Vehicle Routing Using Relaxed Dual Optimal Inequalities for Relaxed Columns
CN111536979B (en) * 2020-07-08 2020-10-30 浙江浙能天然气运行有限公司 Unmanned aerial vehicle routing inspection path planning method based on random optimization
CN112132312B (en) * 2020-08-14 2022-08-23 蓝海(福建)信息科技有限公司 Path planning method based on evolutionary multi-objective multi-task optimization
CN114422363B (en) * 2022-01-11 2023-04-21 北京科技大学 Capacity optimization method and device for unmanned aerial vehicle-mounted RIS auxiliary communication system
CN115065939A (en) * 2022-06-08 2022-09-16 电子科技大学长三角研究院(衢州) Auxiliary communication unmanned aerial vehicle trajectory planning and power control method capable of charging in flight
CN115185303B (en) * 2022-09-14 2023-03-24 南开大学 Unmanned aerial vehicle patrol path planning method for national parks and natural protected areas

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110470301A (en) * 2019-08-13 2019-11-19 上海交通大学 Unmanned plane paths planning method under more dynamic task target points

Also Published As

Publication number Publication date
CN115574826A (en) 2023-01-06

Similar Documents

Publication Publication Date Title
CN108229685B (en) Air-ground integrated unmanned intelligent decision-making method
CN113342046B (en) Power transmission line unmanned aerial vehicle routing inspection path optimization method based on ant colony algorithm
CN112016812A (en) Multi-unmanned aerial vehicle task scheduling method, system and storage medium
CN106959700B (en) A kind of unmanned aerial vehicle group collaboration patrol tracing path planing method based on upper limit confidence interval algorithm
CN110597286B (en) Method for realizing unmanned aerial vehicle autonomous inspection of power transmission line by using smart hangar
Liu et al. Application of unmanned aerial vehicle hangar in transmission tower inspection considering the risk probabilities of steel towers
CN114169066B (en) Space target characteristic measuring and reconnaissance method based on micro-nano constellation approaching reconnaissance
Xu et al. A brief review of the intelligent algorithm for traveling salesman problem in UAV route planning
CN115574826B (en) National park unmanned aerial vehicle patrol path optimization method based on reinforcement learning
CN115185303B (en) Unmanned aerial vehicle patrol path planning method for national parks and natural protected areas
CN115840468B (en) Autonomous line inspection method of power distribution network unmanned aerial vehicle applied to complex electromagnetic environment
Masadeh et al. Reinforcement learning-based security/safety uav system for intrusion detection under dynamic and uncertain target movement
CN114638155A (en) Unmanned aerial vehicle task allocation and path planning method based on intelligent airport
CN117726153B (en) Real-time re-planning method suitable for unmanned aerial vehicle cluster operation tasks
Wang et al. A novel hybrid algorithm based on improved particle swarm optimization algorithm and genetic algorithm for multi-UAV path planning with time windows
Zheng et al. Robustness of the planning algorithm for ocean observation tasks
CN117744994A (en) Patrol unmanned aerial vehicle-aircraft nest distribution scheduling method based on goblet sea squirt algorithm
Gaowei et al. Using multi-layer coding genetic algorithm to solve time-critical task assignment of heterogeneous UAV teaming
Zhou et al. Route planning for unmanned aircraft based on ant colony optimization and voronoi diagram
CN116578120A (en) Unmanned aerial vehicle scheduling method and device, unmanned aerial vehicle system and computer equipment
CN114371728B (en) Unmanned aerial vehicle resource scheduling method based on multi-agent collaborative optimization
Dai et al. A genetic algorithm-based research on drone trajectory planning strategy of cooperative inspection of transmission lines, substations and distribution lines
Yin et al. Multi UAV cooperative task allocation method for intensive corridors of transmission lines inspection
Li et al. Intelligent Early Warning Method Based on Drone Inspection
CN117472083B (en) Multi-unmanned aerial vehicle collaborative marine search path planning method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant