CN109655066A - One kind being based on the unmanned plane paths planning method of Q (λ) algorithm - Google Patents

One kind being based on the unmanned plane paths planning method of Q (λ) algorithm Download PDF

Info

Publication number
CN109655066A
CN109655066A CN201910071929.6A CN201910071929A CN109655066A CN 109655066 A CN109655066 A CN 109655066A CN 201910071929 A CN201910071929 A CN 201910071929A CN 109655066 A CN109655066 A CN 109655066A
Authority
CN
China
Prior art keywords
state
unmanned plane
value
space
algorithm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910071929.6A
Other languages
Chinese (zh)
Other versions
CN109655066B (en
Inventor
张迎周
竺殊荣
高扬
孙仪
张灿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Post and Telecommunication University
Original Assignee
Nanjing Post and Telecommunication University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Post and Telecommunication University filed Critical Nanjing Post and Telecommunication University
Priority to CN201910071929.6A priority Critical patent/CN109655066B/en
Publication of CN109655066A publication Critical patent/CN109655066A/en
Application granted granted Critical
Publication of CN109655066B publication Critical patent/CN109655066B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/20Instruments for performing navigational calculations
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/10Simultaneous control of position or course in three dimensions
    • G05D1/101Simultaneous control of position or course in three dimensions specially adapted for aircraft

Landscapes

  • Engineering & Computer Science (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
  • Catching Or Destruction (AREA)

Abstract

The present invention provides a kind of unmanned plane mission planning methods for being based on Q (λ) algorithm, including environmental modeling step, markov decision process model initialization step, Q (λ) algorithm iteration calculates step, optimal path step is calculated according to state value function, grid space is initialized according to unmanned plane minimum track segment length first, grid space coordinate is mapped as way point, and round and polygon threatening area is indicated, then Markovian decision model is established, it is indicated including unmanned plane during flying motion space, the design of state transition probability, the construction of reward function, then calculating is iterated on the basis of the model of building using Q (λ) algorithm, and the optimal path that can avoid the unmanned plane of threatening area safely is calculated according to final convergent state value function, the present invention learns traditional Q Algorithm is combined with effectiveness tracking, improves the convergent speed of cost function and precision, and guidance unmanned plane avoids threatening area and carries out autonomous path planning.

Description

One kind being based on the unmanned plane paths planning method of Q (λ) algorithm
Technical field
The present invention relates to a kind of unmanned plane, specifically a kind of unmanned plane paths planning method belongs to heuritic approach Technical field.
Background technique
Unmanned plane path planning is the important component of unmanned plane mission planning, is to realize that unmanned plane independently executes task Important stage.Unmanned plane path planning requires to cook up known to given known, part or in the environment of totally unknown information Target point is reached from starting point, it can be around threatening area and barrier, safe and reliable collisionless and meet various constraint items simultaneously The flight track of part.Path planning is divided into global path planning by the acquisition situation of the battlefield surroundings information according to locating for unmanned plane And local paths planning.
In practical applications, if unmanned function obtains global context knowledge, Dynamic Programming realizing route rule can be used It draws.Complexity and uncertain increase however as battlefield surroundings, the priori knowledge of the few environment of unmanned plane, so in reality Need unmanned plane that there is the stronger ability for adapting to dynamic environment in the application of border.In this case, sensor information is depended on The technology that real-time perception threatening area information carries out local paths planning just shows huge superiority.
There is algorithms easily to fall into local minimum or local oscillation, algorithm time cost for current local paths planning technology Big and computerized information amount of storage is big, rule is difficult to the problems such as determining.And the unmanned plane paths planning method of Behavior-based control is known as The hot spot studied now, essence are exactly that the ambient condition that sensor perceives is mapped to the movement of actuator, Behavior-based control It is often extremely difficult in actual complex environment to the acquisition of the design of state feature vector and the sample for having supervision in method 's.Therefore these problems are urgently to be resolved.
Summary of the invention
The object of the present invention is to provide a kind of unmanned plane mission planning methods for being based on Q (λ) algorithm, learn in conjunction with Q and imitate With tracking (Eligibility Traces), the rewards and punishments signal of quantization is given to the ambient condition of sensor perception, by continuous With the interaction of environment, guides unmanned plane to carry out autonomous path planning and avoided threatening area safely, realize to external environment The quick response of variation has the advantages that quick, real-time, promotion unmanned plane adaptability under unknown or part circumstances not known.
The present invention provides a kind of unmanned plane paths planning method for being based on Q (λ) algorithm, it is characterised in that: including following step It is rapid:
Step 1, environmental modeling: utilizing the collected environmental information of sensor, identifies threatening area, using Grid Method by nothing Man-machine flight environment of vehicle is modeled, and by continuous spatial discretization, generates uniform grid chart according to the space size of setting, will Grid vertex is as the way point after discrete;
Step 2, initialize markov decision process model: initialization is suitable for solving the unmanned plane path planning Markov decision process model, the markov decision process model can use four-tuple<S, A, P, and R>expression, S are nothing Man-machine state in which space, A are the motion space of unmanned plane, and P is state-transition matrix, and R is reward function, and Markov is determined The initialization of plan process model includes the expression to unmanned plane during flying motion space, the design of state transition probability and reward function Construction;
Step 3, it on the model established, is calculated using Q (λ) algorithm iteration: in the model that step 1 and step 2 are established On the basis of, calculating is iterated using Q (λ) algorithm for combining Q-learning algorithm and effectiveness to track;Introduce state action valence Value function Q (s takes the value of movement a a) to characterize unmanned plane in state s, establishes Q table and stores each state action to<s, and a> Value;Introduce effectiveness tracking function E (s, a) indicate final state and state behavior to<s, a>causality;It carries out first Q value and the initialization of E value, then in each learning cycle, the movement taken under s state by Boltzmann policy selection a;After execution movement a is transferred to NextState s', Q (s, value a), and pass through E value more new formula are updated by Q value more new formula The E value for updating all state actions pair, when reaching final state, when secondary learning cycle terminates, until reaching maximum study week After issue, Q (λ) algorithm iteration calculating process terminates;
Step 4, optimal path is calculated according to state value function: obtains convergent state value function after step 3, The movement a* with maximum Q value can be then selected at state s, continue to use deterministic strategy after taking movement a*, until Final state is reached, the node in grid is finally mapped into longitude and latitude and then obtains optimal path.
As the specific steps for further defining that step 1 environmental modeling of the invention are as follows:
Step 1.1 initializes grid space according to unmanned plane minimum track segment length;
Unmanned plane fly between several destinations be along rectilinear flight, and when reaching certain destinations according to track requirement and Change of flight posture, minimum track segment length are the most short distances that limits unmanned plane and must fly nonstop to before starting change of flight posture From with unmanned plane minimum track segment length setting step-length, the available Discrete Grid space for meeting unmanned plane itself constraint;
The latitude and longitude coordinates that unmanned plane start position is arranged are S=(lonS,latS), the latitude and longitude coordinates of target point are T =(lonT,latT), unmanned plane minimum track segment length is dmin, the size of grid space is m*n, by dminIt is set as grid step It grows, then the calculation formula of m, n are as follows:
Grid space coordinate is mapped as way point by step 1.2;
Using vertex raster as the way point after discrete, the coordinate in grid space uses (x, y) to indicate, setting grid is empty Between the corresponding latitude and longitude coordinates of origin (0,0) be (lono,lato), then (x, y) corresponding way point latitude and longitude coordinates (lonxy, latxy) calculation formula it is as follows: lonxy=lono+dmin*x,latxy=lato+dmin*y。
The expression of step 1.3 threatening area information;
Unmanned plane will consider the spatial position in threat source in flight course, be divided into threatening area according to threat source category Node label containing threatening area is 1, is expressed as no-fly zone by border circular areas and polygonal region in grid space Domain, the node label without containing threatening area are 0, are expressed as that region can be flown;For round threatening area, the setting area center of circle is sat It is designated as (lonc,latc), threatening area radius is r (km), for each node (x, y) in grid, according to haversine public affairs Distance d of the corresponding way point of formula calculate node to the threat area center of circlexyo, haversine equation is calculated according to latitude and longitude coordinates Distance on spherical surface between two points;
If dxyo(x, y) corresponding node label is then 1, is otherwise labeled as 0 by≤r, for polygon threatening area, With way point (lonxy,latxy) start, horizontal direction to the right (or to the left) makees a ray, calculates the ray and polygon area The intersection point number in domain, if intersection point number is odd number, way point, which is located at polygon, to be threatened in area, is by (x, y) node label 1, if intersection point number is even number, threatened outside area in polygon, is 1 by node label.
As the specific steps for further defining that the step 2 markov decision process model initialization of the invention Are as follows:
Step 2.1 indicates unmanned plane during flying motion space
Using grid vertex as way point in grid space, then a vertex to another vertex shares eight transfer sides To (except boundary point);Certain limitation is done to shift direction according to the constraint of unmanned plane itself and the threat in space distribution, it will The behavior of unmanned plane is generalized for discrete movement space, by course state with 45 ° for interval carry out discretization, can obtain 8 from Bulk state;According to the discretization course state of setting, 5 unmanned plane during flying movements are set, and flying nonstop to is indicated with number 0, turned right 45 ° are indicated with 1, and turning left 45 ° is indicated with 2, and turning right 90 ° is indicated with 3, turns left 90 ° to indicate that then motion space is expressed as A=with 4 [0,1,2,3,4], each number respectively indicate a movement;
Step 2.2 design point transition probability
After state transition probability refers to that execution acts under a certain air route state when unmanned plane, another air route state is reached Conditional probability is usedIt indicates, represents the probability that unmanned plane execution movement a at state s is transferred to state s';
Since at study initial stage, unmanned plane is unknown to environment, easily enter threatening area, unmanned plane enters threatening area i.e. Representing a learning cycle terminates, and is confined near original state to the exploration of environment, so setting is moved when what unmanned plane was taken It is will lead into threatening area or when will lead to unmanned plane leave state space, generating state does not shift, i.e. unmanned plane State does not change, and is transferred to the state that movement is directed toward for 100% under the conditions of remaining;The state space of unmanned plane is S, is threatened Regional space is O, thenCalculation formula are as follows:
The construction of step 2.3 reward function
Unmanned plane carries out that instant reward can be obtained when way point is transferred into next state, is based on the study of Q (λ) algorithm Target be exactly maximize accumulation immediately reward, the construction of reward function to consider influence track performance various indexs, including away from Distance, flight safety, threat degree of target point etc.;Indicate that unmanned plane takes movement a to be transferred to s' state at state s The instant reward function obtained, calculation formula is as follows, wherein w1、w2、w3For weighting coefficient, fd、fo、faFor by normalization The route evaluation factor of reason;
fdIt indicates visibility, takes state s' away from the inverse of target point distance, the latitude and longitude coordinates of s' are s'=(lons', lats'), the latitude and longitude coordinates of target point are T=(lonT,latT), fdCalculation formula are as follows:
foIndicate threatening area to the threat degree of state s',Wherein IoIt indicates to the current shape of unmanned plane There is the threat area set threatened in state transfer,It indicates to threaten area oiTo the threat degree of s', area o is threatenediLatitude and longitude coordinates For Calculation formula are as follows:
faIndicate that the penalty term acted to unmanned plane during flying, the flare maneuver that unmanned plane is taken are to influence unmanned plane during flying peace Full key factor;According to the unmanned plane during flying motion space that step 2.1 is arranged, by faProcessing is discrete function:
As of the invention further defining that, the step 3 on the model established, is calculated using Q (λ) algorithm iteration Specific steps are as follows:
Step 3.1 initializes Q table
To each state action in Q table to Q (s, a) carries out the initialization of Q value, Q (s,~) indicate s state lower it is stateful The initial value of movement pair, sTIndicate final state, then Q (s, calculation formula a) are as follows:
If s is final state, initial Q value is 0, otherwise sets s and s for Q valueTDistance inverse, s state pair The coordinate answered is (x, y), sTThe corresponding coordinate of state is (xT,yT), dssTCalculation formula are as follows:
Step 3.2 initializes E value
When each learning cycle starts, by all state actions to<s, a>E value E (s a) is initialized as 0;
Step 3.3 carries out movement selection using Boltzmann Distribution Strategy.
In each learning cycle, first setting original state, then according to Boltzmann Distribution Strategy selection act into The transfer of row state;Probability p (a | s) calculation formula of movement a is taken under s state are as follows:
Wherein T indicates temperature coefficient, for the exploration intensity of control strategy.Biggish temperature can be used at study initial stage Coefficient is gradually reduced temperature coefficient to guarantee stronger tactful exploring ability later.Then it is selected according to p (a | s) using wheel disc method Movement a is selected, and (s, value a) add one by E;
Step 3.4 updates Q value
Unmanned plane takes steps the movement a of 3.2 selections at state s, is transferred to state s', and obtain reward r immediately, then Q (s, more new formula a) are as follows:
Q (s, a)=Q (s, a)+α * (r+ γ * maxaQ(s′,a)-Q(s,a))*E(s,a)
Wherein α is learning rate, and γ is discount factor, and γ indicates the attention degree to future reward, maxa(s' a) is Q Maximum Q value under state s';
Step 3.5 updates E value
To all state actions to E (s, more new formula a) are as follows: ((s, a), wherein λ is weight ginseng to E by s, a)=λ * E Number, when state s' is final state, then this learning cycle terminates, and into next learning cycle, is otherwise transferred to s' state, And return step 3.2, continue learning process;
Further define that the step 4 according to the specific step of state value function calculating optimal path as of the invention Suddenly are as follows:
Step 4.1 carries out state transfer using deterministic policy
After step 3, state value Q has restrained;Original state s is set first, and selection has maximum under s state The movement a* of Q value, and state transfer is carried out, act the selection formula of a* are as follows: a*=argmaxa∈A(s a) acts a when taking to Q After being transferred to NextState s', continue that deterministic policy selection is taken to act, until reaching final state;
Mesh space is mapped to way point latitude and longitude coordinates by step 4.2
The optimal path coordinate in grid obtained in step 4.1 is mapped to way point according to the formula in step 1.2 Latitude and longitude coordinates, then obtain unmanned plane optimal path.
The invention adopts the above technical scheme compared with prior art, has following technical effect that
1. using unmanned plane minimum track segment length as discretization step-length, it is contemplated that unmanned plane itself constraint solves The shortcomings that discretization process of environmental modeling lacks foundation, obtains the discrete programming sky that can give full play to unmanned plane during flying ability Between;
2. when state transition probability is arranged, when the movement that unmanned plane is taken, which will lead to it, enters threatening area, nobody Machine not generating state shift, keep the current status unchanged the study for continuing current period, solve study initial stage nobody Machine and environmental interaction are confined to the disadvantage near original state, improve convergence speed of the algorithm;
3.Q learning algorithm does not need global environmental knowledge, but by the method for similar trial and error, constantly handed over environment Mutually, optimal policy is approached by optimization behavior memory function, is suitable for that unmanned plane under dynamic environment is to environment unknown or part Unknown situation, guidance unmanned plane carry out autonomous path planning;
4. traditional Q-learning algorithm is to see a step under current state during algorithm iteration more, by learning to calculate in Q Effectiveness is introduced in method and tracks function, the prediction of all step numbers has been comprehensively considered, so that the calculating for cost function is more acurrate.And And effective online updating can be carried out, needing not wait for a learning cycle terminates just to carry out the update of Q value, before can abandoning Learning data, accelerate the speed of algorithmic statement.
Detailed description of the invention
Unmanned plane discrete movement and its transfer result in Fig. 1 grid space.
Algorithm iteration flow chart in each learning cycle of Fig. 2.
Specific embodiment
Further explanation is done to the present invention with reference to the accompanying drawing.
In order to facilitate narration, the simple primary variables defined in algorithm:
The latitude and longitude coordinates of unmanned plane start position are S=(lonS,latS), the latitude and longitude coordinates of target point are T= (lonT,latT), the size of grid space is m*n, and point coordinate is (x, y) in grid space.Markov model with four-tuple < S, A, P, R > expression, S are unmanned plane status space, and A is unmanned plane motion space, and R is reward function, and P is state transfer Probability matrix.
The present invention proposes a kind of unmanned plane paths planning method for being based on Q (λ) algorithm, including environmental modeling step, Ma Er Section husband decision process model initialization step, Q (λ) algorithm iteration calculate step, calculate optimal path according to state value function Step;
Specific step is as follows:
Step 1) environmental modeling step
The step-length of grid space is set unmanned plane minimum track segment length d by step 1.1)min
Step 1.2) is according to formulaComputation grid space size;
Step 1.3) is according to formula lonxy=lono+dmin*x,latxy=lato+dmin* grid space coordinate is mapped as by y Way point latitude and longitude coordinates, (lono,lato) it is grid space origin (0,0) corresponding latitude and longitude coordinates;
Node label containing threatening area is 1, indicates no-fly region by step 1.4) in grid space.It will be free of The node label for having threatening area is 0, is expressed as that region can be flown;
Step 2) markov decision process model initialization
Step 2.1) is arranged 5 unmanned plane during flying movements, flies nonstop to and use number according to unmanned plane shift direction as shown in Figure 1 Word 0 indicates, turning right 45 ° is indicated with 1, and turning left 45 ° is indicated with 2, and 90 ° of right-hand rotation is indicated with 3, turning left 90 ° is indicated with 4, by unmanned plane Flare maneuver space representation is A=[0,1,2,3,4], and each number respectively indicates a movement;
Step 2.2) by state transition probability be set as the movement that unmanned plane is taken will lead to its into threatening area or When will lead to unmanned plane leave state space, generating state is not shifted, i.e., drone status does not change, general under the conditions of remaining 100% is transferred to the state that movement is directed toward.State transition probability calculation formula are as follows:
Wherein O is threatening area space;
Step 2.3) unmanned plane takes movement a to be transferred to the instant reward function that s' state obtains at state sIt calculates Formula isWherein w1、w2、w3For weighting coefficient, fd、fo、 faFor by normalized Route evaluation factor;
Step 2.4) fdIt indicates visibility, takes state s' away from the inverse of target point distance, the latitude and longitude coordinates of s' are s'= (lons',lats'), the latitude and longitude coordinates of target point are T=(lonT,latT), fdCalculation formula are as follows:
Step 2.5) foIndicate threatening area to the threat degree of state s',Wherein IoIt indicates to nobody There is the threat area set threatened in the transfer of machine current state,It indicates to threaten area oiTo the threat degree of s', area o is threatenediWarp Latitude coordinate is Calculation formula are as follows:
Step 2.6) faIndicate the penalty term acted to unmanned plane during flying, the flare maneuver that unmanned plane is taken is to influence nobody The key factor of machine flight safety.According to the unmanned plane during flying motion space that step 2.1 is arranged, by faProcessing is discrete function,
Step 3) is iterated calculating on the model established, using Q (λ) algorithm, and algorithm is in each learning cycle Iterative process it is as shown in Figure 2;
To each state action in Q table, to Q, (s a) carries out the initialization of Q value to step 3.1).Q (s,~) indicate s state Under all state actions pair initial value, sTIndicate final state, then Q (s, calculation formula a) are as follows:
Step 3.2) is when each learning cycle starts, by all state actions to<s, a>E value E (s a) is initialized It is 0;
Original state is arranged in step 3.3);
Step 3.4) carries out movement selection according to Boltzmann Distribution Strategy, taken under s state movement a Probability p (a | S) calculation formula are as follows:
Step 3.5) is according to formula:
Q (s, a)=Q (s, a)+α * (r+ γ * maxaQ(s′,a)-Q(s,a))*E(s,a)
To Q, (s a) is updated;
Step 3.6) according to formula E (s, a)=λ * E (s a) is updated E value:
Step 3.7) takes movement a to be transferred to NextState s', if s' is final state, this learning cycle terminates, Return step 3.2) enter next learning cycle, otherwise return step 3.4) continue iteration.
Step 4) calculates optimal path according to state value function:
After step 3), state value Q has restrained step 4.1), first setting original state s, selects under s state The movement a* with maximum Q value is selected, and carries out state transfer, acts the selection formula of a* are as follows: a*=argmaxa∈AQ(s,a).When After taking movement a to be transferred to NextState s', continue that deterministic policy selection is taken to act, until reaching final state;
Step 4.2) reflects the optimal path coordinate in grid obtained in step 4.1) according to the formula in step 1.3) The latitude and longitude coordinates of way point are penetrated into, then obtain unmanned plane optimal path.
The above, the only specific embodiment in the present invention, but scope of protection of the present invention is not limited thereto, appoints What is familiar with the people of the technology within the technical scope disclosed by the invention, it will be appreciated that expects transforms or replaces, and should all cover Within scope of the invention, therefore, the scope of protection of the invention shall be subject to the scope of protection specified in the patent claim.

Claims (5)

1. being based on the unmanned plane paths planning method of Q (λ) algorithm, it is characterised in that: the following steps are included:
Step 1, environmental modeling: environmental information is acquired using sensor, identifies threatening area, using Grid Method by unmanned plane during flying Environment is modeled, and by continuous spatial discretization, uniform grid chart is generated according to the space size of setting, by grid vertex As the way point after discrete;
Step 2, initialize markov decision process model: initialization is suitable for solving the Ma Er of the unmanned plane path planning Section husband decision process model, the markov decision process model four-tuple<S, A, P, R>expression, S are locating for unmanned plane State space, A be unmanned plane motion space, P is state-transition matrix, and R is reward function, markov decision process mould Type initialization includes the construction of the expression to unmanned plane during flying motion space, the design of state transition probability and reward function;
Step 3, it on the model established, is calculated using Q (λ) algorithm iteration: in the model basis that step 1 and step 2 are established On, calculating is iterated using Q (λ) algorithm for combining Q-learning algorithm and effectiveness to track;It introduces state action and is worth letter Number Q (s takes the value of movement a a) to characterize unmanned plane in state s, establishes Q table and stores each state action to<s, a>valence Value;Introduce effectiveness tracking function E (s, a) indicate final state and state behavior to<s, a>causality;Q value is carried out first It is initialized with E value, then in each learning cycle, the movement a that is taken under s state by Boltzmann policy selection;It holds After action is transferred to NextState s' as a, Q (s, value a), and update by E value more new formula are updated by Q value more new formula The E value of all state actions pair, when reaching final state, when secondary learning cycle terminates, until reaching maximum learning cycle number Afterwards, Q (λ) algorithm iteration calculating process terminates;
Step 4, optimal path is calculated according to state value function: obtains convergent state value function after step 3, then may be used To select the movement a* with maximum Q value at state s, continue after taking movement a* using deterministic strategy, until reaching Node in grid is finally mapped to longitude and latitude and then obtains optimal path by final state.
2. the unmanned plane paths planning method according to claim 1 based on Q (λ) algorithm, it is characterised in that: the step The specific steps of 1 environmental modeling are as follows:
Step 1.1 initializes grid space according to unmanned plane minimum track segment length;
It is along rectilinear flight that unmanned plane flies between several destinations, and while reaching certain destinations changes according to track requirement Flight attitude, minimum track segment length are the shortest distances that limits unmanned plane and must fly nonstop to before starting change of flight posture, with Step-length is arranged in unmanned plane minimum track segment length, can get the Discrete Grid space for meeting unmanned plane itself constraint;
The latitude and longitude coordinates that unmanned plane start position is arranged are S=(lonS,latS), the latitude and longitude coordinates of target point are T= (lonT,latT), unmanned plane minimum track segment length is dmin, the size of grid space is m*n, by dminIt is set as grid step-length, The then calculation formula of m, n are as follows:
Grid space coordinate is mapped as way point by step 1.2;
Using vertex raster as the way point after discrete, the coordinate in grid space uses (x, y) to indicate, setting grid space is former The corresponding latitude and longitude coordinates of point (0,0) are (lono,lato), then (x, y) corresponding way point latitude and longitude coordinates (lonxy,latxy) Calculation formula it is as follows: lonxy=lono+dmin*x,latxy=lato+dmin*y。
The expression of step 1.3 threatening area information;
Unmanned plane will consider the spatial position in threat source in flight course, and threatening area is divided into circle according to threat source category Node label containing threatening area is 1, is expressed as no-fly region, is free of by region and polygonal region in grid space The node label for having threatening area is 0, is expressed as that region can be flown;For round threatening area, setting area central coordinate of circle is (lonc,latc), threatening area radius is r (km), for each node (x, y) in grid, according to haversine formula meter Distance d of the corresponding way point of operator node to the threat area center of circlexyo, haversine equation is to calculate spherical surface according to latitude and longitude coordinates Distance between upper two points;
If dxyo(x, y) corresponding node label is then 1, is otherwise labeled as 0, for polygon threatening area, with boat by≤r Waypoint (lonxy,latxy) start, horizontal direction to the right (or to the left) makees a ray, calculates the ray and polygonal region Intersection point number, if intersection point number is odd number, it is 1 by (x, y) node label that way point, which is located at polygon, which to be threatened in area, if Intersection point number is even number, then threatens outside area in polygon, is 1 by node label.
3. the unmanned plane paths planning method according to claim 2 based on Q (λ) algorithm, it is characterised in that: the step The specific steps of 2 markov decision process model initializations are as follows:
Step 2.1 indicates unmanned plane during flying motion space
Using grid vertex as way point in grid space, then a vertex to another vertex shares eight shift directions (except boundary point);Certain limitation is done to shift direction according to the constraint of unmanned plane itself and the threat in space distribution, by nothing Man-machine behavior is generalized for discrete movement space, by course state with 45 ° for interval carry out discretization, can obtain 8 it is discrete State;According to the discretization course state of setting, 5 unmanned plane during flyings movements are set, fly nonstop to indicated with digital 0, turn right 45 ° with 1 indicate, turn left 45 ° indicated with 2, turn right 90 ° indicated with 3, turning left 90 ° is indicated with 4, then motion space be expressed as A=[0,1,2, 3,4], each number respectively indicates a movement;
Step 2.2 design point transition probability
After state transition probability refers to that execution acts under a certain air route state when unmanned plane, the condition of another air route state is reached Probability is usedIt indicates, represents the probability that unmanned plane execution movement a at state s is transferred to state s';
Since at study initial stage, unmanned plane is unknown to environment, easily enter threatening area, unmanned plane enters threatening area and represents One learning cycle terminates, and is confined near original state to the exploration of environment, so the movement meeting that setting is taken when unmanned plane When it being caused to enter threatening area or will lead to unmanned plane leave state space, generating state is not shifted, i.e. drone status It does not change, is transferred to the state that movement is directed toward for 100% under the conditions of remaining;The state space of unmanned plane is S, threatening area Space is O, thenCalculation formula are as follows:
The construction of step 2.3 reward function
Unmanned plane carries out that instant reward can be obtained when way point is transferred into next state, is based on the learning objective of Q (λ) algorithm It is exactly to maximize accumulation reward immediately, the construction of reward function will consider to influence the various indexs of track performance, including away from target Distance, flight safety, the threat degree etc. of point;Indicate that unmanned plane takes movement a to be transferred to the acquisition of s' state at state s Instant reward function, calculation formula is as follows, wherein w1、w2、w3For weighting coefficient, fd、fo、faFor the boat Jing Guo normalized Mark factor of evaluation;
fdIt indicates visibility, takes state s' away from the inverse of target point distance, the latitude and longitude coordinates of s' are s'=(lons',lats'), The latitude and longitude coordinates of target point are T=(lonT,latT), fdCalculation formula are as follows:
foIndicate threatening area to the threat degree of state s',Wherein IoIt indicates to turn unmanned plane current state The threat area set for existing and threatening is moved,It indicates to threaten area oiTo the threat degree of s', area o is threatenediLatitude and longitude coordinates be Calculation formula are as follows:
faIndicate that the penalty term acted to unmanned plane during flying, the flare maneuver that unmanned plane is taken are to influence unmanned plane during flying safety Key factor;According to the unmanned plane during flying motion space that step 2.1 is arranged, by faProcessing is discrete function:
4. the unmanned plane paths planning method according to claim 3 based on Q (λ) algorithm, it is characterised in that: the step 3 on the model established, the specific steps calculated using Q (λ) algorithm iteration are as follows:
Step 3.1 initializes Q table
To each state action in Q table, to Q, (s, a) carries out the initialization of Q value, and Q (s,~) indicates all state actions under s state Pair initial value, sTIndicate final state, then Q (s, calculation formula a) are as follows:
If s is final state, initial Q value is 0, otherwise sets s and s for Q valueTDistance inverse, s state is corresponding Coordinate is (x, y), sTThe corresponding coordinate of state is (xT,yT), dssTCalculation formula are as follows:
Step 3.2 initializes E value
When each learning cycle starts, by all state actions to<s, a>E value E (s a) is initialized as 0;
Step 3.3 carries out movement selection using Boltzmann Distribution Strategy.
In each learning cycle, original state is set first, is then acted according to Boltzmann Distribution Strategy selection and is carried out shape State transfer;Probability p (a | s) calculation formula of movement a is taken under s state are as follows:
Wherein T indicates temperature coefficient, for the exploration intensity of control strategy.Biggish temperature coefficient can be used at study initial stage To guarantee stronger tactful exploring ability, it is gradually reduced temperature coefficient later.Then it is selected according to p (a | s) using wheel disc method dynamic Make a, and (s, value a) add one by E;
Step 3.4 updates Q value
Unmanned plane takes steps the movement a of 3.2 selections at state s, is transferred to state s', and obtain reward r immediately, then Q (s, A) more new formula are as follows:
Q (s, a)=Q (s, a)+α * (r+ γ * maxaQ(s′,a)-Q(s,a))*E(s,a)
Wherein α is learning rate, and γ is discount factor, and γ indicates the attention degree to future reward, maxa(s' a) is state s' to Q Under maximum Q value;
Step 3.5 updates E value
To all state actions to E (s, more new formula a) are as follows: E (s, a)=λ * E (and s, a), wherein λ is weight parameter, when When state s' is final state, then this learning cycle terminates, and into next learning cycle, is otherwise transferred to s' state, and return Step 3.2 is returned, learning process is continued.
5. the unmanned plane paths planning method according to claim 4 based on Q (λ) algorithm, it is characterised in that: the step 4 calculate the specific steps of optimal path according to state value function are as follows:
Step 4.1 carries out state transfer using deterministic policy
After step 3, state value Q has restrained;Original state s is set first, and selection has maximum Q value under s state Movement a*, and carry out state transfer, act the selection formula of a* are as follows: a*=argmaxa∈A(s a) acts a transfer when taking to Q To after NextState s', continue that deterministic policy selection is taken to act, until reaching final state;
Mesh space is mapped to way point latitude and longitude coordinates by step 4.2
Optimal path coordinate in grid obtained in step 4.1 is mapped to the warp of way point according to the formula in step 1.2 Latitude coordinate then obtains unmanned plane optimal path.
CN201910071929.6A 2019-01-25 2019-01-25 Unmanned aerial vehicle path planning method based on Q (lambda) algorithm Active CN109655066B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910071929.6A CN109655066B (en) 2019-01-25 2019-01-25 Unmanned aerial vehicle path planning method based on Q (lambda) algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910071929.6A CN109655066B (en) 2019-01-25 2019-01-25 Unmanned aerial vehicle path planning method based on Q (lambda) algorithm

Publications (2)

Publication Number Publication Date
CN109655066A true CN109655066A (en) 2019-04-19
CN109655066B CN109655066B (en) 2022-05-17

Family

ID=66121623

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910071929.6A Active CN109655066B (en) 2019-01-25 2019-01-25 Unmanned aerial vehicle path planning method based on Q (lambda) algorithm

Country Status (1)

Country Link
CN (1) CN109655066B (en)

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110134140A (en) * 2019-05-23 2019-08-16 南京航空航天大学 A kind of unmanned plane paths planning method based on potential function award DQN under the unknown continuous state of environmental information
CN110324805A (en) * 2019-07-03 2019-10-11 东南大学 A kind of radio sensor network data collection method of unmanned plane auxiliary
CN110320931A (en) * 2019-06-20 2019-10-11 西安爱生技术集团公司 Unmanned plane avoidance Route planner based on Heading control rule
CN110428115A (en) * 2019-08-13 2019-11-08 南京理工大学 Maximization system benefit method under dynamic environment based on deeply study
CN110673637A (en) * 2019-10-08 2020-01-10 福建工程学院 Unmanned aerial vehicle pseudo path planning method based on deep reinforcement learning
CN110726416A (en) * 2019-10-23 2020-01-24 西安工程大学 Reinforced learning path planning method based on obstacle area expansion strategy
CN110879610A (en) * 2019-10-24 2020-03-13 北京航空航天大学 Reinforced learning method for autonomous optimizing track planning of solar unmanned aerial vehicle
CN111006693A (en) * 2019-12-12 2020-04-14 中国人民解放军陆军工程大学 Intelligent aircraft track planning system and method thereof
CN111026157A (en) * 2019-12-18 2020-04-17 四川大学 Intelligent aircraft guiding method based on reward remodeling reinforcement learning
CN111123963A (en) * 2019-12-19 2020-05-08 南京航空航天大学 Unknown environment autonomous navigation system and method based on reinforcement learning
CN111160755A (en) * 2019-12-26 2020-05-15 西北工业大学 DQN-based real-time scheduling method for aircraft overhaul workshop
CN111328023A (en) * 2020-01-18 2020-06-23 重庆邮电大学 Mobile equipment multitask competition unloading method based on prediction mechanism
CN111340324A (en) * 2019-09-25 2020-06-26 中国人民解放军国防科技大学 Multilayer multi-granularity cluster task planning method based on sequential distribution
CN111399541A (en) * 2020-03-30 2020-07-10 西北工业大学 Unmanned aerial vehicle whole-region reconnaissance path planning method of unsupervised learning type neural network
CN111479216A (en) * 2020-04-10 2020-07-31 北京航空航天大学 Unmanned aerial vehicle cargo conveying method based on UWB positioning
CN111538059A (en) * 2020-05-11 2020-08-14 东华大学 Self-adaptive rapid dynamic positioning system and method based on improved Boltzmann machine
CN111612162A (en) * 2020-06-02 2020-09-01 中国人民解放军军事科学院国防科技创新研究院 Reinforced learning method and device, electronic equipment and storage medium
CN111736461A (en) * 2020-06-30 2020-10-02 西安电子科技大学 Unmanned aerial vehicle task collaborative allocation method based on Q learning
CN111880563A (en) * 2020-07-17 2020-11-03 西北工业大学 Multi-unmanned aerial vehicle task decision method based on MADDPG
CN112130124A (en) * 2020-09-18 2020-12-25 北京北斗天巡科技有限公司 Rapid calibration and error processing method for unmanned aerial vehicle management and control equipment in civil aviation airport
CN112356031A (en) * 2020-11-11 2021-02-12 福州大学 On-line planning method based on Kernel sampling strategy under uncertain environment
CN112525213A (en) * 2021-02-10 2021-03-19 腾讯科技(深圳)有限公司 ETA prediction method, model training method, device and storage medium
CN113033815A (en) * 2021-02-07 2021-06-25 广州杰赛科技股份有限公司 Intelligent valve cooperation control method, device, equipment and storage medium
CN113093803A (en) * 2021-04-03 2021-07-09 西北工业大学 Unmanned aerial vehicle air combat motion control method based on E-SAC algorithm
CN113176786A (en) * 2021-04-23 2021-07-27 成都凯天通导科技有限公司 Q-Learning-based hypersonic aircraft dynamic path planning method
CN113867369A (en) * 2021-12-03 2021-12-31 中国人民解放军陆军装甲兵学院 Robot path planning method based on alternating current learning seagull algorithm
CN114020009A (en) * 2021-10-20 2022-02-08 中国航空工业集团公司洛阳电光设备研究所 Terrain penetration planning method for small-sized fixed-wing unmanned aerial vehicle
CN114115340A (en) * 2021-11-15 2022-03-01 南京航空航天大学 Airspace cooperative control method based on reinforcement learning
CN114153213A (en) * 2021-12-01 2022-03-08 吉林大学 Deep reinforcement learning intelligent vehicle behavior decision method based on path planning
CN115562357A (en) * 2022-11-23 2023-01-03 南京邮电大学 Intelligent path planning method for unmanned aerial vehicle cluster
WO2024020923A1 (en) * 2022-07-27 2024-02-01 苏州泽达兴邦医药科技有限公司 Granulation process for traditional chinese medicine production, and process strategy calculation method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108171315A (en) * 2017-12-27 2018-06-15 南京邮电大学 Multiple no-manned plane method for allocating tasks based on SMC particle cluster algorithms
CN108170147A (en) * 2017-12-31 2018-06-15 南京邮电大学 A kind of unmanned plane mission planning method based on self organizing neural network
CN108319286A (en) * 2018-03-12 2018-07-24 西北工业大学 A kind of unmanned plane Air Combat Maneuvering Decision Method based on intensified learning
CN108413959A (en) * 2017-12-13 2018-08-17 南京航空航天大学 Based on the Path Planning for UAV for improving Chaos Ant Colony Optimization
US20180308371A1 (en) * 2017-04-19 2018-10-25 Beihang University Joint search method for uav multiobjective path planning in urban low altitude environment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180308371A1 (en) * 2017-04-19 2018-10-25 Beihang University Joint search method for uav multiobjective path planning in urban low altitude environment
CN108413959A (en) * 2017-12-13 2018-08-17 南京航空航天大学 Based on the Path Planning for UAV for improving Chaos Ant Colony Optimization
CN108171315A (en) * 2017-12-27 2018-06-15 南京邮电大学 Multiple no-manned plane method for allocating tasks based on SMC particle cluster algorithms
CN108170147A (en) * 2017-12-31 2018-06-15 南京邮电大学 A kind of unmanned plane mission planning method based on self organizing neural network
CN108319286A (en) * 2018-03-12 2018-07-24 西北工业大学 A kind of unmanned plane Air Combat Maneuvering Decision Method based on intensified learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
YANG GAO等: "Multi-UAV Task Allocation Based on Improved Algorithm of Multi-objective Particle Swarm Optimization", 《2018 INTERNATIONAL CONFERENCE ON CYBER-ENABLED DISTRIBUTED COMPUTING AND KNOWLEDGE DISCOVERY (CYBERC)》 *
郝钏钏等: "基于Q学习的无人机三维航迹规划算法", 《上海交通大学学报》 *
陈侠等: "应用改进神经网络的无人机三维航迹规划", 《电光与控制》 *

Cited By (45)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110134140A (en) * 2019-05-23 2019-08-16 南京航空航天大学 A kind of unmanned plane paths planning method based on potential function award DQN under the unknown continuous state of environmental information
CN110134140B (en) * 2019-05-23 2022-01-11 南京航空航天大学 Unmanned aerial vehicle path planning method based on potential function reward DQN under continuous state of unknown environmental information
CN110320931A (en) * 2019-06-20 2019-10-11 西安爱生技术集团公司 Unmanned plane avoidance Route planner based on Heading control rule
CN110324805A (en) * 2019-07-03 2019-10-11 东南大学 A kind of radio sensor network data collection method of unmanned plane auxiliary
CN110324805B (en) * 2019-07-03 2022-03-08 东南大学 Unmanned aerial vehicle-assisted wireless sensor network data collection method
CN110428115A (en) * 2019-08-13 2019-11-08 南京理工大学 Maximization system benefit method under dynamic environment based on deeply study
CN111340324B (en) * 2019-09-25 2022-06-07 中国人民解放军国防科技大学 Multilayer multi-granularity cluster task planning method based on sequential distribution
CN111340324A (en) * 2019-09-25 2020-06-26 中国人民解放军国防科技大学 Multilayer multi-granularity cluster task planning method based on sequential distribution
CN110673637A (en) * 2019-10-08 2020-01-10 福建工程学院 Unmanned aerial vehicle pseudo path planning method based on deep reinforcement learning
CN110673637B (en) * 2019-10-08 2022-05-13 福建工程学院 Unmanned aerial vehicle pseudo path planning method based on deep reinforcement learning
CN110726416A (en) * 2019-10-23 2020-01-24 西安工程大学 Reinforced learning path planning method based on obstacle area expansion strategy
CN110879610A (en) * 2019-10-24 2020-03-13 北京航空航天大学 Reinforced learning method for autonomous optimizing track planning of solar unmanned aerial vehicle
CN111006693A (en) * 2019-12-12 2020-04-14 中国人民解放军陆军工程大学 Intelligent aircraft track planning system and method thereof
CN111006693B (en) * 2019-12-12 2021-12-21 中国人民解放军陆军工程大学 Intelligent aircraft track planning system and method thereof
CN111026157A (en) * 2019-12-18 2020-04-17 四川大学 Intelligent aircraft guiding method based on reward remodeling reinforcement learning
CN111123963A (en) * 2019-12-19 2020-05-08 南京航空航天大学 Unknown environment autonomous navigation system and method based on reinforcement learning
CN111160755A (en) * 2019-12-26 2020-05-15 西北工业大学 DQN-based real-time scheduling method for aircraft overhaul workshop
CN111160755B (en) * 2019-12-26 2023-08-18 西北工业大学 Real-time scheduling method for aircraft overhaul workshop based on DQN
CN111328023A (en) * 2020-01-18 2020-06-23 重庆邮电大学 Mobile equipment multitask competition unloading method based on prediction mechanism
CN111399541B (en) * 2020-03-30 2022-07-15 西北工业大学 Unmanned aerial vehicle whole-region reconnaissance path planning method of unsupervised learning type neural network
CN111399541A (en) * 2020-03-30 2020-07-10 西北工业大学 Unmanned aerial vehicle whole-region reconnaissance path planning method of unsupervised learning type neural network
CN111479216A (en) * 2020-04-10 2020-07-31 北京航空航天大学 Unmanned aerial vehicle cargo conveying method based on UWB positioning
CN111538059A (en) * 2020-05-11 2020-08-14 东华大学 Self-adaptive rapid dynamic positioning system and method based on improved Boltzmann machine
CN111612162A (en) * 2020-06-02 2020-09-01 中国人民解放军军事科学院国防科技创新研究院 Reinforced learning method and device, electronic equipment and storage medium
CN111736461B (en) * 2020-06-30 2021-05-04 西安电子科技大学 Unmanned aerial vehicle task collaborative allocation method based on Q learning
CN111736461A (en) * 2020-06-30 2020-10-02 西安电子科技大学 Unmanned aerial vehicle task collaborative allocation method based on Q learning
CN111880563A (en) * 2020-07-17 2020-11-03 西北工业大学 Multi-unmanned aerial vehicle task decision method based on MADDPG
CN111880563B (en) * 2020-07-17 2022-07-15 西北工业大学 Multi-unmanned aerial vehicle task decision method based on MADDPG
CN112130124B (en) * 2020-09-18 2023-11-24 郑州市混沌信息技术有限公司 Quick calibration and error processing method for unmanned aerial vehicle management and control equipment in civil aviation airport
CN112130124A (en) * 2020-09-18 2020-12-25 北京北斗天巡科技有限公司 Rapid calibration and error processing method for unmanned aerial vehicle management and control equipment in civil aviation airport
CN112356031A (en) * 2020-11-11 2021-02-12 福州大学 On-line planning method based on Kernel sampling strategy under uncertain environment
CN112356031B (en) * 2020-11-11 2022-04-01 福州大学 On-line planning method based on Kernel sampling strategy under uncertain environment
CN113033815A (en) * 2021-02-07 2021-06-25 广州杰赛科技股份有限公司 Intelligent valve cooperation control method, device, equipment and storage medium
CN112525213B (en) * 2021-02-10 2021-05-14 腾讯科技(深圳)有限公司 ETA prediction method, model training method, device and storage medium
CN112525213A (en) * 2021-02-10 2021-03-19 腾讯科技(深圳)有限公司 ETA prediction method, model training method, device and storage medium
CN113093803A (en) * 2021-04-03 2021-07-09 西北工业大学 Unmanned aerial vehicle air combat motion control method based on E-SAC algorithm
CN113176786A (en) * 2021-04-23 2021-07-27 成都凯天通导科技有限公司 Q-Learning-based hypersonic aircraft dynamic path planning method
CN114020009A (en) * 2021-10-20 2022-02-08 中国航空工业集团公司洛阳电光设备研究所 Terrain penetration planning method for small-sized fixed-wing unmanned aerial vehicle
CN114020009B (en) * 2021-10-20 2024-03-29 中国航空工业集团公司洛阳电光设备研究所 Small fixed-wing unmanned aerial vehicle terrain burst prevention planning method
CN114115340A (en) * 2021-11-15 2022-03-01 南京航空航天大学 Airspace cooperative control method based on reinforcement learning
CN114153213A (en) * 2021-12-01 2022-03-08 吉林大学 Deep reinforcement learning intelligent vehicle behavior decision method based on path planning
CN113867369A (en) * 2021-12-03 2021-12-31 中国人民解放军陆军装甲兵学院 Robot path planning method based on alternating current learning seagull algorithm
WO2024020923A1 (en) * 2022-07-27 2024-02-01 苏州泽达兴邦医药科技有限公司 Granulation process for traditional chinese medicine production, and process strategy calculation method
CN115562357A (en) * 2022-11-23 2023-01-03 南京邮电大学 Intelligent path planning method for unmanned aerial vehicle cluster
CN115562357B (en) * 2022-11-23 2023-03-14 南京邮电大学 Intelligent path planning method for unmanned aerial vehicle cluster

Also Published As

Publication number Publication date
CN109655066B (en) 2022-05-17

Similar Documents

Publication Publication Date Title
CN109655066A (en) One kind being based on the unmanned plane paths planning method of Q (λ) algorithm
Kahn et al. Badgr: An autonomous self-supervised learning-based navigation system
Wang et al. Autonomous navigation of UAVs in large-scale complex environments: A deep reinforcement learning approach
Singla et al. Memory-based deep reinforcement learning for obstacle avoidance in UAV with limited environment knowledge
Choi et al. Unmanned aerial vehicles using machine learning for autonomous flight; state-of-the-art
Sun et al. Motion planning for mobile robots—Focusing on deep reinforcement learning: A systematic review
CN106483852B (en) A kind of stratospheric airship control method based on Q-Learning algorithm and neural network
CN110362089A (en) A method of the unmanned boat independent navigation based on deeply study and genetic algorithm
Dong et al. A review of mobile robot motion planning methods: from classical motion planning workflows to reinforcement learning-based architectures
Lin et al. A gated recurrent unit-based particle filter for unmanned underwater vehicle state estimation
Xie et al. Learning with stochastic guidance for robot navigation
CN109597425A (en) Navigation of Pilotless Aircraft and barrier-avoiding method based on intensified learning
CN113268074B (en) Unmanned aerial vehicle flight path planning method based on joint optimization
Alkowatly et al. Bioinspired autonomous visual vertical control of a quadrotor unmanned aerial vehicle
Stevšić et al. Sample efficient learning of path following and obstacle avoidance behavior for quadrotors
Li et al. A behavior-based mobile robot navigation method with deep reinforcement learning
Katyal et al. High-speed robot navigation using predicted occupancy maps
Xue et al. A uav navigation approach based on deep reinforcement learning in large cluttered 3d environments
Li et al. A warm-started trajectory planner for fixed-wing unmanned aerial vehicle formation
Olaz et al. Quadcopter neural controller for take-off and landing in windy environments
Heidari et al. Improved black hole algorithm for efficient low observable UCAV path planning in constrained aerospace
Zhang et al. A State-Decomposition DDPG Algorithm for UAV Autonomous Navigation in 3D Complex Environments
Wu et al. Multi-objective reinforcement learning for autonomous drone navigation in urban areas with wind zones
Hua et al. A Novel Learning-Based Trajectory Generation Strategy for a Quadrotor
Talha et al. Autonomous UAV Navigation in Wilderness Search-and-Rescue Operations Using Deep Reinforcement Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant