CN113093124A

CN113093124A - DQN algorithm-based real-time allocation method for radar interference resources

Info

Publication number: CN113093124A
Application number: CN202110370353.0A
Authority: CN
Inventors: 蒋伊琳; 黄星源; 尚熙; 陈涛; 赵忠凯; 郭立民; 刘鲁涛
Original assignee: Harbin Engineering University
Current assignee: Harbin Engineering University
Priority date: 2021-04-07
Filing date: 2021-04-07
Publication date: 2021-07-09
Anticipated expiration: 2041-04-07
Also published as: CN113093124B

Abstract

The invention belongs to the technical field of radar interference, and particularly relates to a method for real-time allocation of radar interference resources based on a DQN algorithm. The invention introduces the DQN algorithm into the interference pattern resource allocation of the unmanned aerial vehicle, overcomes the defects of the prior art on dynamic and real-time allocation, realizes the real-time allocation of the interference pattern resource of the unmanned aerial vehicle from the start of a task to the completion of the task, and can be used for processing the condition that the radar has a plurality of working mode conversions.

Description

DQN algorithm-based real-time allocation method for radar interference resources

Technical Field

The invention belongs to the technical field of radar interference, and particularly relates to a method for real-time allocation of radar interference resources based on a DQN algorithm.

Background

At present, more and more radars are automatically changed according to the surrounding environment, so that the requirement on an interference resource allocation strategy is higher and higher, the unmanned aerial vehicle is required to be capable of adaptively changing the strategy of the unmanned aerial vehicle in real time according to the obtained parameters of the radars, and the current threatened radars can be effectively interfered in real time and quickly in the whole flight process. Therefore, the method has important significance for researching interference pattern resources of the unmanned aerial vehicle distributed in real time along with the flight range in the flight process.

The allocation of interference pattern resources results in a large amount of data accumulation and calculation, which puts higher demands on the ability of the drones to quickly allocate the interference pattern resources they carry. The existing algorithms which are applicable to the problem comprise a traditional dynamic planning algorithm and an intelligent population search algorithm, the two algorithms are not dynamic but static for the distribution of interference pattern resources carried by the unmanned aerial vehicle, the distribution mode of the interference pattern resources cannot be changed in real time along with the flying distance of the unmanned aerial vehicle, particularly the situation that the radar is in multiple working modes, and the multifunctional radar is classified into 3 working modes for searching, tracking and guiding. In order to make up for the deficiency of the allocation mode under the algorithm, the invention provides that the DQN algorithm is introduced into the research of the interference pattern resource allocation of the unmanned aerial vehicle, so that the defects of the two algorithms on dynamic and real-time allocation can be made up, and the situation that the radar has multiple working mode conversions can be processed.

Disclosure of Invention

The invention aims to provide a method for real-time allocation of radar interference resources based on a DQN algorithm.

The purpose of the invention is realized by the following technical scheme: the method comprises the following steps:

step 1: obtaining an interference resource pool J ═ J₁,j₂,......,j_xThe radar resource pool P ═ P₁,P₂,......,P_mThe unmanned aerial vehicle group jam to be distributed is { jam ═ jam₁,jam₂,...,jam_m}; obtaining the required success rate SR of the task executed by the unmanned aerial vehicle group_max；

Wherein x represents the number of interference patterns; the number of unmanned aerial vehicles in the unmanned aerial vehicle group is the same as the number of radars in the environment, and is m;

step 2: setting the distance L from the starting point to the task point, the number of iteration steps num and the maximum capacity D of an empirical playback pool of the unmanned aerial vehicle_max(ii) a Initializing state S of m-section radar with t equal to 1₁＝{s₁₁,s₂₁,...,s_m1}; initializing an experience playback pool

Wherein,

representing unmanned aerial vehicle jam_uInterfering radar P_iThe state at the t step;

representing unmanned aerial vehicle jam_uThe accumulated flight distance at the t-th step,

fucl_i(t) denotes a radar P_iThe state at the t step; u ═ 1,2, ·, m, i ═ 1,2, ·, m;

and step 3: interference action A executed by selecting unmanned aerial vehicle cluster by greedy strategy_t＝{a_1t,a_2t,...,a_mt}；

Wherein, a_ut＝{P_i,j_kDenotes unmanned plane jam_uFor radar P at t step_iPerforming a jamming action j_k，j_k∈J；

And 4, step 4: performing a disturbing action A_tThen, according to the reward value R_tObtaining the state S of m radars_t+1；

Wherein if radar P_iKeeping the working mode unchanged, then s_itThe change is not changed; if radar P_iFrom search mode to tracking mode or from tracking mode to guidance mode, s_itIncreasing; if radar P_iFrom the guidance mode to the tracking mode, or from the guidance mode to the search mode, or from the tracking mode to the search mode, s_itDecrease;

and 5: will (S)_t,A_t,R_t,S_t+1) Storing the experience playback pool D; if the experience playback pool D does not reach the maximum capacity D_maxIf so, changing t to t +1, and returning to the step 3; otherwise, executing step 6;

step 6: initialization G₁＝0，G ₂0; randomly sampling a batch of samples from an experience pool D, and converting the state s_itAnd action a_itPerforming combined input into the neural network for training, and utilizing DQN algorithm to correspond to the state s of the neural network at each step_itCorrecting the output action to make the output of the neural network to act a_itApproaching;

and 7: predicting the action taken by the unmanned aerial vehicle cluster from 1-num step according to the trained neural network, and recording whether the unmanned aerial vehicle cluster successfully reaches a task point after num step;

and 8: repeatedly executing the step 7, and calculating the success rate sr of the task executed by the unmanned aerial vehicle group; if the success rate SR of the unmanned aerial vehicle group to execute the task is larger than the SR_maxEnding the training and executing the step 9; otherwise, returning to the step 2;

sr＝(G₂/G₁)

wherein G is₁The total times of executing the step 7, namely the total flying times of the unmanned aerial vehicle group; g₂The number of times of completing the flight mission for the unmanned aerial vehicle group;

and step 9: the neural network meeting the success rate of the task requirement is used for executing the real-time allocation of radar interference resources of the unmanned aerial vehicle cluster, and the state S of m radars at a certain moment_tInput to meet the required success rate of the taskObtaining the interference action A taken by the unmanned aerial vehicle group in the neural network_tNamely, the real-time allocation result of the radar interference resources of the unmanned aerial vehicle cluster.

The invention has the beneficial effects that:

the invention introduces the DQN algorithm into the interference pattern resource allocation of the unmanned aerial vehicle, overcomes the defects of the prior art on dynamic and real-time allocation, realizes the real-time allocation of the interference pattern resource of the unmanned aerial vehicle from the start of a task to the completion of the task, and can be used for processing the condition that the radar has a plurality of working mode conversions.

Drawings

Fig. 1 is a DQN learning diagram.

Fig. 2 is a flow chart of DQN algorithm training in conjunction with radar interference strategy assignment.

Fig. 3 is a conversion diagram of the operation mode of the multifunctional radar.

Fig. 4 shows the relationship between the radar and the radial position of the drone.

Figure 5 is a tensorbard code visualization diagram.

Fig. 6 is an interference resource allocation diagram when t is 20 steps.

Fig. 7 is an interference resource allocation diagram when t is 40 steps.

Fig. 8 is an interference resource allocation diagram when t is 60 steps.

Fig. 9 is an interference resource allocation diagram when t is 80 steps.

FIG. 10 is a graph of error as a function of iteration number.

FIG. 11 is a graph of flight success rate as a function of iteration number.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

The invention aims to provide a DQN-based method suitable for dynamically allocating interference pattern resources carried by an unmanned aerial vehicle, and particularly relates to a method for realizing the real-time allocation of the interference pattern resources from the start to the completion of a task of the unmanned aerial vehicle by processing the condition that a radar has multiple working modes.

The invention uses DQN algorithm as a solving tool, a network structure is shown in figure 1, the network structure is introduced into the distribution of a plurality of multifunctional radar interference resources, a dynamic distribution strategy of the interference resources in the environment changing along with the unmanned aerial vehicle group range is researched by adopting a one-to-one interference mode on the basis of a complex electronic countermeasure environment, the flow of the whole scheme is shown in figure 2, and the method comprises the following steps:

step 1: bringing the electronic countermeasure information into an interference resource pool J and a radar resource pool P ═ P₁,P₂,......,P_mThe unmanned aerial vehicle group jam to be distributed is { jam ═ jam₁,jam₂,...,jam_m}; wherein J ═ { J ═ J₁,j₂,......,j_xX represents the number of interference patterns.

Ground radar resource pool P ═ { P ═ P₁,P₂,......,P_mAnd m represents the number of radars in the environment. P_i＝{fucl,sys,pp,gr,qs}，P_iThe method comprises the steps of (1) representing an ith multifunctional radar, and fucl representing different working mode parameter sets of the multifunctional radar; qs represents a measure of the radar against interference; sys represents radar constitution, representing different radar types; pp denotes the peak power (KW) of the radar and gr denotes the radar antenna gain (dB).

Relevant parameters in fucl: fcl ═ pw_j,bw_j,prf_j,rf_jJ is 0-2, which represents three different working modes of the first multifunctional radar, wherein pw_j，bw_j，prf_j，rf_jThe radar signal pulse width, the receiver bandwidth, the pulse repetition frequency and the carrier frequency of the radar under different modes are respectively.

Unmanned aerial vehicle group jam ═ jam of interference resource to be distributed₁,jam₂,...,jam_mAnd m represents the number of drones. Wherein the ith unmanned plane is jam_i＝{p_jam,gj,bw_jam,J}，p_jamFor unmanned aerial vehicle power (W), gj for unmanned aerial vehicle antenna gain (dB), bw_jamIs the drone bandwidth (MHz).

Step 2: setting of relevant parameters in DQN networks, D_num(empirical playback set size), γ (reward discount factor), r (learning rate), ε (ε -greedy), C (number of network weight reset steps).

And step 3: training of the DQN network is started.The distance from a starting point to a task point of the unmanned aerial vehicle is L, the unmanned aerial vehicle is divided into num steps, t represents that the unmanned aerial vehicle flies t steps, the initialized t is 1, and the state S of m radars is detected₁＝{s₁₁,s₂₁,...,s_m1}; initializing an experience playback pool

Having a capacity of D_max(ii) a Initializing a randomly generated weight θ₁；

Wherein,

express the u unmanned plane jam_uThe state of the interference ith radar in the step t; jam_uRepresents the accumulated flight distance of the unmanned plane during the step t, and

and 4, step 4: interference action A executed by selecting unmanned aerial vehicle cluster by greedy strategy_t＝{a_1t,a_2t,...,a_mt}；

Wherein, a_it＝{P_i,j_kDenotes unmanned plane jam_iFor radar P in t steps_iPerforming a jamming action j_k，j_k∈J；

And 5: performing a disturbing action A_tThen, the state S of m radar parts is obtained_t+1To obtain a reward value R_tAs shown in the following formula (2)

Wherein r is_t(i) And (4) indicating that the unmanned aerial vehicle interferes with the reward value obtained by the ith radar when flying to the step t. R_tRepresenting the total reward value obtained by the interfering m radars at step t.

When the ith radar goes from the search mode to the tracking mode and then to the guidance mode, s_itSequentially increasing, otherwise, decreasing; if the radar keeps the working mode unchanged, s_itThe radar operation transitions are shown in figure 3, without change. The transition of the drone to the radar mode of operation is obtained by a change in the parameters of fucol in step 1.

Step 6: will (S)_t,A_t,R_t,S_t+1) And storing the samples into an experience pool D, and if the number of the samples stored in the experience pool D is not enough, entering a step 4, and making t equal to t +1 until D is full. And on the contrary, randomly sampling a batch of samples from the experience pool D every C step in the training process to adjust the internal parameters of the training network.

And 7: and if the experience pool is full, training is started from steps of 1-num in sequence. Training the network is to state s_itAnd action a_itCombining and carrying out learning in a neural network, utilizing the advantages of the neural network, and utilizing the characteristics in the DQN algorithm to carry out learning on the current state s at each step_itAction taken a_itCorrecting the reward gradually to the optimal action a_itTo get close. If the whole unmanned plane with the steps of 1-num flies successfully, G₂Adding 1; whether failed or successful, G₁And adding 1.

And 8: if the total flying times of the unmanned aerial vehicle is G at the moment₁The number of times that the unmanned plane successfully completes the flight mission is G₂The success rate of obtaining the unmanned aerial vehicle to execute the task is shown as the formula (4), and when SR is larger than the task requirement SR_maxAnd ending the training and entering the step 9, otherwise, continuing to execute the step 3.

sr＝(G₂/G₁) (4)

And step 9: at this point, the training of the interference pattern resource allocation by using the DQN algorithm is finished, and the internal neural network parameters are trained. Now we input the corresponding state S_tCan pass through DQN networkAnd obtaining a corresponding optimal interference pattern resource allocation result according to the training result.

Example 1:

the invention provides a DQN-based method suitable for dynamically allocating interference pattern resources carried by an unmanned aerial vehicle, and particularly relates to a method for realizing real-time allocation of the interference pattern resources from the start to the completion of a task of the unmanned aerial vehicle by processing the condition that a radar has multiple working modes. In order to verify the effectiveness of the method, the method is used, and as shown in fig. 2, a DQN algorithm is performed to allocate interference resources of the unmanned aerial vehicle, which change along with the flight path, in real time.

The method comprises the following steps: obtaining an interference resource pool J and a radar resource pool P ═ { P ═ P₁,P₂,P₃,P₄Resource pool for confrontation environment E ═ E₁,E₂The unmanned aerial vehicle group jam to be distributed is { jam ═ jam₁,jam₂,jam₃,jam₄}；

Wherein J ═ { J ═ J₁,j₂,j₃,j₄,j₅,j₆,j₇}，j₁Representing noise frequency modulation suppressed interference, j₂Representing noise frequency modulation suppressed interference, j₃Representing smart noise convolution disturbances, j₄Suppression of disturbances, j, representing dense decoys₅Representing distance-trailing spoofing interference, j₆Representing speed-pulling spoofing disturbances, j₇Representing a combined range-velocity tow spoofing disturbance.

Ground radar resource pool P ═ { P ═ P₁,P₂,P₃,P₄And for the established radar, the radar has two basic anti-jamming capabilities of pulse compression and pulse accumulation. Now. We denote the range radar by 0, the pulse Doppler radar by 1 and the MTI moving object display radar by 2.

Wherein P is₁{ fuco, 0,320,32, qs }, where qs increases the pulse front tracking immunity measure; when the radar is in the search state, fucl ═ {32,24,0.3,8.7}, and when the radar is in the tracking state, fucl ═ {15,40,1.2,8.7 }.

Wherein P is₂{ fucol, 1,250,33, qs }, where qs adds clutter cancellation, pulse leading edge tracking anti-jamming measures; thunderWhen in the search state, fucl ═ 20,24,0.5,10.3, and when in the tracking state, fucl ═ 5,60,1.5, 11.1.

Wherein P is₃{ fucl,2,180,34, qs }, wherein qs adds clutter cancellation, speed discrimination anti-jamming measures; when the radar is in the search state, fucl ═ {15,32,0.8,9.5}, and when the radar is in the tracking state, fucl ═ {8,50,1.8,9.5 }.

Wherein P is₄The method comprises the following steps of (1, 220,33, qs), wherein qs adds a clutter cancellation and speed discrimination anti-interference measure; when the radar is in the search state, fucl ═ {15,32,0.8,11.8}, and when the radar is in the tracking state, fucl ═ {4,60,2.4,11.8 }.

Unmanned aerial vehicle group jam ═ jam of interference resource to be distributed₁,jam₂,jam₃,jam₄And m represents the number of drones. Wherein the ith unmanned plane is jam_i＝{p_jam,gj,bw_jam,J}，p_jamFor unmanned aerial vehicle power (W), gj for unmanned aerial vehicle antenna gain (dB), bw_jamIs the drone bandwidth (MHz).

jam₁＝{10,9,200,J}、jam₂＝{10,9,200,J}、jam₃＝{10,9,200,J}、jam₄10,9,200, J belonging to J₁～j₇；

rd_mAnd jd_mRespectively representing the position coordinates, in units (KM), of the mth radar and drone. The specific coordinate settings are as follows:

rd₁＝[-30,200]，rd₂＝[30,120]，rd₃＝[-20,40]，rd₄＝[20,0]；jd₁＝[0,10]，jd₂＝[0,10]，jd₃＝[0,10]，jd₄＝[0,10]。

therefore, the position information of the unmanned aerial vehicle and the radar can be described in the two-dimensional coordinates, the distance between the unmanned aerial vehicle and the radar can be calculated through the coordinates, and the radial distance transformation relation between the unmanned aerial vehicle and each radar is shown in fig. 4.

Step 2: relevant parameters in the DQN network are set, D (empirical playback set size) is 2000, γ (reward discount factor) is 0.9, r (learning rate) is 0.001, e (e-greedy) is 0.9, and C (reset network weight step number) is 200.

And step 3: training of the DQN network is started. The distance from the starting point to the task point of the whole unmanned aerial vehicle is 300KM, the whole unmanned aerial vehicle is divided into 100 steps, t represents that the unmanned aerial vehicle flies t steps, the initialized t is 1, and the states S of m radars are obtained₁＝{s₁₁,s₂₁,s₃₁,s₄₁}; initializing an experience playback pool

Its capacity is 2000; initializing a randomly generated weight θ₁；

Wherein,

and 4, step 4: interference action performed by a greedy strategy selection drone swarm_t＝{a_1t,a_2t,a_3t,a_4t}; wherein, a_it＝{P_i,j_kDenotes unmanned plane jam_iFor radar P in t steps_iPerforming a jamming action j_k，j_k∈J；

And 5: performing a disturbing action A_tThen, the state S of m radar parts is obtained_t+1To obtain a reward value R_tAs shown in the following formula (6)

Wherein r is_t(i) And (4) indicating that the unmanned aerial vehicle interferes with the reward value obtained by the ith radar when flying to the step t. R_tRepresenting the total reward value obtained by the interfering 4 radars at step t.

Step 6: will (S)_t,A_t,R_t,S_t+1) And storing the samples into an experience pool D, and if the number of the samples stored in the experience pool D is not enough, entering a step 4, and making t equal to t +1 until D is full. On the contrary, a batch of samples are randomly sampled from the experience pool D every 200 steps in the training process to adjust the internal parameters of the training network.

And 7: and (4) when the experience pool is full, starting training from 1-100 steps in sequence. Training the network is to state s_itAnd action a_itCombining and carrying out learning in a neural network, utilizing the advantages of the neural network, and utilizing the characteristics in the DQN algorithm to carry out learning on the current state s at each step_itAction taken a_itCorrecting the reward gradually to the optimal action a_itTo get close. If the whole unmanned aerial vehicle with 1-100 steps flies successfully, G₂Adding 1; whether failed or successful, G₁Plus 1, and G₁、G₂The initial value is zero.

And step 9: at this point, the training of the interference pattern resource allocation by using the DQN algorithm is finished, and the internal neural network parameters are trained. Now we input the corresponding state S_tCan be trained by DQN networkAnd obtaining a corresponding optimal interference pattern resource allocation result.

The results of dynamic allocation of interference resources through the DQN network are shown in fig. 6-9, where t is the number of flight steps of the drone. After training and learning are carried out through the DQN algorithm, an optimal interference pattern resource distribution result under the current establishment environment and a graph 10 of changes of the DQN error function along with iteration times can be obtained. The Tenscript code frame visualization display is shown in FIG. 5.

2. Analysis of simulation results

The results of interference resource allocation in the simulation environment are shown in FIGS. 6 to 9. In the whole flying process of the cluster, the influence of interference resource distribution along with the dynamic change of the flight distance is considered, and the results show that in the change process of random cluster flying time, different interference patterns of different interference machines are adopted for interference at different moments aiming at different multifunctional radars, so that a flying task is completed. The experiment is carried out for 1600 times of simulation, although the error function after the DQN network training has fluctuation of about 0.2 in 1200-1600, the DQN error function is basically converged between 0.1-0.3, and the interference resource allocation can be basically converged, through the final flight success rate effect diagram, as shown in FIG. 11, we can see that the success rate of the interference effect through the DQN algorithm is finally stabilized at more than 70%, and the overall interference allocation result is good for the whole interference process, thereby realizing the requirement of dynamic allocation of the interference resources, and further verifying the feasibility and effectiveness of the establishment method.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A DQN algorithm-based real-time allocation method for radar interference resources is characterized by comprising the following steps:

Wherein,

And 4, step 4: performing a disturbing action A_tThen, according to the reward value R_tGet m partState S of radar_t+1；

step 6: initialization G₁＝0，G₂0; randomly sampling a batch of samples from an experience pool D, and converting the state s_itAnd action a_itPerforming combined input into the neural network for training, and utilizing DQN algorithm to correspond to the state s of the neural network at each step_itCorrecting the output action to make the output of the neural network to act a_itApproaching;

sr＝(G₂/G₁)

and step 9: the neural network meeting the success rate of the task requirement is used for executing the real-time allocation of radar interference resources of the unmanned aerial vehicle cluster, and the state S of m radars at a certain moment_tInputting the data into a neural network meeting the success rate of the task requirements to obtain an interference action A taken by the unmanned aerial vehicle group_tNamely, the real-time allocation result of the radar interference resources of the unmanned aerial vehicle cluster.