CN110031807A - A kind of multistage smart noise jamming realization method based on model-free intensified learning - Google Patents

A kind of multistage smart noise jamming realization method based on model-free intensified learning Download PDF

Info

Publication number
CN110031807A
CN110031807A CN201910321772.8A CN201910321772A CN110031807A CN 110031807 A CN110031807 A CN 110031807A CN 201910321772 A CN201910321772 A CN 201910321772A CN 110031807 A CN110031807 A CN 110031807A
Authority
CN
China
Prior art keywords
fcr
multistage
jamming
action
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910321772.8A
Other languages
Chinese (zh)
Other versions
CN110031807B (en
Inventor
张天贤
王远航
贾瑞
韩毅
孔令讲
杨晓波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201910321772.8A priority Critical patent/CN110031807B/en
Publication of CN110031807A publication Critical patent/CN110031807A/en
Application granted granted Critical
Publication of CN110031807B publication Critical patent/CN110031807B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S7/00Details of systems according to groups G01S13/00, G01S15/00, G01S17/00
    • G01S7/02Details of systems according to groups G01S13/00, G01S15/00, G01S17/00 of systems according to group G01S13/00
    • G01S7/38Jamming means, e.g. producing false echoes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Radar Systems Or Details Thereof (AREA)

Abstract

The present invention discloses a kind of multistage smart noise jamming realization method based on model-free intensified learning, applied to Radar Technology field, in order to solve jammer for the optimal jamming power assignment problem under the environmental models unknown situations such as enemy's fire control radar interference identification method, interference protection measure and working mode change rule, multistage jamming power assignment problem is modeled as the Markovian decision process of a circumstances not known model by the present invention first;In order to assess the performance of multistage noise jamming, select average search-locking time of fire control radar as evaluation index;Next analyzes the principle of noise-plus-interference power distribution, and is directed to the challenge of circumstances not known model, establishes the intensified learning frame of multistage jamming power assignment problem;Finally propose a kind of multistage jamming power distribution method based on Q-learning algorithm;The method of the present invention efficiently solves the optimal assignment problem of jamming power in practical applications, improves interference success rate.

Description

A kind of multistage smart noise jamming realization method based on model-free intensified learning
Technical field
The invention belongs to Radar Technology field, in particular to a kind of radar Technology of Smart Noise Jamming.
Background technique
Smart noise jamming, that is, jammer is a kind of relevant noise like signals by emitting, and believes in the time domain target echo It number is overlapped and is covered, so that the target detection and tracking to radar cause to confuse.Noise jamming technology is in electronic countermeasure Play the role of critical.It can effectively be interfered, be related to the safety problem of our operating resources and combatant, because This smart noise jamming has become the emphasis research topic of domestic and international expert at present.
Since modern fire control radar has strong anti-interference ability and multiple-working mode.In face of this modernization FCR, traditional noise jamming performance is worse and worse.It is necessary to study better smart noise jamming measures in this case. In document " A Comparison of DDS and DRFM Techniques in the Generation of Smart Noise Jamming Waveforms.NAVAL POSTGRADUATE SCHOOL MONTEREY CA has studied spirit in 1996 " The generation of skilful noise jamming waveform.In document " Research on the method for smart noise jamming on Pulse radar. " Instrumentation and Measurement, Computer, Communication and Control (IMCCC), 2015Fifth International Conference on.IEEE has studied in 2015. " and is based on The Technology of Smart Noise Jamming of Convolution Modulation.But they pertain only to waveform design and generation and have ignored jamming power Distribution.Optimize distribution by the dynamic power of continuous effective, interference success rate can be significantly improved.It is well known that dexterous noise The multistage power distribution of interference tends to rely on expertise.However, expertise is often inaccuracy, inaccurate is dry Disturbing power distribution will lead to the loss of jamming performance.1) too small jamming power cannot be effectively reduced the performance of FCR.2) it interferes Power is excessive, it is found that the probability of interference increases.Therefore, smart noise jamming needs to study a kind of effective multistage jamming power Distribution method.
Summary of the invention
In order to solve the above technical problems, the present invention proposes that a kind of multistage dexterity noise based on model-free intensified learning is dry Method is disturbed,
A kind of the technical solution adopted by the present invention are as follows: multistage smart noise jamming side based on model-free intensified learning Method, comprising the following steps:
S1, the building markov decision process that multistage jamming power assignment problem is modeled as to circumstances not known model: Target aircraft with defensive avionics system and the aircraft with FCR are constantly fought in each stage, until target aircraft is locked Or target aircraft is successfully escaped, confrontation terminates;Self-defence interference system includes jammer and RWR;In each stage, jammer The waveform that FCR launches will be received first and carries out waveshape feature abstraction, identifies the current operation mode s of FCRk∈ S, S are FCR can workable mode, k indicate stage serial number;Then jammer selects a violate-action corresponding with jamming power ak∈ A implements smart noise jamming, and A is possible violate-action set;Finally, after one stage of experience, interference system of defending oneself System will obtain the reward t in kth stagekAnd FCR is switched to subsequent work mode sk+1, tkWhen for FCR consumed by the kth stage Between;
S2, using search-locking time of FCR as smart noise jamming performance indicator;
S3, using tuple<S, A, Θ, the intensified learning frame of the multistage jamming power assignment problem of ψ, δ>expression;Wherein, S Operating mode for finite state space, corresponding to FCR;A is limited action space, corresponds to the feasible violate-action of jammer Space;Θ is state transition function, corresponding to the switching between FCR different working modes;ψ is reward function, corresponding to working as The elapsed time of preceding state FCR;δ is discount factor, the assessment corresponding to jammer to future reward;
S4, the optimum allocation that multistage jamming power is solved using Q-learning algorithm.
Further, step S4 specifically:
S41, Studying factors sequence { α is determinedn∈ (0,1) } and exploration factor sequence { ξn∈ (0,1) };Wherein, αnIt indicates to learn Practise the factor, ξnIt indicates to explore the factor;
S42, initialization Q-table, enable Q (s, a)=0;Wherein, s indicates that the operating mode of current FCR, a indicate current Violate-action;
S43, when target aircraft is not locked by FCR, state s according to Q-table with ξnProbability selection currently most Excellent violate-action, or with 1- ξnProbability randomly choose violate-action;Otherwise terminate;
S44, the current violate-action of assessment, obtain search-locking time of FCR and the subsequent work mode of FCR;
S45, according to step S43 selection violate-action and step S44 obtain FCR consumption time and FCR under One operating mode updates Q-table;
S46, it FCR is switched to the subsequent work mode that step S44 is obtained returns to step if target is not locked by FCR Rapid S3;Otherwise terminate current pass, execute step S47;
If S47, the rounds for reaching setting, terminate;Otherwise return step S3.
Further, search-locking time of FCR described in step S44, specifically: under current violate-action, FCR In its current operation mode duration.
Further, step S43 are as follows: when target aircraft is not locked by FCR, in state s according to Q-table with 1- ξn Probability randomly choose violate-action;Otherwise terminate.
Further, Q-table is updated described in step S45, expression formula is as follows:
Wherein, the subsequent work mode of s ' expression FCR, the next violate-action of a ' expression.
Further, a ' is determined by step S43.
Beneficial effects of the present invention: multistage jamming power assignment problem is abstracted into a state transition probability first Then unknown Markovian decision process gives the evaluation index of interference effect, analyze multistage smart munition power The principle of assignment problem finally proposes the intensified learning frame of multistage jamming power assignment problem, and uses Q-learning Algorithm successfully solves the power distribution problems of multistage noise jamming;Advantages of the present invention: it solves environmental model and does not know The reasonable distribution problem of multistage jamming power under condition;Jamming power distribution of the invention does not depend on expertise, avoids The jamming power distribution of inaccuracy, causes interference with the loss problem of performance;Method of the invention can be applied to civilian military affairs etc. Field.
Detailed description of the invention
Fig. 1 is the scene figure that the present invention considers.
Fig. 2 is the FCR operating mode switching schematic diagram under simulated conditions of the present invention.
Fig. 3 be when γ is identical, the conventional cap method of method of the invention, random power selection method and known γ Performance Simulation Results;
Wherein, Fig. 3 (a) is the simulation result under the conditions of γ=5.6, and Fig. 3 (b) is the simulation result under the conditions of γ=5.8, Fig. 3 (c) is the simulation result under the conditions of γ=6.
Fig. 4 is the Performance Simulation Results of the distinct methods in the specific embodiment of the invention when γ is situation 1.
Fig. 5 is the Performance Simulation Results of the distinct methods in the specific embodiment of the invention when γ is situation 2.
Specific embodiment
In order to facilitate the description contents of the present invention, following term is explained first:
Term 1: fire control radar
Fire control radar refers to for accurately tracking target, provides the thunder of Target Coordinate Data for Weapon Direction control system It reaches.Referred to as FCR.
Term 2: radar warning receiver
Radar warning receiver refers to for intercepting and capturing, analyzing, identifying enemy radar signal, and real-time judge its threat degree is simultaneously The countermeasures set alerted in time, referred to as RWR.
Term 3: it is tracked in scanning
It is tracked in scanning and refers to radar one side scanning search space, track the working method of single or multiple targets on one side, Referred to as TWS.
Term 4: tracking plus search
Tracking plus search refer to that radar can be completed at the same time the mode of search plus precision tracking single goal or multiple target.Referred to as For TAS.
Term 5: monotrack
Monotrack refers to that radar carries out precision tracking to single target, to obtain the distance, speed and angle of target Etc. information working method, referred to as STT.
For convenient for those skilled in the art understand that technology contents of the invention, with reference to the accompanying drawing to the content of present invention into one Step is illustrated.
A kind of multistage smart noise jamming realization method based on model-free intensified learning of the invention, first will be multistage Jamming power assignment problem is abstracted into the unknown Markovian decision process of state transition probability, then gives interference effect The evaluation index of fruit, analyzes the principle of multistage smart munition power distribution problems, finally proposes multistage jamming power The intensified learning frame of assignment problem, and asked with the power distribution that Q-learning algorithm successfully solves multistage noise jamming Topic.The following steps are included:
Step 1: assuming that the target aircraft with defensive avionics system is chased by the aircraft with FCR as shown in Figure 1.Target flies The airborne defensive application system of machine includes jammer and RWR.Target aircraft is applied with noise to the aircraft with FCR to successfully escape Interference.The power of noise jamming can carry out automatic adjusument according to the signal strength and operating mode of FCR.The master that the present invention solves Wanting problem is exactly the power distribution problems of noise jamming.The airborne FCR of attack aircraft has multiple-working mode.It is typical airborne FCR operating mode includes that (TWS) mode, tracking plus search (TAS) mode, monotrack (STT) mode are tracked in scanning Deng.During electronic countermeasure, FCR can flexible switching working mode according to actual needs.Such as once confirmation target, FCR will Monotrack mode can be switched to accurately to obtain the location parameter of target.When target is due to motor-driven or application interference After losing, FCR will be switched to the tracing mode in scanning;When the potential threat more than one of FCR, and need to estimate roughly When meter has found the location information of target, FCR is typically located at tracking plus search pattern.
Due to the switching of FCR multiple-working mode, electronic countermeasure process is divided into a multistage negotiation problem, therefore asks Topic can be modeled as a Markovian decision process.For more precisely, target aircraft with defensive avionics system and with FCR's Aircraft is constantly fought in k=0,1,2 ... stage, until target aircraft is locked or target aircraft is successfully escaped, is fought Terminate.In each stage k, jammer will receive the waveform that FCR launches first and carry out waveshape feature abstraction, identify FCR Current operation mode sk∈ S, S are that FCR can workable mode;Then jammer select one it is corresponding with jamming power Violate-action ak∈ A implements smart noise jamming, and A is possible violate-action set;Finally, after one stage of experience, from The reward t in kth stage will be obtained by defending interference systemkAnd it is switched to subsequent work mode sk+1, tkDisappeared by FCR in the kth stage The time of consumption.
Step 2: determining the evaluation factor of interference effect
In order to assess the interference effect in kth stage, search-locking time of present invention selection FCR, which is used as, judges interference One standard of energy.The target of jammer is by choosing suitable jamming power to increase search-locking of FCR as far as possible Time.More popular saying is exactly the search-locking time for exactly maximizing FCR, is described as with mathematical linguistics
Wherein, T is search-locking time of FCR, and E () is desired value, χ=[a0, a1, a2...] it is one group by a system The violate-action of column stage selection, can be described as jamming power allocation strategy χ.It is not difficult to find out that there are certain relationships between T and χ, but Be that relationship between the two is influenced by many factors, for example, the disturbance ecology of FCR, interference protection measure, measurement error and The operating mode etc. of FCR.For jammer, these factors can not be learnt, they are referred to as environmental factor in the present invention, use EF It indicates.Without loss of generality, the relationship of T and χ is expressed as
T=F (χ, EF) (2)
Wherein, F () indicates a kind of functional relation.
It is apparent that T means that more greatly the interference effect of jammer is better.Therefore a best tactful χ is expressed as
Under normal circumstances, the transition probability of the operating mode of FCR is unknown, the Numerical evaluation of each stage jamming performance It is uncertain.Multistage jamming power assignment problem can be expressed as Markovian decision process with circumstances not known model.Solution Certainly such issues that, there are two difficult points: 1) interfering number of stages uncertain, objective function is difficult to obtain;2) environmental model is unknown.It is based on The above two o'clock, traditional optimization algorithm such as evolution algorithm, Dynamic Programming etc. can not all solve power distribution problems.Therefore consider nothing Model intensified learning solves this kind of optimization problem, but the multistage jamming power assignment problem based on the algorithm still needs to be visited Rope.
Step 3: the concrete principle of multistage smart noise jamming is analyzed
Multistage smart noise jamming realization method proposed by the present invention is applicable to a variety of interference signals, and according to signal strength The power interfered with the operating mode dexterity division noise of FCR.Without loss of generality, if the transmission power of FCR is Ptk, the kth stage Antenna gain be Gtk, the effective aperture of FCR antenna is Ar, FCR range-to-go is R.Therefore, the transmitting that RWR is received Power density is
If successfully having identified the operating mode s of FCR after jammer receives FCR transmitting signalk, target aircraft Average cross-section be σu, then the power of kth stage smart noise jamming be
The sectional area σ of target is unknown in practice, it is assumed that sectional area σ~U (σ of targetmin, σmax), the interference that FCR is received Power is Prgk, real goal echo power is Prsk, for each pulse, the echo gross energy that FCR is received is
For smart noise jamming, real goal echo is completely covered in interference.Therefore, real goal echo and interference always It mixes, so that FCR can not separate them.Under normal circumstances, FCR according to the general power of each pulse echo come Judge smart noise jamming.If the estimated value of R isThe threshold value of FCR is γk, FCR can estimate the general power of pulse echo, It is defined as
FCR is by comparing Prk/PerkAnd γkSize judge to interfere, the SINR of each pulse echo is the kth stage
Wherein
SINRk2=λ SINRk1(λ > 1) (10)
It can be appreciated that the detection of fire control radar and tracking performance are proportional to the Signal to Interference plus Noise Ratio SINR in each stagek, and precisely do Immunity can be inversely proportional to the performance of fire control radar, therefore the search one locking time T of FCR is inversely proportional to SINRk, it may be assumed that
If the power of interference is too small, the Signal to Interference plus Noise Ratio of radar return, SINR can not be effectively reducedkIt is larger, thus T compared with It is small., whereas if jamming power is excessive, FCR is easy to discover and be interfered, and will take effective interference protection measure, equally can not The Signal to Interference plus Noise Ratio of radar return, SINR is effectively reducedkLarger, T is smaller.Select reasonable violate-action ak, interference can be improved Energy.However T and akBetween relationship influenced by many circumstances not known factors.Therefore, it studies multistage under circumstances not known model Section jamming power assignment problem is of great significance.In order to overcome the challenge of circumstances not known model, present invention employs model-frees Intensified learning.
Step 4: the intensified learning frame of multistage jamming power assignment problem
The intensified learning frame of multistage jamming power assignment problem can use a tuple<S, A, Θ, and ψ, δ>indicate, In, S is finite state space, the operating mode corresponding to FCR;A is limited action space, corresponds to the feasible interference of jammer Motion space;Θ is state transition function, corresponding to the switching between FCR different working modes;ψ is reward function, is corresponded to In the elapsed time of current state FCR;δ is discount factor, the assessment corresponding to jammer to future reward.
Step 5: the multistage jamming power distribution method based on Q-learning algorithm
Under the intensified learning frame of multistage jamming power assignment problem, Q-learning algorithm is taken to come to the multistage Jamming power carries out optimum allocation.Specific step is as follows for algorithm:
(1) Studying factors sequence { α is determinedn∈ (0,1) } and exploration factor sequence { ξn∈ (0,1) };
(2) initialize Q-table, enable Q (s, a)=0;
(3) when target is not locked by FCR, in state s according to Q-table with ξnProbability selection it is current optimal dry A is made in disturbance, or with 1- ξnProbability random selection movement;
(4) current action a is assessed, the time t and subsequent work mode s ' of FCR consumption are obtained;
(5) Q-table is updated according to following formula;
(6) s ← s ' is enabled, jumps to (3) if target is not by FCR locking, otherwise current pass terminates, and entrance is next Bout is since step (3), until reaching the rounds being pre-designed.
Effect of the invention is further illustrated by the test of following simulation comparison:
Simulated conditions: assuming that there are four types of operating mode (TWS, TAS1, TAS2, STT), the conversion of operating mode such as Fig. 2 by FCR It is shown.FCR confirms a target (or losing a target) according to the testing result of 16 continuous data frames: if all Target is not detected in data frame, then it is assumed that the target is lost.If detecting target in M/N data frame, it is determined that Target, and current operation mode is transferred to the higher operating mode of target tracking accuracy.The interference protection measure of FCR is frequency victory Change technology.P is worked as in settingrkWith PerkRatio be more than threshold value when, frequency agility continues 9 data frames.The present invention puts aside frequency The loss of agile bring, and it is invalid for being set in frequency agility and interfering in the process.It is general that detection is calculated using Marcum Q function Rate.The coherent processing inteval (CPI) of FCR is 1 data frame.The transmission power of FCR is 10kW, Ar=1m2.FCR under each mode Design parameter is as shown in table 1, wherein number of pulses of the L between each CPI, GtFor main beam gain, DrFor tracking data rate, M/N For the foundation for confirming target, Re is distance estimations error amount.In addition, setting interference can select 13 interference dynamic in the present embodiment Make A=[5.8,6,6.2 ..., 8.2], violate-action is set to 0.2, σ~U (10m2, 30m2), σu=20m2, Pn= 1.56×10-13W, R~U (10km, 30km).The present invention has done 400000 experiments for intensified learning training, every 4000 notes Performance average value is sought for one group.The original state of each FCR is all TWS.
Limited action space A is configured according to specific application scenarios in practical application.
Parameter under 1 FCR different working modes of table
In conventional methods where, the distribution of multistage jamming power depends on expertise more and has ignored multistage electronic countermeasure Dynamic.When known to frequency agility thresholding γ, the performance of conventional method has had reached the upper limit.However in practical application In, γ is but hardly resulted in.In order to verify the validity of the proposed algorithm of the present invention, the present embodiment, will be random under γ known case Power selection and tradition obtain the method for the upper limit as a control group.
Simulation is divided into two groups.For first group of emulation, the frequency agility threshold value phase under all 4 kinds of modes is arranged in the present embodiment Together.As shown in figure 3, γ=6 in γ=5.8 in γ=5.6 in Fig. 3 (a), Fig. 3 (b), Fig. 3 (c).
From Fig. 3 (a) (b) (c) as can be seen that the performance of the algorithm is far superior to random power selection method, and better than The upper limit for the conventional method known, but advantage is obvious not enough.With the increase of γ, this advantage is gradually obvious.This is because TWS Tracking data rate be much smaller than other operating modes, the FCR most of the time is all in TWS.When other operating modes of FCR consume Time it is smaller when, TWS optimum jamming movement selection rise main influence.But with the increase of γ, other work The importance of mode is consequently increased.Therefore, the advantage of the mentioned algorithm of the present invention is more and more obvious.In practice, too low γ is It can not use in practical applications.This is because FCR can always take interference protection measure when γ is too low, this be will lead to not The loss of necessary snr loss and target speed information.
For second group of emulation, the frequency agility threshold value of the present embodiment setting FCR4 kind mode is different.The present embodiment Consider two kinds of typical cases.Situation 1: γTWS=5.6, γTAS1=5.8, γTAS2=5.8, γSTT=6.Here γTWSIndicate frequency Rate agile threshold value, other symbols are similar.Situation 2: γTWS=6, γTAS1=5.8, γTAS2=5.8, γSTT=5.6.
Figure 4, it is seen that under γ known conditions, the performance of the mentioned algorithm of the present invention better than random power selection with The upper limit of conventional method, and according to circumstances 1 setting threshold gamma when, it is with the obvious advantage.This is because the frequency agility threshold value as TWS is low When other modes can relative reduction TWS importance and increase the importance of other operating modes, therefore conventional method is upper Limit will reduce.In contrast, when threshold gamma is set as situation 2, as shown in figure 5, the attainable upper limit of conventional method will mention It is high.This is because the frequency agility threshold value of TWS is higher than other operating modes.And the performance of inventive algorithm is still preferable.
In conclusion when threshold gamma is accordingly set, the multistage jamming power based on model-free intensified learning The performance of distribution method is better than control group.In fact, environmental model can be learnt by model-free intensified learning.Therefore, No matter how threshold gamma is arranged, and the mentioned method of the present invention can find preferable jamming power allocation strategy.
Specific embodiment can be seen that the present invention and can be very good to carry out the distribution of jamming power through the invention.
Those of ordinary skill in the art will understand that the embodiments described herein, which is to help reader, understands this hair Bright principle, it should be understood that protection scope of the present invention is not limited to such specific embodiments and embodiments.This field Those of ordinary skill disclosed the technical disclosures can make according to the present invention and various not depart from the other each of essence of the invention The specific variations and combinations of kind, these variations and combinations are still within the scope of the present invention.

Claims (6)

1. a kind of multistage smart noise jamming realization method based on model-free intensified learning, which comprises the following steps:
S1, the building markov decision process that multistage jamming power assignment problem is modeled as to circumstances not known model: have The target aircraft of defensive avionics system and aircraft with FCR are constantly fought in each stage, until target aircraft is locked or Target aircraft is successfully escaped, and confrontation terminates;Self-defence interference system includes jammer and RWR;In each stage, jammer is first The waveform that FCR launches will be received and carry out waveshape feature abstraction, identify the current operation mode s of FCRk∈ S, S are that FCR can Workable mode, k indicate stage serial number;Then jammer selects a violate-action a corresponding with jamming powerk∈A Implement smart noise jamming, A is possible violate-action set;Finally, self-defence interference system is incited somebody to action after one stage of experience To the reward t in kth stagekAnd FCR is switched to subsequent work mode sk+1, tkFor FCR time consumed by the kth stage;
S2, using search-locking time of FCR as smart noise jamming performance indicator;
S3, using tuple<S, A, Θ, the intensified learning frame of the multistage jamming power assignment problem of ψ, δ>expression;Wherein, S is to have Limit state space, the operating mode corresponding to FCR;A is limited action space, and it is empty to correspond to the feasible violate-action of jammer Between;Θ is state transition function, corresponding to the switching between FCR different working modes;ψ is reward function, is corresponded to current The elapsed time of state FCR;δ is discount factor, the assessment corresponding to jammer to future reward;
S4, the optimum allocation that multistage jamming power is solved using Q-learning algorithm.
2. a kind of multistage smart noise jamming realization method based on model-free intensified learning according to claim 1, special Sign is, step S4 specifically:
S41, Studying factors sequence { α is determinedn∈ (0,1) } and exploration factor sequence { ξn∈ (0,1) };Wherein, αnIndicate study because Son, ξnIt indicates to explore the factor;
S42, initialization Q-table, enable Q (s, a)=0;Wherein, s indicates that the operating mode of current FCR, a indicate current interference Movement;
S43, when target aircraft is not locked by FCR, state s according to Q-table with ξnProbability selection it is current optimal dry Disturbance is made;Otherwise terminate;
S44, the current violate-action of assessment, obtain search-locking time of FCR and the subsequent work mode of FCR;
S45, according to step S43 selection violate-action and step S44 obtain FCR consumption time and FCR next work Operation mode updates Q-table;
S46, FCR is switched to the subsequent work mode that step S44 is obtained, if target is not locked by FCR, return step S3;Otherwise terminate current pass, execute step S47;
If S47, the rounds for reaching setting, terminate;Otherwise return step S3.
3. a kind of multistage smart noise jamming realization method based on model-free intensified learning according to claim 2, special Sign is, search-locking time of FCR described in step S44, specifically: under current violate-action, FCR is in its work at present Mode duration.
4. a kind of multistage smart noise jamming realization method based on model-free intensified learning according to claim 2, special Sign is, step S43 are as follows: when target aircraft is not locked by FCR, in state s according to Q-table with 1- ξnProbability it is random Select violate-action;Otherwise terminate.
5. a kind of multistage smart noise jamming realization method based on model-free intensified learning according to claim 2, special Sign is, Q-table is updated described in step S45, and expression formula is as follows:
Wherein, the subsequent work mode of s ' expression FCR, the next violate-action of a ' expression.
6. a kind of multistage smart noise jamming realization method based on model-free intensified learning according to claim 4 or 5, It is characterized in that, a ' is obtained by step S43.
CN201910321772.8A 2019-04-19 2019-04-19 Multi-stage smart noise interference method based on model-free reinforcement learning Active CN110031807B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910321772.8A CN110031807B (en) 2019-04-19 2019-04-19 Multi-stage smart noise interference method based on model-free reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910321772.8A CN110031807B (en) 2019-04-19 2019-04-19 Multi-stage smart noise interference method based on model-free reinforcement learning

Publications (2)

Publication Number Publication Date
CN110031807A true CN110031807A (en) 2019-07-19
CN110031807B CN110031807B (en) 2021-01-12

Family

ID=67239521

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910321772.8A Active CN110031807B (en) 2019-04-19 2019-04-19 Multi-stage smart noise interference method based on model-free reinforcement learning

Country Status (1)

Country Link
CN (1) CN110031807B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111314015A (en) * 2020-01-07 2020-06-19 中国人民解放军国防科技大学 Pulse interference decision method based on reinforcement learning
CN113093124A (en) * 2021-04-07 2021-07-09 哈尔滨工程大学 DQN algorithm-based real-time allocation method for radar interference resources

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6788243B2 (en) * 2001-09-06 2004-09-07 Minister Of National Defence Of Her Majestry's Canadian Government The Secretary Of State For Defence Hidden Markov modeling for radar electronic warfare
CN103954939A (en) * 2014-01-21 2014-07-30 中国人民解放军海军航空工程学院 Smart-noise-jamming resistant method based on radar networking
CN105388461A (en) * 2015-10-31 2016-03-09 电子科技大学 Radar adaptive behavior Q learning method
US20180012137A1 (en) * 2015-11-24 2018-01-11 The Research Foundation for the State University New York Approximate value iteration with complex returns by bounding
CN108712748A (en) * 2018-04-12 2018-10-26 天津大学 A method of the anti-interference intelligent decision of cognitive radio based on intensified learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6788243B2 (en) * 2001-09-06 2004-09-07 Minister Of National Defence Of Her Majestry's Canadian Government The Secretary Of State For Defence Hidden Markov modeling for radar electronic warfare
CN103954939A (en) * 2014-01-21 2014-07-30 中国人民解放军海军航空工程学院 Smart-noise-jamming resistant method based on radar networking
CN105388461A (en) * 2015-10-31 2016-03-09 电子科技大学 Radar adaptive behavior Q learning method
US20180012137A1 (en) * 2015-11-24 2018-01-11 The Research Foundation for the State University New York Approximate value iteration with complex returns by bounding
CN108712748A (en) * 2018-04-12 2018-10-26 天津大学 A method of the anti-interference intelligent decision of cognitive radio based on intensified learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
FUZZY Q LEARNING ALGORITHM FOR DUAL-AIRCRAFT PATH: "Xiang Gao et al.", 《JOURNAL OF SYSTEMS ENGINEERING AND ELECTRONICS》 *
王彬 等: "认知雷达中基于q学习的自适应波形选择算法", 《***工程与电子技术》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111314015A (en) * 2020-01-07 2020-06-19 中国人民解放军国防科技大学 Pulse interference decision method based on reinforcement learning
CN111314015B (en) * 2020-01-07 2022-08-05 中国人民解放军国防科技大学 Pulse interference decision method based on reinforcement learning
CN113093124A (en) * 2021-04-07 2021-07-09 哈尔滨工程大学 DQN algorithm-based real-time allocation method for radar interference resources

Also Published As

Publication number Publication date
CN110031807B (en) 2021-01-12

Similar Documents

Publication Publication Date Title
CN103728598B (en) The method of track spoofing interference is suppressed with the active radar and passive radar net of other place configure
CN104267379A (en) Active and passive radar cooperative anti-interference method based on waveform design
CN108303684A (en) Ground surveillance radar multi-object tracking method based on radial velocity information
CN112904290A (en) Method for generating radar intelligent cognitive anti-interference strategy
CN110031807A (en) A kind of multistage smart noise jamming realization method based on model-free intensified learning
CN115575908B (en) Radar interference parameter optimization method and system based on pulse description words
Zhang et al. Research on decision-making system of cognitive jamming against multifunctional radar
Zhang et al. Performance analysis of deep reinforcement learning-based intelligent cooperative jamming method confronting multi-functional networked radar
CN115236607A (en) Radar anti-interference strategy optimization method based on double-layer Q learning
Zhang et al. Joint resource optimization for a distributed MIMO radar when tracking multiple targets in the presence of deception jamming
CN105891799A (en) Active jamming reconnaissance method suitable for mechanical scanning radars
Song et al. A POMDP approach for scheduling the usage of airborne electronic countermeasures in air operations
Yang et al. Consensus based target tracking against deception jamming in distributed radar networks
Arasaratnam et al. Tracking the mode of operation of multi-function radars
CN113608193A (en) Radar multi-target distance and speed estimation method based on UNet
CN116224248A (en) Interference intention reasoning method, storage medium and equipment
CN111198366A (en) Method for quickly selecting finite array elements under distributed MIMO radar multitasking
CN115436891A (en) MBSE-based model construction radar countermeasure evaluation method
CN113064132B (en) Robust radar target detection method based on continuous trust function
CN115267708A (en) Radar interference effect on-line evaluation method based on state change
Hui et al. Highly contaminated work mode identification of phased array radar using deep learning method
Gilliam et al. Scheduling of multistatic sonobuoy fields using multi-objective optimization
CN113126086A (en) Life detection radar weak target detection method based on state prediction accumulation
Beun Cognitive radar: Waveform design for target detection
CN114666219B (en) Multi-radar network power and bandwidth joint optimization allocation method and system under non-ideal detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant