CN117313972B - Attack method, system and device for unmanned ship cluster and storage medium - Google Patents

Attack method, system and device for unmanned ship cluster and storage medium Download PDF

Info

Publication number
CN117313972B
CN117313972B CN202311271113.0A CN202311271113A CN117313972B CN 117313972 B CN117313972 B CN 117313972B CN 202311271113 A CN202311271113 A CN 202311271113A CN 117313972 B CN117313972 B CN 117313972B
Authority
CN
China
Prior art keywords
unmanned
strategy
unmanned ship
attack
boats
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311271113.0A
Other languages
Chinese (zh)
Other versions
CN117313972A (en
Inventor
王莹洁
金世龙
刘兆伟
段培永
刘志中
童向荣
宋永超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yantai University
Original Assignee
Yantai University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yantai University filed Critical Yantai University
Priority to CN202311271113.0A priority Critical patent/CN117313972B/en
Publication of CN117313972A publication Critical patent/CN117313972A/en
Application granted granted Critical
Publication of CN117313972B publication Critical patent/CN117313972B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B63SHIPS OR OTHER WATERBORNE VESSELS; RELATED EQUIPMENT
    • B63BSHIPS OR OTHER WATERBORNE VESSELS; EQUIPMENT FOR SHIPPING 
    • B63B35/00Vessels or similar floating structures specially adapted for specific purposes and not otherwise provided for
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • G06Q10/047Optimisation of routes or paths, e.g. travelling salesman problem
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06312Adjustment or analysis of established resource schedule, e.g. resource or task levelling, or dynamic rescheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06313Resource planning in a project environment
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B63SHIPS OR OTHER WATERBORNE VESSELS; RELATED EQUIPMENT
    • B63BSHIPS OR OTHER WATERBORNE VESSELS; EQUIPMENT FOR SHIPPING 
    • B63B35/00Vessels or similar floating structures specially adapted for specific purposes and not otherwise provided for
    • B63B2035/006Unmanned surface vessels, e.g. remotely controlled

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Theoretical Computer Science (AREA)
  • Development Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Educational Administration (AREA)
  • Chemical & Material Sciences (AREA)
  • Combustion & Propulsion (AREA)
  • Mechanical Engineering (AREA)
  • Ocean & Marine Engineering (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention belongs to the technical field of unmanned ship control, and particularly relates to an attack method, an attack system, an attack device and a storage medium of an unmanned ship cluster. Acquiring benefits obtained by taking actions by each unmanned ship according to the acquired information of the unmanned ships on both sides of the enemy, performing advantage processing on the benefits obtained by taking the actions by each unmanned ship on the my, performing benefit gradient processing, predicting next actions after action loss processing, and obtaining the optimal next action of each unmanned ship on the my after strategy loss processing on the predicted next actions, thereby obtaining the optimal attack route of each unmanned ship on the my; combining the mixing strategy and Nash equilibrium processing to obtain Nash equilibrium points of the mixing strategy and obtain a collaborative attack strategy of the unmanned aerial vehicle cluster, so that the unmanned aerial vehicle cluster collaborative attack strategy can ensure that each unmanned aerial vehicle can accurately and efficiently finish the attack task, and can also realize the maximization of the collaborative attack income of the unmanned aerial vehicle cluster.

Description

Attack method, system and device for unmanned ship cluster and storage medium
Technical Field
The invention belongs to the technical field of unmanned ship control, and particularly relates to an attack method, an attack system, an attack device and a storage medium of an unmanned ship cluster.
Background
Unmanned ship cluster means that through the cooperation between a plurality of unmanned ships, realize the omnidirectional monitoring, detection and operation to marine environment, such as marine rescue, maritime search and rescue, marine patrol, marine strike etc. have cluster characteristics such as centerless, group control, high appearance. The unmanned ship clusters can realize the joint attack of a plurality of unmanned ships through cooperative combat, so that the combat efficiency and the combat precision are improved; the unmanned ship cluster can also realize simultaneous execution of multiple tasks through task planning and action decision, and improves task execution efficiency.
Despite the numerous advantages of unmanned boat cluster technology, there are still challenges facing. How to make unmanned ships complete tasks in complex environments and ensure that the cooperative attack benefit of unmanned ship clusters is maximized is a problem to be solved when unmanned ship clusters attack at present.
Disclosure of Invention
The invention provides an attack method, an attack system, an attack device and a storage medium for an unmanned ship cluster.
The technical scheme of the invention is as follows:
the invention provides an attack method of unmanned ship clusters, which comprises the following steps:
s1: acquiring information of unmanned boats of both sides of a friend or foe, wherein the information comprises attribute information, advancing speed and movement track of the unmanned boats of both sides of the friend or foe;
s2: acquiring benefits obtained by taking actions by each unmanned ship according to the acquired information of the unmanned ships on both sides of the enemy, performing advantage processing on the benefits obtained by taking the actions by each unmanned ship on the my, performing benefit gradient processing, performing action loss processing on the benefits obtained by taking the actions by each unmanned ship after the benefit gradient processing, predicting next actions, and obtaining the optimal next actions of each unmanned ship on the my after policy loss processing on the predicted next actions, thereby obtaining the optimal attack route of each unmanned ship on the my;
s3: based on the obtained optimal attack route of each unmanned ship, a hybrid strategy collection of the unmanned ship cluster is obtained, balance points of the hybrid strategy are set, characteristic value curves corresponding to the balance points are obtained, nash balance points of the hybrid strategy are obtained, and a collaborative attack strategy of the unmanned ship cluster is obtained according to the Nash balance points of the hybrid strategy and the hybrid strategy collection.
The mixing strategy collection in the step S3 is specifically that,
the number of unmanned boats is divided into three strategies according to the number of unmanned boats,
(1) When the number of the unmanned boats is more than that of enemies, the unmanned boats adopt a scattered attack strategy;
(2) When the number of the unmanned boats is less than that of enemies, the unmanned boats adopt a centralized attack strategy;
(3) When the number of the unmanned boats is equal to that of the enemy, the unmanned boats adopt strategies (1) and (2) to perform hybrid attack.
The hybrid attack of the present invention is, in particular,
and selecting a strategy with the largest benefit obtained by taking action by each unmanned ship from the strategy (1) and the strategy (2), and making a strategy conforming to the cluster benefit maximization of the unmanned ships according to the strategy selected by each unmanned ship.
The nash equalization point of the obtained mixing strategy in S3 of the present invention is, in particular,
setting an equalization point of a mixing strategy, acquiring a characteristic value curve corresponding to the equalization point, performing game on both sides of a friend or foe based on state information of the unmanned aerial vehicle cluster, acquiring benefits of the unmanned aerial vehicle cluster, acquiring characteristic value curve change of each unmanned aerial vehicle based on the benefits of the unmanned aerial vehicle cluster, and when all characteristic value curves pass a certain point and the slope is not changed any more, wherein the point is a Nash equalization point of the mixing strategy.
The advantage processing in the S2 of the invention is specifically as follows:
based on the benefits obtained by taking actions by each unmanned ship, combining the benefits of the next action and the benefits obtained by converting the current action into the next action, screening out the actions taken by each unmanned ship above the average benefits.
The benefit gradient processing in the S2 specifically comprises the following steps:
based on the benefits obtained by taking actions by each unmanned ship on the my side after the advantage processing, the actions taken by each unmanned ship on the my side with the benefits of each unmanned ship on the my side under the mixed strategy and the next position state of each unmanned ship on the my side are screened out.
The action loss processing in S2 of the present invention specifically includes:
based on the benefits obtained by taking actions by each unmanned ship of the my after the benefit gradient processing, obtaining an action loss value of the unmanned ship cluster of the my by using an action loss function;
the policy loss processing in S2 specifically includes:
based on the predicted next action, a discriminant loss function is used to obtain a policy loss value after each predicted action.
The invention also provides an attack system of the unmanned ship cluster, which comprises:
the acquisition module is used for: the method comprises the steps of acquiring information of unmanned boats of both sides of a friend or foe, wherein the information comprises position state information, advancing speed and movement track of the unmanned boats of both sides of the friend or foe;
and the profit processing module is used for: acquiring benefits obtained by taking actions by each unmanned ship according to the acquired information of the unmanned ships on both sides of the enemy, performing benefit gradient processing on the benefits obtained by taking actions by each unmanned ship on the other side of the enemy, performing action loss processing on the benefits obtained by taking actions by each unmanned ship after the benefit gradient processing, predicting next actions, and obtaining the optimal next actions of each unmanned ship on the other side of the enemy after policy loss processing on the predicted next actions, thereby obtaining the optimal attack route of each unmanned ship on the other side of the enemy;
attack strategy generation module: based on the obtained optimal attack route of each unmanned ship, a hybrid strategy collection of the unmanned ship cluster is obtained, balance points of the hybrid strategy are set, characteristic value curves corresponding to the balance points are obtained, nash balance points of the hybrid strategy are obtained, and a collaborative attack strategy of the unmanned ship cluster is obtained according to the Nash balance points of the hybrid strategy and the hybrid strategy collection.
The invention also provides an attack device of the unmanned ship cluster, which comprises a processor and a memory, wherein the attack method of the unmanned ship cluster is realized when the processor executes the computer program stored in the memory.
The invention also provides a storage medium for storing a computer program, wherein the computer program realizes the attack method of the unmanned ship cluster when being executed by a processor.
Advantageous effects
The invention firstly obtains the best next action of each unmanned ship, further obtains the best attack line of each unmanned ship, combines the mixing strategy and Nash equalization processing to obtain Nash equalization points of the mixing strategy, and obtains the cooperative attack strategy of each unmanned ship cluster, thereby not only ensuring the maximization of the benefit of each unmanned ship, but also realizing the maximization of the cooperative attack benefit of the unmanned ship cluster;
according to the invention, based on the benefits obtained by taking actions by each unmanned ship, benefit gradient processing is carried out after advantage processing is carried out, based on the benefits obtained by taking actions by each unmanned ship after benefit gradient processing, action loss processing is carried out, next action is predicted, and the predicted next action is subjected to strategy loss processing, so that the optimal next action of each unmanned ship is obtained, thereby ensuring that each unmanned ship accurately and efficiently completes the attack task, and maximizing the benefits of each unmanned ship.
Drawings
Figure 1 is a graph of the evolution phase when the number of my unmanned boats is greater than the number of enemies,
figure 2 is a graph of the evolution phase when the number of my unmanned boats is less than the enemy,
figure 3 is a graph of the evolution phase when the number of my unmanned boats is equal to the enemy,
figure 4 is a plot of revenue for my unmanned boats more than enemies,
figure 5 is a plot of revenue for my unmanned boats less than an enemy,
figure 6 is a plot of revenue for the number of my unmanned boats equal to the enemy,
fig. 7 is a schematic diagram of the pursuit process of the my unmanned boat with a hybrid attack strategy.
Detailed Description
The following examples are intended to illustrate the invention, but not to limit it further.
The invention provides an attack method of unmanned ship clusters, which comprises the following steps:
s1: acquiring information of unmanned boats of both sides of a friend or foe, wherein the information comprises attribute information, advancing speed and movement track of the unmanned boats of both sides of the friend or foe;
s2: acquiring benefits obtained by taking actions by each unmanned ship according to the acquired information of the unmanned ships on both sides of the enemy, performing advantage processing on the benefits obtained by taking the actions by each unmanned ship on the my, performing benefit gradient processing, performing action loss processing on the benefits obtained by taking the actions by each unmanned ship after the benefit gradient processing, predicting next actions, and obtaining the optimal next actions of each unmanned ship on the my after policy loss processing on the predicted next actions, thereby obtaining the optimal attack route of each unmanned ship on the my; s3: based on the obtained optimal attack route of each unmanned ship, a hybrid strategy collection of the unmanned ship cluster is obtained, balance points of the hybrid strategy are set, characteristic value curves corresponding to the balance points are obtained, nash balance points of the hybrid strategy are obtained, and a collaborative attack strategy of the unmanned ship cluster is obtained according to the Nash balance points of the hybrid strategy and the hybrid strategy collection.
Further, the attribute information of the unmanned boats of both sides of the friend and foe in S1 includes:
the number of the unmanned boats, the survival state of the unmanned boats, the combined system state of the unmanned boat cluster, the actions of the unmanned boats and the state transfer condition of the unmanned boats;
the number of enemy unmanned boats, the survival state of the enemy unmanned boats, the actions of the enemy unmanned boats and the state transfer condition of the enemy unmanned boats.
Specifically, the attribute set of the unmanned boats on the my side is as follows Wherein N is p Is the number of unmanned boats on the my side, S p Is the survival state of the unmanned ship, C p For the joint system state of the unmanned ship cluster, the system is used for counting the proportion of the unmanned ships taking the joint strategies, so that the individual benefits brought by the next joint strategy taking are predicted, and the more the unmanned ships take the joint strategies, the more C p The closer to 1 the value of (c), and the closer to 0 the value of (c). />T is the action set of the unmanned boat cluster p Is the state transfer function of the unmanned ship, T pI.e. according to the survival state and action set of the current unmanned ship cluster, the survival state S 'of the next action is given' p By obtaining the benefits obtained by each unmanned boat taking action on my side, predicting the benefits of the next action. />Representing a survival state S of the unmanned ship p When the operation is executed, the survival state S 'of the next operation is reached' p The awards obtained. The prize is positive each time the my unmanned boat destroys an enemy unmanned boat; when the unmanned boats are farther from the enemy position, the rewards are negative; gamma is the discount factor, gamma is [0,1 ]]Gamma follows myThe unmanned ship is changed by sensing the environment, if the adopted strategy leads to a better attack result, gamma can be increased, and if the adopted strategy leads to a worse attack result, gamma can be reduced.
The enemy unmanned ship cluster attribute set isWherein N is e Is the number of unmanned boats for enemy, S e Is the survival state of enemy unmanned ship, <' > and->Action set for enemy unmanned ship cluster, T e Is the state transfer function of an enemy unmanned ship, T e :/>→[0,1]I.e. according to the survival state and action set of the current enemy unmanned ship cluster, the survival state S 'of the next action is given' e The probability distribution of surviving states is captured by the cluster of unmanned boats in combination with the discount factor gamma of the unmanned boats and the state transfer function T of the unmanned boats p To determine the next set of actions of the cluster of unmanned boats to plan the route of travel of the cluster of unmanned boats.
Specifically, the battle sea area is set to be a rectangular two-dimensional plane environment, the unmanned boats of both sides of the enemy set departure points at random, the unmanned boats of the me track and strike in the appointed countersea area, the unmanned boats of the me are reconnaissance on the sea surface according to rules, and the surrounding environment is perceived to carry out path planning so as to be expected to get rid of the attack of the unmanned boats of the me. The equipment setting of the unmanned aerial vehicle and the equipment setting of the enemy unmanned aerial vehicle are not different, so that the speeds of the two parties are not obviously different, but the enemy unmanned aerial vehicle can accelerate the pursuit when detecting that the distance of the enemy unmanned aerial vehicle is close to the striking range; the task of my unmanned boats is to chase and fight out all unmanned boats of the enemy in a limited time. The task of my unmanned boats is to destroy all unmanned boats of the enemy with minimal chase time. Assuming that the moving speed and the acceleration of the unmanned ship are constant, the unmanned ship on the my side is trained by adopting a deep reinforcement learning method, and the unmanned ship on the enemy side is trained by adopting a fixed rule, so that rewards are only designed for the tasks of the unmanned ship on the my side. The prize is positive each time the my unmanned boat destroys an enemy unmanned boat; when the unmanned boats are farther from the enemy position, the rewards are negative; when the unmanned ship on the my side completes the combat task and does not exceed the specified time, the maximum global positive value rewards are obtained; if the specified time is exceeded, a global minimum positive prize is obtained.
The unmanned boats of both sides of enemy plan own action path through the perception to the environment respectively, maximize own benefit, the unmanned boats of me can also perceive the communication condition between the unmanned boat clusters of own apart from the perception to the environment, improve the striking efficiency through the cooperation between unmanned boats, maximize the income, attack and fight out all unmanned boats of enemy in limited time.
Further, the advantage processing in S2 specifically includes:
based on the benefits obtained by taking actions by each unmanned ship, combining the benefits of the next action and the benefits obtained by converting the current action into the next action, screening out the actions taken by each unmanned ship above the average benefits.
The dominance function used is as follows:
A π (s,a)=Q π (s,a)-V π =γT p +∈V π (s′)+L-V π (s)
wherein A is π (s, a) is a dominance function; q (Q) π (s, a) is a function of the action value of the policy; v (V) π As a function of the value of the policy. The closer the epsilon is to 1, the more likely the representation will be to consider the benefit of the subsequent action, and when epsilon is close to 0, the more likely it will be to consider only the effect of the current action benefit; v (V) π (s') is a survival state value in the s+1 survival state, namely: when E V π The closer (s') is to V π (s') then means that the more likely the unmanned my boat is to consider subsequent actions, and vice versa, when ε V π The closer (s') to 0, the more prone the unmanned boat is to current motion;V π (s) is a survival state value at the current s survival state; gamma T p Historical benefits obtained when the current survival state is converted to the survival state of the next action; l represents the benefit obtained from the current action transitioning to the next action.
Further, the benefit gradient processing in S2 specifically includes:
based on the benefits obtained by taking actions by each unmanned ship on the my side after the advantage processing, the actions taken by each unmanned ship on the my side with the benefits of each unmanned ship on the my side under the mixed strategy and the next position state of each unmanned ship on the my side are screened out.
The revenue gradient function used is as follows:
wherein J (θ) i ) Maximizing the expected rewards for my unmanned boat i;gradient to maximize desired rewards for my unmanned boat i; e (E) s Is the expected value of the unmanned ship in the survival state s; θ i ={θ 1 ,…,θ N The policy parameters of the cluster of my unmanned boats; pi= { pi 1 ,…,π N A policy set of my unmanned ship clusters, each unmanned ship pi i The method corresponds to three strategies, and the overall benefits of the unmanned ship cluster on the my side can be obtained on the basis of guaranteeing the benefits of each unmanned ship on the my side; q (Q) π (X,a 1 ,…,a N ) As a function of the action value of the added state information X under the action set of the unmanned aerial vehicle cluster, wherein x= (o) 1 ,…,o N ) X is an observed value of the cluster of the unmanned boats, and the observed value comprises the position and survival state of the cluster of the unmanned boats and the position state of the next step of each unmanned boat, so that the next step of the unmanned boats can be predicted better.
Further, the action loss processing in S2 specifically includes:
based on the benefits obtained by taking actions by each unmanned ship of the my party after the benefit gradient processing, the action loss function is used for obtaining the action loss value of the unmanned ship cluster of the my party.
The action loss function is as follows:
wherein L is π The action loss value is the action loss value of the unmanned ship cluster; a is that π (s, a) is a dominance function; pi θ (ρ, a) is a strategy to select action a with probability ρ;
further, the policy loss processing in S2 specifically includes:
based on the predicted next action, a discriminant loss function is used to obtain a policy loss value after each predicted action.
The discrimination loss function is as follows:
wherein L is v Is the strategic loss value after each action of unmanned ship i.
Based on the obtained optimal actions and optimal attack routes of each unmanned aerial vehicle, further, a collaborative attack strategy of the unmanned aerial vehicle cluster is obtained, and the hybrid strategy set in the step S3 is specifically,
the number of unmanned boats is divided into three strategies according to the number of unmanned boats,
(1) When the number of the unmanned boats is more than that of enemies, the unmanned boats adopt a scattered attack strategy;
(2) When the number of the unmanned boats is less than that of enemies, the unmanned boats adopt a centralized attack strategy;
(3) When the number of the unmanned boats is equal to that of the enemy, the unmanned boats carry out mixed attack by adopting strategies (1) and (2).
The hybrid attack specifically comprises the steps of selecting a strategy with the largest benefit obtained by taking action by each unmanned ship from the strategies (1) and (2), and making a strategy conforming to the largest benefit of the unmanned ship cluster according to the strategy selected by each unmanned ship.
Specifically, each unmanned boat in the my is trained through deep reinforcement learning, rewards are obtained through continuous interaction with the environment, and survival states and action values are learned and updated. In addition, an experience playback mechanism is further arranged for storing training experiences of each unmanned ship in the environment, and experience is extracted from the experience pool in a layered sampling mode to improve learning efficiency until the reward value converges when each unmanned ship in the current state cannot acquire larger rewards through changing actions.
When the number of the unmanned boats on the my side is more than that of the enemies, the unmanned boats on the my side have the same advantage function and rewarding function, and are in a complete cooperation relationship with each other, so that the probability of adopting a scattered attack strategy by the unmanned boats on the my side is 1 in order to obtain victory in a shorter time, the unmanned boats on the my side can carry out comprehensive attack on the unmanned boats on the enemies through scattered attack, the time cost is saved, the attack efficiency is improved, and the overall benefit of the unmanned boats on the my side is improved.
When the number of the unmanned boats is less than that of the enemy, the situation that the unmanned boats face to be hit by the enemy is considered in consideration of the fact that a single unmanned boat independently fights, and communication conditions can be perceived among the unmanned boats, so that the unmanned boats can inform the other unmanned boats of the situation through communication, and the aim of cooperative attack among the unmanned boats is achieved. In order to ensure the maximization of benefits, the probability of adopting a centralized attack strategy by the unmanned aerial vehicle cluster is 1, the unmanned aerial vehicle cluster carries out local many-to-one attack on the enemy unmanned aerial vehicle through centralized attack, the cost and loss in the attack process are reduced, and the individual benefit of the unmanned aerial vehicle cluster is improved.
When the number of unmanned boats is equal to the enemy, in order to be able to win in a shorter time and to ensure maximum benefit, the probability of the unmanned boat cluster to take a distributed attack strategy is εThe probability of adopting a centralized attack strategy is 1-epsilon. Each unmanned ship on my side selects action value function Q of strategy from strategy (1) and strategy (2) π And (c) the strategy with the biggest action benefit in (s, a), the unmanned aerial vehicle cluster can make a strategy which accords with the biggest action benefit of the unmanned aerial vehicle cluster according to the strategy selected by each unmanned aerial vehicle in the my, for example, a mixed attack strategy that the whole unmanned aerial vehicle cluster in the my is dispersed into a plurality of part unmanned aerial vehicle clusters in a certain range or a mixed attack strategy that all unmanned aerial vehicles in the my are dispersed to perform concentrated attack on the unmanned aerial vehicles in the my.
Further, based on the best attack route of each unmanned ship, in order to ensure that the benefit of the unmanned ship cluster is maximized, the nash balance point of the hybrid strategy is obtained in S3, specifically,
setting an equalization point of a mixing strategy, acquiring a characteristic value curve corresponding to the equalization point, performing game by both sides of the enemy based on the state information of the unmanned aerial vehicle cluster, acquiring benefits of the unmanned aerial vehicle cluster, acquiring characteristic value curve change of each unmanned aerial vehicle based on the benefits of the unmanned aerial vehicle cluster, and converging all characteristic value curves to one point when all characteristic value curves pass through a certain point and the slope is no longer changed, wherein the point is a Nash equalization point of the mixing strategy.
Wherein the status information includes: the number of unmanned boats, the survival status of the unmanned boats, the combined system status of the unmanned boat cluster, the actions of the unmanned boats, the status transition condition of the unmanned boats.
Specifically, the nash equilibrium point is (a, b), wherein a is a decentralized or centralized attack strategy of each unmanned ship, b is a decentralized or centralized attack strategy of a unmanned ship cluster, the decentralized attack strategy is represented by "0", and the centralized attack strategy is represented by "1". For example, nash equilibrium points are (0, 1), and each unmanned ship on behalf of my adopts a scattered attack strategy, and the unmanned ship clusters on my adopt a concentrated attack strategy, namely, in a certain range, the unmanned ship clusters on my scatter to perform concentrated attack on the unmanned ships on enemy, and the attack strategy is a local scattered and overall concentrated attack strategy.
Taking the proportion of the centralized attack strategy selected by each unmanned ship of the my as an abscissa and the proportion of the centralized attack strategy selected by the unmanned ship cluster of the my as an ordinate, performing mixed strategy game and Nash equilibrium processing, wherein when the number of the unmanned ships of the my is more than that of enemies, as shown in figure 1, nash equilibrium points are (0, 0) through the mixed strategy game and Nash equilibrium processing, and finally determining that each unmanned ship of the my and the unmanned ship cluster of the my all adopt a scattered attack strategy; as shown in fig. 2, when the number of the unmanned boats is less than that of the enemies, through mixed strategy game and Nash equilibrium processing, nash equilibrium points are (1, 1), and finally, each unmanned boat and each unmanned boat cluster in the my are determined to adopt a centralized attack strategy; as shown in fig. 3, when the number of the unmanned boats on the my side is equal to that of the enemies, the hybrid strategy game and the Nash equilibrium processing are performed, and the Nash equilibrium point is (1, 0), so that the hybrid attack strategy adopted by the my side is finally determined, namely the whole unmanned boat cluster is dispersed into a plurality of partial unmanned boat clusters, and attack is performed.
Simulation experiment
The unmanned ship cluster attack method is tested by simulating and generating the sea area countermeasure environment, specifically, the battle sea area is set to be a rectangular two-dimensional plane environment, the two parties set departure points randomly, the unmanned ships in the designated battle sea area track and strike, the unmanned ships in the battle sea area are detected by the enemy according to rules, and the surrounding environment is perceived to conduct path planning, so that the attack of the unmanned ships in the my can be expected to be avoided. On the premise that the ship speed and equipment of the unmanned boats of the two sides of the friend and foe have no obvious difference, the influence of the number of the unmanned boats of the two sides of the friend and foe on the cluster attack method of the unmanned boats of the application is evaluated in an experiment. MSDDPG is an unmanned ship cluster attack method, and DDPG is an unmanned ship cluster attack method which is not subjected to advantage processing and benefit gradient processing.
Fig. 4 is a plot of the benefit when the number of my unmanned boats is greater than that of enemies, and it can be seen that the final benefit of the unmanned boat cluster attack method MSDDPG of the present application is higher than that of the unmodified unmanned boat cluster attack method DDPG when the number of my unmanned boats is greater than that of enemies.
Fig. 5 is a plot of benefit when the number of unmanned boats is less than that of enemies, and it can be seen that when the number of unmanned boats is less than that of unmanned boats of enemies, the unmanned boat cluster attack method DDPG without improvement converges at about 6000 steps, and the unmanned boat cluster attack method MSDDPG of the present application converges at about 3000 steps, and compared with the unmanned boat cluster attack method DDPG without improvement, the unmanned boat cluster attack method MSDDPG of the present application can converge faster and can achieve higher final benefit.
Fig. 6 is a plot of benefit when the number of unmanned boats is equal to the enemy, and it can be seen that when the number of unmanned boats is equal to the enemy, the unmanned boat cluster attack method DDPG without improvement converges at about 8000 steps, and the unmanned boat cluster attack method MSDDPG of the present application converges at about 5000 steps, so that compared with the unmanned boat cluster attack method DDPG without improvement, the unmanned boat cluster attack method MSDDPG of the present application can converge faster, and achieve stable benefit.
Fig. 7 is a schematic diagram of a pursuit process under a hybrid attack strategy adopted by the unmanned aerial vehicle, in which the unmanned aerial vehicles D1, D2 and D3 are represented by white circles, the unmanned aerial vehicles E1, E2 and E3 are represented by gray circles, and the unmanned aerial vehicles are represented by black circles.
As can be seen from fig. 7 (1), during the process of pursuing the enemy unmanned vessels, the unmanned vessels D1, D2, D3 are as close to each other as possible, so as to enter a joint state, and the pursuing target of D1, D3 is E3; as can be seen from fig. 7 (2), D1 and D3 have completed the pursuit of the nearest target E3 and successfully kill E3, and meanwhile, the pursuit targets of D1 and D3 are converted from E3 to E1 nearest to the nearest target, and at this time, D2 seeks a centralized attack by tracking the trajectories of D1 and D3; as can be seen from fig. 7 (3), when three unmanned boats D1, D2, D3 on the my side have completed the combination, and when E2 is closest to the unmanned boat cluster on the my side, the targets of the unmanned boat cluster on the my side are changed from E1 to E2, and after they are extinguished, the attack targets are changed into the last unmanned boat E1 on the enemy side; as can be seen from fig. 7 (4), D1, D2, D3 enter a centralized attack state to attack and fight against the enemy unmanned ship E1, and by adopting a hybrid strategy, the unmanned ships cooperate to chase the enemy target, thereby completing the attack task against all the unmanned ships.
The invention also provides an attack system of the unmanned ship cluster, which comprises:
the acquisition module is used for: the method comprises the steps of acquiring information of unmanned boats of both sides of a friend or foe, wherein the information comprises position state information, advancing speed and movement track of the unmanned boats of both sides of the friend or foe;
and the profit processing module is used for: acquiring benefits obtained by taking actions by each unmanned ship according to the acquired information of the unmanned ships on both sides of the enemy, performing benefit gradient processing on the benefits obtained by taking actions by each unmanned ship on the other side of the enemy, performing action loss processing on the benefits obtained by taking actions by each unmanned ship after the benefit gradient processing, predicting next actions, and obtaining the optimal next actions of each unmanned ship on the other side of the enemy after policy loss processing on the predicted next actions, thereby obtaining the optimal attack route of each unmanned ship on the other side of the enemy;
attack strategy generation module: based on the obtained optimal attack route of each unmanned ship, a hybrid strategy collection of the unmanned ship cluster is obtained, balance points of the hybrid strategy are set, characteristic value curves corresponding to the balance points are obtained, nash balance points of the hybrid strategy are obtained, and a collaborative attack strategy of the unmanned ship cluster is obtained according to the Nash balance points of the hybrid strategy and the hybrid strategy collection.
The invention also provides an attack device of the unmanned ship cluster, which comprises a processor and a memory, wherein the attack method of the unmanned ship cluster is realized when the processor executes the computer program stored in the memory.
The invention also provides a storage medium for storing a computer program, wherein the computer program realizes the attack method of the unmanned ship cluster when being executed by a processor.

Claims (5)

1. An attack method of unmanned ship clusters is characterized by comprising the following steps:
s1: acquiring information of unmanned boats of both sides of a friend or foe, wherein the information comprises attribute information, advancing speed and movement track of the unmanned boats of both sides of the friend or foe;
s2: acquiring benefits obtained by taking actions by each unmanned ship according to the acquired information of the unmanned ships on both sides of the enemy, performing advantage processing on the benefits obtained by taking the actions by each unmanned ship on the my, performing benefit gradient processing, performing action loss processing on the benefits obtained by taking the actions by each unmanned ship after the benefit gradient processing, predicting next actions, and obtaining the optimal next actions of each unmanned ship on the my after policy loss processing on the predicted next actions, thereby obtaining the optimal attack route of each unmanned ship on the my;
the advantage processing in S2 specifically includes:
based on the benefits obtained by taking actions by each unmanned ship of the my, combining the benefits of the next action and the benefits obtained by converting the current action into the next action, screening out the actions taken by each unmanned ship of the my with the average benefits;
the benefit gradient processing in the S2 specifically comprises the following steps:
based on the benefits obtained by taking actions by each unmanned ship on the my side after the advantage processing, screening out the actions taken by each unmanned ship on the my side with benefits larger than a set threshold by combining the benefits of each unmanned ship on the my side under the mixing strategy and the next position state of each unmanned ship on the my side;
the action loss processing in S2 specifically includes:
based on the benefits obtained by taking actions by each unmanned ship of the my after the benefit gradient processing, obtaining an action loss value of the unmanned ship cluster of the my by using an action loss function;
the policy loss processing in S2 specifically includes:
based on the predicted next action, acquiring a strategy loss value after each predicted action by using a discriminant loss function;
s3: based on the obtained optimal attack route of each unmanned ship, obtaining a hybrid strategy set of the unmanned ship cluster, setting balance points of the hybrid strategy, obtaining a characteristic value curve corresponding to the balance points, obtaining Nash balance points of the hybrid strategy, and obtaining a collaborative attack strategy of the unmanned ship cluster according to the Nash balance points of the hybrid strategy and the hybrid strategy set;
the mixing policy set in S3 is specifically,
the number of unmanned boats is divided into three strategies according to the number of unmanned boats,
(1) When the number of the unmanned boats is more than that of enemies, the unmanned boats adopt a scattered attack strategy;
(2) When the number of the unmanned boats is less than that of enemies, the unmanned boats adopt a centralized attack strategy;
(3) When the number of the unmanned boats is equal to that of the enemy, the unmanned boats adopt strategies (1) and (2) to perform hybrid attack;
the hybrid attack, in particular,
and selecting a strategy with the largest benefit obtained by taking action by each unmanned ship from the strategy (1) and the strategy (2), and making a strategy conforming to the cluster benefit maximization of the unmanned ships according to the strategy selected by each unmanned ship.
2. The method for attacking an unmanned ship cluster according to claim 1, wherein said mixing policy-derived Nash equilibrium point in S3 is, in particular,
setting an equalization point of a mixing strategy, acquiring a characteristic value curve corresponding to the equalization point, performing game on both sides of a friend or foe based on state information of the unmanned aerial vehicle cluster, acquiring benefits of the unmanned aerial vehicle cluster, acquiring characteristic value curve change of each unmanned aerial vehicle based on the benefits of the unmanned aerial vehicle cluster, and when all characteristic value curves pass a certain point and the slope is not changed any more, wherein the point is a Nash equalization point of the mixing strategy.
3. An attack system for an unmanned ship cluster, comprising:
the acquisition module is used for: the method comprises the steps of acquiring information of unmanned boats of both sides of a friend or foe, wherein the information comprises position state information, advancing speed and movement track of the unmanned boats of both sides of the friend or foe;
and the profit processing module is used for: acquiring benefits obtained by taking actions by each unmanned ship according to the acquired information of the unmanned ships on both sides of the enemy, performing benefit gradient processing on the benefits obtained by taking actions by each unmanned ship on the other side of the enemy, performing action loss processing on the benefits obtained by taking actions by each unmanned ship after the benefit gradient processing, predicting next actions, and obtaining the optimal next actions of each unmanned ship on the other side of the enemy after policy loss processing on the predicted next actions, thereby obtaining the optimal attack route of each unmanned ship on the other side of the enemy;
the advantage treatment specifically comprises the following steps:
based on the benefits obtained by taking actions by each unmanned ship of the my, combining the benefits of the next action and the benefits obtained by converting the current action into the next action, screening out the actions taken by each unmanned ship of the my with the average benefits;
the profit gradient treatment specifically comprises the following steps:
based on the benefits obtained by taking actions by each unmanned ship on the my side after the advantage processing, screening out the actions taken by each unmanned ship on the my side with benefits larger than a set threshold by combining the benefits of each unmanned ship on the my side under the mixing strategy and the next position state of each unmanned ship on the my side;
the action loss processing specifically comprises the following steps:
based on the benefits obtained by taking actions by each unmanned ship of the my after the benefit gradient processing, obtaining an action loss value of the unmanned ship cluster of the my by using an action loss function;
the strategy loss processing specifically comprises the following steps:
based on the predicted next action, acquiring a strategy loss value after each predicted action by using a discriminant loss function;
attack strategy generation module: based on the obtained optimal attack route of each unmanned ship, obtaining a hybrid strategy set of the unmanned ship cluster, setting balance points of the hybrid strategy, obtaining a characteristic value curve corresponding to the balance points, obtaining Nash balance points of the hybrid strategy, and obtaining a collaborative attack strategy of the unmanned ship cluster according to the Nash balance points of the hybrid strategy and the hybrid strategy set;
the hybrid policy set is, in particular,
the number of unmanned boats is divided into three strategies according to the number of unmanned boats,
(1) When the number of the unmanned boats is more than that of enemies, the unmanned boats adopt a scattered attack strategy;
(2) When the number of the unmanned boats is less than that of enemies, the unmanned boats adopt a centralized attack strategy;
(3) When the number of the unmanned boats is equal to that of the enemy, the unmanned boats adopt strategies (1) and (2) to perform hybrid attack;
the hybrid attack, in particular,
and selecting a strategy with the largest benefit obtained by taking action by each unmanned ship from the strategy (1) and the strategy (2), and making a strategy conforming to the cluster benefit maximization of the unmanned ships according to the strategy selected by each unmanned ship.
4. An attack apparatus for an unmanned aerial vehicle cluster, comprising a processor and a memory, wherein the processor implements an attack method for an unmanned aerial vehicle cluster according to claim 1 or 2 when executing a computer program stored in the memory.
5. A storage medium storing a computer program, wherein the computer program when executed by a processor implements a method of attack of an unmanned aerial vehicle cluster according to claim 1 or 2.
CN202311271113.0A 2023-09-28 2023-09-28 Attack method, system and device for unmanned ship cluster and storage medium Active CN117313972B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311271113.0A CN117313972B (en) 2023-09-28 2023-09-28 Attack method, system and device for unmanned ship cluster and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311271113.0A CN117313972B (en) 2023-09-28 2023-09-28 Attack method, system and device for unmanned ship cluster and storage medium

Publications (2)

Publication Number Publication Date
CN117313972A CN117313972A (en) 2023-12-29
CN117313972B true CN117313972B (en) 2024-04-12

Family

ID=89259837

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311271113.0A Active CN117313972B (en) 2023-09-28 2023-09-28 Attack method, system and device for unmanned ship cluster and storage medium

Country Status (1)

Country Link
CN (1) CN117313972B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105278542A (en) * 2015-09-23 2016-01-27 沈阳航空航天大学 Counter-attack countermeasure optimal strategy method for multi-unmanned plane cooperative strike task
CN108873894A (en) * 2018-06-11 2018-11-23 上海大学 A kind of target following cooperative control system and method based on more unmanned boats
JP2020027656A (en) * 2018-08-14 2020-02-20 本田技研工業株式会社 Interaction recognition decision-making
CN111624996A (en) * 2020-05-12 2020-09-04 哈尔滨工程大学 Multi-unmanned-boat incomplete information trapping method based on game theory
CN112364972A (en) * 2020-07-23 2021-02-12 北方自动控制技术研究所 Unmanned fighting vehicle team fire power distribution method based on deep reinforcement learning
CN113052289A (en) * 2021-03-16 2021-06-29 东南大学 Unmanned ship cluster striking position selection method based on game theory
CN114167899A (en) * 2021-12-27 2022-03-11 北京联合大学 Unmanned aerial vehicle swarm cooperative countermeasure decision-making method and system
CN115328189A (en) * 2022-07-04 2022-11-11 合肥工业大学 Multi-unmanned aerial vehicle cooperative game decision method and system
CN115525058A (en) * 2022-10-24 2022-12-27 哈尔滨工程大学 Unmanned underwater vehicle cluster cooperative countermeasure method based on deep reinforcement learning
CN116009395A (en) * 2022-11-30 2023-04-25 哈尔滨工业大学 Fault-tolerant control method for multi-agent system in non-cooperative game
CN116050795A (en) * 2023-02-13 2023-05-02 上海大学 Unmanned ship cluster task scheduling and collaborative countermeasure method based on MADDPG
CN116225049A (en) * 2022-12-21 2023-06-06 中国航空工业集团公司沈阳飞机设计研究所 Multi-unmanned plane wolf-crowd collaborative combat attack and defense decision algorithm

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9931573B2 (en) * 2013-02-11 2018-04-03 University Of Southern California Optimal patrol strategy for protecting moving targets with multiple mobile resources
CN112766329B (en) * 2021-01-06 2022-03-22 上海大学 Multi-unmanned-boat cooperative interception control method and system

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105278542A (en) * 2015-09-23 2016-01-27 沈阳航空航天大学 Counter-attack countermeasure optimal strategy method for multi-unmanned plane cooperative strike task
CN108873894A (en) * 2018-06-11 2018-11-23 上海大学 A kind of target following cooperative control system and method based on more unmanned boats
JP2020027656A (en) * 2018-08-14 2020-02-20 本田技研工業株式会社 Interaction recognition decision-making
CN111624996A (en) * 2020-05-12 2020-09-04 哈尔滨工程大学 Multi-unmanned-boat incomplete information trapping method based on game theory
CN112364972A (en) * 2020-07-23 2021-02-12 北方自动控制技术研究所 Unmanned fighting vehicle team fire power distribution method based on deep reinforcement learning
CN113052289A (en) * 2021-03-16 2021-06-29 东南大学 Unmanned ship cluster striking position selection method based on game theory
CN114167899A (en) * 2021-12-27 2022-03-11 北京联合大学 Unmanned aerial vehicle swarm cooperative countermeasure decision-making method and system
CN115328189A (en) * 2022-07-04 2022-11-11 合肥工业大学 Multi-unmanned aerial vehicle cooperative game decision method and system
CN115525058A (en) * 2022-10-24 2022-12-27 哈尔滨工程大学 Unmanned underwater vehicle cluster cooperative countermeasure method based on deep reinforcement learning
CN116009395A (en) * 2022-11-30 2023-04-25 哈尔滨工业大学 Fault-tolerant control method for multi-agent system in non-cooperative game
CN116225049A (en) * 2022-12-21 2023-06-06 中国航空工业集团公司沈阳飞机设计研究所 Multi-unmanned plane wolf-crowd collaborative combat attack and defense decision algorithm
CN116050795A (en) * 2023-02-13 2023-05-02 上海大学 Unmanned ship cluster task scheduling and collaborative countermeasure method based on MADDPG

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
基于博弈论模型的多机协同对抗多目标任务决策方法;姚宗信;李明;陈宗基;;航空计算技术;20070515(第03期);全文 *
基于无人机/无人艇的最优动态覆盖观测技术;姚鹏;綦声波;黎明;;海洋科学;20180115(第01期);全文 *
姚宗信 ; 李明 ; 陈宗基 ; .基于博弈论模型的多机协同对抗多目标任务决策方法.航空计算技术.2007,(第03期),全文. *
姚鹏 ; 綦声波 ; 黎明 ; .基于无人机/无人艇的最优动态覆盖观测技术.海洋科学.2018,(第01期),全文. *

Also Published As

Publication number Publication date
CN117313972A (en) 2023-12-29

Similar Documents

Publication Publication Date Title
CN108459616B (en) Unmanned aerial vehicle group collaborative coverage route planning method based on artificial bee colony algorithm
CN113741508B (en) Unmanned aerial vehicle task allocation method based on improved wolf pack algorithm
CN112198892B (en) Multi-unmanned aerial vehicle intelligent cooperative penetration countermeasure method
CN112766329B (en) Multi-unmanned-boat cooperative interception control method and system
CN116360503B (en) Unmanned plane game countermeasure strategy generation method and system and electronic equipment
CN113268078B (en) Target tracking and capturing method for self-adaptive environment of unmanned aerial vehicle group
Nguyen et al. Multi-agent behavioral control system using deep reinforcement learning
CN113052289A (en) Unmanned ship cluster striking position selection method based on game theory
CN112651486A (en) Method for improving convergence rate of MADDPG algorithm and application thereof
Hao et al. Independent generative adversarial self-imitation learning in cooperative multiagent systems
CN112305913A (en) Multi-UUV collaborative dynamic maneuver decision method based on intuitive fuzzy game
CN116225049A (en) Multi-unmanned plane wolf-crowd collaborative combat attack and defense decision algorithm
Prajapat et al. Competitive policy optimization
CN110703759B (en) Ship collision prevention processing method for multi-ship game
CN112306070A (en) Multi-AUV dynamic maneuver decision method based on interval information game
CN117313972B (en) Attack method, system and device for unmanned ship cluster and storage medium
CN117313561B (en) Unmanned aerial vehicle intelligent decision model training method and unmanned aerial vehicle intelligent decision method
CN113324545A (en) Multi-unmanned aerial vehicle collaborative task planning method based on hybrid enhanced intelligence
CN114167899B (en) Unmanned plane bee colony collaborative countermeasure decision-making method and system
CN116225065A (en) Unmanned plane collaborative pursuit method of multi-degree-of-freedom model for multi-agent reinforcement learning
CN115755971A (en) Cooperative confrontation task allocation method for sea-air integrated unmanned intelligent equipment
CN113962013B (en) Aircraft countermeasure decision making method and device
CN113095465B (en) Underwater unmanned cluster task allocation method for quantum salmon migration mechanism evolution game
Shao et al. Mask Atari for deep reinforcement learning as POMDP benchmarks
Bin et al. Grouped attack strategy of multi-UAV imitating Hawk hunting behaviors

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant