CN114722998B - Construction method of soldier chess deduction intelligent body based on CNN-PPO - Google Patents

Construction method of soldier chess deduction intelligent body based on CNN-PPO Download PDF

Info

Publication number
CN114722998B
CN114722998B CN202210232129.XA CN202210232129A CN114722998B CN 114722998 B CN114722998 B CN 114722998B CN 202210232129 A CN202210232129 A CN 202210232129A CN 114722998 B CN114722998 B CN 114722998B
Authority
CN
China
Prior art keywords
network
actor
output
neural network
situation data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210232129.XA
Other languages
Chinese (zh)
Other versions
CN114722998A (en
Inventor
张震
臧兆祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Three Gorges University CTGU
Original Assignee
China Three Gorges University CTGU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Three Gorges University CTGU filed Critical China Three Gorges University CTGU
Priority to CN202210232129.XA priority Critical patent/CN114722998B/en
Publication of CN114722998A publication Critical patent/CN114722998A/en
Application granted granted Critical
Publication of CN114722998B publication Critical patent/CN114722998B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method for constructing a deduction intelligent body of a chess game based on CNN-PPO, which comprises the following steps: collecting initial situation data of a chess deduction platform, and preprocessing the initial situation data to obtain target situation data; an influence map module is constructed, target situation data is input into the influence map module, and influence characteristics are output and obtained; and (3) constructing a hybrid neural network model based on convolutional neural network and near-end strategy optimization, splicing target situation data and influence characteristics, inputting the hybrid neural network model for model iterative training until an objective function is minimum and the network converges, and constructing the CNN-PPO intelligent body. The invention increases the understanding degree of the intelligent agent to the situation and increases the intensity of the intelligent agent to a certain extent.

Description

Construction method of soldier chess deduction intelligent body based on CNN-PPO
Technical Field
The invention belongs to the technical field of computers, and particularly relates to a construction method of a chess deduction intelligent body based on CNN-PPO.
Background
The deduction of the chess mainly uses experience and rules summarized from war practice to carry out deduction analysis on the fight process. With the rapid development of computing capability of a computer, various new technologies are applied to the deduction of the chess, the deduction of the chess by the computer becomes a main branch of the deduction of the chess, and the deduction of the chess by the computer is regarded as a means for improving the military capability in various countries of the world.
In a specific chess deduction, it is generally simplified to be one such problem: under the limitation of a certain objective rule, certain targets are realized through the deployment, maneuvering, attack and other actions of the force, such as capturing control points or extinguishing the force of the enemy force. The purpose of constructing the weapon deduction agent is to obtain a director which can autonomously make corresponding action decisions according to the current battlefield situation. According to whether the agent has learning ability, it is classified into a rule type and a learning type. Wherein the regular agent is implemented by hard programming means, multiple branch loops are used to specify that the agent takes some action at a certain moment, a commonly used technique is behavior tree. Learning agents are agents with autonomous learning ability represented by machine learning models, which can update network parameters during the fight process, thus obtaining more excellent models.
The existing intelligent body construction method is mainly divided into a rule type model and a neural network model, and because of huge state space in the deduction of the chess, all conditions are difficult to cover according to rules formulated by expert experience, and only the states can be classified in a more general way, so that the rule type intelligent body is relatively stiff in decision making and cannot flexibly cope with emergency. The difficulty faced by the neural network model is mainly that sparse rewards given by the environment are difficult to effectively update network parameters, dimension explosion and the like.
Disclosure of Invention
In order to solve the problems, the invention provides the following scheme: a method for constructing a chess deduction intelligent body based on CNN-PPO comprises the following steps:
collecting initial situation data of a chess deduction platform, and preprocessing the initial situation data to obtain target situation data;
an influence map module is constructed, the target situation data is input into the influence map module, and influence characteristics are obtained through output;
and constructing a hybrid neural network model based on convolutional neural network and near-end strategy optimization, splicing the target situation data and the influence characteristics, inputting the hybrid neural network model for model iterative training until an objective function is minimum and the network converges, and constructing the CNN-PPO intelligent body.
Preferably, preprocessing the initial situation data to screen the initial situation data, and removing nonstandard data to obtain target situation data;
the initial situation data comprises attribute information of a host combat entity, attribute information of an enemy combat entity, map view attribute information and score board information;
the nonstandard data comprise redundant data, data with missing formats, null values and error information.
Preferably, the overall architecture of the hybrid neural network model is a CNN-PPO architecture, and comprises a convolutional neural network, an actor_new network, an actor_old network and a Critic network;
the convolutional neural network is used for mining potential relations between target situation data, and extracting hidden features;
the actor_new network, the actor_old network and the Critic network are all three-layer fully-connected neural networks.
Preferably, before the mixed neural network model is input to perform model iterative training, the method further comprises the step of inputting the output of the convolutional neural network into an Actor network in the PPO architecture to obtain the output of the Actor network; and splicing the output of the Actor network and the output of the convolutional neural network, and inputting the spliced output to a Critic network to obtain the output of the Critic network.
Preferably, the output of the convolutional neural network is input into an Actor network in the PPO architecture, and obtaining the output of the Actor network includes inputting the output of the convolutional neural network into an actor_new network to obtain two parameter values of μ and σ; establishing normal distribution based on the two parameter values, wherein mu is the mean value of the normal distribution, and sigma is an equation of the normal distribution; and obtaining an action according to the normal distribution sampling, and obtaining a reward value given by the environment and a next time state through interaction of the action and the environment.
Preferably, splicing the output of the Actor network and the output of the convolutional neural network, inputting the spliced output into a Critic network, wherein obtaining the output of the Critic network comprises inputting situation data at the next moment into the Critic network, obtaining the output V_ of the network, and calculating a discount rewarding value; inputting the state values of the T moments into a Critic network to obtain T V_values; the mean square error of the discount prize values R and V_is calculated, and the Critic network is updated by using a back propagation mechanism. Where V_is the estimated benefit value obtained by taking action a in state S and calculating the discount prize value.
Preferably, the mixed neural network model is input to perform model iterative training, namely N times of optimization is performed on network parameters by using a mean square error loss function, B times of optimization is performed on the Actor network and the convolutional neural network until an objective function is minimum and the network converges.
Preferably, the optimization of the network parameters for N times by using the mean square error loss function, and the optimization of the Actor network and the convolutional neural network for B times comprise respectively inputting all state values in the experience pool into the actor_new network and the actor_old network to obtain normal distributions N1 and N2 of actions; inputting all actions in the experience pool into N1 and N2 to obtain probabilities p1 and p2, and calculating the value of p2/p1 based on the probability values of p1 and p 2; and calculating the error of the Actor network, updating parameters by using a back propagation mechanism, training a model until convergence, and constructing the CNN-PPO intelligent body.
The invention discloses the following technical effects:
the invention provides a method for constructing a soldier chess deduction intelligent body based on CNN-PPO, which is based on a convolutional neural network to perform potential association mining on initial situation data, obtain influence characteristic information, input the influence characteristic and the initial situation data into a PPO algorithm model together for learning, form a hybrid neural network model by adopting the Convolutional Neural Network (CNN) and a near-end strategy optimization (PPO), and artificially add characteristics formed by an influence map in terms of characteristic processing. This makes the convolutional neural network converge faster when processing the feature data, and the action choices given by the whole agent are also more careful. The understanding degree of the intelligent agent on the situation is increased, and the intensity of the intelligent agent fight is increased to a certain extent.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method according to an embodiment of the invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.
As shown in FIG. 1, the invention provides a method for constructing a deduction intelligent body of a chess game based on CNN-PPO, which comprises the following steps:
collecting initial situation data of a chess deduction platform, and preprocessing the initial situation data to obtain target situation data;
an influence map module is constructed, the target situation data is input into the influence map module, and influence characteristics are obtained through output;
and constructing a hybrid neural network model based on convolutional neural network and near-end strategy optimization, splicing the target situation data and the influence characteristics, inputting the hybrid neural network model for model iterative training until an objective function is minimum and the network converges, and constructing the CNN-PPO intelligent body.
Preprocessing the initial situation data to screen the initial situation data, removing nonstandard data and obtaining target situation data;
the initial situation data comprises attribute information of a host combat entity, attribute information of an enemy combat entity, map view attribute information and score board information;
the nonstandard data comprise redundant data, data with missing formats, null values and error information.
The hybrid neural network model is of a CNN-PPO architecture and comprises a convolutional neural network, an actor_new network, an actor_old network and a Critic network;
the convolutional neural network is used for mining potential relations between target situation data, and extracting hidden features;
the actor_new network, the actor_old network and the Critic network are all three-layer fully-connected neural networks.
Before the mixed neural network model is input for model iterative training, the output of the convolutional neural network is input into an Actor network in the PPO architecture, and the output of the Actor network is obtained; and splicing the output of the Actor network and the output of the convolutional neural network, and inputting the spliced output to a Critic network to obtain the output of the Critic network.
The output of the convolutional neural network is input into an Actor network in the PPO architecture, and the obtaining of the output of the Actor network comprises the input of the output of the convolutional neural network into the actor_new network to obtain two parameter values of mu and sigma; establishing normal distribution based on the two parameter values, wherein mu is the mean value of the normal distribution, and sigma is an equation of the normal distribution; and obtaining an action according to the normal distribution sampling, and obtaining a reward value given by the environment and a next time state through interaction of the action and the environment.
Splicing the output of the Actor network and the output of the convolutional neural network, inputting the spliced output into a Critic network, wherein obtaining the output of the Critic network comprises inputting situation data at the next moment into the Critic network to obtain an output V_ of the network, and calculating a discount rewarding value; inputting the state values of the T moments into a Critic network to obtain T V_values; the mean square error of the discount prize values R and V_is calculated, and the Critic network is updated by using a back propagation mechanism. Where V_is the estimated benefit value obtained by taking action a in state S and calculating the discount prize value.
And (3) inputting the hybrid neural network model to perform model iterative training, namely performing N times of optimization on network parameters by using a mean square error loss function, and performing B times of optimization on the Actor network and the convolutional neural network until an objective function is minimum and the network converges.
The method comprises the steps of optimizing network parameters for N times by using a mean square error loss function, and optimizing an Actor network and a convolutional neural network for B times, wherein all state values in an experience pool are respectively input into the actor_new network and the actor_old network to obtain normal distributions N1 and N2 of actions; inputting all actions in the experience pool into N1 and N2 to obtain probabilities p1 and p2, and calculating the value of p2/p1 based on the probability values of p1 and p 2; and calculating the error of the Actor network, updating parameters by using a back propagation mechanism, training a model until convergence, and constructing the CNN-PPO intelligent body.
Example 1
As shown in fig. 1, the method for constructing the deduction intelligent agent for the chess based on CNN-PPO provided by the invention comprises the following steps:
step 1: and operating the chess deduction platform, creating a scene of chess deduction, and obtaining situation data returned by the platform. These situation data are generated by the random initialization of the neural network model actor_new network against robots built into the environment. The method specifically comprises the following steps:
1.1 the rule type intelligent agent is arranged in the chess deduction platform, so that the training of man-machine countermeasure and machine countermeasure can be provided. And using an actor_new network to fight against the built-in intelligent agent and generating situation data. The actor_new network is a three-layer fully-connected neural network.
Step 2: and (3) screening situation data returned by the platform in the step (1) to remove nonstandard data. The nonstandard data mainly refer to redundant data, data with missing formats and the like, and the data are removed. In the data generated by the fight built-in robot, some rewards are positive and most rewards are negative, and experience of positive rewards is collected preferentially during collection.
The step 2 specifically comprises the following steps:
2.1 the situation data mainly comprises own entity attributes, entity attributes of which enemies have been discovered, map attributes and scoreboard information.
2.2, the nonstandard data mainly refer to null values, error information and the like.
The invention adopts the reinforcement learning idea and the influence map idea. Reinforcement learning is to program the problem as a Markov decision process and solve the problem through iteration. The influence map divides situation features into primary features and secondary features. The first-level features comprise attribute information of own combat entity and attribute information of enemy combat entity; the secondary features include map viewing information, scoreboard information, and influence map information.
Step 3: and inputting the screened data into an influence map module, wherein the input of the influence map module is situation information comprising own party/enemy entity information and map information. And outputting the influence characteristics of a certain point of the map.
The step 3 specifically comprises the following steps:
3.1 constructing an influence map module, wherein the influence map module is a module for further extracting situation data, and the influence in a certain range around own entity is given by the following formula:
e=ine+high+da+di
ine in the formula is a view coefficient, and the view is whether there is a shielding between two coordinates, and the non-shielding is called visible view, and the shielding is called invisible view. high is the elevation, i.e. the altitude in a popular sense. da is a risk coefficient, and di is the distance from the robbed control point.
3.2 outputting map points which are generally set as a certain area around a host entity, taking hexagonal lattices as an example, and outputting influence coefficients of all hexagonal lattices in n hexagonal lattices from the host unit.
When the own entity is in the area with negative influence, the rewarding function gives a negative value as punishment to the intelligent agent, and when the own entity is in the area with positive influence, the rewarding function takes a positive value as rewarding to the intelligent agent.
The form of the reward function is as follows:
R=r a +r c +r d +a
wherein r is a A score representing the current surviving caprine units; r is (r) c A score representing the occupied robbery control point; r is (r) d A unit score representing a annihilation enemy; a representsThe current situation score is the blood volume lost by the hit at the previous moment, or the effective score of hit against the adversary.
Step 4: a hybrid neural network is constructed that is constructed using a near-end policy optimization algorithm (PPO) architecture.
The step 4 specifically comprises the following steps:
4.1 construction of convolutional neural networks to mine potential links between situation data.
4.2 constructing a hybrid neural network overall architecture according to a PPO algorithm architecture, wherein the hybrid neural network overall architecture is a CNN-PPO architecture and consists of 4 neural networks, namely a convolutional neural network, an actor_new network, an actor_old network and a Critic network.
The convolutional neural network is used for extracting hidden features, the CNN uses 3 convolutional kernels with different sizes, different potential features are respectively concerned, and the calculation formula of a CNN model is as follows:
x t =σ cnn (w cnn ⊙x t +b cnn )
wherein x is t Representing the current state characteristics, w cnn Representing the weights of the filters, b cnn Representing the deviation parameter, sigma cnn Representing an activation function.
The Actor network is used for generating a target state according to the current state s t Obtaining mu and sigma values, establishing normal distribution N according to the mu and sigma values, sampling from the distribution N to obtain action a, obtaining a reward value r given by the environment, and observing the next state s after the environment changes t+1 . The gradient is given as:
further, the Actor network is updated using the gradient described above.
Wherein P is θ (a t |s t ) For sampling strategy, P θ '(a t |s t ) And (5) a sampling strategy after parameter updating.
The network of the Critic is a network of the Critic,according to the input state s t Action a t Calculate the action value function Q (s t ,a t ) The Critic network loss calculation formula is:
loss=(r+γ(maxQ(s',a'))-Q(s,a)) 2
where r is the prize value given by the environment, γ is the discount factor, and Q (s, a) is a function of the action value, representing the benefit of taking action a in state s.
And constructing an actor_new network, an actor_old network and a Critic network according to the PPO algorithm architecture. The actor_new network uses three layers of fully-connected neural networks, wherein the number of neurons of a first layer is 42, the number of neurons of a second layer is 128, and the number of neurons of a third layer is 15. The Critic network uses a three-layer fully-connected neural network, wherein the number of neurons of the first layer is 57, the number of neurons of the second layer is 64, and the number of neurons of the third layer is 1. The actor_new network is consistent with the actor_old network architecture. After the model is built, network parameters are initialized randomly.
Step 5: and (3) splicing the situation information with the influence characteristics output by the influence module in the step (3), and inputting the spliced information into a convolutional neural network to obtain the output of the convolutional neural network. And (3) splicing the situation information with the influence characteristics output by the influence module in the step (3), and inputting the spliced information into a convolutional neural network to obtain the output of the convolutional neural network. The input of the convolutional neural network is an 80-dimensional input vector formed by splicing 26-dimensional initial situation and 54-dimensional influence characteristics, and the output is a 42-dimensional output vector.
The step 5 specifically comprises the following steps:
and 5.1, splicing the initial situation information and the characteristic information extracted by the influence map module, and merging and inputting the initial situation information and the characteristic information into the convolutional neural network. The splicing mode is direct splicing and adding according to the phase.
5.2 convolutional neural networks use a number of different sizes of convolutional kernels to focus on different potential features.
Step 6: and inputting the output of the convolutional neural network into an Actor network in the PPO architecture, and obtaining the output of the Actor network.
The step 6 specifically comprises the following steps:
6.1 building an experience pool, storingStore each experience information in the form of
Step 7: and splicing the output of the Actor network and the output of the convolutional neural network, and inputting the spliced output to the Critic network to obtain the output of the Critic network. And (3) optimizing the network parameters N times by using a mean square error loss function, and optimizing the Actor network and the convolutional neural network B times until the objective function is minimum and the network converges.
The overall flow order of data in the four networks is: the initial situation data is input into an influence map module to obtain a secondary influence characteristic; the initial situation data and the secondary influence characteristics are spliced and input into a convolutional neural network to obtain the output of the convolutional neural network; the output of the convolutional neural network is input into an actor_new network to obtain two values of mu and sigma, and a normal distribution is established by using the two values to represent the distribution of actions; sampling from the normal distribution to obtain an action; the action interacts with the environment to obtain a rewarding value given by the environment and a next time state; and inputting situation data at the next moment into the Critic network to obtain an output V_ of the network, and calculating a discount rewarding value. Inputting the state values of the T moments into a Critic network to obtain T V_values; calculating the mean square error of the discount rewards value R and V; the Critic network is then updated using the back-propagation mechanism.
Inputting all state values in the experience pool into an actor_new network and an actor_old network respectively to obtain normal distributions N1 and N2 of actions; inputting all a values in the experience pool into N1 and N2 to obtain probabilities p1 and p2, and then calculating the value of p2/p1 by using p1 and p 2; and then the following formula is used for calculating the error of the Actor network, and the parameter is updated by using a back propagation mechanism.
Training the model until convergence, namely, the establishment of the CNN-PPO agent is completed.
The above embodiments are only illustrative of the preferred embodiments of the present invention and are not intended to limit the scope of the present invention, and various modifications and improvements made by those skilled in the art to the technical solutions of the present invention should fall within the protection scope defined by the claims of the present invention without departing from the design spirit of the present invention.

Claims (4)

1. The method for constructing the deduction intelligent body of the chess based on the CNN-PPO is characterized by comprising the following steps:
collecting initial situation data of a chess deduction platform, and preprocessing the initial situation data to obtain target situation data;
an influence map module is constructed, the target situation data is input into the influence map module, and influence characteristics are obtained through output;
constructing a hybrid neural network model based on convolutional neural network and near-end strategy optimization, splicing the target situation data and the influence characteristics, inputting the hybrid neural network model for model iterative training until an objective function is minimum and the network converges, and constructing a CNN-PPO intelligent body;
the hybrid neural network model is of a CNN-PPO architecture and comprises a convolutional neural network, an actor_new network, an actor_old network and a Critic network;
the convolutional neural network is used for mining potential relations between target situation data, and extracting hidden features;
the actor_new network, the actor_old network and the Critic network are all three-layer fully-connected neural networks;
before the mixed neural network model is input for model iterative training, the output of the convolutional neural network is input into an actor_new network in the PPO architecture, and the output of the actor_new network is obtained; splicing the output of the actor_new network and the output of the convolutional neural network, and inputting the spliced output into a Critic network to obtain the output of the Critic network;
the output of the convolutional neural network is input into an actor_new network in the PPO architecture, and the obtaining of the output of the actor_new network comprises the steps of inputting the output of the convolutional neural network into the actor_new network to obtain two parameter values of mu and sigma; establishing normal distribution based on the two parameter values, wherein mu is the mean value of the normal distribution, and sigma is an equation of the normal distribution; obtaining an action according to the normal distribution sampling, and obtaining a rewarding value given by the environment and a next time state through interaction of the action and the environment;
splicing the output of the Actor network and the output of the convolutional neural network, inputting the spliced output into a Critic network, wherein obtaining the output of the Critic network comprises inputting situation data at the next moment into the Critic network to obtain an output V_ of the network, and calculating a discount rewarding value; inputting the state values of the T moments into a Critic network to obtain T V_values; calculating the mean square error of the discount rewards value R and V_and updating the Critic network by using a back propagation mechanism; where V_is the estimated benefit value obtained by taking action a in state S.
2. The construction method of the deduction agent for the chess based on CNN-PPO according to claim 1, wherein,
preprocessing the initial situation data to screen the initial situation data, removing nonstandard data and obtaining target situation data;
the initial situation data comprises attribute information of a host combat entity, attribute information of an enemy combat entity, map view attribute information and score board information;
the nonstandard data comprise redundant data, data with missing formats, null values and error information.
3. The construction method of the deduction agent for the chess based on CNN-PPO according to claim 1, wherein,
and (3) inputting the hybrid neural network model to perform model iterative training, namely performing N times of optimization on network parameters by using a mean square error loss function, and performing B times of optimization on the Actor network and the convolutional neural network until an objective function is minimum and the network converges.
4. The method for constructing a deduction agent for chess based on CNN-PPO according to claim 3, wherein,
the method comprises the steps of optimizing network parameters for N times by using a mean square error loss function, and optimizing an Actor network and a convolutional neural network for B times, wherein all state values in an experience pool are respectively input into the actor_new network and the actor_old network to obtain normal distributions N1 and N2 of actions; inputting all actions in the experience pool into N1 and N2 to obtain probabilities p1 and p2, and calculating the value of p2/p1 based on the probability values of p1 and p 2; and calculating the error of the Actor network, updating parameters by using a back propagation mechanism, training a model until convergence, and constructing the CNN-PPO intelligent body.
CN202210232129.XA 2022-03-09 2022-03-09 Construction method of soldier chess deduction intelligent body based on CNN-PPO Active CN114722998B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210232129.XA CN114722998B (en) 2022-03-09 2022-03-09 Construction method of soldier chess deduction intelligent body based on CNN-PPO

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210232129.XA CN114722998B (en) 2022-03-09 2022-03-09 Construction method of soldier chess deduction intelligent body based on CNN-PPO

Publications (2)

Publication Number Publication Date
CN114722998A CN114722998A (en) 2022-07-08
CN114722998B true CN114722998B (en) 2024-02-02

Family

ID=82238024

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210232129.XA Active CN114722998B (en) 2022-03-09 2022-03-09 Construction method of soldier chess deduction intelligent body based on CNN-PPO

Country Status (1)

Country Link
CN (1) CN114722998B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115829034B (en) * 2023-01-09 2023-05-30 白杨时代(北京)科技有限公司 Method and device for constructing knowledge rule execution framework

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108171796A (en) * 2017-12-25 2018-06-15 燕山大学 A kind of inspection machine human visual system and control method based on three-dimensional point cloud
CN109948642A (en) * 2019-01-18 2019-06-28 中山大学 Multiple agent cross-module state depth deterministic policy gradient training method based on image input
CN111605565A (en) * 2020-05-08 2020-09-01 昆山小眼探索信息科技有限公司 Automatic driving behavior decision method based on deep reinforcement learning
CN112861442A (en) * 2021-03-10 2021-05-28 中国人民解放军国防科技大学 Multi-machine collaborative air combat planning method and system based on deep reinforcement learning
CN113222106A (en) * 2021-02-10 2021-08-06 西北工业大学 Intelligent military chess deduction method based on distributed reinforcement learning
CN113947022A (en) * 2021-10-20 2022-01-18 哈尔滨工业大学(深圳) Near-end strategy optimization method based on model

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130325774A1 (en) * 2012-06-04 2013-12-05 Brain Corporation Learning stochastic apparatus and methods
US9679258B2 (en) * 2013-10-08 2017-06-13 Google Inc. Methods and apparatus for reinforcement learning
MX2018000942A (en) * 2015-07-24 2018-08-09 Deepmind Tech Ltd Continuous control with deep reinforcement learning.

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108171796A (en) * 2017-12-25 2018-06-15 燕山大学 A kind of inspection machine human visual system and control method based on three-dimensional point cloud
CN109948642A (en) * 2019-01-18 2019-06-28 中山大学 Multiple agent cross-module state depth deterministic policy gradient training method based on image input
CN111605565A (en) * 2020-05-08 2020-09-01 昆山小眼探索信息科技有限公司 Automatic driving behavior decision method based on deep reinforcement learning
CN113222106A (en) * 2021-02-10 2021-08-06 西北工业大学 Intelligent military chess deduction method based on distributed reinforcement learning
CN112861442A (en) * 2021-03-10 2021-05-28 中国人民解放军国防科技大学 Multi-machine collaborative air combat planning method and system based on deep reinforcement learning
CN113947022A (en) * 2021-10-20 2022-01-18 哈尔滨工业大学(深圳) Near-end strategy optimization method based on model

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Actor–Critic Reinforcement Learning and Application in Developing Computer-Vision-Based Interface Tracking;Oguzhan Dogru等;Engineering;第7卷(第9期);1248-1261 *
基于强化学习的兵棋推演智能对抗研究与实现;薛傲;中国优秀硕士学位论文全文数据库社会科学Ⅰ辑(第01期);6, 44 *
基于深度强化学习的兵棋推演决策方法框架;崔文华;李东;唐宇波;柳少军;;国防科技(第02期);118-126 *

Also Published As

Publication number Publication date
CN114722998A (en) 2022-07-08

Similar Documents

Publication Publication Date Title
CN112329348B (en) Intelligent decision-making method for military countermeasure game under incomplete information condition
CN110119773B (en) Global situation assessment method, system and device of strategic gaming system
CN113222106B (en) Intelligent soldier chess deduction method based on distributed reinforcement learning
CN114757351B (en) Defense method for resisting attack by deep reinforcement learning model
CN113723013B (en) Multi-agent decision-making method for continuous space soldier chess deduction
CN112364972A (en) Unmanned fighting vehicle team fire power distribution method based on deep reinforcement learning
CN114722998B (en) Construction method of soldier chess deduction intelligent body based on CNN-PPO
CN113625569B (en) Small unmanned aerial vehicle prevention and control decision method and system based on hybrid decision model
CN116661503B (en) Cluster track automatic planning method based on multi-agent safety reinforcement learning
CN115328189B (en) Multi-unmanned plane cooperative game decision-making method and system
CN114638339A (en) Intelligent agent task allocation method based on deep reinforcement learning
CN113255893B (en) Self-evolution generation method of multi-agent action strategy
CN116596343A (en) Intelligent soldier chess deduction decision method based on deep reinforcement learning
Wu et al. Dynamic multitarget assignment based on deep reinforcement learning
CN113988301B (en) Tactical strategy generation method and device, electronic equipment and storage medium
CN115220458A (en) Distributed decision-making method for multi-robot multi-target enclosure based on reinforcement learning
Bian et al. Cooperative strike target assignment algorithm based on deep reinforcement learning
CN114611669B (en) Intelligent decision-making method for chess deduction based on double experience pool DDPG network
CN114662655B (en) Attention mechanism-based method and device for deriving AI layering decision by soldier chess
CN117252081A (en) Method for dynamically allocating air defense weapon-target to be driven
CN112926729B (en) Man-machine confrontation intelligent agent strategy making method
CN117973494A (en) Method, device and medium for enabling reinforcement learning oriented to multiple agents to be interpretable
CN114662655A (en) Attention mechanism-based weapon and chess deduction AI hierarchical decision method and device
CN117151224A (en) Strategy evolution training method, device, equipment and medium for strong random game of soldiers
CN117687436A (en) Unmanned aerial vehicle group attack and defense countermeasure game target distribution method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant