CN110717591A - Falling strategy and layout evaluation method suitable for various chess - Google Patents

Falling strategy and layout evaluation method suitable for various chess Download PDF

Info

Publication number
CN110717591A
CN110717591A CN201910929174.9A CN201910929174A CN110717591A CN 110717591 A CN110717591 A CN 110717591A CN 201910929174 A CN201910929174 A CN 201910929174A CN 110717591 A CN110717591 A CN 110717591A
Authority
CN
China
Prior art keywords
probability
neural network
falling
pos
chess
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910929174.9A
Other languages
Chinese (zh)
Other versions
CN110717591B (en
Inventor
路红
王琳
杨博弘
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Original Assignee
Fudan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University filed Critical Fudan University
Priority to CN201910929174.9A priority Critical patent/CN110717591B/en
Publication of CN110717591A publication Critical patent/CN110717591A/en
Application granted granted Critical
Publication of CN110717591B publication Critical patent/CN110717591B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F3/00Board games; Raffle games
    • A63F3/02Chess; Similar board games
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention belongs to the technical field of computer games, and particularly relates to a falling strategy and a layout evaluation method suitable for various chess games. The method comprises the following steps: predicting the falling probability and the falling estimation value through a neural network; generating training data by using an MCTS algorithm and an Update Board Value algorithm; iteratively training a neural network by a reinforcement learning method; finally, the MCTS algorithm is used for outputting the drop strategy and the local estimation. The invention provides a situation evaluation function and a falling strategy function which are friendly to human beings, do not need to know the advantage value of a prior party and are suitable for various chess types (such as go, black and white chess, Chinese chess and checkers).

Description

Falling strategy and layout evaluation method suitable for various chess
Technical Field
The invention belongs to the technical field of computer games, and particularly relates to a method for evaluating a plurality of chess falling strategies and positions.
Background
With the development of computer game technology, the chess world champions are defeated by the deep blue of the Alphabeta algorithm, and the Weiqi world champions are defeated by the AlphaGo Zero algorithm. However, the situation assessment function in computer gaming is still unsolved. Taking computer vision as an example, the situation assessment function of AlphaZero is equivalent to picture classification, i.e. to find the winning probability of the current situation. The human method is more like picture pixel level segmentation, namely, the occupation probability of each point in the current situation is obtained.
The difficulty of solving the form judgment function of the chess such as go, black and white chess and the like by utilizing human knowledge is higher. In addition, the relative value of each piece in chess and Chinese chess is difficult to evaluate. If the form judgment function can be solved, the analysis of the chess related to occupation areas, such as go and black and white chess, by human beings is greatly facilitated. If the relative value of chess pieces of chess with various chess pieces, such as chess and Chinese chess, can be solved, the method is greatly helpful for human understanding and analysis.
At present, chess related to region occupation is mostly judged in a form by adopting artificial knowledge matching or adopting a mode of training a neural network by supervised learning. The relative value of the chessmen is determined according to the experience of the chess related to various chessmen. In addition, all chess games have first-hand advantages and need first-hand and last-hand balance through human experience.
For chess related to regional occupation, because the size of the chessboard is variable and the advantages of hands are not obvious on different chessboard sizes, at present, human beings are only limited to a few common chessboard sizes (such as 19-way, 13-way and 9-way chessboard of go chess and 8-way chessboard of black and white chess), and the advantages of hands of the common chessboard are understood more deeply. For some unusual boards, only approximate guesses can be made about the value of the prior dominance. Meanwhile, because the value of the chess pieces is difficult to judge, the win and lose results of the chess such as the international chess, the Chinese chess and the like only have 3 types of win, lose and average. If the chess piece values of the chess pieces are quantified, the winning and losing results of the chess pieces can be converted into a real numerical value, so that the advantages and disadvantages of the chess pieces in different states can be better distinguished like go, black and white chess and the like.
The AlphaZero algorithm is intended to work properly and must know an approximate prior dominance assessment. If one wants to know the true value of the first-hand dominance, one has to have a high level on a board of a certain board size. This clearly violates the idea of the algorithm "learn from scratch, without human knowledge". Therefore, an algorithm is needed that does not require prior knowledge of the value of the prior dominance and that can gradually evaluate the value of the prior dominance.
Disclosure of Invention
The invention aims to provide a dropping strategy and a layout evaluation method which are friendly to human beings, do not need to know the advantage value of a prior party, are based on a deep neural network and reinforcement learning and are suitable for various chess.
The invention provides a falling strategy and a local evaluation method suitable for various chess games, which predict falling probability and falling evaluation through a neural network, generate training data by using an MCTS algorithm and an Update Board Value algorithm, and iteratively train the neural network through a reinforcement learning method. Finally, the MCTS algorithm is used for outputting the drop strategy and the local estimation. The invention is suitable for various chess (such as go, black and white chess, Chinese chess and Chinese checkers), and comprises the following steps:
(1) utilizing a residual error neural network [1] and a pixel level segmentation method [2] [3] to realize a local evaluation function and a drop strategy function;
(2) generating training data by using an MCTS algorithm [4], an Early Stop algorithm and an Update Board Value algorithm proposed by the patent;
(3) repeating the step (1) and the step (2), and performing iterative training to obtain a trained neural network;
(4) and (4) generating a final situation evaluation function and a final fall strategy function by using the neural network trained in the step (3) and the MCTS algorithm.
Wherein:
the implementation of the situation evaluation function and the drop strategy function in the step (1) specifically comprises the following steps:
(11) inputting 8-step history situations, wherein each situation comprises CKA plurality of channels constituting an input block;
(12) the input block is sequentially processed by a residual error tower, batch normalization and a ReLU activation function; wherein the residual tower contains K residual blocks, each residual block having C channels;
(13) adopting different structures for the falling chess (such as go and black and white chess) and the moving chess (such as chess, Chinese chess and Chinese checkers) to obtain a falling strategy function;
(14) sequentially performing channel number C on the output of the step (12) by using a pixel level segmentation methodKThe 1x1 convolution and the channel Softmax are obtained to obtain the situation evaluation function.
Generating training data in the step (2), wherein the specific process is as follows:
(21) using the number of searches as S1Generating a falling probability and a situation evaluation function of each step by the MCTS algorithm; selecting the next step according to the probability, and performing falling;
(22) continuously repeating the step (21) until the end; simultaneously, an Early Stop algorithm is used, and if the detected evaluation value in 2 continuous steps is stable or the advantage of one party is detected to be overlarge in 4 continuous steps, the opposite bureau is terminated in advance;
(23) for the resulting drop probability of length T and the local evaluation function, the Update Board Value algorithm was used to synthesize training data.
Performing iterative training in the step (3), wherein the specific process is as follows:
(31) inserting the training data generated in the step (2) into an experience pool with the number of hits being R, and if the number of hits in the experience pool is larger than R, eliminating the oldest data;
(32) after G game, randomly selecting data from the experience pool, and training a neural network by using the selected data;
(33) replacing the neural network used by the MCTS with the trained neural network parameters;
(34) and repeating the steps to carry out iterative training.
Generating the final situation evaluation function and the final fall strategy function in the step (4), wherein the specific process is as follows:
(41) using the neural network generated in the step (3), and using MCTS algorithm to perform the operation times of S in each step2Searching for (2);
(42) after searching, selecting the situation evaluation value of the root node of the search tree as a final situation evaluation function, and selecting the child node with the most search times for child falling, namely obtaining a child falling strategy function.
In step (11) of the present invention, the input block is composed as follows:
(111) inputting 8-step history situation, each situation CKA channel; if the number of the falling steps is less than 8, setting the values of all channels corresponding to the part with less than 8 steps as 0; in addition to the historical situation, there are "number of players" color channels. When in input, the value of the plane of the current falling party is assigned to be 1, and the values of the planes of the other channels are assigned to be 0;
(112) wherein, CKThe number of players is multiplied by the chess variety of each player; if the specific position on each channel has the chess pieces of the corresponding type corresponding to the player, the value of the corresponding position is 1, otherwise, the value is 0.
The input block in the step (12) is sequentially processed by a residual error tower, batch normalization and a ReLU activation function, and the specific process is as follows:
(122) sequentially carrying out batch normalization, 3x3 convolution, a ReLU activation function, batch normalization, 3x3 convolution and a ReLU activation function on the input of the residual block;
(123) summing the input of the residual block and the output of step (122), the result being the output of the residual block;
(124) obtaining the output of a residual error tower through K stacked residual error blocks;
(125) and carrying out batch normalization and a ReLU activation function on the output of the residual error tower.
In the step (13), the method for obtaining the drop strategy function specifically comprises the following steps:
(131) for the chess of the dropping type, carrying out 1x1 convolution with the channel number of 2 on the output of the residual error tower in the step (12); averaging the plane of one channel, and performing Softmax on the average value and all values of the other plane; outputting the falling probability of each point and the falling probability of the abandoning right;
(132) for the mobile chess, the output of the residual error tower in the step (12) is sequentially subjected to 1x1 convolution with the channel number of 2 and the output dimension is legal action number CAFull connection layer, Softmax processing, output legalProbability of action (including disclaimer).
In step (21) of the present invention, the MCTS algorithm specifically comprises the following steps:
(211) the MCTS is divided into 4 stages, namely selection, evaluation, backup and drop;
(212) in the selection phase, child nodes are selected using the following formula until a leaf node is reached:
bound=cr×W×H+csdσ(s)
Figure BDA0002219819920000041
πSC(s)=argmaxa(QSC(s,a)+U(s,a))
where s is the current state, QSC(s, a) is the mean of the actions of action a in the current state s, P (a | s) is the falling probability of the added noise, σ(s) is the standard deviation N (s, a) of the current state s is the number of visits of action a in the current state s, W is the width of the chessboard, H is the height of the chessboard, c is the height of the chessboardr、csd、cpuctIs a constant.
(213) In the evaluation phase, if the office is not finished, the neural network input is prepared according to the step (11), and the falling probability P (a | s; theta) and the office evaluation function BV(s) are obtained after forward propagationLPos; θ); and (3) summing the situation evaluation function according to the following formula to obtain a value function:
VSC(sL;θ)=∑posBV(sL,pos;θ)
wherein, VSC(sL(ii) a Theta) is in the leaf node state sLState value of the neural network output, BV(s)LPos; theta) is in the leaf node state sLThe probability of node occupancy for the point pos output by the neural network.
(214) In the evaluation phase, if the game is over, BV(s) is obtained by the rules for the drop-type chessLPos; θ); for the mobile chess, the value of the corresponding positions of all the chess pieces of the winning side is set to be 1, and BV(s) is obtained through phase changeL,pos;θ)。
(215) In the backup stage, for the chess type of the boy type, the following formula is used for updating the values on the nodes from the leaf nodes to the root nodes:
N(s,a)=N(s,a)+1
Figure BDA0002219819920000042
Figure BDA0002219819920000043
wherein N (s, a) is the number of node accesses, QSC(s, a) is the mean value of the node actions, VSC(sL(ii) a Theta) is in the leaf node state sLState value of neural network output, BVπ(s, a, pos) is the mean probability of occupation of node actions for point pos, BV(s)LPos; theta) is in the leaf node state sLThe node occupation probability of the point pos output by the neural network;
in the backup stage, for the mobile chess, the following formula is used for updating the values on the nodes from the leaf nodes to the root nodes:
N(s,a)=N(s,a)+1
Figure BDA0002219819920000051
Figure BDA0002219819920000052
wherein N (s, a) is the number of node accesses, QSC(s, a) is the mean value of the node actions, VSC(sL(ii) a Theta) is in the leaf node state sLState value of neural network output, BVπ(s, a, pos) is the mean probability of occupation of node actions for point pos, BV(s)LPos'; theta) is in the leaf node state sLThe node occupation probability of the point pos' output by the neural network;
unlike the point-to-point update of the falling chess, the mobile chess needs to update the survival probability of the corresponding chess piece of the child node to the position of the corresponding chess piece of the parent node in an accumulated manner. China (China)In the chess example, the white side moves f2 to f3 and reaches the leaf node state sL. BV(s) is obtained after neural network evaluationLPos; theta), then BV(s)LPos; theta) the values of the corresponding positions of f2 and f3 are interchanged, and then the above-mentioned updating formula is used. That is, pos is interchanged at certain positions according to the falling position to form pos'.
(216) In the child falling stage, the child falling probability of each child node is obtained according to the access times of each child node of the root node divided by the access times of the root node; selecting one of the child nodes according to the child falling probability of the child nodes, taking the selected child node as a new root node, and adding noise to the probability of the new root node according to the following formula:
Figure BDA0002219819920000053
P(a|s)=(1-∈)P(a|s;θ)+∈η
wherein, W is the width of the chessboard, H is the height of the chessboard, epsilon is 0.25, P (a | s; theta) is the falling probability of the output of the neural network, and P (a | s) is the falling probability used in the step (212) of adding noise.
In the step (22), the Early Stop algorithm comprises the following specific processes:
(221) and performing stability evaluation on the situation evaluation function after MCTS search. For the falling chess, if the probability value of each point is either greater than 0.95 or less than-0.95, the situation evaluation function is considered to be stable; for the mobile chess, if the probability value of the son on each chessboard is either greater than 0.95 or less than-0.95, the situation evaluation function is considered to be stable; if the situation evaluation function is stable in the continuous 2 steps, the office is ended;
(222) in the first step of the game, an analog quantity S is usedvtThe MCTS algorithm evaluates a situation evaluation function of an initial situation, and obtains a prior-hand advantage value K by using the situation evaluation functionvt(ii) a Then if there is a 95% probability in each step, its absolute value | MCTS's estimated prior-hand dominance value-KvtIf | is greater than 4, the advantage of one party is considered to be too great; if the advantages of one party are detected to be too great in 4 continuous steps, the method is used for solving the problems that the advantages of the other party are too greatAnd ending the game.
In step (23), the Update Board Value algorithm specifically comprises the following steps:
(231) defining a smoothing function using the following function
Figure BDA0002219819920000061
Figure BDA0002219819920000062
(232) The training data BV is synthesized using the following formulaπ(st,pos):
δt=BVπ(st+1,pos)-BVMC(st,pos)
Figure BDA0002219819920000063
Where T is the length of the match, T indicates the time, stIs the state at time t, pi (a)t|st) Is a state stNext, MCTS selected action atFalling probability, BVMC(stPos) MCTS search occupancy probability, BV, of point posπ(st+1Pos) is the synthetic occupancy probability of the point pos that will be used to train the neural network;
(233) the method directly uses MCTS algorithm to search the falling probability pi (a) without synthesist|st) As training data for the probability of a neural network falling.
In step (32), the neural network training method specifically comprises the following steps:
(321) inputting data randomly selected from the experience pool into a neural network for forward propagation;
(322) the neural network loss L is calculated using the following formulatotal(θ):
Figure BDA0002219819920000064
Figure BDA0002219819920000065
Lpolicy(θ)=-∑aπ(a|s)lnP(a|s;θ)
Ltotal(θ)=Lpolicy(θ)+c1LBV(θ)+c2LV(θ)+c3|θ|2
Wherein, CKThe number of game players is multiplied by the variety of each game player, N is the width of the chessboard multiplied by the height of the chessboard, s is the current state, pi (a | s) is the target probability of the action a in the current state s, P (a | s; theta) is the predicted probability of the action a in the current state s output by the neural network,
Figure BDA0002219819920000071
target occupation probability, BV, for the ith channel point pos of the current state si(s, pos; theta) is the predicted occupation probability of the ith channel point pos in the current state s output by the neural network, c1、c2、c3Is a constant related to the board size;
(323) and (4) utilizing the neural network loss to perform back propagation.
Compared with the prior art, the invention has the beneficial effects that:
1. compared with the prior output method, the method outputs the situation evaluation function, can judge the situation in a detailed form and is more meaningful for human;
2. the algorithm does not have the limiting condition, the neural network can be trained without knowing the first recruitment advantage of the first-handed party, and meanwhile, the algorithm can gradually learn the value of the first recruitment advantage;
3. the neural network of the previous algorithm is related to the checkerboard width height, while the present algorithm does not have this limitation. Therefore, in the chess with the changeable width and height of the chessboard (such as go, black and white chess and international checkers), the same neural network can be used for carrying out the strategy and the local evaluation of the chess falling on the chessboard with different widths and heights.
Drawings
Fig. 1 is a general flow chart of a multi-chess drop strategy and layout evaluation method according to the present invention.
Fig. 2 is a diagram illustrating the structure of the neural network in step (1) of fig. 1.
Fig. 3 is a supplementary explanatory view of fig. 2.
Fig. 4 is an explanatory diagram of the MCTS selection, backup, and update stages of step (2) in fig. 1.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples.
Referring to fig. 1, the method for evaluating the strategy and layout of falling chess of various chess comprises the following specific steps:
(1) the situation evaluation function and the drop strategy function are implemented, as shown in fig. 2, and are shown as follows:
(11) inputting 8-step history situations, wherein each situation comprises CKA plurality of channels constituting an input block;
(12) the input block then goes through the residual tower, batch normalization, ReLU activation function in sequence, as shown in fig. 2 (a). Wherein the residual tower contains K residual blocks, each residual block having C channels;
(13) adopting the structure shown in (b) in the figure 2 for the chess of the piece falling type (such as go, black and white chess), adopting the structure shown in (c) in the figure 2 for the chess of the moving type (such as chess, Chinese chess and Chinese checkers), and outputting a piece falling strategy;
(14) using the pixel level segmentation method, the output of step (12) is sequentially subjected to the number of channels C as shown in FIG. 2 (d)K1x1 convolution, channel Softmax. And obtaining a situation evaluation function.
Taking 6 paths of go distance, fig. 3 (a) is the situation to be evaluated, fig. 3 (b) is the falling strategy, fig. 3 (c) is the situation evaluation function of the black visual angle (1.0 is the point according to which the neural network predicts that the black chess completely occupies, and-1.0 is the point according to which the neural network predicts that the white chess completely occupies), and fig. 3 (d) is the situation evaluation function of the white chess visual angle (1.0 is the point according to which the neural network predicts that the black chess completely occupies, and 1.0 is the point according to which the neural network predicts that the white chess completely occupies).
(2) Generating training data by using an MCTS algorithm, an Early Stop algorithm and an Update Board Value algorithm;
MCTS is divided into 4 stages, selection, evaluation, backup, and drop, respectively. The selection phase as shown in fig. 4 (a), child nodes are selected step by step starting from the root node until a leaf node is reached. The evaluation stage is shown in fig. 4(b), in which the triangle is the local estimation value of the neural network output, the black node is the newly generated child node, and the prior probability of each child node is evaluated by the neural network. The backup phase updates the neural network estimate (triangle) output by the neural network of fig. 4(b) along the path to the root node, as shown in fig. 4 (c). Repeating the 3 stages of fig. 4 several times, the search tree grows gradually, and the evaluation value of the root node is more and more accurate. And the last child node is selected as a new root node on the root node according to the access probability of the child nodes of the root node.
(3) And (3) repeating the steps (1) and (2) to carry out iterative training.
(4) And (4) generating a final situation evaluation function and a final landing strategy by using the neural network trained in the step (3) and the MCTS algorithm.
Reference to the literature
[1]K.He,X.Zhang,S.Ren,and J.Sun."Deep residual learning for imagerecognition."Internaltional Conference on Computer Vision and PatternRecogintion,2016
[2]T.Wu,I.Wu,G.Chen,T.Wei,H.Wu,T.Lai,and L.Lan,"Multi-Labelled ValueNetworks for Computer Go."IEEE Transactions on Games,pp.1–1,2018.
[3]Jmgilmer,"GoDCNN."https://github.com/jmgilmer/GoCNN/,2016.
[4]C.Browne,E.J.Powley,D.Whitehouse,S.M.Lucas,P.I.Cowling,P.Rohlfshagen,S.Tavener,D.Perez,S.Samothrakis,and S.Colton,"A Survey of MonteCarlo Tree Search Methods."IEEE Transactions on Computational Intelligenceand AI in Games,vol.4,no.1,pp.1–43,2012.。

Claims (8)

1. The falling strategy and layout evaluation method suitable for various chess categories is characterized by comprising the following specific steps:
(1) a local evaluation function and a drop strategy function are realized by utilizing a residual error neural network and a pixel level segmentation method;
(2) generating training data by using an MCTS algorithm, an Early Stop algorithm and an Update Board Value algorithm;
(3) repeating the step (1) and the step (2), and performing iterative training to obtain a neural network;
(4) generating a final situation evaluation function and a final fall strategy function by using the neural network trained in the step (3) and an MCTS algorithm;
wherein:
the situation evaluation function and the drop strategy function realized in the step (1) specifically comprise the following processes:
(11) inputting 8-step history situations, wherein each situation comprises CKA plurality of channels constituting an input block;
(12) the input block is sequentially processed by a residual error tower, batch normalization and a ReLU activation function; wherein the residual tower contains K residual blocks, each residual block having C channels;
(13) adopting different structures for the falling chess and the moving chess, and outputting a falling strategy function;
(14) sequentially performing channel number C on the output of the step (12) by using a pixel level segmentation methodKObtaining a situation evaluation function by the convolution of 1x1 and a channel Softmax;
generating training data in the step (2), wherein the specific process is as follows:
(21) using the number of searches as S1Generating a falling probability and a situation evaluation function of each step by the MCTS algorithm; selecting the next step according to the probability, and performing falling;
(22) continuously repeating the step (21) until the end; simultaneously, an Early Stop algorithm is used, and if the detected evaluation value in 2 continuous steps is stable or the advantage of one party is detected to be overlarge in 4 continuous steps, the opposite bureau is terminated in advance;
(23) synthesizing training data by using an Update Board Value algorithm for the generated drop probability with the length of T and a local evaluation function;
performing iterative training in the step (3), wherein the specific process is as follows:
(31) inserting the training data generated in the step (2) into an experience pool with the number of hits being R, and if the number of hits in the experience pool is larger than R, eliminating the oldest data;
(32) after G game, randomly selecting data from the experience pool, and training a neural network by using the selected data;
(33) replacing the neural network used by the MCTS with the trained neural network parameters;
(34) repeating the steps and carrying out iterative training;
generating the final situation evaluation function and the final fall strategy function in the step (4), wherein the specific process is as follows:
(41) using the neural network generated in the step (3), and using MCTS algorithm to perform the operation times of S in each step2Searching for (2);
(42) after searching, selecting the situation evaluation value of the root node of the search tree as a final situation evaluation function, and selecting the child node with the most searching times for falling.
2. The method of claim 1, wherein the input block in step (11) is implemented as follows:
(111) inputting 8-step history situation, each situation CKA channel; if the number of the falling steps is less than 8, setting the values of all channels corresponding to the part with less than 8 steps as 0; besides the historical situation, the color channels of the number of the players are also provided; when in input, the value of the plane of the current falling party is assigned to be 1, and the values of the planes of the other channels are assigned to be 0;
(112) wherein, CKThe number of players is multiplied by the chess variety of each player; if the specific position on each channel has the chess pieces of the corresponding type corresponding to the player, the value of the corresponding position is 1, otherwise, the value is 0.
3. The method according to claim 2, wherein the input block is sequentially processed by the residual tower, batch normalization, and ReLU activation function in step (12) as follows:
(121) sequentially carrying out batch normalization, 3x3 convolution, a ReLU activation function, batch normalization, 3x3 convolution and a ReLU activation function on the input of the residual block;
(122) summing the input of the residual block and the output of step (121), the result being the output of the residual block;
(123) obtaining the output of a residual error tower through K stacked residual error blocks;
(124) and carrying out batch normalization and ReLU activation function processing on the output of the residual error tower.
4. The method according to claim 3, wherein the step (13) obtains the drop strategy function by the following specific process:
(131) for the chess of the dropping type, carrying out 1x1 convolution with the channel number of 2 on the output of the residual error tower in the step (12); averaging the plane of one channel, and performing Softmax on the average value and all values of the other plane; outputting the falling probability of each point and the falling probability of the abandoning right;
(132) for the mobile chess, 1x1 convolution with the channel number of 2 and the output dimension of legal action number C are sequentially carried out on the output of the residual error tower in the step (12)AThe full connection layer of (2), Softmax; and outputting the probability of legal action.
5. The method of claim 4, wherein the MCTS algorithm of step (21) is implemented as follows:
(211) the MCTS is divided into 4 stages, namely selection, evaluation, backup and drop;
(212) in the selection phase, selecting child nodes by using the following formula until leaf nodes are reached;
bound=cr×W×H+csdσ(s)
Figure FDA0002219819910000031
πSC(s)=argmaxa(QSC(s,a)+U(s,a))
wherein s is the current state, QSC(s, a) is in the current state sMean value of action a, P (as | s) is the falling probability of the added noise, σ(s) is the standard deviation N (s, a) of the current state s is the number of accesses of action a in the current state s, W is the width of the chessboard, H is the height of the chessboard, cr、csd、cpuctIs a constant;
(213) in the evaluation phase, if the office is not finished, the neural network input is prepared according to the step (11), and the falling probability P (a | s; theta) and the office evaluation function BV(s) are obtained after forward propagationLPos; θ); and (3) summing the situation evaluation function according to the following formula to obtain a value function:
VSC(sL;θ)=∑posBV(sL,pos;θ)
wherein, VSC(sL(ii) a Theta) is in the leaf node state sLState value of the neural network output, BV(s)LPos; theta) is in the leaf node state sLThe node occupation probability of the point pos output by the neural network;
(214) in the evaluation phase, if the office is finished; for the falling chess, BV(s) is obtained by the ruleLPos; θ); for the mobile chess, the value of the corresponding positions of all the chess pieces of the winning side is set to be 1, and then the phase is changed to obtain BV(s)L,pos;θ);
(215) In the backup stage, for the chess type of the boy type, the following formula is used for updating the values on the nodes from the leaf nodes to the root nodes:
N(s,a)=N(s,a)+1
Figure FDA0002219819910000032
Figure FDA0002219819910000033
wherein N (s, a) is the number of node accesses, QSC(s, a) is the mean value of the node actions, VSC(sL(ii) a Theta) is in the leaf node state sLState value of neural network output, BVπ(s, a, pos) is the mean probability of occupation of node actions for point pos, BV(s)LPos; theta) is atLeaf node status sLThe node occupation probability of the point pos output by the neural network;
in the backup stage, for the mobile chess, the following formula is used for updating the values on the nodes from the leaf nodes to the root nodes:
N(s,a)=N(s,a)+1
Figure FDA0002219819910000041
Figure FDA0002219819910000042
wherein N (s, a) is the number of node accesses, QSC(s, a) is the mean value of the node actions, VSC(sL(ii) a Theta) is in the leaf node state sLState value of neural network output, BVπ(s, a, pos) is the mean probability of occupation of node actions for point pos, BV(s)LPos'; theta) is in the leaf node state sLThe node occupation probability of the point pos' output by the neural network;
(216) in the child falling stage, the child falling probability of each child node is obtained according to the access times of each child node of the root node divided by the access times of the root node; selecting one of the child nodes according to the child falling probability of the child node, wherein the selected child node becomes a new root node, and meanwhile, adding noise to the probability of the new root node according to the following formula:
P(a|s)=(1-∈)P(a|s;θ)+∈η
wherein, W is the width of the chessboard, H is the height of the chessboard, epsilon is 0.25, P (a | s; theta) is the falling probability of the output of the neural network, and P (a | s) is the falling probability used in the step (212) of adding noise.
6. The method of claim 5, wherein the Early Stop algorithm of step (22) is as follows:
(221) performing stability evaluation on the situation evaluation function after MCTS search; for the falling chess, if the probability value of each point is greater than 0.95 or less than-0.95, the situation evaluation function is considered to be stable; for the mobile chess, if the probability value of the son on each chessboard is greater than 0.95 or less than-0.95, the situation evaluation function is considered to be stable; if the situation evaluation function is stable in the continuous 2 steps, the office is ended;
(223) in the first step of the game, an analog quantity S is usedvtThe MCTS algorithm evaluates a situation evaluation function of an initial situation, and obtains a prior-hand advantage value K by using the situation evaluation functionvt(ii) a Then if there is a 95% probability in each step, its absolute value | MCTS's estimated prior-hand dominance value-KvtIf | is greater than 4, the advantage of one party is considered to be too great; if the advantages of one party are detected to be too large in the 4 continuous steps, the game is ended.
7. The method according to claim 5, wherein the Update Board Value algorithm in step (23) is as follows:
(231) defining a smoothing function using the following function
Figure FDA0002219819910000051
Figure FDA0002219819910000052
(232) The training data BV is synthesized using the following formulaπ(st,pos):
δt=BVπ(st+1,pos)-BVMC(st,pos)
Figure FDA0002219819910000053
Where T is the length of the match, T indicates the time, stIs the state at time t, pi (a)t|st) Is a state stDown, MCTS selected action atFalling probability, BVMC(stPos) MCTS search occupancy probability, BV, of point posπ(st+1Pos) is the synthetic occupancy probability of the point pos that will be used to train the neural network;
(233) the probability of falling is searched by MCTS without synthesist|st) As training data for the probability of a neural network falling.
8. The method according to claim 7, wherein the neural network training method in step (32) comprises the following steps:
(321) inputting data randomly selected from the experience pool into a neural network for forward propagation;
(322) the neural network loss L is calculated using the following formulatotal(θ):
Figure FDA0002219819910000054
Figure FDA0002219819910000055
Lpolicy(θ)=-∑aπ(a|s)lnP(a|s;θ)
Ltotal(θ)=Lpolicy(θ)+c1LBV(θ)+c2LV(θ)+c3|θ|2
Wherein, CKThe number of game players is multiplied by the variety of each game player, N is the width of the chessboard multiplied by the height of the chessboard, s is the current state, pi (a | s) is the target probability of the action a in the current state s, P (a | s; theta) is the predicted probability of the action a in the current state s output by the neural network,target occupation probability, BV, for the ith channel point pos of the current state si(s, pos; theta) is the predicted occupation probability of the ith channel point pos in the current state s output by the neural network, c1、c2、c3Is a constant related to the board size;
(323) and (4) utilizing the neural network loss to perform back propagation.
CN201910929174.9A 2019-09-28 2019-09-28 Drop strategy and local assessment method applicable to various chess types Active CN110717591B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910929174.9A CN110717591B (en) 2019-09-28 2019-09-28 Drop strategy and local assessment method applicable to various chess types

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910929174.9A CN110717591B (en) 2019-09-28 2019-09-28 Drop strategy and local assessment method applicable to various chess types

Publications (2)

Publication Number Publication Date
CN110717591A true CN110717591A (en) 2020-01-21
CN110717591B CN110717591B (en) 2023-05-02

Family

ID=69211053

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910929174.9A Active CN110717591B (en) 2019-09-28 2019-09-28 Drop strategy and local assessment method applicable to various chess types

Country Status (1)

Country Link
CN (1) CN110717591B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111667043A (en) * 2020-05-20 2020-09-15 季华实验室 Chess game playing method, system, terminal and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106339582A (en) * 2016-08-19 2017-01-18 北京大学深圳研究生院 Method for automatically generating chess endgame based on machine game technology
CN107050839A (en) * 2017-04-14 2017-08-18 安徽大学 Amazon chess game playing by machine system based on UCT algorithms
US20180032863A1 (en) * 2016-07-27 2018-02-01 Google Inc. Training a policy neural network and a value neural network
CN107661622A (en) * 2017-09-18 2018-02-06 北京深度奇点科技有限公司 It is a kind of to generate method of the quintet game to office data
CN109032935A (en) * 2018-07-13 2018-12-18 东北大学 The prediction technique of non-perfect information game perfection software model based on phantom go
CN110119804A (en) * 2019-05-07 2019-08-13 安徽大学 A kind of Ai Ensitan chess game playing algorithm based on intensified learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180032863A1 (en) * 2016-07-27 2018-02-01 Google Inc. Training a policy neural network and a value neural network
CN106339582A (en) * 2016-08-19 2017-01-18 北京大学深圳研究生院 Method for automatically generating chess endgame based on machine game technology
CN107050839A (en) * 2017-04-14 2017-08-18 安徽大学 Amazon chess game playing by machine system based on UCT algorithms
CN107661622A (en) * 2017-09-18 2018-02-06 北京深度奇点科技有限公司 It is a kind of to generate method of the quintet game to office data
CN109032935A (en) * 2018-07-13 2018-12-18 东北大学 The prediction technique of non-perfect information game perfection software model based on phantom go
CN110119804A (en) * 2019-05-07 2019-08-13 安徽大学 A kind of Ai Ensitan chess game playing algorithm based on intensified learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
苏攀 等: "基于不平衡学习的分类器博弈模型及其在中国象棋中的应用" *
高强;徐心和: "证据计数法在落子类机器博弈中的应用" *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111667043A (en) * 2020-05-20 2020-09-15 季华实验室 Chess game playing method, system, terminal and storage medium
CN111667043B (en) * 2020-05-20 2023-09-19 季华实验室 Chess game playing method, system, terminal and storage medium

Also Published As

Publication number Publication date
CN110717591B (en) 2023-05-02

Similar Documents

Publication Publication Date Title
Schwartz Multi-agent machine learning: A reinforcement approach
Lucas et al. The n-tuple bandit evolutionary algorithm for game agent optimisation
Gaina et al. Population seeding techniques for rolling horizon evolution in general video game playing
Perick et al. Comparison of different selection strategies in monte-carlo tree search for the game of tron
Benbassat et al. EvoMCTS: Enhancing MCTS-based players through genetic programming
Barros et al. Balanced civilization map generation based on open data
CN111729300A (en) Monte Carlo tree search and convolutional neural network based bucket owner strategy research method
Fu et al. “Reverse” nested lottery contests
CN110852436A (en) Data processing method, device and storage medium for electronic poker game
CN110717591A (en) Falling strategy and layout evaluation method suitable for various chess
Londoño et al. Graph Grammars for Super Mario Bros Levels.
Sismanis How i won the" chess ratings-elo vs the rest of the world" competition
CN110727870A (en) Novel single-tree Monte Carlo search method for sequential synchronous game
Scott et al. How does AI play football? An analysis of RL and real-world football strategies
CN112685921B (en) Mahjong intelligent decision method, system and equipment for efficient and accurate search
Xu et al. Elastic monte carlo tree search with state abstraction for strategy game playing
Takada et al. Reinforcement learning for creating evaluation function using convolutional neural network in hex
Risi et al. Automatically categorizing procedurally generated content for collecting games
Hu et al. Reinforcement learning with dual-observation for general video game playing
Kim Intelligent maze generation
Purmonen Predicting game level difficulty using deep neural networks
CN114146401A (en) Mahjong intelligent decision method, device, storage medium and equipment
Yee et al. Pattern Recognition and Monte-CarloTree Search for Go Gaming Better Automation
CN109646946B (en) Chess and card game hosting method and device
CN113377779A (en) Strategy improvement method for searching game tree on go

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant