A kind of chess and card games automation final phase of a chess game generation method based on game playing by machine technology
Technical field
The invention belongs to artificial intelligence and game playing by machine technical field, and in particular to it is a kind of based on game playing by machine technology from
Dynamicization final phase of a chess game generation method establishes the support of final phase of a chess game database for the learning training of the chess and card games of human player and game, more
Big data support can be provided for the research of chess and card games computer system;Pass through chess/card game game theory generation method, office
The design of face estimation method and interpersonal interactive system interface and interactive mode is realized.
Background technique
Artificial intelligence is an important branch of computer field, its central task is to study how to do computer
Originally the work that can only lean on the intelligence of people that could complete.A research field of the game playing by machine as artificial intelligence, is identifier
One of the means of work intellectual development level.Since over half a century, game playing by machine is always the breeding ground of Artificial Intelligence Development innovation,
Resulting is successfully even more the important milestone in Artificial Intelligence Development history.From dark blue (chess) to Cepheus (Dezhou
Playing card) it arrives again recent AlphaGo (go), game playing by machine system intelligently issues in one and another field to the highest of the mankind
Challenge.
In chess and card games, final phase of a chess game playing method is a kind of its exclusive game mode, refers to certain single order that game proceeds to
Section, certain the specific situation situation being made of portion of residual chess piece, it is most typical representative be Chinese chess, the chess final phase of a chess game and
Fixed pattern in go.Final phase of a chess game mode is important content of the mankind chess player in the study and training process of these game.This be because
For the relatively complete game of the final phase of a chess game, type, difficulty can regulate and control, and specific aim is opposite to be enhanced, and be conducive to train chess player
Key technical ability in a certain respect.
By taking chess as an example, the books " Problems, Combinations, and Games " of L á szl ó Polg á r are
One of the classical study course of chess chess player training.4462 kinds of chess " general " skills are summarized by final phase of a chess game mode in book
Ingeniously and 5330 kinds of play chess skills in the case of other.The several of the training chess player " two steps kill chess " summarized in book are listed in Fig. 1
Final phase of a chess game situation.
But the current chess and card games final phase of a chess game is mainly derived from the historical accumulation of human player for a long time, quantity
Limited, mode is single, and difficulty is difficult to measure, and the mode of playing method is caused to be very limited, and player is undertrained by the final phase of a chess game
Sufficiently, the training and amusement effect of the final phase of a chess game this mode in chess and card games are limited.Simultaneously, the machine of chess and card games
The application of the rise of device Game Study, especially deep learning is even more that the quality and quantity of the game final phase of a chess game is required significantly to mention
It rises.
Summary of the invention
For technical problem of the existing technology, the purpose of the present invention is to propose to a kind of chesses based on game playing by machine technology
The automatic final phase of a chess game generation method of board game, for the chess categories final phase of a chess game such as the current Chinese chess final phase of a chess game and Joseki both from people
The historical experience of class player accumulates, and there are the two large problems that limited amount and final phase of a chess game difficulty can not carry out accurate quantification.Pass through this
The method proposed is invented, can establish large-scale, the quantifiable chess/card game final phase of a chess game database of difficulty.This method can be chess
The final phase of a chess game training mode of the training institution of board game and personal offer more horn of plenty and science is that the game playing by machine of chess/card game is led
Domain research provides the basic methods for establishing extensive final phase of a chess game database.
Chess and card games based on game playing by machine technology of the invention automate final phase of a chess game generation method, mainly include following step
It is rapid:
1) random final phase of a chess game database is automatically generated;
2) final phase of a chess game database is traversed, final phase of a chess game game theory is generated to each final phase of a chess game;
3) the node valuation that the game theory is calculated using evaluation function, generates the game theory of valuation mark;
4) the game tree node victory or defeat value in game theory is calculated, the game theory of victory or defeat mark is generated;
5) final phase of a chess game difficulty for calculating game theory, carries out screening and the record of the final phase of a chess game.
Technology contents of the invention are further illustrated below:
1. non-complete information processing method
It is flooded with a large amount of unknown, obscuring and incredible information in chess and card games, is summarized as non-complete information.
It establishes final phase of a chess game database to need that non-complete information is analyzed and handled using the Monte Carlo methods of sampling, basic thought is:
When institute's Solve problems are the probability or some expectation of a random variable that certain chance event occurs, " taken out by certain
The method of sample ", estimates the true probability of this chance event with probability that this event occurs in sampling process, or
To certain numerical characteristics of this stochastic variable, and as the solution of problem.
The game theory method for building up of 2.MCTS algorithm
One of the difficult point for establishing the game tree problem of chess/card game is to need to establish to support to find in more information
The searching algorithm of optimizing decision, referred to as game-tree search.It is mentioned herein that " more " to be sometimes referred as several levels other.For example, Chinese
The final phase of a chess game game theory scale of Chinese chess can achieve 1012The order of magnitude.
The core concept of MCTS (Monte Carlo Tree Search, the search of Monte Carlo tree) is according to current game
Game theory is gradually established by the Monte Carlo methods of sampling and extended to the size of tree scale.The game theory that method is established according to this
Each intermediate node contain the sampling and assessing information of its all child node, it is fed back to the father node of oneself.Its is excellent
Point is limited time and system resource can be focused on those more likely to become in the branch of optimal walking, while more
Efficiently neglect the branch that will lead to poor outcome.The first step of MCTS algorithm is since the initialization of game theory, usually
It is separate nodes using the root node as entire game theory by current game state abstraction.Next search process is such as
Shown in figure, 4 steps can be divided into:
1, expanding node selects: the selection of expanding node is a recursive procedure, is terminated since root node to leaf node.
Node selection function can select layer by layer expanding node according to tactful corresponding specific implementation.The realization of node selection strategy is later
In have detailed introduction.
2, expansion process: one or more child node can be expanded below the leaf node that step 1 finally selects to be come.
At this point, original leaf node can become their father node, and their own becomes new leaf node.
3, sampling and assessing: by the methods of sampling, to all by selection node and newly-generated leaf node in step 1
Carry out valuation calculating.
4, valuation backtracking (backpropagation): since leaf node, new valuation result is recalled layer by layer to be saved to respective father
Point, and it is ultimately transferred to root node.
In the search problem of extensive game theory, MCTS algorithm is than traditional searching algorithm (herein with Mini-Max
Searching algorithm is as comparing reference) there is more outstanding performance.The present invention will use the algorithm as the search of decision system
Core algorithm.
3. the final phase of a chess game estimation method of chess and card games
During the game-tree search that the final phase of a chess game of the present invention generates, estimation method is responsible for each sub-stage being in progress to the final phase of a chess game
It is assessed.If searching algorithm be game-tree search skeleton if, evaluation function is exactly the brain of game theory.Valuation letter
Number is responsible for judge whether each situation advantageous to oneself, which is that future is advantageous, which be it is unfavorable etc., directly determine
The chess power height of intelligent body.The present invention is based on the evaluation function design methods of different chess and card games, by intensified learning side
Method trains chess piece static state valuation matrix, layout valuation matrix and position influence matrix, and then a certain node of game theory is calculated
Valuation.
By taking kriegspiel as an example (Chinese chess, chess etc. are similar), the placement strategy of game person is referred to how 12
Totally 25 chess pieces are deployed to 25 positions of one's own side on chessboard up to kind.
In four countries' kriegspiel, placement strategy can be understood as the permutation and combination of chess piece and position.Formula 1 gives chess
The static estimation method of son.F is the matrix of a 12*12, is obtained by two matrix multiples.A kind of chess of first matrix representative
Son attacks static income when another chess piece.For example, f1,2It represents when chess piece Class1 (commandant) has attacked the chess of other side
When subtype 2 (army commander), the player for holding commandant will obtain income f from this step1,2.Second matrix passes through statistical
Have recorded the probability that chess piece encounters mutually in attack.For example, p1,2Represent the probability that army commander encounters commandant.Matrix F as a result,
Diagonal line on 12 value, the static valuation of 12 all type chess pieces is represented, herein by the Fs in formula 2 come table
Show.
Fs=[F1,1,F2,2,...,F12,12] (2)
Static valuation matrix F s based on chess piece, formula 3 give the meter of the static valuation Bs of the placement strategy of game person
Calculation method.
In equation 3, BiIt is { 0, a 1 } matrix, for recording the layout of game person.When game person arranges chess piece i
When the j of position, bij=1, while the other values of the i-th row are 0.In this way, Bs is calculated as the matrix of a 1*25,
It has recorded a kind of static valuation of placement strategy.
Next, using position influence matrix I hereinADBs addition is carried out to static matrix.Position influence matrix IADSuch as public affairs
Shown in formula 4, all positions on chessboard are had recorded to the impact factor of two aspects of attack and defense, wherein A1~A25Indicate into
Attack impact factor, D1~D25Indicate defence impact factor.
As shown in formula 4, position influence matrix separately counts the attack impact factor of different location and defence impact factor
It calculates.Usually, the characteristic of opponent is being contacted due to being easier close to front-seat position, attack impact factor is larger.Heel row
Chess piece, it is larger in the impact factor of defensive side due to more adjacent with one's own side's chess piece.Meanwhile " handing on chessboard
The position of logical hinge ", can all generate the large effect factor both ways.
Based on above procedure, a binary group as shown in formula 5 can be used for the final valuation result of placement strategy
BADTo indicate.In formula, the first item of binary group represents the attack valuation of the placement strategy, and Section 2 represents the anti-of it
Keep valuation.They are all to have matrix BSAnd IADRespective items sum to obtain.
Compared with prior art, technical effect of the invention:
Node estimation method after the expansion proposed by the present invention based on game theory is established headed by the method for the chess/card game final phase of a chess game
Wound.The present invention can be adapted to mainstream pop chess/card game (actual measurement type includes fighting landlord, Chinese chess, chess, military chess),
Final phase of a chess game database is generated on cluster server.The average time for generating the final phase of a chess game is 3.7 innings/minute.Through artificial detection, the final phase of a chess game is generated
Accuracy rate be 98% or more, final phase of a chess game difficulty valuation accuracy rate be 91.5%.
Detailed description of the invention
Fig. 1 overview flow chart.
The final phase of a chess game example of Fig. 2 chess.
The game-tree search process schematic that Fig. 3 final phase of a chess game generates.
Fig. 4 valuation marks game theory schematic diagram.
Fig. 5 victory or defeat marks game theory schematic diagram.
Fig. 6 game theory difficulty calculates schematic diagram.
Specific embodiment
Below by embodiment and attached drawing, the invention will be described in further detail.
With reference to Fig. 1, the specific design of flow chart of the present invention is as follows:
1, random final phase of a chess game database is automatically generated:
The basic parameter of setting target final phase of a chess game database, including essential information: adaptation are needed in application scenarios of the present invention first
Type of play, final phase of a chess game chess piece (hands) quantity;Final phase of a chess game difficulty control information: search depth, solution quantity, maximum tolerance amplitude.
Wherein:
Search depth refers to that final phase of a chess game game theory, to the search depth of last solution leaf node, it is broken to influence human player from root node
Solve the thinking depth of the final phase of a chess game.
Solution quantity refers to the number of nodes that player wins in the first-level nodes of final phase of a chess game game root vertex extension.Work as skill
When amount is 0, illustrates that player walks anyway, can not all win, then current situation is not the final phase of a chess game.When solving quantity is 1, it is meant that
Have and only a kind of way to get there can crack the current final phase of a chess game, current situation is the final phase of a chess game and difficulty is higher.As solution quantity increases, the final phase of a chess game
Difficulty is gradually reduced.
During optimal walking path refers to game-tree search, the node sequence that is formed from root node to optimal result leaf node
Column, maximum tolerance amplitude refer in optimal walking path that the maximum valuation amplitude of the evaluation function of adjacent node, calculation formula is such as
Shown in lower, wherein VtIndicate some node in optimal walking path, when t=0 indicates root node, Vt+1Indicate VtOn path
Child node:
Maximum tolerance amplitude representative human player cracks the thinking difficulty of the final phase of a chess game.Maximum tolerance amplitude is bigger, Ren Leiwan
Family takes the strategy just smaller as the probability of optimal policy, and it is bigger to crack difficulty.For example, during bishop ending cracks, very
Correct solution all includes that surface seems that loss is very big and send substrategy when more;During the fighting landlord final phase of a chess game cracks, then need hand
Board is broken.These are all that the final phase of a chess game in the larger situation of maximum tolerance amplitude cracks example.It is maximum in actual test of the invention
When tolerance amplitude increases to 50% or more, the final phase of a chess game of generation has had very high difficulty.
By the setting of the above parameter, computer starts to generate the random final phase of a chess game, and the random final phase of a chess game of generation will enter in next step
Analysis.Fig. 2 is the final phase of a chess game example of chess.
2, final phase of a chess game game theory is generated based on MCTS method:
Since root node, the final phase of a chess game game theory of the final phase of a chess game is generated, final phase of a chess game game theory is improved using MCTS method and generates effect
Rate.Shown in such as the step of Fig. 3, comprising:
1) expanding node selects: the selection of expanding node is a recursive procedure, is terminated since root node to leaf node.
Node selection function can select layer by layer expanding node according to tactful corresponding specific implementation.The realization of node selection strategy is later
In have detailed introduction.
2) expansion process: one or more child node can be expanded below the leaf node that step 1 finally selects to be come.
At this point, original leaf node can become their father node, and their own becomes new leaf node.
3) sampling and assessing: by the methods of sampling, to all by selection node and newly-generated leaf node in step 1
Carry out valuation calculating.
4) valuation backtracking (backpropagation): since leaf node, new valuation result is recalled layer by layer to be saved to respective father
Point, and it is ultimately transferred to root node.
In the search problem of extensive game theory, MCTS algorithm is than traditional searching algorithm (herein with Mini-Max
Searching algorithm is as comparing reference) there is more outstanding performance.The present invention uses search core of the algorithm as decision system
Center algorithm.
3, evaluation function calculate node valuation generates the game theory of valuation mark:
The leaf node valuation of current final phase of a chess game game theory is calculated by the evaluation function of particular game, and recalls calculating game theory
In each node layer valuation, generate valuation mark game theory.It is that a simplification branches into 3 as shown in Figure 4, the valuation that depth is 2
Game theory is marked, in practical applications, game theory number of nodes scale can exceed that 1010The order of magnitude.
4, game theory interior joint victory or defeat value is calculated, the game theory of victory or defeat mark is generated:
The step calculates the victory or defeat situation of leaf node, then based on game theory according to the game rule of specific game
Principle is unfolded in minimax, and backtracking calculates the victory or defeat value of all nodes, ultimately generates victory or defeat mark game theory.
Specific: by leaf node, the calculation method based on maximin obtains the victory or defeat valuation of game theory interior joint.
Node is labeled and generates victory or defeat mark game theory, can be indicated by 0,1.It is that a simplification branches into 3 as shown in Figure 5,
The victory or defeat that depth is 2 marks game theory, and in practical applications, game theory number of nodes scale can exceed that 1010The order of magnitude.
5, game theory final phase of a chess game difficulty is calculated, screening and the record of the final phase of a chess game are carried out
Fig. 6 is that game theory difficulty calculates schematic diagram.Game theory final phase of a chess game difficulty proposed by the present invention will pass through following side
It is measured in face.
A, game theory is marked based on victory or defeat, if root node solution quantity is 1, final phase of a chess game difficulty is set as a reference value.Otherwise
Then judging current game theory not is a final phase of a chess game.The solution quantity for calculating game root vertex excludes the game that solution quantity is not 1
Tree, to reduce the screening range of effective final phase of a chess game.
B, game theory is marked based on victory or defeat, counts the subtree number that all root node solution quantity are 1 in current game theory, note
For T, game theory final phase of a chess game difficulty is proportional therewith.It calculates in current game theory, it is all to solve node that quantity is 1 as root node
Subtree, with such subtree quantity in critical path be measure final phase of a chess game difficulty one of vector
C, based on valuation mark game theory, count in current game theory it is optimal walk step sequence maximum tolerance amplitude.It is rich
It is proportional therewith to play chess tree final phase of a chess game difficulty.The valuation for calculating the adjacent node in critical path, obtains the maximum tolerance in the path
Amplitude, and using as measure final phase of a chess game difficulty one of vector
D, final game theory difficulty is calculated, is sorted out according to the threshold value being set in advance and carries out database purchase.If difficult
The threshold value that degree meets setting then saves such as final phase of a chess game database, otherwise abandons the current final phase of a chess game.
Final phase of a chess game difficulty calculation formula are as follows:
Wherein, α and β represent formula to root node solution quantity as 1 two vectors of subtree number T and maximum tolerance amplitude M
Specific gravity adjusting parameter.In the current game theory of k expression, the subtree number that root node solution quantity is 1, MiRepresent the maximum of subtree i
Tolerate amplitude.α is bigger, and system, which is more intended to extract, has the final phase of a chess game that is unique or seldom solving;β is bigger, and system is more intended to extract
Situation fluctuates the biggish final phase of a chess game during playing chess.
The above embodiments are merely illustrative of the technical solutions of the present invention rather than is limited, the ordinary skill of this field
Personnel can be with modification or equivalent replacement of the technical solution of the present invention are made, without departing from the spirit and scope of the present invention, this
The protection scope of invention should be subject to described in claims.