CN106339582B

CN106339582B - A kind of chess and card games automation final phase of a chess game generation method based on game playing by machine technology

Info

Publication number: CN106339582B
Application number: CN201610697369.1A
Authority: CN
Inventors: 张加佳; 刘宏
Original assignee: Shenzhen Yun'an Sheng Technology Co Ltd
Current assignee: Xiamen Kexin Network Technology Co.,Ltd.
Priority date: 2016-08-19
Filing date: 2016-08-19
Publication date: 2019-02-01
Anticipated expiration: 2036-08-19
Also published as: CN106339582A

Abstract

The invention discloses a kind of, and the chess and card games based on game playing by machine technology automate final phase of a chess game generation method.The method include the steps that automatically generating random final phase of a chess game library；The game theory of the following situation is constructed to each final phase of a chess game in the final phase of a chess game library of generation, the legal policy sequence based on both sides generates all possible following situation；To all leaf node deployment analysis that game theory generates, valuation game theory and victory or defeat valuation game theory are generated；By calculating root node solution quantity and subtree solution quantity, critical path maximum tolerance amplitude is calculated, final final phase of a chess game difficulty assessment is obtained, is screened and recorded according to given threshold.The present invention adapts to the chess and card games feature in mainstream, establishes the final phase of a chess game database of controllable difficulty.95% or more is reached for the accuracy rate in the final phase of a chess game library finally established.

Description

A kind of chess and card games automation final phase of a chess game generation method based on game playing by machine technology

Technical field

The invention belongs to artificial intelligence and game playing by machine technical field, and in particular to it is a kind of based on game playing by machine technology from Dynamicization final phase of a chess game generation method establishes the support of final phase of a chess game database for the learning training of the chess and card games of human player and game, more Big data support can be provided for the research of chess and card games computer system；Pass through chess/card game game theory generation method, office The design of face estimation method and interpersonal interactive system interface and interactive mode is realized.

Background technique

Artificial intelligence is an important branch of computer field, its central task is to study how to do computer Originally the work that can only lean on the intelligence of people that could complete.A research field of the game playing by machine as artificial intelligence, is identifier One of the means of work intellectual development level.Since over half a century, game playing by machine is always the breeding ground of Artificial Intelligence Development innovation, Resulting is successfully even more the important milestone in Artificial Intelligence Development history.From dark blue (chess) to Cepheus (Dezhou Playing card) it arrives again recent AlphaGo (go), game playing by machine system intelligently issues in one and another field to the highest of the mankind Challenge.

In chess and card games, final phase of a chess game playing method is a kind of its exclusive game mode, refers to certain single order that game proceeds to Section, certain the specific situation situation being made of portion of residual chess piece, it is most typical representative be Chinese chess, the chess final phase of a chess game and Fixed pattern in go.Final phase of a chess game mode is important content of the mankind chess player in the study and training process of these game.This be because For the relatively complete game of the final phase of a chess game, type, difficulty can regulate and control, and specific aim is opposite to be enhanced, and be conducive to train chess player Key technical ability in a certain respect.

By taking chess as an example, the books " Problems, Combinations, and Games " of L á szl ó Polg á r are One of the classical study course of chess chess player training.4462 kinds of chess " general " skills are summarized by final phase of a chess game mode in book Ingeniously and 5330 kinds of play chess skills in the case of other.The several of the training chess player " two steps kill chess " summarized in book are listed in Fig. 1 Final phase of a chess game situation.

But the current chess and card games final phase of a chess game is mainly derived from the historical accumulation of human player for a long time, quantity Limited, mode is single, and difficulty is difficult to measure, and the mode of playing method is caused to be very limited, and player is undertrained by the final phase of a chess game Sufficiently, the training and amusement effect of the final phase of a chess game this mode in chess and card games are limited.Simultaneously, the machine of chess and card games The application of the rise of device Game Study, especially deep learning is even more that the quality and quantity of the game final phase of a chess game is required significantly to mention It rises.

Summary of the invention

For technical problem of the existing technology, the purpose of the present invention is to propose to a kind of chesses based on game playing by machine technology The automatic final phase of a chess game generation method of board game, for the chess categories final phase of a chess game such as the current Chinese chess final phase of a chess game and Joseki both from people The historical experience of class player accumulates, and there are the two large problems that limited amount and final phase of a chess game difficulty can not carry out accurate quantification.Pass through this The method proposed is invented, can establish large-scale, the quantifiable chess/card game final phase of a chess game database of difficulty.This method can be chess The final phase of a chess game training mode of the training institution of board game and personal offer more horn of plenty and science is that the game playing by machine of chess/card game is led Domain research provides the basic methods for establishing extensive final phase of a chess game database.

Chess and card games based on game playing by machine technology of the invention automate final phase of a chess game generation method, mainly include following step It is rapid:

1) random final phase of a chess game database is automatically generated；

2) final phase of a chess game database is traversed, final phase of a chess game game theory is generated to each final phase of a chess game；

3) the node valuation that the game theory is calculated using evaluation function, generates the game theory of valuation mark；

4) the game tree node victory or defeat value in game theory is calculated, the game theory of victory or defeat mark is generated；

5) final phase of a chess game difficulty for calculating game theory, carries out screening and the record of the final phase of a chess game.

Technology contents of the invention are further illustrated below:

1. non-complete information processing method

It is flooded with a large amount of unknown, obscuring and incredible information in chess and card games, is summarized as non-complete information. It establishes final phase of a chess game database to need that non-complete information is analyzed and handled using the Monte Carlo methods of sampling, basic thought is: When institute's Solve problems are the probability or some expectation of a random variable that certain chance event occurs, " taken out by certain The method of sample ", estimates the true probability of this chance event with probability that this event occurs in sampling process, or To certain numerical characteristics of this stochastic variable, and as the solution of problem.

The game theory method for building up of 2.MCTS algorithm

One of the difficult point for establishing the game tree problem of chess/card game is to need to establish to support to find in more information The searching algorithm of optimizing decision, referred to as game-tree search.It is mentioned herein that " more " to be sometimes referred as several levels other.For example, Chinese The final phase of a chess game game theory scale of Chinese chess can achieve 10¹²The order of magnitude.

The core concept of MCTS (Monte Carlo Tree Search, the search of Monte Carlo tree) is according to current game Game theory is gradually established by the Monte Carlo methods of sampling and extended to the size of tree scale.The game theory that method is established according to this Each intermediate node contain the sampling and assessing information of its all child node, it is fed back to the father node of oneself.Its is excellent Point is limited time and system resource can be focused on those more likely to become in the branch of optimal walking, while more Efficiently neglect the branch that will lead to poor outcome.The first step of MCTS algorithm is since the initialization of game theory, usually It is separate nodes using the root node as entire game theory by current game state abstraction.Next search process is such as Shown in figure, 4 steps can be divided into:

1, expanding node selects: the selection of expanding node is a recursive procedure, is terminated since root node to leaf node. Node selection function can select layer by layer expanding node according to tactful corresponding specific implementation.The realization of node selection strategy is later In have detailed introduction.

2, expansion process: one or more child node can be expanded below the leaf node that step 1 finally selects to be come. At this point, original leaf node can become their father node, and their own becomes new leaf node.

3, sampling and assessing: by the methods of sampling, to all by selection node and newly-generated leaf node in step 1 Carry out valuation calculating.

4, valuation backtracking (backpropagation): since leaf node, new valuation result is recalled layer by layer to be saved to respective father Point, and it is ultimately transferred to root node.

In the search problem of extensive game theory, MCTS algorithm is than traditional searching algorithm (herein with Mini-Max Searching algorithm is as comparing reference) there is more outstanding performance.The present invention will use the algorithm as the search of decision system Core algorithm.

3. the final phase of a chess game estimation method of chess and card games

During the game-tree search that the final phase of a chess game of the present invention generates, estimation method is responsible for each sub-stage being in progress to the final phase of a chess game It is assessed.If searching algorithm be game-tree search skeleton if, evaluation function is exactly the brain of game theory.Valuation letter Number is responsible for judge whether each situation advantageous to oneself, which is that future is advantageous, which be it is unfavorable etc., directly determine The chess power height of intelligent body.The present invention is based on the evaluation function design methods of different chess and card games, by intensified learning side Method trains chess piece static state valuation matrix, layout valuation matrix and position influence matrix, and then a certain node of game theory is calculated Valuation.

By taking kriegspiel as an example (Chinese chess, chess etc. are similar), the placement strategy of game person is referred to how 12 Totally 25 chess pieces are deployed to 25 positions of one's own side on chessboard up to kind.

In four countries' kriegspiel, placement strategy can be understood as the permutation and combination of chess piece and position.Formula 1 gives chess The static estimation method of son.F is the matrix of a 12*12, is obtained by two matrix multiples.A kind of chess of first matrix representative Son attacks static income when another chess piece.For example, f_1,2It represents when chess piece Class1 (commandant) has attacked the chess of other side When subtype 2 (army commander), the player for holding commandant will obtain income f from this step_1,2.Second matrix passes through statistical Have recorded the probability that chess piece encounters mutually in attack.For example, p_1,2Represent the probability that army commander encounters commandant.Matrix F as a result, Diagonal line on 12 value, the static valuation of 12 all type chess pieces is represented, herein by the Fs in formula 2 come table Show.

Fs=[F_1,1,F_2,2,...,F_12,12] (2)

Static valuation matrix F s based on chess piece, formula 3 give the meter of the static valuation Bs of the placement strategy of game person Calculation method.

In equation 3, B_iIt is { 0, a 1 } matrix, for recording the layout of game person.When game person arranges chess piece i When the j of position, b_ij=1, while the other values of the i-th row are 0.In this way, Bs is calculated as the matrix of a 1*25, It has recorded a kind of static valuation of placement strategy.

Next, using position influence matrix I herein_ADBs addition is carried out to static matrix.Position influence matrix I_ADSuch as public affairs Shown in formula 4, all positions on chessboard are had recorded to the impact factor of two aspects of attack and defense, wherein A₁~A₂₅Indicate into Attack impact factor, D₁~D₂₅Indicate defence impact factor.

As shown in formula 4, position influence matrix separately counts the attack impact factor of different location and defence impact factor It calculates.Usually, the characteristic of opponent is being contacted due to being easier close to front-seat position, attack impact factor is larger.Heel row Chess piece, it is larger in the impact factor of defensive side due to more adjacent with one's own side's chess piece.Meanwhile " handing on chessboard The position of logical hinge ", can all generate the large effect factor both ways.

Based on above procedure, a binary group as shown in formula 5 can be used for the final valuation result of placement strategy B_ADTo indicate.In formula, the first item of binary group represents the attack valuation of the placement strategy, and Section 2 represents the anti-of it Keep valuation.They are all to have matrix B_SAnd I_ADRespective items sum to obtain.

Compared with prior art, technical effect of the invention:

Node estimation method after the expansion proposed by the present invention based on game theory is established headed by the method for the chess/card game final phase of a chess game Wound.The present invention can be adapted to mainstream pop chess/card game (actual measurement type includes fighting landlord, Chinese chess, chess, military chess), Final phase of a chess game database is generated on cluster server.The average time for generating the final phase of a chess game is 3.7 innings/minute.Through artificial detection, the final phase of a chess game is generated Accuracy rate be 98% or more, final phase of a chess game difficulty valuation accuracy rate be 91.5%.

Detailed description of the invention

Fig. 1 overview flow chart.

The final phase of a chess game example of Fig. 2 chess.

The game-tree search process schematic that Fig. 3 final phase of a chess game generates.

Fig. 4 valuation marks game theory schematic diagram.

Fig. 5 victory or defeat marks game theory schematic diagram.

Fig. 6 game theory difficulty calculates schematic diagram.

Specific embodiment

Below by embodiment and attached drawing, the invention will be described in further detail.

With reference to Fig. 1, the specific design of flow chart of the present invention is as follows:

1, random final phase of a chess game database is automatically generated:

The basic parameter of setting target final phase of a chess game database, including essential information: adaptation are needed in application scenarios of the present invention first Type of play, final phase of a chess game chess piece (hands) quantity；Final phase of a chess game difficulty control information: search depth, solution quantity, maximum tolerance amplitude. Wherein:

Search depth refers to that final phase of a chess game game theory, to the search depth of last solution leaf node, it is broken to influence human player from root node Solve the thinking depth of the final phase of a chess game.

Solution quantity refers to the number of nodes that player wins in the first-level nodes of final phase of a chess game game root vertex extension.Work as skill When amount is 0, illustrates that player walks anyway, can not all win, then current situation is not the final phase of a chess game.When solving quantity is 1, it is meant that Have and only a kind of way to get there can crack the current final phase of a chess game, current situation is the final phase of a chess game and difficulty is higher.As solution quantity increases, the final phase of a chess game Difficulty is gradually reduced.

During optimal walking path refers to game-tree search, the node sequence that is formed from root node to optimal result leaf node Column, maximum tolerance amplitude refer in optimal walking path that the maximum valuation amplitude of the evaluation function of adjacent node, calculation formula is such as Shown in lower, wherein V_tIndicate some node in optimal walking path, when t=0 indicates root node, V_t+1Indicate V_tOn path Child node:

Maximum tolerance amplitude representative human player cracks the thinking difficulty of the final phase of a chess game.Maximum tolerance amplitude is bigger, Ren Leiwan Family takes the strategy just smaller as the probability of optimal policy, and it is bigger to crack difficulty.For example, during bishop ending cracks, very Correct solution all includes that surface seems that loss is very big and send substrategy when more；During the fighting landlord final phase of a chess game cracks, then need hand Board is broken.These are all that the final phase of a chess game in the larger situation of maximum tolerance amplitude cracks example.It is maximum in actual test of the invention When tolerance amplitude increases to 50% or more, the final phase of a chess game of generation has had very high difficulty.

By the setting of the above parameter, computer starts to generate the random final phase of a chess game, and the random final phase of a chess game of generation will enter in next step Analysis.Fig. 2 is the final phase of a chess game example of chess.

2, final phase of a chess game game theory is generated based on MCTS method:

Since root node, the final phase of a chess game game theory of the final phase of a chess game is generated, final phase of a chess game game theory is improved using MCTS method and generates effect Rate.Shown in such as the step of Fig. 3, comprising:

1) expanding node selects: the selection of expanding node is a recursive procedure, is terminated since root node to leaf node. Node selection function can select layer by layer expanding node according to tactful corresponding specific implementation.The realization of node selection strategy is later In have detailed introduction.

2) expansion process: one or more child node can be expanded below the leaf node that step 1 finally selects to be come. At this point, original leaf node can become their father node, and their own becomes new leaf node.

3) sampling and assessing: by the methods of sampling, to all by selection node and newly-generated leaf node in step 1 Carry out valuation calculating.

4) valuation backtracking (backpropagation): since leaf node, new valuation result is recalled layer by layer to be saved to respective father Point, and it is ultimately transferred to root node.

In the search problem of extensive game theory, MCTS algorithm is than traditional searching algorithm (herein with Mini-Max Searching algorithm is as comparing reference) there is more outstanding performance.The present invention uses search core of the algorithm as decision system Center algorithm.

3, evaluation function calculate node valuation generates the game theory of valuation mark:

The leaf node valuation of current final phase of a chess game game theory is calculated by the evaluation function of particular game, and recalls calculating game theory In each node layer valuation, generate valuation mark game theory.It is that a simplification branches into 3 as shown in Figure 4, the valuation that depth is 2 Game theory is marked, in practical applications, game theory number of nodes scale can exceed that 10¹⁰The order of magnitude.

4, game theory interior joint victory or defeat value is calculated, the game theory of victory or defeat mark is generated:

The step calculates the victory or defeat situation of leaf node, then based on game theory according to the game rule of specific game Principle is unfolded in minimax, and backtracking calculates the victory or defeat value of all nodes, ultimately generates victory or defeat mark game theory.

Specific: by leaf node, the calculation method based on maximin obtains the victory or defeat valuation of game theory interior joint. Node is labeled and generates victory or defeat mark game theory, can be indicated by 0,1.It is that a simplification branches into 3 as shown in Figure 5, The victory or defeat that depth is 2 marks game theory, and in practical applications, game theory number of nodes scale can exceed that 10¹⁰The order of magnitude.

5, game theory final phase of a chess game difficulty is calculated, screening and the record of the final phase of a chess game are carried out

Fig. 6 is that game theory difficulty calculates schematic diagram.Game theory final phase of a chess game difficulty proposed by the present invention will pass through following side It is measured in face.

A, game theory is marked based on victory or defeat, if root node solution quantity is 1, final phase of a chess game difficulty is set as a reference value.Otherwise Then judging current game theory not is a final phase of a chess game.The solution quantity for calculating game root vertex excludes the game that solution quantity is not 1 Tree, to reduce the screening range of effective final phase of a chess game.

B, game theory is marked based on victory or defeat, counts the subtree number that all root node solution quantity are 1 in current game theory, note For T, game theory final phase of a chess game difficulty is proportional therewith.It calculates in current game theory, it is all to solve node that quantity is 1 as root node Subtree, with such subtree quantity in critical path be measure final phase of a chess game difficulty one of vector

C, based on valuation mark game theory, count in current game theory it is optimal walk step sequence maximum tolerance amplitude.It is rich It is proportional therewith to play chess tree final phase of a chess game difficulty.The valuation for calculating the adjacent node in critical path, obtains the maximum tolerance in the path Amplitude, and using as measure final phase of a chess game difficulty one of vector

D, final game theory difficulty is calculated, is sorted out according to the threshold value being set in advance and carries out database purchase.If difficult The threshold value that degree meets setting then saves such as final phase of a chess game database, otherwise abandons the current final phase of a chess game.

Final phase of a chess game difficulty calculation formula are as follows:

Wherein, α and β represent formula to root node solution quantity as 1 two vectors of subtree number T and maximum tolerance amplitude M Specific gravity adjusting parameter.In the current game theory of k expression, the subtree number that root node solution quantity is 1, M_iRepresent the maximum of subtree i Tolerate amplitude.α is bigger, and system, which is more intended to extract, has the final phase of a chess game that is unique or seldom solving；β is bigger, and system is more intended to extract Situation fluctuates the biggish final phase of a chess game during playing chess.

The above embodiments are merely illustrative of the technical solutions of the present invention rather than is limited, the ordinary skill of this field Personnel can be with modification or equivalent replacement of the technical solution of the present invention are made, without departing from the spirit and scope of the present invention, this The protection scope of invention should be subject to described in claims.

Claims

1. a kind of chess and card games based on game playing by machine technology automate final phase of a chess game generation method, step includes:

1) random final phase of a chess game database is automatically generated；First set target final phase of a chess game database basic parameter, then start generate with The machine final phase of a chess game；The basic parameter includes essential information and final phase of a chess game difficulty control information；The essential information includes: the game of adaptation Type, final phase of a chess game chess piece quantity；The final phase of a chess game difficulty control information includes: search depth, solution quantity, maximum tolerance amplitude；It is described Maximum tolerance amplitude refers in optimal walking path, the maximum valuation amplitude of the evaluation function of adjacent node；The solution quantity is Refer to the number of nodes that player wins in the first-level nodes of final phase of a chess game game root vertex extension, when solving quantity is 0, illustrates player It walks, can not all win anyway, then current situation is not the final phase of a chess game, when solving quantity is 1, it is meant that have and only a kind of way to get there The current final phase of a chess game can be cracked, current situation is the final phase of a chess game and difficulty is higher, and as solution quantity increases, final phase of a chess game difficulty is gradually reduced；

2. the method as described in claim 1, which is characterized in that step 2) is based on MCTS method, using selection, extension, sampling It calculates and four steps is recalled in valuation, promote game theory formation efficiency, generate the game theory of each final phase of a chess game.

3. method according to claim 2, which is characterized in that step 3) is set based on the evaluation function of different chess and card games Meter method passes through intensified learning method training chess piece static state valuation matrix, layout valuation matrix and multiple position influence matrix meters Calculation obtains the valuation of a certain node of game theory.

4. method as claimed in claim 3, which is characterized in that step 4) is calculated according to the game rule of specific game Principle is unfolded in the victory or defeat situation of leaf node, then the minimax based on game theory, and backtracking calculates the victory or defeat value of all nodes, finally It generates victory or defeat and marks game theory, i.e., by leaf node, the calculation method based on maximin obtains the victory or defeat of game theory interior joint Valuation is labeled node and generates victory or defeat mark game theory.

5. method as claimed in claim 4, which is characterized in that step 5) calculates the solution quantity of game root vertex, excludes solution Quantity is not 1 game theory, to reduce the screening range of effective final phase of a chess game.

6. method as claimed in claim 4, which is characterized in that step 5) calculates in current game theory, all to solve quantity as 1 Node be root node subtree, with such subtree quantity on optimal walking path be measurement final phase of a chess game difficulty one of vector.

7. method as claimed in claim 4, which is characterized in that step 5) calculates estimating for the adjacent node on optimal walking path Value, obtains the maximum tolerance amplitude in the path, and using one of the vector as measurement final phase of a chess game difficulty；The maximum tolerance amplitude Refer in optimal walking path, the maximum valuation amplitude of the evaluation function of adjacent node, its calculation formula is:

Wherein, V_tIndicate the calculated result that some node in optimal walking path is obtained by evaluation function, V_t+1Indicate V_t? The calculated result that child node on path is obtained by evaluation function.

8. method as claimed in claim 4, which is characterized in that the method for step 5) calculating final phase of a chess game difficulty are as follows:

Wherein, the ratio that α and β represents to root node solution quantity as 1 two vectors of subtree number T and maximum tolerance amplitude M is resetted Whole parameter, k indicate in current game theory, the subtree number that root node solution quantity is 1, M_iRepresent the maximum tolerance amplitude of subtree i； α is bigger, and being more intended to extract has the final phase of a chess game that is unique or seldom solving；β is bigger, is more intended to extract situation wave during playing chess Move the biggish final phase of a chess game.