CN113975783A

CN113975783A - Method and device for learning composite eating experience, electronic equipment and storage medium

Info

Publication number: CN113975783A
Application number: CN202111268992.2A
Authority: CN
Inventors: 王玉龙; 高圣州; 李蒙; 孙艳庆; 林秀桃; 段亦涛; 陈虎; 倪洪生
Original assignee: Netease Youdao Information Technology Jiangsu Co ltd
Current assignee: Netease Youdao Information Technology Jiangsu Co ltd
Priority date: 2021-10-28
Filing date: 2021-10-28
Publication date: 2022-01-28

Abstract

The present disclosure provides a method and an apparatus for learning a composite eating plate, an electronic device and a storage medium, wherein the method comprises: obtaining game data of a user after the user plays the chess through human-computer interaction, and determining a plurality of user drop points of the user in the chess according to the game data; for each user falling point, determining at least one recommended falling point corresponding to the user falling point; determining a recommended value of the user drop point according to the user drop point and the recommended drop point corresponding to the user drop point; determining the user drop point meeting a preset condition in all the recommended values as a dead hand point; for each bad hand point, determining a target drop point corresponding to the bad hand point; and generating the copy demonstration data according to the target falling point, and displaying the copy demonstration data to a user. The method and the device have the advantages that the dependence on teacher guidance in the process of multi-disk learning is eliminated, and the activity of user autonomous learning is improved.

Description

Method and device for learning composite eating experience, electronic equipment and storage medium

Technical Field

The present application relates to the field of copy learning technologies, and in particular, to a method and an apparatus for learning a cookie and a copy, an electronic device, and a storage medium.

Background

At present, there are many products for weiqi to eat subclasses on the market, and the main purpose is that on a 9-way or 13-way chessboard, a user and Artificial Intelligence (AI) alternately play chess, and a party who first eats a plurality of pieces of the other party wins, and games and game varieties similar to the weiqi win many. The user needs to improve the skill of weiqi by playing and rerailing, but in the learning of playing and rerailing, the user often has "what pieces are wrong? "," where should the wrongly-placed pieces be placed? "how should the subsequent game go? "and other related problems, effective reply learning needs to be performed under the guidance of teachers, and learning autonomy is influenced.

Disclosure of Invention

In view of the above, an object of the present application is to provide a method and an apparatus for learning a copy of a card, an electronic device and a storage medium to solve the above technical problems.

An exemplary embodiment of the present disclosure provides a method for learning a composite eating plate, including:

obtaining game data of a user after the user plays the chess through human-computer interaction, and determining a plurality of user drop points of the user in the chess according to the game data;

for each user falling point, determining at least one recommended falling point corresponding to the user falling point; determining a recommended value of the user drop point according to the user drop point and the recommended drop point corresponding to the user drop point;

determining the user drop point meeting a preset condition in all the recommended values as a dead hand point;

for each bad hand point, determining the winning value of each recommended drop point corresponding to the bad hand point, determining the highest winning value as the highest winning value, and determining the recommended drop point corresponding to the highest winning value as the target drop point; and generating the copy demonstration data according to the target falling point, and displaying the copy demonstration data to a user.

In some exemplary embodiments, the determining at least one recommended falling point corresponding to the user falling point specifically includes:

determining the chessboard layout data before the user falling point;

inputting the chessboard layout data into a pre-trained strategy network model to obtain at least one recommended drop point output by the strategy network model;

wherein one of the at least one recommended drop point is the same as the user drop point.

In some exemplary embodiments, the output of the policy network model further includes a recommendation degree corresponding to each of the recommendation drop points; the recommendation degree of the user drop point is equal to the recommendation degree of the recommended drop point which is the same as the user drop point;

the calculation process of the recommended value is as follows:

and the recommendation value is the recommendation degree of the user drop point/the recommendation degree sum of the recommendation drop points higher than the recommendation degree of the user drop point.

In some exemplary embodiments, for each of the dead hand points, determining a win ratio value of each of the recommended falling points corresponding to the dead hand point, and determining the recommended falling point corresponding to the highest win ratio value as a target falling point includes:

searching the win rate value of each recommended drop point according to a quick searching mode by utilizing a pre-trained valuation network model and a quick walking subnetwork model;

and sorting each recommended falling point according to all the winning value values, determining the highest winning value in the winning value values as the highest winning value, and determining the recommended falling point corresponding to the highest winning value as a target falling point.

In some exemplary embodiments, generating the copy presentation data according to the target landing point, and presenting the copy presentation data to a user includes:

determining the chessboard layout data of the target falling point;

inputting the chessboard layout data of the target drop point into a pre-trained strategy network model to obtain at least one next recommended drop point output by the strategy network model;

searching the win probability value of each next recommended drop point according to a quick searching mode by utilizing a pre-trained valuation network model and a quick walking subnetwork model;

sorting each next recommended falling point according to all the winning value values, determining the highest winning value as the highest winning value, and determining the recommended falling point corresponding to the highest winning value as the next target falling point;

re-determining the chessboard layout data based on the next target landing point;

and generating the copy demonstration data in response to the fact that the falling end condition is met, and displaying the copy demonstration data to a user.

In some exemplary embodiments, the fast search mode is an MCTS search.

In some exemplary embodiments, the end condition of the MCTS search includes:

the search time exceeds 5 s;

alternatively, the first and second electrodes may be,

the highest winning rate value is 50% higher than the next highest winning rate value;

wherein the next highest win value is the win value closest to the highest win value.

Based on the same inventive concept, an exemplary embodiment of the present disclosure further provides a device for learning a composite eating dish, including:

the acquisition module is used for acquiring game data of the user after the user plays the chess through human-computer interaction and determining a plurality of user drop points of the user in the game according to the game data;

the calculation module is used for determining at least one recommended falling point corresponding to each user falling point; determining a recommended value of the user drop point according to the user drop point and the recommended drop point corresponding to the user drop point;

the hand-breaking point module is used for determining the user falling point meeting the preset condition in all the recommended values as a hand-breaking point;

the display module is used for determining the winning rate value of each recommended falling point corresponding to each dead hand point and determining the recommended falling point corresponding to the highest winning rate value as a target falling point; and generating the copy demonstration data according to the target falling point, and displaying the copy demonstration data to a user.

Based on the same inventive concept, the exemplary embodiments of the present disclosure also provide an electronic device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the method as described in any one of the above items when executing the program.

Based on the same inventive concept, the exemplary embodiments of the present disclosure also provide a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method as described in any one of the above.

As can be seen from the foregoing, the present application provides a method, an apparatus, an electronic device, and a storage medium for learning a copy of a piece of food, which first screen bad-hand points in user drop points of a human-computer game, recommend target drop points with a high winning rate corresponding to the bad-hand points, and generate copy demonstration data based on the target drop points for the user to learn; the method gets rid of the dependence on teacher guidance in the process of disk-copying learning, and improves the activity of autonomous learning of the user.

Drawings

In order to more clearly illustrate the technical solutions in the present application or the related art, the drawings needed to be used in the description of the embodiments or the related art will be briefly introduced below, and it is obvious that the drawings in the following description are only embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a schematic diagram of an application scenario of an exemplary embodiment of the present disclosure;

FIG. 2 is a schematic flow chart diagram of a Chizi copy learning method according to an exemplary embodiment of the present disclosure;

FIG. 3 is a flowchart illustrating a process for selecting recommended drop points according to an exemplary embodiment of the disclosure;

FIG. 4 is another flowchart illustrating a process for selecting recommended placement points in accordance with an exemplary embodiment of the present disclosure;

FIG. 5 is a schematic process flow diagram of the selection of a target drop point according to an exemplary embodiment of the present disclosure;

FIG. 6 is a flowchart illustrating a process of generating copy presentation data according to an exemplary embodiment of the disclosure;

fig. 7 is a schematic structural diagram of a grazing reply learning device according to an exemplary embodiment of the present disclosure;

fig. 8 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present disclosure.

Detailed Description

The principles and spirit of the present application will be described with reference to a number of exemplary embodiments. It should be understood that these embodiments are presented only to enable those skilled in the art to better understand and to implement the present disclosure, and are not intended to limit the scope of the present application in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

According to the embodiment of the disclosure, a training method of a text processing model, a text processing method and related equipment are provided.

In this document, it is to be understood that any number of elements in the figures are provided by way of illustration and not limitation, and any nomenclature is used for differentiation only and not in any limiting sense.

For convenience of understanding, terms referred to in the embodiments of the present disclosure are explained below:

neural Networks (ANNs): a practical artificial neural network model is built according to the principle of the biological neural network and the requirement of practical application, a corresponding learning algorithm is designed, certain intelligent activity of the human brain is simulated, and then the practical artificial neural network model is technically realized to solve the practical problem.

Policy network model (policyenet, policy network): after the neural network is used for training (learning), the next step of screening and determining the falling point can be carried out based on the current chessboard.

Estimation network model (ValueNet, estimation network): the method is a neural network model which is obtained by training (learning) a large number of samples by using a neural network and can predict the win ratio values of various types of fellows in the next step based on the current state of a chessboard. The neural network has the characteristics of a neural network and has certain self-learning capability.

Fast walk subnetwork model (FastNet, fast walk subnetwork): the neural network model is obtained by training (learning) the neural network and can perform subsequent rapid alternate falling based on the current state of the chessboard.

Monte Carlo Tree Search (Monte Carlo Tree Search, MCTS): also called MCTS search, is a heuristic search algorithm that is based on a tree data structure and is still effective in cases where the search space is large.

Disc compounding: the term "repeating chess", which is also called as "repeating chess", refers to that after the chess is completed, the record of the chess is repeated to check the key of success or failure of the move in the game. Generally used for self-learning, or giving guidance to the analysis by asking the senior to give guidance. For example, the chess manual is ranked according to the chess manual, and the class is called 'playing the chess manual' or 'searching the chess manual' according to the double-disc.

And (4) a hand damage point: the point of the false fall in the chessboard is the dead hand point.

User drop point: the position of the user during the playing process of the human computer.

AI drop point: the position of the AI side in the process of playing the human computer.

The principles and spirit of the present application are explained in detail below with reference to several representative embodiments of the present disclosure.

Summary of The Invention

The chess duplication learning related to the weiqi means that after each game is finished, players of the two players repeat the game, so that the impression of the game of the weiqi can be effectively deepened, and the holes of the two players can be found out to improve the chess skill level of the players. However, when reviewing the study, the user often does not know which pieces are wrong? "," where should the wrongly-placed pieces be placed? "," what should be done in the subsequent game? "i.e., does not know which pieces are bad hand points? "," is the target drop point with high winning rate corresponding to the dead hand point? "," how does the subsequent game based on the target drop point tend? However, the conventional AI does not have a function of indicating a bad hand point to the duplicate chess game and demonstrating the tendency of a plurality of subsequent chess game steps of a target fall point corresponding to the bad hand point, and needs a teacher to perform related guidance, that is, related duplicate learning needs the assistance of the teacher, and particularly for primary learners, the dependency degree on the teacher is high during duplicate learning, and the learning autonomy is naturally influenced.

In order to solve the problems in the prior art, the present disclosure provides a method, an apparatus, an electronic device and a storage medium for learning a draft copy, wherein the method comprises the steps of screening bad hands of a plurality of user drop points in game data after the game is played through human-computer interaction; and then, obtaining a target drop point corresponding to the dead hand point through calculation and analysis, and generating repeated playing demonstration data capable of showing the trend of the subsequent chess game based on the target drop point for the study and research of the user. The device corresponding to the method can have the functions of indicating the dead hand point, the target falling point with high winning rate corresponding to the dead hand point, displaying the subsequent chess game trend based on the target falling point and the like for the falling point of the user in the human-computer game, so that the dependence on teacher guidance is eliminated, and the activity of the autonomous learning of the user is improved.

Having described the general principles of the present disclosure, various non-limiting embodiments of the present disclosure are described in detail below.

Application scene overview

Referring to fig. 1, it is a schematic view of an application scenario of the method for learning a Chizi copy disk according to the embodiment of the present disclosure. The application scenario includes a terminal device 101, a server 102, and a data storage system 103. The terminal device 101, the server 102, and the data storage system 103 may be connected through a wired or wireless communication network. The terminal device 101 includes, but is not limited to, a desktop computer, a mobile phone, a mobile computer, a tablet computer, a media player, a smart wearable device, a Personal Digital Assistant (PDA), or other electronic devices capable of implementing the above functions. The server 102 and the data storage system 103 may be independent physical servers, may also be a server cluster or distributed system formed by a plurality of physical servers, and may also be cloud servers providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, and big data and artificial intelligence platforms.

The server 102 is configured to provide a user of the terminal apparatus 101 with a service of learning a cookie, and the terminal apparatus 101 has a client installed therein, which communicates with the server 102.

First, the server 102 transmits game data obtained by the human-computer interaction game to the client of the terminal device 101 via the communication network, and displays the game data on the display interface corresponding to the client. Meanwhile, the server 102 determines a plurality of user drop points of the user in the game based on the game data, and determines at least one recommended drop point corresponding to each user drop point; and calculating to obtain a recommended value of the user drop point, defining the user drop point corresponding to the recommended value meeting the preset condition as a dead hand point, sending the dead hand point to a client of the terminal equipment 101, and displaying on a display interface corresponding to the client.

Then, the server 102 takes the recommended drop point corresponding to the dead hand point and having the highest win point value as a target drop point, generates copy demonstration data based on the target drop point, stores the copy demonstration data in the data storage system 103, sends the copy demonstration data to the client of the terminal device 101 through the communication network, and displays the copy demonstration data through the display interface.

And repeating the process to realize the process of repeating the copy demonstration of the target drop points corresponding to all the dead hand points, thereby completing the task of learning the copy of the user.

The method, the apparatus, the electronic device, and the storage medium for learning the copy-on-draft according to the exemplary embodiments of the present disclosure are described below with reference to an application scenario of fig. 1. It should be noted that the above application scenarios are merely illustrated for the convenience of understanding the spirit and principles of the present disclosure, and the embodiments of the present disclosure are not limited in this respect. Rather, embodiments of the present disclosure may be applied to any scenario where applicable.

Exemplary method

Some embodiments of the present application provide a method of learning a Chizi copy, as shown in FIG. 2, comprising:

s201, obtaining game data of a user after the user plays the chess through human-computer interaction, and determining a plurality of user drop points of the user in the game according to the game data;

s202, for each user falling point, determining at least one recommended falling point corresponding to the user falling point; determining a recommended value of the user drop point according to the user drop point and the recommended drop point corresponding to the user drop point;

s203, determining the user drop point meeting the preset conditions in all the recommended values as a dead hand point;

s204, for each dead hand point, determining the winning value of each recommended falling point corresponding to the dead hand point, determining the highest winning value as the highest winning value, and determining the recommended falling point corresponding to the highest winning value as the target falling point; and generating the copy demonstration data according to the target falling point, and displaying the copy demonstration data to a user.

And repeating the step S204 to generate copy demonstration data for the target drop points corresponding to all the dead hand points for the user to learn.

The game data is complete game data after the man-machine interaction game is played, and the game data comprises AI drop points and user drop points.

The steps S201 to S203 are processes of screening user drop points in the game data and finding out a dead hand point, which can be regarded as shallow analysis of the game data; step S204 is a process of finding a target drop point corresponding to the dead hand point and generating the multi-disc demonstration data based on the target drop point, which may be regarded as deep analysis of the game data. The shallow analysis and the deep analysis are performed by means of a pre-trained neural network model.

The repeated demonstration data can be used for the user to study and improve the chess art. The repeated playing demonstration data can be full playing demonstration data of the next complete chess game based on the target drop point, or repeated playing demonstration data of subsequent steps based on the target drop point, and the repeated playing demonstration data of the full playing demonstration data or the subsequent steps can be obtained only by showing the subsequent trend of the chess game, so that the user can conveniently learn and research, and the limitation is not required. When the repeated playing demonstration data are repeated playing demonstration data of a plurality of subsequent steps based on the target falling point, the repeated playing demonstration data are generally repeated playing demonstration data of at least 5 subsequent steps including the target falling point, namely, the AI automatically plays at least 5 steps by utilizing the neural network model to show the subsequent playing trend of the chess game.

In specific implementation, game data after the human-computer interaction game is obtained in the step, the obtained game data can be displayed on a display interface corresponding to the client, and the determined user landing points can be subjected to protruding display. Then, according to the sequence of the chess playing process, AI inputs the chessboard data of the user before each step to a pre-trained neural network model for calculation; the neural network model outputs at least one recommended falling point according to the chessboard data, the recommendation degree of the user falling point is compared with the recommendation degree of the recommended falling point to obtain a recommendation value, the lower recommendation value can be defined as a bad hand point, and the determination of the bad hand point can be adopted in one of the following two modes: (1) defining all user falling sub-points lower than a preset recommended value as dead hand points; (2) the data can be sorted according to the recommended value from large to small/from small to large, and the last several/first several in the sort are defined as the bad hand points; for another example, the recommended values are sorted from small to large, and the first five names in the sorting are the bad hand points.

Then, the neural network model estimates the winning rate of each recommended drop point corresponding to each hand breaking point, and determines the recommended drop point corresponding to the highest winning rate value as a target drop point; and the neural network model generates copy demonstration data based on the target drop point, and displays the copy demonstration data to the user for the user to study and study.

Certainly, in a specific implementation process, the neural network model can generate repeated playing demonstration data capable of reflecting the trend of a subsequent chess game based on the target drop point corresponding to each dead hand point; and generating the copy demonstration data aiming at the target drop points corresponding to one or more bad hand points selected by the user. When the neural network model generates the copy demonstration data based on the target drop points corresponding to the multiple dead-hand points, the neural network model displays the corresponding copy demonstration data one by one according to the sequence of the multiple dead-hand points in the playing process without the appointment of the user; and in the case of user's appointment, displaying the corresponding copy of the demonstration data according to the user's appointed sequence. And are not limited herein.

According to the method for learning the copy by the student, the dead hand points in the falling points of the users playing the chess are screened in advance, the target falling points with high winning rates corresponding to the dead hand points are recommended, copy demonstration data based on the target falling points are generated for the users to learn and study, dependence on teacher guidance in the process of learning the copy is eliminated, and the activity of autonomous learning of the students is improved.

In some exemplary embodiments, as shown in fig. 3, the determining, for each of the user drop points, at least one recommended drop point corresponding to the user drop point includes:

s301, determining chessboard layout data before the user falls on a point;

s302, inputting the chessboard layout data into a pre-trained strategy network model to obtain at least one recommended drop point output by the strategy network model.

The output of the strategy network model further comprises the recommendation degree of the user drop point and the recommendation degree corresponding to each recommendation drop point; and comparing the recommendation degree between the user drop point and the corresponding recommended drop point, and judging whether the user drop point is a bad hand point.

The game data is complete game data after the man-machine interaction game is played, and the user can be used as the basis for repeated-playing study based on the complete game data; the chessboard layout data is partial chess game data, specifically partial chess game data before each step of the user, namely the chessboard layout data does not include the step points of the user; that is, for one man-machine interaction game, when the game is repeated, the game data is one, and the chessboard layout data is a plurality of. Specifically, when the user first puts the chess out, the user does not have the chessboard layout data before the chess is put out in the first step, so that the number of the chessboard layout data can be one less than the number of the user's falling points; when the AI is to first out the chess, the number of the chessboard layout data can be the same as the number of the user's falling points.

In specific implementation, the process of selecting recommended drop points includes:

based on the pushed chessboard layout data before each user's falling point falls, searching (namely MCTS) on a Monte Carlo tree according to a preset search breadth by using a pre-trained strategy network model, after the MCTS search ending condition is reached (if the search time exceeds 5s), the searching is ended, the output of the strategy network model is a chessboard-sized matrix, the value of each falling point in the matrix represents the recommendation degree of the next falling point falling on the corresponding position on the chessboard, and the sum of all recommendation degree values reflected in the matrix is 1. The strategy network model ranks the recommendation degrees of the plurality of drop point positions, and then deduces at least one drop point with the recommendation degree ranked at the top as a recommendation drop point. The predetermined search width may be set according to actual needs, for example, (analysiswideorotnoise) [0, 1], and the search width is narrower as the numerical value is closer to 0; the closer the value is to 1, the wider the search breadth. The predetermined search scope is increased, so that the search range can be enlarged, the weight of the access times of each node on the monte carlo tree given by the ucb (upper confidence bound) value during searching is increased, and the access opportunity of the node with less access times in the monte carlo tree is increased.

In this embodiment, the chessboard layout data corresponding to each user drop point in the game data is analyzed to obtain at least one recommended drop point corresponding to the user drop point, so as to prepare for performing the subsequent hand-breaking point analysis.

In some exemplary embodiments, as shown in fig. 4, for each user drop point, determining at least one recommended drop point corresponding to the user drop point specifically includes:

s401, marking each user drop point in the game data according to the playing sequence;

s402, determining chessboard layout data before each user falling point in a preset label range falls;

s403, inputting the chessboard layout data into a pre-trained strategy network model to obtain at least one recommended drop point output by the strategy network model.

The preset label range can be input by a user or set by an AI according to the level of the user so as to meet the requirements of users with different levels. If the user is a primary student, the primary student has limited level, and may make mistakes in the first few steps or the last few steps, and the label range can be set to be wider; if the user is a senior learner whose first or last steps are substantially error free, the label range may be set narrower. For example, 38 user drop points are totally included in one game data, the 38 user drop points are labeled according to the playing field sequence, the labels are 1-38, the label range of the user input labels is 10-30, the AI only analyzes the chessboard layout data corresponding to the user drop points labeled with 10-30, and the subsequent screening of the dead hand points is performed based on the user drop points in the preset label range 10-30.

based on the pushed chessboard layout data before each user drop point in the preset label range is searched on the Monte Carlo Tree (MCTS) according to the preset search breadth by utilizing a pre-trained strategy network model, after the end condition of the MCTS search is reached (if the search time exceeds 5s), the search is ended, the output of the strategy network model is a matrix with the size of the chessboard, each value in the matrix represents the recommendation degree of the next drop point on the corresponding position on the chessboard, and the sum of all recommendation degree values reflected in the matrix is 1. The strategy network model ranks the recommendation degrees of the plurality of drop point positions, and then deduces at least one drop point with the recommendation degree ranked at the top as a recommendation drop point. The predetermined search width may be set according to actual needs, for example, (analysiswideorotnoise) [0, 1], and the search width is narrower as the numerical value is closer to 0; the closer the value is to 1, the wider the search breadth. The predetermined search scope is increased, so that the search range can be enlarged, the weight of the access times of each node on the monte carlo tree given by the ucb (upper confidence bound) value during searching is increased, and the access opportunity of the node with less access times in the monte carlo tree is increased.

In this embodiment, the plurality of user drop points in the game data are labeled according to the playing sequence, and it may be further defined that the chessboard layout data corresponding to the user drop points within the preset label range is analyzed to obtain the dead hand points within the label range. The setting of the label range limits that the neural network in the AI only calculates the recommended falling point of the user falling point in the relevant label range, and does not need to calculate the recommended falling points of all the user falling points, thereby reducing the operation times of the neural network, improving the pertinence of the copy learning and further improving the efficiency of the copy learning.

In some exemplary embodiments, the calculation of the recommended value may be performed in two ways:

the first method is as follows:

In specific implementation, based on chessboard layout data before a user drop point drops, the output of the strategy network model is a matrix of the size of a chessboard, each value in the matrix represents the recommendation degree of the next drop on the corresponding position on the chessboard, and the sum of all recommendation degree values reflected in the matrix is 1; moreover, the user drop point is also bound to be in the matrix, and the corresponding recommendation degree is also reflected in the matrix; the strategy network model ranks the recommendation degrees of the plurality of drop point positions, and then deduces a plurality of drop points with the recommendation degrees ranked at the top as recommended drop points. And the number of the recommended falling sub points corresponding to the user falling sub points is at least 5. Taking an example that the policy network model outputs a recommendation drop point with a recommendation degree ranking 5 to each user drop point, the user drop points are denoted as T1, 5 recommendation drop points are T2, T3, T4, T5 and T6, and the recommendation degrees of the recommendation drop points are T1-5%, T2-5%, T3-5%, T4-8%, T5-7% and T6-10%.

The calculation process of the recommended value of the user drop point is as follows:

the recommended value is 5%/(8% + 7% + 10%) -0.2.

The calculation process of the recommendation value is a comparison process of the user drop point and a plurality of recommendation drop points, and compared with the best recommendation drop point, the calculation process of the recommendation value is high in accuracy and is more convincing.

The second method is as follows:

the recommendation value is the recommendation degree of the user falling point/the recommendation degree of the falling point with the highest recommendation degree.

In specific implementation, based on the chessboard layout data before a user drop point drops, the output of the strategy network model is a matrix of the size of the chessboard, each value in the matrix represents the recommendation degree of the next drop at the corresponding position on the chessboard, and the sum of all recommendation degree values reflected in the matrix is 1. The strategy network model ranks the recommendation degrees of the plurality of drop point positions, and then deduces a drop point with the highest recommendation degree as a recommended drop point. And then comparing the recommendation degree of the user drop point with the recommendation degree of the drop point with the highest recommendation degree. For example, the user drop point is denoted as T1, the drop point with the highest recommendation degree is denoted as T2, the recommendation degrees of T1 and T2 are T1-5%, and T2-10%.

the recommended value is 5%/(10%) -0.5.

The calculation process of the recommendation value is a comparison process of the user drop point and a drop point with the highest recommendation degree, the accuracy is reduced compared with that of a plurality of recommended drop points, but the calculation process is accelerated.

In some exemplary embodiments, as shown in fig. 5, for each of the dead hand points, determining a win ratio value of each of the recommended drop points corresponding to the dead hand point, and determining the recommended drop point corresponding to the highest win ratio value as a target drop point includes:

s501, searching a win rate value of each recommended drop point according to a quick searching mode by utilizing a pre-trained valuation network model and a quick walking subnetwork model;

s502, sorting each recommended falling point according to all the winning value values, determining the highest winning value in the winning value values as the highest winning value, and determining the recommended falling point corresponding to the highest winning value as a target falling point.

Wherein, the fast searching mode is MCTS searching.

In specific implementation, the process of selecting the target landing point includes:

based on the pushed chessboard layout data of each recommended drop point, the valuation network model initializes a weight value according to the recommendation degree of the recommended drop point given by the strategy network model, carries out MCTS (multi-channel chess set) search according to the search breadth corresponding to the weight value, carries out the whole chess game of black and white chess by utilizing the fast walking sub-network model, updates the weight value according to the self-playing result, finishes the search after the end condition of the MCTS search is reached, and determines the win ratio of the recommended drop point according to the latest weight value. And repeating the above processes to obtain the win rate values of all the recommended drop points. And then, sequencing each recommended falling point according to all the win rate values, and determining the recommended falling point corresponding to the highest win rate value as a target falling point.

Taking 5 recommended drop points corresponding to one dead hand point as an example, for the 5 recommended drop points, the evaluation network model initializes 5 weight values according to the recommendation degrees of the 5 recommended drop points given by the policy network model, and the 5 weight values are respectively marked as v1, v2, v3, v4 and v5, wherein the corresponding weight value with high recommendation degree is also high, the corresponding weight value with low recommendation degree is also low, and the weight value corresponds to the search scope of the recommended drop point. And then, carrying out MCTS search according to the set search breadth, wherein the MCTS search needs to use an evaluation network model and a fast walking network model. Specifically, based on each recommended dropping point, the fast-walking sub-network model automatically carries out fast black and white chess alternative dropping and a piece of chess (taking black chess as a user side and white chess as an AI side as an example), and if the result of the chess is that the black chess representing the user side is lost, the weight value of the recommended dropping point is reduced; if the outcome of the game wins black chess on behalf of the user, the weight value of the recommended drop point is increased. That is, after one round of search, the weight values of the 5 recommended child points are updated to v1 ', v2 ', v3 ', v4 ', v5 ', and then the search breadth of the corresponding search branch is updated. And (4) circulating the above processes, finishing the search after reaching the end condition of the MCTS search, and determining the win probability value of the corresponding recommended drop point according to the latest 5 weight values. And then, sequencing each recommended falling point according to all the win rate values, and determining the recommended falling point corresponding to the highest win rate value as a target falling point.

Wherein the MCTS search termination condition comprises:

i) the search time exceeds 5 s; or, ii) the highest value is 50% higher than the next highest value; wherein the next highest win value is the win value closest to the highest win value.

In particular implementations, the MCTS search ends as long as one of the end conditions is met.

In some exemplary embodiments, as shown in fig. 6, generating the copy presentation data according to the target landing point, and presenting the copy presentation data to the user includes:

s601, determining chessboard layout data of the target landing point;

s602, inputting the chessboard layout data of the target drop point into a pre-trained strategy network model to obtain at least one next step recommended drop point output by the strategy network model;

s603, searching the win ratio value of each next recommended drop point according to a quick searching mode by utilizing a pre-trained valuation network model and a quick walking subnetwork model;

s604, sorting each next recommended falling point according to all the winning value values, determining the highest winning value in the winning value values as the highest winning value, and determining the recommended falling point corresponding to the highest winning value as the next target falling point;

s605, re-determining the chessboard layout data based on the next target landing point;

and S606, responding to the fact that the falling end condition is met, generating the copy demonstration data, and displaying the copy demonstration data to a user.

And (4) circularly performing the steps S601-S605, namely executing the self-play alternate falling sub-process by the AI, then entering the step S606, generating the copy demonstration data, and displaying the copy demonstration data to the user.

Wherein the MCTS search termination condition comprises:

The dropping end condition is the preset step number of the multi-turn demonstration data, for example, the chessboard layout data is PATH0, the target dropping point is B1, the current chess manual is PATH0+ B1, the step number of the multi-turn demonstration data is 5 steps, and the next target dropping point W2 is obtained by using policyenet, ValueNet and FastNet joint analysis, so that the current chess manual is updated to PATH0+ B1+ W2. And (4) repeating the steps, and alternately dropping the black chess and the white chess to finally form a PATH0+ B1+ W2+ B3+ W4+ B5 style chess manual. B1+ W2+ B3+ W4+ B5 is the 5-step chess tendency for the target falling point B1, and PATH0+ B1+ W2+ B3+ W4+ B5 is the double-disk demonstration data for the target falling point B1. According to the process, corresponding copy demonstration data can be generated for target drop points corresponding to other dead hand points, so that the user can learn and study independently.

Based on the same inventive concept, corresponding to any embodiment method, the application also provides a device for learning the draft reply.

Referring to fig. 7, the grazing reply learning device includes:

the acquisition module 701 is used for acquiring game data of a user after the user plays the game through man-machine interaction, and determining a plurality of user drop points of the user in the game according to the game data;

a calculating module 702, configured to determine, for each user drop point, at least one recommended drop point corresponding to the user drop point; determining a recommended value of the user drop point according to the user drop point and the recommended drop point corresponding to the user drop point;

a bad hand point module 703, determining the user drop point satisfying a predetermined condition in all the recommended values as a bad hand point;

a display module 704, configured to determine, for each of the bad hand points, a winning rate value of each of the recommended falling points corresponding to the bad hand point, and determine the recommended falling point corresponding to the highest winning rate value as a target falling point; and generating the copy demonstration data according to the target falling point, and displaying the copy demonstration data to a user.

In some alternative embodiments, the calculation module is specifically configured to determine the chessboard layout data before the user's landing point; inputting the chessboard layout data into a pre-trained strategy network model to obtain at least one recommended drop point output by the strategy network model;

The output of the strategy network model further comprises the recommendation degree corresponding to each recommendation drop point; the recommendation degree of the user drop point is equal to the recommendation degree of the recommended drop point which is the same as the user drop point;

the calculation process of the recommended value is as follows:

In some optional embodiments, the presentation module is configured to search the win probability value of each recommended drop point according to a fast search mode by using a pre-trained valuation network model and a fast walking subnetwork model; and sorting each recommended falling point according to all the winning value values, determining the highest winning value in the winning value values as the highest winning value, and determining the recommended falling point corresponding to the highest winning value as a target falling point.

In some optional embodiments, the presentation module is further configured to determine chessboard layout data for the target landing point;

Wherein, the fast searching mode is MCTS searching.

Further, the end condition of the MCTS search includes:

the search time exceeds 5 s;

alternatively, the first and second electrodes may be,

Based on the same inventive concept, corresponding to the method of any of the above embodiments, the present application further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the processor implements the Chizifu learning method of any of the above embodiments.

Fig. 8 is a schematic diagram illustrating a more specific hardware structure of an electronic device according to this embodiment, where the electronic device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Characterized in that processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 are communicatively coupled to each other within the device via bus 1050.

The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solutions provided in the embodiments of the present disclosure.

The Memory 1020 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random Access Memory), a static storage device, a dynamic storage device, or the like. The memory 1020 may store an operating system and other application programs, and when the technical solution provided by the embodiments of the present specification is implemented by software or firmware, the relevant program codes are stored in the memory 1020 and called to be executed by the processor 1010.

The input/output interface 1030 is used for connecting an input/output module to input and output information. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The system is characterized in that the input device can comprise a keyboard, a mouse, a touch screen, a microphone, various sensors and the like, and the output device can comprise a display, a loudspeaker, a vibrator, an indicator light and the like.

The communication interface 1040 is used for connecting a communication module (not shown in the drawings) to implement communication interaction between the present apparatus and other apparatuses. The communication module is characterized in that the communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, Bluetooth and the like).

Bus 1050 includes a path that transfers information between various components of the device, such as processor 1010, memory 1020, input/output interface 1030, and communication interface 1040.

It should be noted that although the above-mentioned device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, in a specific implementation, the device may also include other components necessary for normal operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only those components necessary to implement the embodiments of the present description, and not necessarily all of the components shown in the figures.

The electronic device of the above embodiment is used to implement the corresponding method for learning the draft copy in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.

Exemplary program product

Based on the same inventive concept, corresponding to any of the above-described embodiment methods, the present disclosure also provides a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the Chizireplica learning method according to any of the above embodiments.

The non-transitory computer readable storage medium may be any available medium or data storage device that can be accessed by a computer, including but not limited to magnetic memory (e.g., floppy disks, hard disks, magnetic tape, magneto-optical disks (MOs), etc.), optical memory (e.g., CDs, DVDs, BDs, HVDs, etc.), and semiconductor memory (e.g., ROMs, EPROMs, EEPROMs, non-volatile memory (NAND FLASH), Solid State Disks (SSDs)), etc.

The computer instructions stored in the storage medium of the above embodiment are used to enable the computer to execute the draft copy learning method according to any one of the above exemplary method embodiments, and have the beneficial effects of the corresponding method embodiments, which are not described herein again.

As will be appreciated by one skilled in the art, embodiments of the present invention may be embodied as a system, method or computer program product. Accordingly, the present disclosure may be embodied in the form of: entirely hardware, entirely software (including firmware, resident software, micro-code, etc.) or a combination of hardware and software, and is referred to herein generally as a "circuit," module "or" system. Furthermore, in some embodiments, the invention may also be embodied in the form of a computer program product in one or more computer-readable media having computer-readable program code embodied in the medium.

Any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive example) of the computer readable storage medium may include, for example: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable medium that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable medium produce an article of manufacture including instruction means which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

Moreover, while the operations of the method of the invention are depicted in the drawings in a particular order, this does not require or imply that the operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Rather, the steps depicted in the flowcharts may change the order of execution. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.

Use of the verbs "comprise", "comprise" and their conjugations in this application does not exclude the presence of elements or steps other than those stated in this application. The article "a" or "an" preceding an element does not exclude the presence of a plurality of such elements.

While the spirit and principles of the invention have been described with reference to several particular embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, nor is the division of aspects, which is for convenience only as the features in such aspects may not be combined to benefit. The invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims. The scope of the following claims is to be accorded the broadest interpretation so as to encompass all such modifications and equivalent structures and functions.

Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the context of the present application, features from the above embodiments or from different embodiments may also be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the embodiments of the present application as described above, which are not provided in detail for the sake of brevity.

In addition, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown in the provided figures for simplicity of illustration and discussion, and so as not to obscure the embodiments of the application. Furthermore, devices may be shown in block diagram form in order to avoid obscuring embodiments of the application, and this also takes into account the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the embodiments of the application are to be implemented (i.e., specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the application, it should be apparent to one skilled in the art that the embodiments of the application can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative instead of restrictive.

While the present application has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic ram (dram)) may use the discussed embodiments.

The present embodiments are intended to embrace all such alternatives, modifications and variances which fall within the broad scope of the appended claims. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present application are intended to be included within the scope of the present application.

Claims

1. A method of learning a composite eating plate, comprising:

2. The method according to claim 1, wherein the determining at least one recommended drop point corresponding to the user drop point specifically includes:

determining the chessboard layout data before the user falling point;

inputting the chessboard layout data into a pre-trained strategy network model to obtain at least one recommended drop point output by the strategy network model.

3. The method of claim 1 or 2, wherein the output of the policy network model further comprises a recommendation for a user drop point corresponding to each of the recommendation drop points;

the calculation process of the recommended value is as follows:

4. The method of claim 1, wherein for each of the dead-hand points, determining a win ratio value of each of the recommended drop points corresponding to the dead-hand point, and determining the recommended drop point corresponding to a highest win ratio value as a target drop point comprises:

5. The method of claim 1, wherein generating and presenting to a user a copy of presentation data from the target landing point comprises:

determining the chessboard layout data of the target falling point;

6. The method of claim 4 or 5, wherein the fast search mode is an MCTS search.

7. The method of claim 6, wherein the MCTS search termination condition comprises:

the search time exceeds 5 s;

alternatively, the first and second electrodes may be,

8. A device for learning to eat a compound meal, comprising:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 7 when executing the program.

10. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1 to 7.