CN114768254A

CN114768254A - Virtual scene path finding method and device, electronic device and storage medium

Info

Publication number: CN114768254A
Application number: CN202210412745.3A
Authority: CN
Inventors: 胡玥; 王蒙; 陈赢峰; 范长杰; 胡志鹏
Original assignee: Netease Hangzhou Network Co Ltd
Current assignee: Netease Hangzhou Network Co Ltd
Priority date: 2022-04-08
Filing date: 2022-04-19
Publication date: 2022-07-22

Abstract

The application provides a path finding method and related equipment for a virtual scene, which comprise the following steps: acquiring the current position and the target position of the virtual object; inputting the current position and the target position into a path intermediate point prediction model to calculate to obtain an intermediate node position, and taking the obtained intermediate node position as a target in a virtual object reinforcement learning process; and taking the position of the intermediate node as the next route searching target position of the virtual object, and controlling the virtual object to move to the route searching target position. According to the method and the device, the real moving route and the moving operation of the user in the virtual scene are simulated through the intermediate point prediction model learning, the long-distance route searching is divided into the multiple sections of short-distance route searching in a mode of setting the intermediate nodes, route changing during the route searching is strengthened, the automatic route searching in the virtual scene is enabled to be more like the moving route operated by the user, a more normal and anthropomorphic automatic route searching scheme is provided, the route searching efficiency and the route searching success rate are improved, and meanwhile the user experience is improved.

Description

Virtual scene path finding method, virtual scene path finding equipment, electronic equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a path finding method and apparatus for a virtual scene, an electronic device, and a storage medium.

Background

In the use process of more and more game applications or interactive applications, in order to meet the requirements of practical applications, diversified movements of virtual objects controlled by users or artificial intelligence can be realized according to practical scenes, for example: an automatic way finding scheme for virtual objects in a virtual scene.

However, some existing automatic way-finding schemes generally control a virtual object to move according to a script written according to a certain rule, the rule writing is difficult in this way, and in some scenes, the automatic way-finding movement of the virtual object is obviously different from the movement of the virtual object controlled by a normal user, and the behavior is strange.

Disclosure of Invention

In view of this, the present application provides a way finding method, device, electronic device and storage medium for a virtual scene, so as to provide a more normal and anthropomorphic automatic way finding scheme.

Based on the above purpose, the present application provides a way finding method for a virtual scene, which includes:

acquiring the current position and the target position of a virtual object, wherein the virtual object is an agent obtained through reinforcement learning;

inputting the current position and the target position into a path intermediate point prediction model for calculation to obtain an intermediate node position, and taking the obtained intermediate node position as a target in the virtual object reinforcement learning process;

and taking the position of the intermediate node as the next route searching target position of the virtual object, and controlling the virtual object to move to the route searching target position.

In some embodiments, the method further comprises:

determining a virtual scene where the virtual object is located currently, and acquiring at least one piece of historical movement data of the virtual object controlled by a user under the virtual scene;

generating a training set according to the historical movement data, wherein each piece of data of the training data set comprises a starting point and an end point corresponding to each movement step and a label formed by the next point of the starting point, and the next point is a point on a path formed by the starting point and the end point;

and training the path intermediate point prediction model through the training set.

In some embodiments, the generating a training set from the historical movement data comprises:

generating a visual path diagram according to the historical movement data;

cleaning the historical movement data according to the visual path diagram;

and generating a training set according to the cleaned historical movement data.

In some embodiments, the generating a visualization pathway map from the historical movement data comprises:

displaying the moving paths of different players by using different styles, wherein the visual path graph is formed by the moving paths;

cleaning the historical movement data according to the visualization path graph, which comprises the following steps: if the two different types of moving paths in the visualization path graph have aggregation conditions, deleting historical moving data corresponding to aggregation areas of the two different types of moving paths from the historical moving data; or/and the first and/or second light-emitting diodes are arranged in the light-emitting diode,

and if the corresponding displacement of the single moving path in the visualization path diagram is smaller than a preset distance, deleting the single moving path from the historical moving data.

In some embodiments, said training said path midpoint prediction model through said training set comprises:

inputting the first piece of data in the training set into the path intermediate point prediction model, and outputting a next point predicted by the generation through the path intermediate point prediction model;

comparing the predicted next point with the label in the first piece of data, determining whether a preset convergence condition is met or not according to the comparison result, and if not, adjusting the parameters of the path intermediate point prediction model based on the comparison result;

and performing next round of training on the path intermediate point prediction model after parameter adjustment based on the training set until a preset convergence condition is met, and obtaining a trained path intermediate point prediction model.

In some embodiments, the method further comprises:

acquiring current state data of the object and action data which can be executed by the virtual object;

and performing reinforcement learning on the virtual object based on the state data, the action data and a next node predicted by the path intermediate point prediction model, and performing reinforcement learning on the virtual object in a next round according to rewards obtained by the reinforcement learning until a preset convergence condition is met to obtain a trained virtual object.

In some embodiments, after the controlling the virtual object to move to the destination position, the method further includes:

and in response to the virtual object reaching the route searching target position within a set step number, taking the route searching target position as the new current position of the virtual object, re-determining the route searching target position based on the new current position, and controlling the virtual object to move to the re-determined route searching target position.

and in response to the virtual object not reaching the way-finding target position within the set step number, the current position of the virtual object is acquired again, the way-finding target position is determined again on the basis of the acquired current position, and the virtual object is controlled to move to the determined way-finding target position.

In some embodiments, the path intermediate point prediction model is a fully connected neural network model.

Based on the same concept, the application also provides a path-finding device of the virtual scene, which comprises:

the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring the current position and the target position of a virtual object, and the virtual object is an agent obtained through reinforcement learning;

the determining module is used for inputting the current position and the target position into a path intermediate point prediction model to calculate to obtain an intermediate node position, and the obtained intermediate node position is used as a target in the virtual object reinforcement learning process;

and the control module is used for taking the position of the intermediate node as the next route searching target position of the virtual object and controlling the virtual object to move to the route searching target position.

Based on the same concept, the present application also provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the method as described in any one of the above is implemented.

Based on the same concept, the present application also provides a non-transitory computer-readable storage medium storing computer instructions for causing the computer to implement the method of any one of the above.

As can be seen from the foregoing, a way finding method, device, electronic device, and storage medium for a virtual scene provided in the present application include: acquiring the current position and the target position of a virtual object, wherein the virtual object is an agent obtained through reinforcement learning; inputting the current position and the target position into a path intermediate point prediction model for calculation to obtain an intermediate node position, and taking the obtained intermediate node position as a target in the virtual object reinforcement learning process; and taking the position of the intermediate node as the next route searching target position of the virtual object, and controlling the virtual object to move to the route searching target position. According to the method and the device, the real moving route and the moving operation of the user in the virtual scene are simulated through the intermediate point prediction model learning, the long-distance route searching is divided into the multiple sections of short-distance route searching in a mode of setting the intermediate nodes, route changing during the route searching is strengthened, the automatic route searching in the virtual scene is enabled to be more like the moving route operated by the user, a more normal and anthropomorphic automatic route searching scheme is provided, the route searching efficiency and the route searching success rate are improved, and meanwhile, the user experience is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments or related technologies, the drawings needed to be used in the description of the embodiments or related technologies are briefly introduced below, it is obvious that the drawings in the following description are only the embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flowchart of a way finding method for a virtual scene according to an embodiment of the present disclosure;

fig. 2 is a schematic structural diagram of a path finding device for a virtual scene according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present specification more apparent, the present specification is further described in detail below with reference to the accompanying drawings in combination with specific embodiments.

It should be noted that technical terms or scientific terms used in the embodiments of the present application should have a general meaning as understood by those having ordinary skill in the art to which the present application belongs, unless otherwise defined. The use of "first," "second," and similar terms in the embodiments of the present application do not denote any order, quantity, or importance, but rather the terms are used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that the element, article, or method step that precedes the word comprises, or does not exclude, other elements, articles, or method steps, and the like. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", and the like are used merely to indicate relative positional relationships, and when the absolute position of the object being described is changed, the relative positional relationships may also be changed accordingly.

As described in the background section, taking the concrete scenario of game application as an example, many current routing robots in games are scripts, i.e. written according to certain rules, which is difficult to write rules, and many behavior models seem to be contrary to conventional, undesirable behaviors (i.e. behaviors that are not much like human manipulations). For example, the current goal of the way-finding robot is the player closest to it, but when this player is out of the regular range, it will also run towards the player, which is strange, and the virtual object manipulated by the normal user will not actively run out of the regular range; in addition, a plurality of houses exist in the game scene, goods and materials are arranged in the houses, the virtual objects operated by normal users can go out from a gate or jump out of a window after the goods and materials are taken out, but the path-finding robot can be blocked in a certain corner of a room and can try repeated actions such as jump forward and the like to be eager out, but the virtual objects still are blocked; when encountering a high wall with an eave, a virtual object operated by a normal user can be turned over by a hook lock when being far away from the wall, and the path-finding robot can go to the front of the wall and try to jump upwards to turn over but is blocked by the eave to turn over.

In some existing improvement schemes, a reinforcement learning model is directly used for training a way-finding robot, a game scene is used as an environment, an observed game scene and state are used as states, a predicted series of actions are used as actions, and training is carried out by taking the goal of reaching a predicted target point as a target. The trained path-finding robot can reach a target point as much as possible and also can be far away from the limit of a regular range to a certain extent, but the path-finding robot still has a blocking condition, and the main reason of the blocking is that the path of the path-finding robot reaching the destination is explored by the robot, and the path-finding robot is not easy to block like human behaviors. For example, in a hall, the doors are arranged on the first floor, the windows on the second floor are closed, normal players leave the room from the first floor, but the robot seeking the way may try to jump to the second floor and go out of the windows, but the robot is blocked. Namely, the way-finding method of the existing scheme has low anthropomorphic degree and poor open-loop capability, which finally results in low way-finding efficiency and way-finding success rate.

In combination with the actual situation, the embodiment of the application provides a path searching scheme for a virtual scene, the actual moving route and the moving operation of a user in the virtual scene are simulated through the learning of the intermediate point prediction model, the long-distance path searching is divided into multiple sections of short-distance path searching in a mode of setting the intermediate nodes, and the route change during the path searching is strengthened, so that the automatic path searching in the virtual scene is more like the moving route operated by the user, a more normal and anthropomorphic automatic path searching scheme is provided, the path searching efficiency and the path searching success rate are improved, and the user experience is improved.

As shown in fig. 1, a schematic flow diagram of a way finding method for a virtual scene provided by the present application is provided, where the method specifically includes:

step 101, obtaining a current position and a destination position of a virtual object, wherein the virtual object is an agent obtained through reinforcement learning.

In this step, the virtual object is a virtual character object manipulated by a user or controlled by an artificial intelligence in the virtual scene, the current position information of the virtual object is obtained, and the destination position information of the virtual object is determined. The current position information and the target position information may be coordinate information in a three-dimensional coordinate system established with a virtual scene as a background, or corresponding information which is determined according to a specific application scene and can reflect a specific position. Then, the destination position may be specified by the user when manipulating the virtual object, or may be set for the virtual object through artificial intelligence according to a specific rule, and the determination manner of the destination position may be different according to a specific application scenario.

In a specific embodiment, the current position and the destination position are a starting position and an ending position of the way finding, and the goal of the way finding is to move the virtual object from the starting position to the ending position.

And 102, inputting the current position and the target position into a path intermediate point prediction model to calculate to obtain an intermediate node position, and using the obtained intermediate node position as a target in the virtual object reinforcement learning process.

In this step, an intermediate node position is determined first by the trained path intermediate point prediction model, and the virtual object finally reaches the destination position by reaching the intermediate node positions one by one.

The path intermediate point prediction model may be a relatively conventional neural network model, such as a fully connected neural network model, a convolutional neural network model, or the like. The input dimension is 2 game coordinate dimensions (namely the current position and the destination position), the output dimension is game coordinate dimensions (namely the middle node position), the number and the size of middle neural network layers can be specifically designed according to specific application scenes, a Loss function (Loss) can use L2 Loss (Mean Squared Error, MSE, Mean square Error) and the like, and the training mode is a common supervised learning mode.

The training set of the model may be obtained by obtaining a history of real movement operations of the user in the current virtual scene, and the like, taking a start point and an end point of each path in the real records as inputs of the training set, that is, a current position and a destination position, then segmenting the paths according to a preset length, wherein the preset length is the same as the distance from the middle node position to the current position, taking a first segmented point as an output taking the start point and the end point as the current position and the destination position, taking a second segmented point as an output taking the first segmented point and the end point as the current position and the destination position, and so on. After a large number of history records of movement operations are collected, these history records are taken as a training set. In a specific application scenario, the training set may be obtained in other manners, for example, by specifying or creating a fixed training set by an operator, or directly performing specific setting on each parameter and loss function of the intermediate point prediction model, and so on. Then, in some embodiments, the user's actual history records of the mobile operation may be optimized and filtered, so that the history records may better meet the requirement of routing. And finally, training by taking the collected real path data of the real user as a training set, so that the intermediate point prediction model can plan a path from the starting point to the end point according to the starting point and the end point. Since the dataset is real user data, the planned path is compared like a path a real user would take.

In a specific embodiment, the intermediate point prediction model predicts a next coordinate point, i.e., an intermediate node position, based on the current position and the destination position of the robot, so that the virtual object can move the predicted coordinate point toward it as the current target point.

And 103, taking the position of the intermediate node as the next route searching target position of the virtual object, and controlling the virtual object to move to the route searching target position.

In this step, after the intermediate node position is determined, the virtual object may use the intermediate node position as a current route-seeking target position, where the route-seeking target position is a target point position to which the current virtual object is to move, and is a section of a route planned from the current position to the destination position. In a specific embodiment, after the virtual object moves to the way-finding target position, the calculation of the next way-finding target position can be performed again according to the scheme.

Then, the virtual object is controlled to move to the position of the target to find the way, and the behavioral actions and the like used by the movement of the virtual object can be determined by the neural network model, for example, obtained by training through a reinforcement learning model, and by acquiring scene information (which may include road information, mountain information, house information and the like in the virtual scene) of the current virtual scene and action information that can be used by the virtual object, for example, some virtual characters under the scene can run, jump, cross, use specific props and the like, and some virtual characters under the virtual scene can only walk, squat and the like. And outputting a series of behavior actions, and moving the virtual object to the path finding target position through the behavior actions.

In addition, for the determined way-finding target position, an output operation may be performed for storing, displaying, using, or reworking the way-finding target position. For example, the current position of the routing target is output and displayed in a display manner, so that the user can visually observe the moving target of the virtual object, or the position of the routing target is output to a downstream processing unit for reprocessing, for example, the downstream processing unit generates an indication arrow according to the position of the routing target, attaches the indication arrow to the virtual object and displays the indication arrow, so that the user can visually observe the moving direction of the current virtual object, and the like. That is, the specific output mode for the position of the routing target can be flexibly selected according to different application scenarios and implementation requirements.

For example, for an application scenario in which the method of the present embodiment is executed on a single device, the wayfinding target position may be directly output in a display manner on a display part (a display, a projector, etc.) of the current device, so that an operator of the current device can directly see the content of the wayfinding target position from the display part.

For another example, for an application scenario executed by a system composed of multiple devices according to the method of this embodiment, the location of the route-finding target may be sent to other preset devices serving as receivers in the system, that is, the synchronization terminal, through any data communication manner (such as wired connection, NFC, bluetooth, wifi, cellular mobile network, and the like), so that the synchronization terminal may perform subsequent processing on the location. Optionally, the synchronization terminal may be a preset server, the server is generally disposed at a cloud end, and is used as a data processing and storage center, and the server can store and distribute the route searching target position; the recipients of the distribution are terminal devices, and the holders or operators of the terminal devices may be current users, supervisors of game quality, supervisors of user behaviors (performing cheating behavior review, etc.), and the like.

For another example, for an application scenario executed on a system composed of multiple devices, the method of this embodiment may directly send the route searching target location to a preset terminal device through any data communication manner, where the terminal device may be one or more of the foregoing paragraphs.

As can be seen from the foregoing, a way finding method for a virtual scene in an embodiment of the present application includes: acquiring the current position and the target position of a virtual object, wherein the virtual object is an agent obtained through reinforcement learning; inputting the current position and the target position into a path intermediate point prediction model for calculation to obtain an intermediate node position, and taking the obtained intermediate node position as a target in the virtual object reinforcement learning process; and taking the position of the intermediate node as the next route searching target position of the virtual object, and controlling the virtual object to move to the route searching target position. According to the method and the device, the real moving route and the moving operation of the user in the virtual scene are simulated through the intermediate point prediction model learning, the long-distance route searching is divided into the multiple sections of short-distance route searching in a mode of setting the intermediate nodes, route changing during the route searching is strengthened, the automatic route searching in the virtual scene is enabled to be more like the moving route operated by the user, a more normal and anthropomorphic automatic route searching scheme is provided, the route searching efficiency and the route searching success rate are improved, and meanwhile, the user experience is improved.

It should be noted that the method of the embodiment of the present application may be executed by a single device, such as a computer or a server. The method of the embodiment of the application can also be applied to a distributed scene and is completed by the mutual cooperation of a plurality of devices. In such a distributed scenario, one of the multiple devices may only perform one or more steps of the method of the embodiment, and the multiple devices interact with each other to complete the method.

It should be noted that the above-mentioned description describes specific embodiments of the present application. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments described above and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

In an optional exemplary embodiment, before the step of calculating the current position and the destination position by using the intermediate point prediction model of the input path to obtain an intermediate node position, the method further includes: determining a virtual scene where the virtual object is currently located, acquiring at least one piece of historical movement data of the virtual object controlled by a user under the virtual scene, and generating a training set according to the historical movement data, wherein each piece of data of the training data set comprises a starting point and an end point corresponding to each movement step, and a label formed by a next point of the starting point, and the next point is a point on a path formed by the starting point and the end point.

In this embodiment, a training set of a path intermediate point prediction model is generated by collecting movement operation data of a real user in the virtual scene and preprocessing the movement operation data. Taking a game scene as an example, in some interactive and confrontational games, a plurality of map virtual scenes are generally set, and when a game is played, one of the map virtual scenes is randomly or according to the designation of a user, and a plurality of users interact or confront in the map virtual scene. Therefore, after the game is online, records of a plurality of users during game playing can be recorded aiming at one virtual scene, and coordinate information (namely historical movement data) of the users during game playing is mainly extracted from the records. The motion trail of the user can be known through the coordinate information, and the automatic path-finding imitation user motion trail can be enabled to achieve more anthropomorphic movement according to the motion trail of the user. Then, the motion trail of the user during game playing is generally not all the way-seeking, and may be interacted or confronted with other users, or may be self-stationary in place or reciprocating movement or in-place circling in a small range in some places, and the data have no reference value for the way-seeking and need to be eliminated. This requires a screening of the data, which may be a washing of the data. When the historical movement data is preprocessed, a corresponding targeted preprocessing process can be executed according to a specific application scene.

In an alternative exemplary embodiment, the generating a training set from the historical movement data includes: generating a visualization path graph according to the historical mobile book; cleaning the historical movement data according to the visual path diagram; and generating a training set according to the cleaned historical movement data. In another optional exemplary embodiment, the generating a visualization pathway map from the historical movement data includes: displaying the moving paths of different players by using different styles, and forming the visual path graph by the moving paths. In another alternative exemplary embodiment, the cleansing the historical movement data includes: if the two different types of moving paths in the visualized path diagram have aggregation conditions, deleting historical moving data corresponding to aggregation areas of the two different types of moving paths from the historical moving data; or/and if the corresponding displacement of the single moving path in the visualization path diagram is smaller than a preset distance, deleting the single moving path from the historical moving data.

In an optional exemplary embodiment, the training the path intermediate point prediction model through the training set specifically includes: inputting the first piece of data in the training set into the path intermediate point prediction model, and outputting a next point predicted by a generation through the path intermediate point prediction model; comparing the predicted next point with the label in the first piece of data, determining whether a preset convergence condition is met or not according to the comparison result, and if not, adjusting the parameters of the path intermediate point prediction model based on the comparison result; and performing next round of training on the path intermediate point prediction model after the parameter adjustment based on the training set until a preset convergence condition is met, and obtaining a trained path intermediate point prediction model. In these embodiments, the operation duration corresponds to duration information of the virtual character operated by the user, in some application scenarios, the user may only operate the virtual character for a short time, for example, the user aborts north or "dies" and exits the virtual scene within a short time after entering the virtual scene, and then the reference value of the record of the short operation time to the route searching itself is low, and the historical movement data can be deleted and the like. The moving distance corresponds to the moving distance of the virtual character operated by the user in the virtual scene, in some application scenes, the user may only operate the virtual character to move for a short distance or does not perform moving operation at all, and then the reference value of records with excessively short moving distances of the operation on the routing is also low, and the historical moving data can be deleted and the like. The movement mode corresponds to a mode in which a user manipulates a virtual character to move in a virtual scene, and some users move only within a specific range or make a turn or the like after entering the virtual scene, for example, after entering the virtual scene, the user manipulates the virtual character to move in a house or make a turn in place, and the like. The records of the operation modes which are contrary to the purpose of the routing are also low in reference value of the routing, and the historical movement data can be deleted and the like. The above case is where the deletion operation, i.e., the filtering work, is performed on the entire historical movement data. And then, cleaning each piece of historical movement data to enable the presented data to reflect more ways or movements.

In the embodiments, since the user does not only perform the moving or path-finding operation after entering the virtual scene, and more importantly, the virtual objects are operated to fight against or interact with the virtual objects manipulated by other users in the virtual scene, but the data of the fighting or interaction operation itself provides a low reference value for path-finding, so that the data can be eliminated from the historical moving data. And judging whether the user is confronted or interacted with, and judging whether the confrontation or the interaction possibly occurs between the historical movement data of one user and the historical movement data of other users in the same scene according to whether the distance between the historical movement data of the user and the historical movement data of other users in the scene reaches a certain threshold range, so that the mobile data is deleted from the whole data. Certainly, in such a determination process, some non-antagonistic or interactive data may be deleted by mistake, but since the data amount itself is sufficient, deleting part of the non-antagonistic data does not cause too much influence on the training set. Later, when the user performs activities in the whole virtual scene, the user can perform more or less certain stops, hiding and even perform some unnecessary movements (such as pivot rotation and the like) according to the operation habit of the user. And then randomly intercepting the historical movement data for a set time by setting a time window, and judging whether the operation (movement distance and/or movement mode) of the user in the intercepted data accords with a preset movement rule or not, if so, deleting the corresponding movement data in the historical movement data. The preset moving rule is a preset corresponding judgment standard for staying, hiding and the like, and can be specifically set according to a specific application scene.

In an optional exemplary embodiment, the generating the training set includes: selecting a continuous motion track of the historical movement data, and determining a starting point and an end point of the motion track; starting from the starting point, selecting an intermediate point on the motion track according to a set distance; and taking the starting point and the end point as the input of the path intermediate point prediction model, taking the intermediate point as the output of the path intermediate point prediction model, and taking the input and the output of the path intermediate point prediction model as a training set. In another alternative exemplary embodiment, the input of the path intermediate point prediction model is a feature and the output of the path intermediate point prediction model is a label.

In this embodiment, in order to accurately generate the training set, after the work of screening the historical mobile data and the like is completed, the input and output of each piece of data required by the training set can be obtained; wherein the screening may be cleaning historical movement data. The input is the starting point and the end point of each motion track in the historical movement data, and the historical movement data is cleaned, so that the originally complete and continuous movement track can be divided into a plurality of sections, and a plurality of motion tracks can be contained in one piece of historical movement data. And outputting an intermediate point with a set distance from the starting point on the motion track, wherein the set distance is adapted to the distance from the starting point to the intermediate point finally generated by the model in a specific application scene. Then, in order to maximize the value, after the selection of the first intermediate point is completed, the intermediate point can be continuously used as a new starting point as an input, and one intermediate point can be determined again as an output. Thereby generating a plurality of training data on one motion trajectory. That is, in an alternative exemplary embodiment, after selecting an intermediate point on the motion trajectory according to a set distance, the method further includes; and deleting the track from the starting point to the intermediate point on the motion track, and generating a new motion track to be added into the historical movement data. In a specific embodiment, a starting point and an end point are recorded for one movement track of a single user from the filtered historical movement data, then a point with a certain distance from the starting point is used as a next point of the starting point, then a point with a certain distance from the next point is used as a next point, and so on, a series of coordinate points can be obtained. And taking the former point and the end point as input, namely features, and taking the latter point as output, namely labels, to construct a training set. In an optional exemplary embodiment, the training the path intermediate point prediction model through the training set further includes: acquiring current state data of the virtual object and action data which can be executed by the virtual object; and performing reinforcement learning on the virtual object based on the state data, the action data and a next node predicted by the path intermediate point prediction model, and performing next round of reinforcement learning on the virtual object according to rewards obtained by the reinforcement learning until a preset convergence condition is met to obtain the trained virtual object.

In this embodiment, by acquiring current state data of the virtual object and motion data that can be executed by the virtual object, the motion that the virtual object needs to execute is output through the motion-enhanced learning model. (ii) a The state data may include environmental information and virtual object current location information; further, the environmental information may include road information, mountain information, house information, and the like in the virtual scene; the action data is all actions that the virtual character can execute, in some scenes, the virtual character can run, jump, cross, use a specific prop and the like, and the virtual character can only walk, squat and the like in some virtual scenes. The motion reinforcement learning model is obtained by performing corresponding training on a general reinforcement learning model. In a specific application scenario, the embodiment is mainly to train a basic reinforcement robot (the robot is an agent in the reinforcement learning concept and is a basic concept of the reinforcement learning model) through reinforcement learning, and the construction of the reinforcement robot needs to be designed according to a specific scenario task. For example, in a tactical competition type sandbox game, if a robot with a path-finding function is to be trained, a map scene of a real game is used as an environment of the agent, some game information for path-finding is used as a state (for example, current position information, surrounding environment information, target point information, and the like), and a preset action (action which the robot is expected to perform, such as jumping, hooking, running, and the like) is used as an action which the robot can perform. Since the objective is to reach the destination as much as possible, the reward function (which is the basic concept of the reinforcement learning model) can be designed in relation to reaching the destination, for example giving a reward to reaching the destination within a certain number of steps, without giving a penalty to reaching the destination. Other rewards may also be added, such as adding a penalty function (which is a basic concept of a reinforcement learning model) to enter a particular rule area so that the bot may learn not to enter a particular rule area, and so forth. The aim of this step is to train a robot which, given a target point, can reach this target point as far as possible by a series of actions, without the action pattern being too mechanized. That is, in some alternative exemplary embodiments, the performing the calculation of the motion-reinforced learning model includes: and establishing a reward function and/or a penalty function of the action reinforcement learning model according to the scene data, the action data and the path finding target position so as to calculate the action reinforcement learning model.

In an optional exemplary embodiment, after the controlling the virtual object to move to the routing target position, the method further includes: and in response to the virtual object reaching the way searching target position within a set step number, taking the way searching target position as the new current position of the virtual object, re-determining the way searching target position on the basis of the new current position, and controlling the virtual object to move to the re-determined way searching target position.

Or, in an optional exemplary embodiment, after the controlling the virtual object to move to the routing target position, the method further includes: in response to the virtual object not reaching the way-finding target position within a set number of steps, the current position of the virtual object is obtained again, the way-finding target position is determined again on the basis of the obtained current position, and the virtual object is controlled to move to the determined way-finding target position.

In the above two embodiments, when the virtual object moves, there are two general cases, that is, the virtual object reaches the seek target position or does not reach the seek target position within a set number of steps, the set number of steps may be set in advance, or may be calculated according to a specific seek length, or the like. Under the situation that the position of the route searching target is reached, the scheme can be continuously executed by taking the current position as a starting point, and the position of the next route searching target is calculated. And under the condition that the path-finding target position is not reached, the current movement is stopped in time, the next path-finding target position is recalculated, and the path is newly planned, so that the movement is more anthropomorphic. Finally, the virtual object is brought to the final destination position.

In an alternative exemplary embodiment, the path intermediate point prediction model is a fully connected neural network model.

Wherein the fully-connected neural network model is essentially a single switch connecting all inputs and outputs. The method has the characteristics of high throughput, high reliability and low time delay.

Based on the same concept, the application also provides a path finding device of the virtual scene corresponding to the method of any embodiment.

Referring to fig. 2, the path finding apparatus for a virtual scene includes:

an obtaining module 210, configured to obtain a current position and a destination position of a virtual object, where the virtual object is an agent obtained through reinforcement learning;

the determining module 220 is configured to input the current position and the destination position into a path intermediate point prediction model to perform calculation to obtain an intermediate node position, and use the obtained intermediate node position as a target in the virtual object reinforcement learning process;

a control module 230, configured to take the intermediate node position as a next route searching target position of the virtual object, and control the virtual object to move to the route searching target position.

For convenience of description, the above devices are described as being divided into various modules by functions, which are described separately. Of course, the functions of the modules may be implemented in the same or multiple software and/or hardware when implementing the embodiments of the present application.

The device in the foregoing embodiment is used to implement the path finding method for the corresponding virtual scene in the foregoing embodiment, and has the beneficial effects of the embodiment of the path finding method for the corresponding virtual scene, which are not described herein again.

In an optional exemplary embodiment, the determining module 220 is further configured to:

the generating a training set from the historical movement data comprises:

generating a visualization path graph according to the historical movement data;

cleaning the historical movement data according to the visual path diagram;

and generating a training set according to the washed historical movement data.

if the two different styles of moving paths in the visualization path diagram have aggregation conditions, deleting the historical moving data corresponding to the aggregation areas of the two different styles of moving paths from the historical moving data; or/and (c) the first and/or second,

In an alternative exemplary embodiment, the control module 220 is further configured to:

acquiring current state data of the virtual object and action data which can be executed by the virtual object;

and performing reinforcement learning on the virtual object based on the scene state data, the action data and the next node predicted by the path intermediate point prediction model, and performing the next round of reinforcement learning on the virtual object according to rewards obtained by the reinforcement learning until a preset convergence condition is met to obtain the trained virtual object.

In an alternative exemplary embodiment, the control module 230 is further configured to:

and in response to the virtual object reaching the way searching target position within a set step number, taking the way searching target position as the new current position of the virtual object, re-determining the way searching target position on the basis of the new current position, and controlling the virtual object to move to the re-determined way searching target position.

in response to the virtual object not reaching the way-finding target position within a set number of steps, the current position of the virtual object is obtained again, the way-finding target position is determined again on the basis of the obtained current position, and the virtual object is controlled to move to the determined way-finding target position.

Based on the same concept, corresponding to the method of any embodiment, the present application further provides an electronic device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the program, the method for routing a virtual scene according to any embodiment is implemented.

Fig. 3 is a schematic diagram illustrating a more specific hardware structure of an electronic device according to this embodiment, where the device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein the processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 are communicatively coupled to each other within the device via bus 1050.

The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solutions provided in the embodiments of the present disclosure.

The Memory 1020 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random Access Memory), a static Memory device, a dynamic Memory device, or the like. The memory 1020 may store an operating system and other application programs, and when the technical solutions provided by the embodiments of the present specification are implemented by software or firmware, the relevant program codes are stored in the memory 1020 and called by the processor 1010 for execution.

The input/output interface 1030 is used for connecting an input/output module to input and output information. The i/o module may be configured as a component within the device (not shown) or may be external to the device to provide corresponding functionality. Wherein the input devices may include a keyboard, mouse, touch screen, microphone, various sensors, etc., and the output devices may include a display, speaker, vibrator, indicator light, etc.

The communication interface 1040 is used for connecting a communication module (not shown in the drawings) to implement communication interaction between the present device and other devices. The communication module can realize communication in a wired mode (for example, USB, network cable, etc.), and can also realize communication in a wireless mode (for example, mobile network, WIFI, bluetooth, etc.).

The bus 1050 includes a path to transfer information between various components of the device, such as the processor 1010, memory 1020, input/output interface 1030, and communication interface 1040.

It should be noted that although the above-mentioned device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, in a specific implementation, the device may also include other components necessary for normal operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only those components necessary to implement the embodiments of the present description, and not necessarily all of the components shown in the figures.

The electronic device of the foregoing embodiment is used to implement the path finding method for the corresponding virtual scene in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.

Based on the same concept, corresponding to any of the above embodiments, the present application further provides a non-transitory computer-readable storage medium storing computer instructions for causing the computer to execute the way-finding method for a virtual scene according to any of the above embodiments.

Computer-readable media, including both permanent and non-permanent, removable and non-removable media, for storing information may be implemented in any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device.

The computer instructions stored in the storage medium of the foregoing embodiment are used to enable the computer to execute the path finding method for the virtual scene according to any of the foregoing embodiments, and have the beneficial effects of corresponding method embodiments, which are not described herein again.

Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the context of the present application, features from the above embodiments or from different embodiments may also be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the embodiments of the present application as described above, which are not provided in detail for the sake of brevity.

In addition, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown within the provided figures for simplicity of illustration and discussion, and so as not to obscure the embodiments of the application. Further, devices may be shown in block diagram form in order to avoid obscuring embodiments of the application, and this also takes into account the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the embodiments of the application are to be implemented (i.e., specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the application, it should be apparent to one skilled in the art that the embodiments of the application can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative instead of restrictive.

While the present application has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic ram (dram)) may use the embodiments discussed.

The present embodiments are intended to embrace all such alternatives, modifications and variances which fall within the broad scope of the appended claims. Therefore, any omissions, modifications, equivalents, improvements, and the like that may be made without departing from the spirit or scope of the embodiments of the present application are intended to be included within the scope of the claims.

Claims

1. A path finding method for a virtual scene is characterized by comprising the following steps:

2. The method of claim 1, further comprising:

generating a training set according to the historical movement data, wherein each piece of data of the training data set comprises a starting point and an end point corresponding to each movement step, and a label formed by a next point of the starting point, and the next point is a point on a path formed by the starting point and the end point;

3. The method of claim 2, wherein generating a training set from the historical movement data comprises:

generating a visual path diagram according to the historical movement data;

cleaning the historical movement data according to the visual path diagram;

and generating a training set according to the washed historical movement data.

4. The method of claim 3, wherein generating a visualization pathway map from the historical movement data comprises:

cleaning the historical movement data according to the visual path diagram, comprising:

if the two different types of moving paths in the visualized path diagram have aggregation conditions, deleting historical moving data corresponding to aggregation areas of the two different types of moving paths from the historical moving data; or/and (c) the first and/or second,

5. The method of claim 2, wherein training the path midpoint prediction model through the training set comprises:

inputting the first piece of data in the training set into the path intermediate point prediction model, and outputting a next point predicted by a generation through the path intermediate point prediction model;

and performing next round of training on the path intermediate point prediction model after the parameter adjustment based on the training set until a preset convergence condition is met, and obtaining a trained path intermediate point prediction model.

6. The method of claim 5, further comprising:

7. The method of claim 1, wherein after controlling the virtual object to move to the seek target position, further comprising:

8. The method of claim 1, wherein after controlling the virtual object to move to the seek target position, further comprising:

9. The method of claim 1, wherein the path intermediate point prediction model is a fully connected neural network model.

10. A route finding device for a virtual scene, comprising:

the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring the current position and the target position of a virtual object, and the virtual object is an agent obtained through reinforcement learning;

11. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 9 when executing the program.

12. A non-transitory computer-readable storage medium storing computer instructions for causing a computer to implement the method of any one of claims 1 to 9.