CN111028317B

CN111028317B - Animation generation method, device and equipment for virtual object and storage medium

Info

Publication number: CN111028317B
Application number: CN201911132565.4A
Authority: CN
Inventors: 周志强; 曾子骄
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-11-14
Filing date: 2019-11-14
Publication date: 2021-01-01
Anticipated expiration: 2039-11-14
Also published as: CN111028317A

Abstract

The embodiment of the application provides an animation generation method, device and equipment of a virtual object and a storage medium, and relates to the field of animation production. The method comprises the following steps: acquiring a reference action of a reference virtual object corresponding to a target virtual object; generating an action sequence of the target virtual object according to the reference action and the environment information corresponding to the target virtual object; the environment information is used for representing the virtual environment where the target virtual object is located; and generating the motion animation of the target virtual object according to the motion sequence of the target virtual object. According to the technical scheme provided by the embodiment of the application, the reference action of the reference virtual object is redirected to the target virtual object, the technical scheme for automatically generating the action animation of the target virtual object is provided, and the animation generation efficiency is improved.

Description

Animation generation method, device and equipment for virtual object and storage medium

Technical Field

The embodiment of the application relates to the field of animation production, in particular to an animation generation method, device, equipment and storage medium for a virtual object.

Background

In game development, three-dimensional virtual objects are often involved, and sometimes three-dimensional animation is required for the virtual objects.

In the related art, a virtual object has bones with joints between adjacent bones, and the animation of the virtual object is actually an animation of the bones. In the process of generating an animation of a virtual object, an animator needs to manually drag a skeleton of the virtual object and slightly correct the motion of the skeleton, thereby generating the animation of the virtual object.

In the above technology, the skeleton of the virtual object needs to be manually dragged, and the motion of the skeleton is modified little by little to generate the animation, so the process is complicated and time-consuming.

Disclosure of Invention

The embodiment of the application provides a method, a device, equipment and a storage medium for generating an animation of a virtual object, which can be used for solving the technical problem that the generation of the animation of the virtual object is tedious and time-consuming in the related technology.

The technical scheme is as follows:

in one aspect, an embodiment of the present application provides an animation generation method for a virtual object, where the method includes:

acquiring a reference action of a reference virtual object corresponding to a target virtual object;

generating an action sequence of the target virtual object according to the reference action and the environment information corresponding to the target virtual object; wherein the environment information is used for representing the virtual environment in which the target virtual object is located;

and generating the motion animation of the target virtual object according to the motion sequence of the target virtual object.

In another aspect, an embodiment of the present application provides an animation display method for a virtual object, where the method includes:

displaying a user interface;

displaying a virtual scene and a target virtual object located in the virtual scene in the user interface;

and displaying the action animation of the target virtual object, wherein the action animation of the target virtual object is generated according to the reference action of the reference virtual object and the environment information corresponding to the target virtual object, and the environment information is used for representing the virtual environment where the target virtual object is located.

In another aspect, an embodiment of the present application provides an animation generation apparatus for a virtual object, where the apparatus includes:

the action acquisition module is used for acquiring a reference action of a reference virtual object corresponding to the target virtual object;

the action generation module is used for generating an action sequence of the target virtual object according to the reference action and the environment information corresponding to the target virtual object; wherein the environment information is used for representing the virtual environment in which the target virtual object is located;

and the animation generation module is used for generating the action animation of the target virtual object according to the action sequence of the target virtual object.

In another aspect, an embodiment of the present application provides an apparatus for displaying an animation of a virtual object, where the apparatus includes:

the interface display module is used for displaying a user interface;

the scene display module is used for displaying a virtual scene and a target virtual object in the virtual scene in the user interface;

and the animation display module is used for displaying the action animation of the target virtual object, and the action animation of the target virtual object is generated according to the reference action of the reference virtual object and the environment information corresponding to the target virtual object, wherein the environment information is used for representing the virtual environment where the target virtual object is located.

In a further aspect, the present application provides a computer device, where the computer device includes a processor and a memory, where the memory stores a computer program, and the computer program is loaded and executed by the processor to implement the animation generation method for a virtual object described above.

In a further aspect, an embodiment of the present application provides a terminal, where the terminal includes a processor and a memory, where the memory stores a computer program, and the computer program is loaded and executed by the processor to implement the animation display method for the virtual object.

In a further aspect, an embodiment of the present application provides a computer-readable storage medium, where a computer program is stored in the storage medium, and the computer program is loaded and executed by a processor to implement the animation generation method for a virtual object described above.

In a further aspect, an embodiment of the present application provides a computer-readable storage medium, where a computer program is stored in the storage medium, and the computer program is loaded and executed by a processor to implement the animation display method for a virtual object described above.

In a further aspect, the present application provides a computer program product, which is used to implement the animation generation method for the virtual object when the computer program product is executed by a processor of a computer device.

In a further aspect, the present application provides a computer program product, which is used to implement the animation display method for the virtual object when the computer program product is executed by a processor of a terminal.

The technical scheme provided by the embodiment of the application can have the following beneficial effects:

the method comprises the steps of obtaining a reference action of a reference virtual object, generating an action sequence of the target virtual object according to the reference action and the environment information of the target virtual object, and generating an action animation of the target virtual object according to the action sequence of the target virtual object, so that the reference action of the reference virtual object is redirected to the target virtual object, the technical scheme for automatically generating the action animation of the target virtual object is provided, and the animation generation efficiency is improved.

In addition, the action sequence is generated according to the environment information, so that the target virtual object can interact with the virtual environment in the generated animation, and the reality of the animation is enhanced.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 illustrates a schematic diagram of a reinforcement learning model;

FIG. 2 is a diagram illustrating an application scenario of a method for animation display of a virtual object according to an embodiment of the present application;

FIG. 3 is a diagram illustrating an application scenario of a method for animation display of a virtual object according to another embodiment of the present application;

FIG. 4 is a flow diagram illustrating a method for animation generation of a virtual object according to an embodiment of the present application;

FIG. 5 is a flow chart illustrating a method for animation generation of a virtual object according to another embodiment of the present application;

FIG. 6 illustrates a schematic structural diagram of a policy network provided by an embodiment of the present application;

FIG. 7 is a schematic diagram illustrating the architecture of a judgment network provided by one embodiment of the present application;

FIG. 8 is a diagram illustrating a method for animation generation of a virtual object according to an embodiment of the present application;

FIG. 9 is a block diagram of an animation generation apparatus for a virtual object according to an embodiment of the present application;

FIG. 10 is a block diagram of an animation generation apparatus for a virtual object according to another embodiment of the present application;

FIG. 11 is a block diagram illustrating a computer device provided by an embodiment of the present application;

fig. 12 shows a block diagram of a terminal according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of methods consistent with aspects of the present application, as detailed in the appended claims.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and teaching learning.

With the research and progress of artificial intelligence technology, the artificial intelligence technology is developed and applied in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical care, smart customer service, and the like.

The scheme provided by the embodiment of the application relates to the machine learning technology of artificial intelligence, and is specifically explained by the following embodiment.

First, terms related to embodiments of the present application will be briefly described.

1. Reinforced learning

Reinforcement learning, also known as refit learning, evaluation learning or reinforcement learning, is one of the paradigms and methodologies of machine learning, and is used to describe and solve the problem that an agent achieves maximum return or achieves a specific goal through learning strategies in the process of interacting with the environment.

Referring to FIG. 1, a diagram of a reinforcement learning model is illustrated. As shown in FIG. 1, in the reinforcement learning model 100, the agent 110 generates an action to interact with the environment 120, calculates the reward of the interaction and the state of the agent 110 after the interaction is completed, inputs the reward and the state into the agent 110, and the agent 110 generates an action according to the state and interacts with the environment 120. Looping through the above described manner results in multiple states of the agent 110 and corresponding rewards, and the agent 110 tends to generate actions with higher rewards, so that the agent 110 can learn a better action strategy through the reinforcement learning model 100.

A policy network is a network that receives a specific input and gives a certain output by learning.

The evaluation network, also called an evaluator or a numerical network, is used to calculate an expected value of the cumulative score of the current state, and the calculated expected value is input into the policy network.

The fully connected layer is a neural network layer in which each node is connected with all nodes of the previous layer, and is used for integrating the extracted features of the previous layer.

2. Skeleton animation

Skeletal animation is one of model animation, and there are currently two ways of model animation: vertex animation and skeleton animation. In skeletal animation, a model has a skeletal structure of interconnected "bones," and animation is generated for the model by changing the orientation and position of the bones.

3. Physical engine

The physical engine is an engine for simulating physical laws by a computer program, and can be used for predicting the effects under different conditions by using variables such as mass, speed, friction force, air resistance and the like. It is mainly used in computing physics, electronic games and computer animation.

4. Quaternion

Quaternions are simple supercomplexes. Complex numbers consist of real numbers plus an imaginary unit i, where i is-1. Similarly, quaternions are all made up of real numbers plus three imaginary units i, j, k, and they have the following relationship: i.e. i²＝j²＝k²＝-1，i⁰＝j⁰＝k⁰Each quaternion is a linear combination of 1, i, j and k, i.e., a quaternion can be generally expressed as a + bi + cj + dk, where a, b, c, d are real numbers. The geometric meaning of i, j, k itself is understood to be aAnd rotation, wherein i rotation represents the rotation of the X axis in the X axis and Y axis intersection plane in the positive direction of the X axis to the positive direction of the Y axis, j rotation represents the rotation of the Z axis in the positive direction of the X axis in the Z axis and X axis intersection plane, k rotation represents the rotation of the Y axis in the positive direction of the Z axis in the Y axis and Z axis intersection plane, and-i, -j and-k represent the reverse rotation of i, j and k rotation respectively.

5. Kinematics

Kinematics is a branch of mechanics, and describes the motion of an object specifically, that is, the position of the object in space changes with time, and does not consider the factors of acting force or quality and the like influencing the motion at all.

6. Dynamics of

Dynamics is a branch of theoretical mechanics, which mainly studies the relationship between forces acting on an object and the motion of the object.

7. AR (Augmented Reality)

The AR technology is a technology for skillfully fusing virtual information and a real world, and a plurality of technical means such as multimedia, three-dimensional modeling, real-time tracking and registration, intelligent interaction, sensing and the like are widely applied, and virtual information such as characters, images, three-dimensional models, music, videos and the like generated by a computer is applied to the real world after analog simulation, and the two kinds of information complement each other, so that the 'enhancement' of the real world is realized.

8. MR (Mixed Reality )

MR refers to the creation of new environments and visualizations in conjunction with real and virtual worlds, where physical entities and digital objects coexist and interact in real time to simulate real objects. MR technology mixes reality, augmented virtual and virtual reality technologies.

9. PD control

PD (proportional and Derivative) control is one of the simplest control methods. The input signal controlled by P (Proportion) proportionally reflects the output signal, and is used for adjusting the open loop gain of the system, improving the steady-state precision of the system, reducing the inertia of the system and accelerating the response speed. The D (Derivative) control is used to reflect the variation trend of the input signal and generate an effective early correction signal to increase the damping degree of the system, thereby improving the stability of the system.

In the method provided by the embodiment of the present application, the execution subject of each step may be a Computer device, where the Computer device refers to an electronic device with data calculation, processing, and storage capabilities, and the Computer device may be a terminal such as a PC (Personal Computer), a mobile phone, an intelligent robot, a tablet Computer, a multimedia player, or a server.

Referring to fig. 2, a schematic diagram of an application scenario of the animation display method for a virtual object according to an embodiment of the present application is shown. As shown in fig. 2, in a game interface of a certain LBS (Location Based Service) game, video information of a real environment is collected, and by applying the animation generation method for a virtual object provided in the embodiment of the present application, an animation of the virtual object 210 is generated in combination with a desktop condition of the desk 220, so as to display an AR animation of the virtual object 210 on the desk 220.

Referring to fig. 3, a schematic diagram of an application scenario of a method for displaying an animation of a virtual object according to another embodiment of the present application is shown. As shown in fig. 3, when a video is relayed, a video is captured based on a real scene 310, and then an animation of a virtual object 320 generated by the method provided by the embodiment of the present application is added to the video captured based on the real scene 310. Thus, in the relayed video, an animation of the virtual object 320 may be displayed in the real scene 310.

In some possible implementation manners, the technical scheme provided by the application can also be applied to training of an intelligent robot in reality.

In some possible implementations, the technical solutions provided in the present application can also be applied to MR technology.

The technical solution of the present application will be described below by means of several embodiments.

Referring to fig. 4, a flowchart of an animation generation method for a virtual object according to an embodiment of the present application is shown. In the present embodiment, the method is mainly exemplified by being applied to the computer device described above. The method may include the steps of:

step 401, obtaining a reference action of a reference virtual object corresponding to the target virtual object.

The virtual object can be a character in a game, wherein the game can run in electronic equipment such as a mobile phone, a tablet computer, a game host, an electronic reader, a multimedia player, wearable equipment and a PC; the virtual object may also be a character in a video such as a cartoon, movie, or television show. The target virtual object is a virtual object to be animated.

The virtual object may be in the form of a character, an animal, a cartoon, or other forms, which are not limited in this application. The virtual object may be displayed in a three-dimensional form or a two-dimensional form, which is not limited in the embodiment of the present application.

In the embodiment of the present application, the virtual object has a skeleton, which is a structure composed of interconnected bones. The motion of the virtual object may be generated by changing the position and posture of the skeleton, and the animation of the virtual object may be generated by a series of motions of the virtual object being consecutive.

The reference virtual object refers to a virtual object that has been animated, and the motion of the reference virtual object is referred to as a reference motion, which may be represented by kinematics. Optionally, the target virtual object and the reference virtual object are two virtual objects with the same or similar skeletons.

The skeletons are the same, and the sizes, the shapes and the connection relations among all skeletons in the skeletons of the target virtual object and the reference virtual object can be expressed to be completely the same; the skeleton is similar, including but not limited to at least one of the following: the bones in the skeletons of the target virtual object and the reference virtual object are similar in size, shape and connection relation.

Step 402, generating an action sequence of the target virtual object according to the reference action and the environment information corresponding to the target virtual object.

The environment information is used for representing the virtual environment where the target virtual object is located. The virtual environment may be a physical environment simulated by a physics engine in which the virtual objects follow dynamics such that the motion of the virtual objects approximates real-world conditions.

The virtual environment may be a scene displayed (or provided) by the physical engine when running in the terminal, and the virtual environment refers to a scene created for the virtual object to perform an activity (such as a game competition). The virtual environment may be a simulation environment of a real world, a semi-simulation semi-fictional environment, or a pure fictional environment. The virtual environment may be a two-dimensional virtual environment, a 2.5-dimensional virtual environment, or a three-dimensional virtual environment, which is not limited in this embodiment of the present application.

Optionally, when the virtual environment is a three-dimensional virtual environment, the virtual object is a three-dimensional stereo model created based on skeletal animation techniques. Each virtual object has its own shape and volume in the three-dimensional virtual environment, occupying a portion of the space in the three-dimensional virtual environment.

In some possible embodiments, the action information of the target virtual object is generated according to the state information of the target virtual object at the first time stamp; then, driving the skeleton of the target virtual object according to the motion information to make a motion; according to the state information of the virtual object at the first time stamp, the action completion condition and the environment information, the state information of the target virtual object at the second time stamp can be obtained through calculation; regenerating the next action information; wherein the action information is generated from the state information and the reference action. The change in state of the target virtual object relative to the previous timestamp represents an action of the target virtual object. By performing the above-described cycle for a plurality of times, a series of continuous motions of the target virtual object can be obtained, thereby generating a motion sequence of the target virtual object.

The state information is used for representing the physical state of the target virtual object, and the action information is used for representing the action executed by the target virtual object.

Step 403, generating a motion animation of the target virtual object according to the motion sequence of the target virtual object.

Optionally, the motion sequence of the target virtual object is used for representing the continuous motion of the target virtual object. The motion animation of the target virtual object can be obtained by connecting the motions of the target virtual object.

Alternatively, if the target virtual object is a three-dimensional virtual object, the generated animation is a three-dimensional animation.

In some possible embodiments, step 403 may include the following sub-steps:

1. obtaining an animation frame of the target virtual object according to the action sequence of the target virtual object;

2. and generating motion animation of the target virtual object based on each animation frame.

In some possible embodiments, the animation is composed of taking and playing pictures frame by frame, each frame is a static picture, and a series of continuous animation frames can be generated according to a series of continuous action sequences, so as to generate the action animation of the target virtual object.

In summary, in the technical solution provided in the embodiment of the present application, by obtaining the reference motion of the reference virtual object, generating the motion sequence of the target virtual object according to the reference motion and the environment information of the target virtual object, and then generating the motion animation of the target virtual object according to the motion sequence of the target virtual object, the reference motion of the reference virtual object is redirected to the target virtual object, a technical solution for automatically generating the motion animation of the target virtual object is provided, and the efficiency of generating the animation is improved.

Referring to fig. 5, a flowchart of an animation generation method for a virtual object according to another embodiment of the present application is shown. In the present embodiment, the method is mainly exemplified by being applied to the computer device described above. The method may include the steps of:

step 501, obtaining a reference action of a reference virtual object corresponding to a target virtual object.

This step is the same as or similar to the step 401 in the embodiment of fig. 4, and is not described here again.

Step 502, training a reinforcement learning model by using a reference action.

Optionally, the reinforcement learning model includes a policy network.

The strategy network in the reinforcement learning model can generate action information of a target virtual object, the reference action is an action in the manufactured animation, the reinforcement learning model is trained by comparing the action of the target virtual object with the reference action, the target virtual object learns the reference action, and the reinforcement learning model tends to output the action information which is more similar to the reference action.

Please refer to fig. 6, which illustrates a schematic structural diagram of a policy network according to an embodiment of the present application. As shown in fig. 6, the input of the policy network is the state information of the target virtual object, the output is the action information of the target virtual object, and a hidden layer is further included between the input layer and the output layer of the policy network.

Optionally, the hidden layer may include a layer of neural network, or may include multiple layers of neural networks, which are specifically set by a relevant technician according to an actual situation, and this is not limited in the embodiment of the present application.

Alternatively, the neural network layer in the hidden layer may be a fully connected layer.

Illustratively, in connection with FIG. 6, the policy network includes an input layer, a hidden layer, and an output layer. Two layers of neural networks may be included in the hidden layer, wherein a first neural network layer may include 1024 neurons and a second neural network layer may include 512 neurons.

Alternatively, when the neural network includes two or more neural network layers, the activation function between the neural network layers is a ReLU (Rectified Linear Unit) function. The output of a neuron using the ReLU function can be expressed by the following equation one:

the formula I is as follows:

f(x)＝max(0，W^Tx+b)

wherein x represents the vector of the input neuron of the upper neural network layer, W^TRepresenting a constant transformation matrix, b representing a constant vector, f (x) representing the output of the neuron.

In some possible embodiments, the reinforcement learning model further includes a judgment network, where the judgment network is used to generate a return value corresponding to the state information of the target virtual object, and the judgment network is obtained through training. Please refer to fig. 7, which illustrates a schematic structural diagram of a judgment network according to an embodiment of the present application. As shown in fig. 7, the input of the evaluation network is the state information of the target virtual object, and the return value corresponding to the state information of the target virtual object is output.

The network structure of the evaluation network may be the same as or different from the policy network, and this is not limited in this application embodiment.

In some possible embodiments, step 502 may include the following sub-steps:

1. calculating a corresponding return value according to the state information of the target virtual object at the first time stamp through a judging network;

2. adjusting parameters of the policy network according to the return value;

3. generating action information of the target virtual object at a second time stamp according to the state information of the target virtual object at the first time stamp through a policy network, wherein the second time stamp is behind the first time stamp;

4. controlling interaction between the target virtual object and the virtual environment according to the action information of the target virtual object at the second time stamp, and determining the state information of the target virtual object at the second time stamp;

5. calculating an award value according to the state information of the target virtual object at the second time stamp and the state information of the reference virtual object at the second time stamp;

6. and adjusting the parameters of the judging network according to the reward value.

Wherein, the return value represents the total quality degree of the action information generated by the network; the bonus value represents how similar the state information of the target virtual object is to the reference virtual object. The reward value for the first time stamp may be the sum of all reward values from the first time stamp until the end of the animation.

In some possible embodiments, the trained judgment network is obtained by inputting the state of the target virtual object and the corresponding return value into the judgment network, and continuously adjusting the parameters of the judgment network according to the state information of the target virtual object and the corresponding return value. The trained evaluation network can calculate an expected value of a corresponding return value according to the state information of the target virtual object, and the expected value can be used for adjusting parameters of the strategy network.

In some possible embodiments, the policy network compares a return value corresponding to the state information of the target virtual object with an expected value, and if the return value is greater than the expected value, it indicates that the generated action information is overall better; if the return value is smaller than the expected value, the generated action information is poor in total; if the reported value is equal to the expected value, it indicates that the generated action information is general. According to the magnitude relation between the return value and the expected value, the parameters of the policy network can be adjusted, so that the policy network tends to generate action information with the return value larger than the expected value.

In some possible embodiments, the reward value is subtracted from its corresponding expected value to yield a relative reward value. Obviously, if the return value is greater than the expected value, the relative return value is a positive number; if the return value is smaller than the expected value, the relative expected value is a negative number; if the return value is equal to the expected value, the relative expected value is 0; the relative return value of the action information that has not been generated can be set to 0. And adjusting parameters of the policy network according to the size of the relative return value, so that the policy network tends to generate action information with larger relative return value.

Optionally, the reward value is calculated once for each time stamp, or may be calculated once for a plurality of time stamps, and the calculation is specifically set by a related technician according to an actual situation, which is not limited in the embodiment of the present application.

In step 503, the motion information of the target virtual object is generated by the reinforcement learning model.

The trained reinforcement learning model can output corresponding action information according to the state information of the target virtual object.

In some possible embodiments, the action information of the target virtual object at the (i + 1) th timestamp is generated by the policy network according to the state information of the target virtual object at the (i) th timestamp, where i is a positive integer.

Optionally, the state information comprises a phase parameter, an attitude parameter and a velocity parameter. The phase parameter is used for representing the action progress of the target virtual object, the posture parameter is used for representing the posture form of the target virtual object, and the speed parameter is used for representing the speed state of the target virtual object.

In some possible embodiments, the phase parameter of the target virtual object may be represented by a phase of the reference motion. For example, if the reference motion is a total duration of 2s (second), and the current reference motion has been played for 0.2s, then the phase parameter of the target virtual object is (0.2s)/(2s) 0.1. Wherein the phase parameter of the target virtual object can also be expressed in percentage.

In some possible embodiments, the pose parameters may be represented by rotation information in a world coordinate system for each joint of the target virtual object's skeleton.

Wherein the rotation information may be represented by a quaternion. If the target virtual object includes N joints, the input dimension of the pose parameter may be (Nx 4), where N is an integer greater than or equal to 1.

Optionally, the world coordinate system is a three-dimensional coordinate system, the world coordinate system comprising mutually perpendicular X, Y and Z axes, the world coordinate system no longer changing with respect to the virtual environment since the start of the determination.

In some possible embodiments, the velocity parameters may be represented by linear velocity information and angular velocity information of the respective joints of the target virtual object in a world coordinate system. For each joint, the linear velocity has a dimension of 3 and the angular velocity has a dimension of 3, which may be 3+ 3-6. If the target virtual object includes N joints, then the velocity dimension of the target virtual object may be (N × 6).

Optionally, the action information may include: velocity of each joint of the target virtual object; alternatively, the position of each joint of the target virtual object.

In some possible embodiments, the velocity of each joint of the target virtual object may be represented by its angular velocity in the local coordinate system. Since the dimension of the angular velocity of each joint is 3, if the target virtual object includes N joints, the dimension of the velocity in the motion information is (N × 3).

Alternatively, the local coordinate system of the mth joint may be a coordinate system with the geometric center or centroid of the mth joint as the origin; the angular velocity of the mth joint may be a point on a rigid body connected to the mth joint, the angular velocity in the local coordinate system of the mth joint, and m is an integer greater than or equal to 1. The rigid body connected to the mth joint may be a bone connected to the mth joint.

In some possible embodiments, the motion information is represented by the position of each joint of the target virtual object. The motion information of the target virtual object may be represented by a change in the position of each joint of the target virtual object in the world coordinate system.

Step 504, generating an action sequence of the target virtual object according to the action information and the environment information.

Optionally, the sequence of actions of the target virtual object is composed of a series of consecutive actions of the target virtual object in the virtual environment. In the virtual environment, the target virtual object is driven to move according to the movement information, and the state of the target virtual object after interaction with the virtual environment can be simulated through physical engine calculation according to the state information, the movement and the environment information of the target virtual object. And circulating the steps, and interacting the target virtual object with the virtual environment to obtain a plurality of continuous actions.

In some possible embodiments, step 504 may include the following sub-steps:

1. calculating the moment corresponding to each joint of the target virtual object according to the state information of the target virtual object at the ith time stamp and the action information of the target virtual object at the (i + 1) th time stamp;

2. controlling the target virtual object to move in a virtual environment corresponding to the environment information according to the moment corresponding to each joint of the target virtual object, and generating the action of the target virtual object from the ith time stamp to the (i + 1) th time stamp;

3. and obtaining the action sequence of the target virtual object according to the action of the target virtual object among the time stamps.

The controller for calculating the moment corresponding to each joint of the target virtual object may be a PD controller, a PID (process-Integral-Derivative) controller, or a PI (process-Integral) controller, which is specifically set by a related technician according to an actual situation, and the embodiment of the present application does not limit this.

Optionally, through multiple iterations, the action of the (i + 1) th timestamp is made to approach the action information of the target virtual object at the (i + 1) th timestamp obtained through calculation.

In some possible embodiments, calculating the moments corresponding to the respective joints of the target virtual object may include the sub-steps of:

1.1, determining the speed of each joint of the target virtual object at the ith time stamp according to the state information of the target virtual object at the ith time stamp;

1.2, determining the target speed of each joint of the target virtual object at the (i + 1) th timestamp according to the action information of the target virtual object at the (i + 1) th timestamp;

1.3 calculating the difference between the target speed of each joint at the (i + 1) th time stamp and the speed at the (i) th time stamp;

and 1.4 calculating the corresponding moment of each joint according to the difference.

Optionally, after determining the velocity of each joint of the target virtual object at the ith timestamp, the controller performs multiple iterations, so that the velocity of each joint of the target virtual object approaches the target velocity.

Taking the target speed as an angular speed as an example, if the target speed of the jth joint is determined to be 10rad (radians)/s and the speed of the jth joint at the ith time stamp is 5rad/s, subtracting the speed of the jth joint at the ith time stamp from the target speed to obtain a first speed difference of 5 rad/s; the controller calculates to obtain a first torque according to the first speed difference; applying a first torque to the j-th joint, the j-th joint velocity becoming 6 rad/s; calculating the second speed difference to be 4 rad/s; the controller calculates to obtain a second moment according to the second speed difference; applying a second moment to the j-th joint, the velocity of the j-th joint becoming 6.5 rad/s; then calculating a third speed difference; and by analogy, carrying out multiple iterative calculations, so that the j-th joint speed approaches to the target speed.

In some possible embodiments, the way to bring the jth joint velocity closer to the target velocity may be: and setting the iterative computation times of the controller between two adjacent time stamps to be preset values. The preset value may be 10, 15, or 20, and is specifically set by a related technician according to an actual situation, which is not limited in the embodiment of the present application.

In some possible embodiments, the approach of the jth joint velocity to the target velocity may also be: the controller continuously performs iterative computation between two adjacent timestamps until the next timestamp stops; alternatively, the controller stops the iterative calculation when the absolute value of the difference between the jth joint velocity and the target velocity is less than or equal to the velocity threshold. The speed threshold may be 0.1rad/s, may be 0.15rad/s, and may be 0.2rad/s, and the specific value is set by a relevant technician according to an actual situation, which is not limited in the embodiment of the present application.

For example, when it is the PD controller that calculates the moment, the controller may calculate the moment corresponding to each joint according to the following formula two:

the formula II is as follows:

wherein the content of the first and second substances,

representing the difference between the velocity of the jth joint of the target virtual object and the target velocity at time t; k_pIs a proportionality coefficient; k_dIs a differential coefficient;

and the moment of the j-th joint of the target virtual object in the local coordinate system at the time t is shown.

Step 505, generating the motion animation of the target virtual object according to the motion sequence of the target virtual object.

This step is the same as or similar to the step 403 in the embodiment of fig. 4, and is not described here again.

In summary, in the technical solution provided in the embodiment of the present application, the reinforcement learning model is trained according to the return value, so that the reinforcement learning model tends to output the action information with a larger return value, and the action of the generated target virtual object is similar to the reference action as much as possible, thereby achieving the purpose of better redirecting the animation of the reference virtual object to the target virtual object.

In the embodiment of the application, in the process of repeatedly training the reinforcement learning model, the state of the target virtual object after interaction with the virtual environment is input into the reinforcement learning model, so that a more intelligent reinforcement learning model is obtained. Furthermore, the action information output by the trained reinforcement learning model can make the target virtual object automatically adapt to the change of the physical environment.

By way of example, when the animation of the reference virtual object is a continuous motion of walking on a flat ground, by applying the technical solution provided by the embodiment of the present application, the animation of the reference virtual object may be automatically migrated to the target virtual object, and if there is a small protrusion on the ground in the direction in which the target virtual object travels, since the trained reinforcement learning model may make the motion of the target virtual object approach the reference motion, after encountering the small protrusion, the target virtual object may automatically adjust the motion pose of each joint, and after walking around the curve, may continue to advance without stumbling, thereby improving the ability of the target virtual object to adapt to the environment.

In an exemplary embodiment, the reward value includes at least one of:

the reward value of the posture similarity is used for representing the similarity of the postures of the target virtual object and the reference virtual object;

the reward value of the speed similarity is used for representing the similarity degree of the speeds of the target virtual object and the reference virtual object;

the reward value of the skeleton tail end position similarity is used for representing the similarity of the skeleton tail end positions of the target virtual object and the reference virtual object;

and the reward value of the gravity center position similarity is used for representing the similarity degree of the gravity center positions of the target virtual object and the reference virtual object.

Wherein the reward value is used to characterize how similar the action of the target virtual object and the reference action are at the same phase.

Wherein, the end of the skeleton can be one end which is connected with only one joint and is not connected with any joint.

Optionally, the actions of the target virtual object and the reference virtual object at the same phase are compared and calculated in the reward function, so as to obtain the reward value of the target virtual object.

In some possible embodiments, the reward function may include the following formulas three through six:

the formula III is as follows:

wherein the content of the first and second substances,

representing the rotation information of the jth joint of the reference virtual object in the local coordinate system at the time t, and representing the rotation information by quaternion;

indicating the rotation information of the jth joint of the target virtual object in the local coordinate system at the time of tExpressed in quaternion;

representing the total pose similarity reward at time t.

The formula four is as follows:

wherein the content of the first and second substances,

representing the angular velocity information of the jth joint of the reference virtual object in the local coordinate system at the time t, and representing the angular velocity information by quaternion;

representing the angular velocity information of the jth joint of the target virtual object in the local coordinate system at the time t by using quaternion;

representing the total speed similarity reward at time t.

The formula five is as follows:

wherein the content of the first and second substances,

representing the position of the e-th end joint of the reference virtual object in the world coordinate system at the time t and representing the position by a three-dimensional vector;

representing the position of the e-th end joint of the target virtual object in the world coordinate system at the time t, and representing the position by using a three-dimensional vector;

representing the total end position similarity reward at time t.

Formula six:

wherein the content of the first and second substances,

representing the position of the gravity center of the reference virtual object in the world coordinate system at the time t by using a three-dimensional vector;

representing the position of the gravity center of the target virtual object in the world coordinate system at the time t, and representing the position by using a three-dimensional vector;

representing the total barycentric location similarity reward at time t.

In some possible embodiments, when multiple prize values are included in the prize value, the multiple prize values may be weighted and summed to calculate the prize value.

Illustratively, the reward value includes a reward value of posture similarity, a reward value of speed similarity, a reward value of skeleton end position similarity and a reward value of gravity center position similarity, and the reward value can be calculated by referring to the following formula seven:

the formula seven:

wherein k is_pWeight coefficient, k, representing attitude similarity reward_vWeight coefficient, k, representing velocity similarity reward_eWeight coefficient, k, representing the tip location similarity reward_cAnd a weight coefficient representing the barycentric location similarity reward. Alternatively, k_p+k_v+k_e+k_c＝1。

In some possible embodiments, when the bonus value includes only one of the bonus value of the gesture similarity, the bonus value of the velocity similarity, the bonus value of the skeleton end position similarity, and the bonus value of the gravity center position similarity, the bonus value may be represented by only one of the bonus values described above.

In the implementation manner, the reward value comprises a reward value of posture similarity, a reward value of speed similarity, a reward value of skeleton tail end position similarity and a reward value of gravity center position similarity, so that the condition that the reward value is too single in composition is avoided, the calculation result of the reward value is more referential, and the quality of the generated animation is further improved.

Referring to fig. 8, a schematic diagram of an animation generation method for a virtual object according to an embodiment of the present application is shown. The method may be applied to the computer device described above. As shown in fig. 8, the method may include the steps of:

step 801, the policy network outputs action information according to the input state of the target virtual object.

In step 802, the controller calculates the moments acting on the joints of the target virtual object according to the physical state and the motion information of the target virtual object.

And 803, changing the physical state of each joint of the target virtual object under the action of moment, and interacting with the virtual environment simulated in the physical engine to obtain the state of the next frame of the target virtual object.

At step 804, the physics engine outputs the reward value and the current state of the target virtual object.

Wherein the reward value is calculated by the physical engine according to the action of the target virtual object and the reference action; the current state of the target virtual object is calculated by the physical engine according to the physical state of the last step of the target virtual object, the moment received by each joint and the virtual environment.

Step 805, the evaluation network calculates a corresponding return value according to the state of the target virtual object, and inputs the return value into the policy network.

The steps 801 to 805 are repeated to obtain a plurality of animation frames of the target virtual object, and the animation of the target virtual object can be generated by connecting the plurality of animation frames of the target virtual object.

The embodiment of the application also provides an animation display method of the virtual object. The method may be applied in the computer device described above, and may comprise the steps of:

1. displaying a user interface;

2. displaying a virtual scene and a target virtual object located in the virtual scene in a user interface;

3. and displaying the action animation of the target virtual object, wherein the action animation of the target virtual object is generated according to the reference action of the reference virtual object and the environment information corresponding to the target virtual object, and the environment information is used for representing the virtual environment where the target virtual object is located.

The following are embodiments of the apparatus of the present application that may be used to perform embodiments of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application.

Referring to fig. 9, a block diagram of an animation generation apparatus for a virtual object according to an embodiment of the present application is shown. The device has a function of implementing an example of the animation generation method of the virtual object, and the function may be implemented by hardware or by hardware executing corresponding software. The apparatus may be the computer device described above, or may be provided in a computer device. As shown in fig. 9, the apparatus 9000 may comprise: an action obtaining module 9100, an action generating module 9200 and an animation generating module 9300.

The action obtaining module 9100 is configured to obtain a reference action of a reference virtual object corresponding to the target virtual object.

The action generating module 9200 is configured to generate an action sequence of the target virtual object according to the reference action and the environment information corresponding to the target virtual object; wherein the environment information is used for characterizing a virtual environment in which the target virtual object is located.

The animation generation module 9300 is configured to generate a motion animation of the target virtual object according to the motion sequence of the target virtual object.

In an exemplary embodiment, as shown in fig. 10, the action generating module 9200 further includes: a model training sub-module 9210, an information generation sub-module 9220, and an action generation sub-module 9230.

The model training submodule 9210 is configured to train a reinforcement learning model by using the reference motion.

The information generating sub-module 9220 is configured to generate, through the reinforcement learning model, action information of the target virtual object, where the action information is used to represent an action performed by the target virtual object.

The action generating sub-module 9230 is configured to generate an action sequence of the target virtual object according to the action information and the environment information.

In an exemplary embodiment, as shown in fig. 10, the reinforcement learning model includes a policy network, and the information generating sub-module 9220 is configured to generate, through the policy network, action information of the target virtual object at an i +1 th timestamp according to state information of the target virtual object at the i th timestamp, where i is a positive integer. Wherein the state information is used to characterize a physical state of the target virtual object.

In an exemplary embodiment, the state information includes a phase parameter, an attitude parameter, and a velocity parameter.

The phase parameter is used for representing the action progress of the target virtual object, the posture parameter is used for representing the posture form of the target virtual object, and the speed parameter is used for representing the speed state of the target virtual object.

In an exemplary embodiment, the action information includes: the velocity of each joint of the target virtual object; alternatively, the position of each joint of the target virtual object.

In an exemplary embodiment, as shown in FIG. 10, the reinforcement learning model further comprises a judgment network; the model training submodule 9210 is configured to:

calculating a corresponding return value according to the state information of the target virtual object at the first time stamp through a judging network;

adjusting parameters of the policy network according to the return value;

generating, by the policy network, action information of the target virtual object at a second timestamp according to the state information of the target virtual object at the first timestamp, the second timestamp being subsequent to the first timestamp;

controlling interaction between the target virtual object and the virtual environment according to the action information of the target virtual object at the second timestamp, and determining the state information of the target virtual object at the second timestamp;

calculating an award value according to the state information of the target virtual object at the second time stamp and the state information of the reference virtual object at the second time stamp;

and adjusting the parameters of the judging network according to the reward value.

In an exemplary embodiment, the reward value comprises at least one of:

a reward value of pose similarity for characterizing the degree of similarity of the poses of the target virtual object and the reference virtual object;

a reward value of speed similarity for characterizing the degree of similarity of the speeds of the target virtual object and the reference virtual object;

In an exemplary embodiment, as shown in fig. 10, the action generation sub-module 9230 includes: moment calculation unit 9231, motion control unit 9232 and action generation unit 9233.

The moment calculation unit 9231 is configured to calculate a moment corresponding to each joint of the target virtual object according to the state information of the target virtual object at the ith time stamp and the action information of the target virtual object at the (i + 1) th time stamp;

the motion control unit 9232 is configured to control the target virtual object to move in the virtual environment corresponding to the environment information according to a moment corresponding to each joint of the target virtual object, and generate an action from the ith time stamp to the (i + 1) th time stamp of the target virtual object;

the action generating unit 9233 is configured to obtain an action sequence of the target virtual object according to the action of the target virtual object between the timestamps.

In an exemplary embodiment, as shown in FIG. 10, the animation generation module 930 is configured to:

obtaining an animation frame of the target virtual object according to the action sequence of the target virtual object;

and generating motion animation of the target virtual object based on each animation frame.

In an exemplary embodiment, the target virtual object and the reference virtual object are two virtual objects having the same or similar skeletons.

The embodiment of the application provides an animation display device of a virtual object, which has the functions of realizing the example of the animation display method of the virtual object, wherein the functions can be realized by hardware, and can also be realized by hardware executing corresponding software. The apparatus may be the terminal described above, or may be provided in the terminal. The device comprises: the device comprises an interface display module, a scene display module and an animation display module.

The interface display module is used for displaying a user interface;

the animation display module is used for displaying the action animation of the target virtual object, the action animation of the target virtual object is generated according to the reference action of the reference virtual object and the environment information corresponding to the target virtual object, and the environment information is used for representing the virtual environment where the target virtual object is located.

It should be noted that, when the apparatus provided in the foregoing embodiment implements the functions thereof, only the division of the functional modules is illustrated, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure of the apparatus may be divided into different functional modules to implement all or part of the functions described above. In addition, the apparatus and method embodiments provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments for details, which are not described herein again.

Referring to fig. 11, a block diagram of a computer device according to an embodiment of the present application is shown. The computer device is used for implementing the animation generation method of the virtual object provided in the above embodiment. Specifically, the method comprises the following steps:

the computer apparatus 1100 includes a CPU (Central Processing Unit) 1101, a system Memory 1104 including a RAM (Random Access Memory) 1102 and a ROM (Read-Only Memory) 1103, and a system bus 1105 connecting the system Memory 1104 and the Central Processing Unit 1101. The computer device 1100 also includes a basic I/O (Input/Output) system 1106, which facilitates transfer of information between devices within the computer, and a mass storage device 1107 for storing an operating system 1113, application programs 1114, and other program modules 1112.

The basic input/output system 1106 includes a display 1108 for displaying information and an input device 1109 such as a mouse, keyboard, etc. for user input of information. Wherein the display 1108 and input device 1109 are connected to the central processing unit 1101 through an input output controller 1110 connected to the system bus 1105. The basic input/output system 1106 may also include an input/output controller 1110 for receiving and processing input from a number of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, input-output controller 1110 also provides output to a display screen, a printer, or other type of output device.

The mass storage device 1107 is connected to the central processing unit 1101 through a mass storage controller (not shown) that is connected to the system bus 1105. The mass storage device 1107 and its associated computer-readable media provide non-volatile storage for the computer device 1100. That is, the mass storage device 1107 may include a computer-readable medium (not shown) such as a hard disk or a CD-ROM (Compact disk Read-Only Memory) drive.

Without loss of generality, the computer-readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, EPROM (Erasable Programmable Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), flash Memory or other solid state Memory technology, CD-ROM, DVD (Digital Video Disc) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices. Of course, those skilled in the art will appreciate that the computer storage media is not limited to the foregoing. The system memory 1104 and mass storage device 1107 described above may be collectively referred to as memory.

According to various embodiments of the present application, the computer device 1100 may also operate as a remote computer connected to a network via a network, such as the Internet. That is, the computer device 1100 may connect to the network 1112 through the network interface unit 1111 that is coupled to the system bus 1105, or may connect to other types of networks or remote computer systems (not shown) using the network interface unit 1111.

Referring to fig. 12, a block diagram of a terminal according to an embodiment of the present application is shown. The terminal 1200 may be an electronic device such as a mobile phone, a tablet computer, a game console, an electronic book reader, a multimedia player, a wearable device, a PC, and the like. The terminal is used for implementing the animation generation method of the virtual object or the animation display method of the virtual object provided in the above embodiments. Specifically, the method comprises the following steps:

in general, terminal 1200 includes: a processor 1201 and a memory 1202.

The processor 1201 may include one or more processing cores, such as a 4-core processor, an 8-core processor, or the like. The processor 1201 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field Programmable Gate Array), and a PLA (Programmable logic Array). The processor 1201 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 1201 may be integrated with a GPU (Graphics Processing Unit) that is responsible for rendering and drawing content that the display screen needs to display. In some embodiments, the processor 1201 may further include an AI (Artificial Intelligence) processor for processing a computing operation related to machine learning.

Memory 1202 may include one or more computer-readable storage media, which may be non-transitory. Memory 1202 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer-readable storage medium in memory 1202 is used to store a computer program and is configured to be executed by one or more processors to implement the animation generation method of a virtual object or the animation display method of a virtual object described above.

In some embodiments, the terminal 1200 may further optionally include: a peripheral interface 1203 and at least one peripheral. The processor 1201, memory 1202, and peripheral interface 1203 may be connected by a bus or signal line. Various peripheral devices may be connected to peripheral interface 1203 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 1204, touch display 1205, camera 1206, audio circuitry 1207, pointing component 1208, and power source 1209.

Those skilled in the art will appreciate that the configuration shown in fig. 12 is not intended to be limiting of terminal 1200 and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be used.

In an exemplary embodiment, there is also provided a computer-readable storage medium having stored therein a computer program which, when executed by a processor of a computer device, implements the animation generation method of the above-described virtual object.

In an exemplary embodiment, there is also provided a computer-readable storage medium having stored therein a computer program which, when executed by a processor of a terminal, implements the animation display method of the above-described virtual object.

In an exemplary embodiment, a computer program product for implementing the above-described animation generation method of a virtual object when executed by a processor is also provided.

In an exemplary embodiment, a computer program product for implementing the above-described animation display method of a virtual object when executed by a processor is also provided.

It should be understood that reference to "a plurality" herein means two or more. In addition, the step numbers described herein only exemplarily show one possible execution sequence among the steps, and in some other embodiments, the steps may also be executed out of the numbering sequence, for example, two steps with different numbers are executed simultaneously, or two steps with different numbers are executed in a reverse order to the order shown in the figure, which is not limited by the embodiment of the present application.

The above description is only exemplary of the present application and should not be taken as limiting the present application, and any modifications, equivalents, improvements and the like that are made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method for animation generation of a virtual object, the method comprising:

training a reinforcement learning model by adopting the reference action, wherein the reinforcement learning model comprises a strategy network and a judgment network;

generating action information of the target virtual object through the policy network, wherein the action information is used for representing an action executed by the target virtual object;

generating an action sequence of the target virtual object according to the action information and the environment information corresponding to the target virtual object; wherein the environment information is used for representing the virtual environment in which the target virtual object is located;

generating motion animation of the target virtual object according to the motion sequence of the target virtual object;

wherein the training of the reinforcement learning model by using the reference action comprises:

calculating a corresponding return value according to the state information of the target virtual object at the first time stamp through the evaluation network;

adjusting parameters of the policy network according to the return value;

2. The method of claim 1, wherein the reinforcement learning model comprises a policy network;

the generating, by the policy network, action information of the target virtual object includes:

generating action information of the target virtual object at an (i + 1) th timestamp through the policy network according to the state information of the target virtual object at the ith timestamp, wherein i is a positive integer;

wherein the state information is used to characterize a physical state of the target virtual object.

3. The method of claim 2, wherein the state information includes a phase parameter, an attitude parameter, and a velocity parameter;

4. The method of claim 2, wherein the action information comprises:

the velocity of each joint of the target virtual object;

alternatively, the first and second electrodes may be,

the position of each joint of the target virtual object.

5. The method of claim 1, wherein the reward value comprises at least one of:

6. The method of claim 1, wherein generating the sequence of actions for the target virtual object based on the action information and the environment information comprises:

calculating the moment corresponding to each joint of the target virtual object according to the state information of the target virtual object at the ith time stamp and the action information of the target virtual object at the (i + 1) th time stamp;

controlling the target virtual object to move in the virtual environment corresponding to the environment information according to the moment corresponding to each joint of the target virtual object, and generating an action of the target virtual object from the ith time stamp to the (i + 1) th time stamp;

and obtaining an action sequence of the target virtual object according to the action of the target virtual object among the time stamps.

7. The method according to claim 6, wherein the calculating the moment corresponding to each joint of the target virtual object according to the status information of the target virtual object at the ith time stamp and the motion information of the target virtual object at the (i + 1) th time stamp comprises:

determining the speed of each joint of the target virtual object at the ith time stamp according to the state information of the target virtual object at the ith time stamp;

determining target speeds of joints of the target virtual object at the (i + 1) th time stamp according to the action information of the target virtual object at the (i + 1) th time stamp;

calculating a difference between a target velocity of the respective joint at the (i + 1) th timestamp and a velocity at the (i) th timestamp;

and calculating the corresponding moment of each joint according to the difference.

8. The method according to any one of claims 1 to 7, wherein the generating of the motion animation of the target virtual object according to the motion sequence of the target virtual object comprises:

9. The method of any one of claims 1 to 7, wherein the target virtual object and the reference virtual object are two virtual objects having the same or similar skeletons.

10. A method for animated display of a virtual object, the method comprising:

displaying a user interface;

displaying the motion animation of the target virtual object, wherein the motion animation of the target virtual object is generated in a mode that: acquiring a reference action of a reference virtual object corresponding to the target virtual object; training a reinforcement learning model by adopting the reference action, wherein the reinforcement learning model comprises a strategy network and a judgment network; generating action information of the target virtual object through the policy network, wherein the action information is used for representing an action executed by the target virtual object; generating an action sequence of the target virtual object according to the action information and the environment information corresponding to the target virtual object; generating motion animation of the target virtual object according to the motion sequence of the target virtual object; wherein the environment information is used for representing the virtual environment in which the target virtual object is located;

adjusting parameters of the policy network according to the return value;

11. An apparatus for animation generation of a virtual object, the apparatus comprising:

the animation generation module is used for generating the action animation of the target virtual object according to the action sequence of the target virtual object;

wherein the action generation module comprises:

the model training submodule is used for training a reinforcement learning model by adopting the reference action, and the reinforcement learning model comprises a strategy network and a judgment network;

the information generation submodule is used for generating action information of the target virtual object through the policy network, and the action information is used for representing an action executed by the target virtual object;

the action generation submodule is used for generating an action sequence of the target virtual object according to the action information and the environment information;

wherein the model training submodule is configured to:

adjusting parameters of the policy network according to the return value;

12. A computer device, characterized in that it comprises a processor and a memory in which is stored a computer program that is loaded and executed by the processor to implement the animation generation method of a virtual object according to any one of claims 1 to 9.

13. A computer-readable storage medium, in which a computer program is stored, which is loaded and executed by a processor to implement the animation generation method of a virtual object according to any one of claims 1 to 9.