CN114998491B

CN114998491B - Digital human driving method, device, equipment and storage medium

Info

Publication number: CN114998491B
Application number: CN202210917824.XA
Authority: CN
Inventors: 崔雨豪; 蒲黎明; 史运洲; 丁浩生; 赵中州; 周伟; 肖志勇; 陈海青
Original assignee: Alibaba China Co Ltd
Current assignee: Alibaba China Co Ltd
Priority date: 2022-08-01
Filing date: 2022-08-01
Publication date: 2022-11-18
Anticipated expiration: 2042-08-01
Also published as: CN114998491A; WO2024027661A1

Abstract

The present disclosure relates to a digital human driving method, apparatus, device, and storage medium. The method comprises the step of determining a target module for executing a control instruction from a motion matching module and a motion control module by driving the control instruction of a digital person. The motion matching module can determine a target animation segment matched with the control instruction from a plurality of preset animation segments according to the control instruction. The motion control module can input the control instruction, the historical motion skeleton information and the historical motion trail of the digital human into a machine learning model which is trained in advance, and the skeleton motion information used for driving the digital human is generated through the machine learning model. Therefore, the embodiment can freely switch between two modes for determining the bone motion information, so that the control instructions under different scenes can generate the bone motion information for driving the digital person in different modes, and a state transition diagram does not need to be constructed, thereby saving the labor cost.

Description

Digital human driving method, device, equipment and storage medium

Technical Field

The present disclosure relates to the field of information technology, and in particular, to a digital human driving method, apparatus, device, and storage medium.

Background

With the continuous development of science and technology, how to drive digital people to move becomes a key technology in the process of constructing a virtual world, such as the process of constructing a meta universe and a virtual anchor, and the digital people can be understood as characters in the virtual world. For example, digital people need to move freely in a virtual world, interact with the surrounding environment, and the like.

However, in the prior art, the digital human is driven to move through a state transition diagram, for example, each node in the state transition diagram is an animation segment, each edge in the state transition diagram is a state transition condition, and if the state transition condition on a certain edge is satisfied, the animation segment for driving the digital human is changed from one animation segment connected with the edge to another animation segment connected with the edge, so that the animation segment for driving the digital human is transferred among different animation segments. However, this approach requires different state transition conditions and corresponding animation segments to be manually pre-constructed, resulting in high labor costs.

Disclosure of Invention

In order to solve the above technical problem or at least partially solve the above technical problem, the present disclosure provides a method, an apparatus, a device, and a storage medium for driving a digital person, where the present embodiment may freely switch between two manners for determining bone motion information, so that control instructions in different scenes may generate bone motion information for driving the digital person in different manners, and the digital person may be driven without constructing a state transition diagram or constructing different state transition conditions and corresponding animation segments in advance, thereby saving labor cost.

In a first aspect, an embodiment of the present disclosure provides a digital human driving method, including:

acquiring a control instruction for driving a digital person;

according to the control instruction, determining a target module for executing the control instruction from a motion matching module and a motion control module;

if the target module is the motion matching module, determining a target animation segment matched with the control instruction from a plurality of preset animation segments according to the control instruction, and taking the bone motion information in the target animation segment as the bone motion information for driving the digital person;

if the target module is the motion control module, inputting the control instruction, historical motion skeleton information and historical motion trail of the digital person into a machine learning model which is trained in advance, and generating skeleton motion information for driving the digital person through the machine learning model;

and driving the digital human to move according to the bone motion information.

In a second aspect, embodiments of the present disclosure provide a digital human drive apparatus, including:

the acquisition module is used for acquiring a control instruction for driving the digital person;

the first determining module is used for determining a target module for executing the control instruction from the motion matching module and the motion control module according to the control instruction;

a second determining module, configured to determine, according to the control instruction, a target animation segment that matches the control instruction from multiple preset animation segments if the target module is the motion matching module, and use bone motion information in the target animation segment as bone motion information for driving the digital person;

the generating module is used for inputting the control instruction, the historical movement skeleton information and the historical movement track of the digital human into a machine learning model which is trained in advance if the target module is the movement control module, and generating skeleton movement information for driving the digital human through the machine learning model;

and the driving module is used for driving the digital human to move according to the bone motion information.

In a third aspect, an embodiment of the present disclosure provides an electronic device, including:

a memory;

a processor; and

a computer program;

wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of the first aspect.

In a fourth aspect, the present disclosure provides a computer-readable storage medium, on which a computer program is stored, the computer program being executed by a processor to implement the method of the first aspect.

According to the digital human driving method, the digital human driving device, the digital human driving equipment and the digital human driving storage medium, the control instruction of the digital human is driven, the target module for executing the control instruction is determined from the motion matching module and the motion control module, and when the control instruction is different, the selected target modules may be different, so that the flexible switching between the motion matching module and the motion control module can be realized. The motion matching module can determine a target animation segment matched with the control instruction from a plurality of preset animation segments according to the control instruction, and the bone motion information in the target animation segment is used as the bone motion information for driving the digital human. The motion control module can input the control command, the historical motion skeleton information and the historical motion trail of the digital person into a machine learning model which is trained in advance, and the skeleton motion information used for driving the digital person is generated through the machine learning model. Therefore, the motion matching module and the motion control module respectively determine different bone motion information modes, and the embodiment can freely switch between the two bone motion information determining modes, so that the control instructions in different scenes can generate the bone motion information for driving the digital human in different modes, and the digital human can be driven without constructing a state transition diagram or constructing different state transition conditions and corresponding animation segments in advance, thereby saving the labor cost.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present disclosure, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.

Fig. 1 is a flowchart of a digital human driving method provided by an embodiment of the present disclosure;

fig. 2 is a schematic diagram of an application scenario provided by an embodiment of the present disclosure;

FIG. 3 is a flow chart of a digital human actuation method according to another embodiment of the present disclosure;

FIG. 4 is a flow chart of a digital human actuation method according to another embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of a digital human driving apparatus provided in an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of an embodiment of an electronic device provided in the embodiment of the present disclosure.

Detailed Description

In order that the above objects, features and advantages of the present disclosure may be more clearly understood, aspects of the present disclosure will be further described below. It should be noted that the embodiments and features of the embodiments of the present disclosure may be combined with each other without conflict.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced in other ways than those described herein; it is to be understood that the embodiments disclosed in the specification are only a few embodiments of the present disclosure, and not all embodiments.

In general, a digital person may be driven to move through a state transition diagram, for example, each node in the state transition diagram is an animation segment, each edge in the state transition diagram is a state transition condition, and if the state transition condition on a certain edge is satisfied, the animation segment for driving the digital person is changed from one animation segment connected to the edge to another animation segment connected to the edge, so that the animation segment for driving the digital person is transitioned between different animation segments. However, this approach requires different state transition conditions and corresponding animation segments to be manually pre-constructed, resulting in high labor costs. To address this problem, embodiments of the present disclosure provide a digital human driving method, which is described below with reference to specific embodiments.

Fig. 1 is a flowchart of a digital human driving method according to an embodiment of the disclosure. The method can be executed by a digital human driving device, the device can be implemented by software and/or hardware, and the device can be configured in an electronic device, such as a server or a terminal, wherein the terminal specifically includes a mobile phone, a computer or a tablet computer. In addition, the digital human driving method described in this embodiment may be applied to the application scenario as shown in fig. 2. As shown in fig. 2, the application scenario includes a terminal 21 and a server 22, where the server 22 may drive the digital person by using the method described in the embodiment of the present disclosure, and send a video file or a video stream of the action of the digital person to the terminal 21, so that the terminal 21 may play a picture of the action of the digital person. Alternatively, the terminal 21 may drive the digital person by using the method described in the embodiment of the present disclosure, and play the picture of the action of the digital person. The method is described in detail below with reference to fig. 2, and as shown in fig. 1, the method includes the following specific steps:

and S101, acquiring a control instruction for driving the digital human.

Assuming that the server 22 drives the digital human as an example, the server 22 may obtain a control instruction for driving the digital human, and the control instruction may be a control instruction from the terminal 21, for example, a control instruction issued by a user of the terminal 21. Alternatively, the control instructions may be generated by the server 22.

And S102, according to the control instruction, determining a target module for executing the control instruction from the motion matching module and the motion control module.

For example, the server 22 may include a motion matching module and a motion control module, which may be implemented in software and/or hardware, respectively. The motion matching module and the motion control module can respectively determine the bone motion information for driving the digital human, but the principle and the specific process of determining the bone motion information by the motion matching module and the motion control module are different. For example, the motion matching module may select one animation segment that is most matched with the control command from a plurality of existing animation segments as the bone motion information for driving the digital human, and the motion control module may directly generate the bone motion information for driving the digital human through a machine learning model trained in advance. Therefore, in the case where the server 22 acquires the control instruction, it is necessary to determine a target module for executing the control instruction from the motion matching module and the motion control module. That is, the server 22 needs to determine one of the motion matching module and the motion control module as a target module, and the target module determines the bone motion information of the driving digital person. In this embodiment, the server 22 may store a plurality of preset control instructions and an identifier of a module executing each preset control instruction in advance, that is, the server 22 may store a corresponding relationship between the preset control instruction and the module identifier in advance. When the server 22 obtains a certain control instruction, a preset control instruction that is most matched with the control instruction may be queried from the corresponding relationship according to the control instruction, and a module corresponding to a module identifier corresponding to the preset control instruction is used as a module for executing the control instruction. That is, the present embodiment may determine in advance which control instructions are executed by which of the motion matching module and the motion control module. In this embodiment, the motion matching module processes the control command by using a motion matching algorithm, and the motion control module processes the control command by using a motion control algorithm. The motion matching algorithm may specifically be an algorithm for determining a target animation segment matched with the control instruction from a plurality of preset animation segments. The motion control algorithm may be an algorithm adopted by a machine learning model, and the machine learning model may generate the bone motion information of the digital person at the next time or the next frame according to the control instruction, the historical motion bone information and the historical motion track of the digital person.

S103, if the target module is the motion matching module, determining a target animation segment matched with the control instruction from a plurality of preset animation segments according to the control instruction, and taking the bone motion information in the target animation segment as the bone motion information for driving the digital person.

For example, if the control command is "go to the front chair", the server 22 may use the motion matching module as a target module for executing the control command, that is, the server 22 may give the control command to the motion matching module for execution. When the motion matching module executes the control instruction, a target animation segment matched with the control instruction can be determined from a plurality of preset animation segments stored in a database according to the control instruction, and the bone motion information in the target animation segment is used as the bone motion information for driving the digital person.

And S104, if the target module is the motion control module, inputting the control instruction, the historical motion skeleton information and the historical motion trail of the digital human into a machine learning model which is trained in advance, and generating skeleton motion information for driving the digital human through the machine learning model.

Optionally, the historical kinematic skeleton information of the digital person includes at least one of: the position information, the displacement information and the rotation information of each skeleton point of the digital person on each track point in the historical motion trail; and the state information of the digital person on each track point in the historical motion track.

For example, if the control command is "sit on chair", the server 22 may use the motion control module as a target module for executing the control command, that is, the server 22 may give the control command to the motion control module for execution. When the motion control module executes the control command, the historical motion skeleton information and the historical motion trail of the digital person can be input into a machine learning model which is trained in advance, so that the machine learning model can generate the skeleton motion information of the digital person at the next moment or the next frame according to the input information. The digital person comprises a plurality of skeleton points, the historical motion track comprises a plurality of historical track points, and the historical motion skeleton information of the digital person can be the skeleton motion information of the digital person in the motion process on the historical motion track. For example, the historical kinematic skeleton information of the digital person comprises skeleton posture information of the digital person on each historical track point and state information of the digital person on each historical track point, wherein the skeleton posture information of the digital person on each historical track point comprises position information of each skeleton point of the digital person on each historical track point or each historical moment, and displacement information and rotation information of each skeleton point between two adjacent historical track points or two adjacent historical moments. The historical track points and the historical moments can be in one-to-one correspondence or not. The state information of the digital person on each historical track point comprises states of walking, running, squatting, standing and the like.

In addition, the bone motion information of the digital person at the next time or the next frame includes bone posture information of the digital person at the next time or the next frame, for example, the bone posture information of the digital person at the next time or the next frame includes position information of each bone point of the digital person at the next time or the next frame, respectively, and displacement information and rotation information of each bone point of the next time relative to the current time or the next frame relative to the current frame, respectively. It is understood that one of the plurality of bone points included in the digital person is a root node, or a root node can be determined according to the plurality of bone points, and a projection point of the root node on the ground is marked as a track point.

And S105, driving the digital person to move according to the bone motion information.

For example, when the server 22 determines the bone motion information for driving the digital person, the digital person may be driven to move according to a reorientation manner, for example, the rotation information and the displacement information of each bone point included in the bone motion information are bound to the bone of the digital person, so that the bone of the digital person may make similar movement with the bone in the bone motion information.

According to the embodiment of the disclosure, the control instruction of the digital human is driven, the target module for executing the control instruction is determined from the motion matching module and the motion control module, and when the control instruction is different, the selected target module may be different, so that the flexible switching between the motion matching module and the motion control module can be realized. The motion matching module can determine a target animation segment matched with the control instruction from a plurality of preset animation segments according to the control instruction, and the bone motion information in the target animation segment is used as the bone motion information for driving the digital human. The motion control module can input the control command, the historical motion skeleton information and the historical motion trail of the digital person into a machine learning model which is trained in advance, and the skeleton motion information used for driving the digital person is generated through the machine learning model. Therefore, the motion matching module and the motion control module respectively determine different bone motion information modes, and the embodiment can freely switch between the two bone motion information determining modes, so that the control instructions in different scenes can generate the bone motion information for driving the digital human in different modes, and the digital human can be driven without constructing a state transition diagram or constructing different state transition conditions and corresponding animation segments in advance, thereby saving the labor cost.

Fig. 3 is a flowchart of a digital human driving method according to another embodiment of the disclosure. In this embodiment, the method specifically includes the following steps:

s301, acquiring at least one control signal for driving the digital human.

As shown in fig. 4, the server 22 includes an instruction parsing module, a dynamic state machine, a pre-processing module, a motion matching module, a motion control module, and a post-processing module. The server 22 can realize a driving scheme for the digital human through the modules, and through the driving scheme, full-terrain displacement animation, scene interaction animation, long-time-sequence action animation and the like corresponding to various instructions can be generated. For example, the instruction parsing module may receive a brain wave, an audio signal, a visual signal, a voice signal, a text signal, a path planning signal, etc. as shown in fig. 4, for driving a digital human, and these control signals may be generated by the terminal 21 and then transmitted to the server 22, or may be generated by the server 22.

S302, each control signal is analyzed into at least one control command.

For example, the brain wave may be a signal induced by a brain sensor, the brain sensor may be disposed in a wearable device, the wearable device may be the terminal 21 and worn on the head of a real person, and when the brain of the real person thinks different control commands, the brain sensor may induce different signals. For example, when the brain of the real person thinks "walk forward", the signal sensed by the brain sensor is 0, and when the brain of the real person thinks "walk backward", the signal sensed by the brain sensor is 1. Therefore, when the brain wave representative signal 0 received by the instruction analyzing module, the control instruction into which the brain wave is analyzed by the instruction analyzing module is "go forward". When the brain wave representative signal 1 received by the instruction analysis module, the control instruction into which the brain wave is analyzed by the instruction analysis module is "go backward". It is understood that the description is only illustrative, and in different scenarios, the signals 0 and 1 sensed by the brain electrical sensor represent different meanings, for example, in a turning scenario of a digital person, the signal 0 sensed by the brain electrical sensor may be interpreted as "turning to the right" by the instruction interpreting module, and the signal 1 may be interpreted as "turning to the left".

The audio signal shown in fig. 4 may be a piece of music or audio with a segment, and the instruction parsing module may parse a control instruction for controlling the behavior of the digital person from the audio signal, for example, parse a control instruction for controlling the magnitude of the action of the digital person according to the volume of the music, and parse a control instruction for controlling the digital person's footage segment according to the segment of the audio signal. The visual signal can be a video shot in the real world, and the instruction analysis module can analyze the human motion in the visual signal and convert the human motion into corresponding bone motion information. Alternatively, the visual signal may be a virtual visual signal, i.e. a visual signal simulated in a virtual environment, such as a 16-line, 64-line visual system, etc. The voice signal may be a voice for controlling a digital person sent by a user of the terminal 21, and the instruction parsing module may convert the voice into text information through an Automatic Speech Recognition (ASR) technology, and further, parse the text information into at least one control instruction. In addition, the text signal shown in fig. 4 may be text information, and the instruction parsing module may parse the text information and parse the text information into continuous independent control instructions. For example, the text message is "go to the front chair and sit down", and the text message may be decomposed into two control commands by the command parsing module, wherein one control command is "go to the front chair" and the other control command is "sit down on the chair". In addition, the path planning signal shown in fig. 4 may include a destination, the instruction parsing module may perform automatic path planning according to the destination, and select an optimal path as the obstacle avoidance path of the digital person, further, the instruction parsing module may parse the optimal path into a plurality of control instructions, each control instruction may include position information of a trace point on the optimal path, so that the digital person is controlled to move along the optimal path through the plurality of control instructions. It is understood that, in other embodiments, the control instruction sent by the instruction parsing module may be replaced by a control instruction sent by another control module, or the instruction parsing module may be replaced by any module capable of sending a control instruction, or the control instruction received by the dynamic state machine may also be any manually input control instruction. In addition, the analysis method adopted by the instruction analysis module is not limited to the analysis method described above, for example, the instruction analysis module may also analyze the control signal received by the instruction analysis module through a machine learning model, so as to directly analyze the control signal received by the instruction analysis module into a control instruction, for example, directly analyze a text signal or a voice signal received by the instruction analysis module into a corresponding control instruction.

S303, sequencing at least one control instruction corresponding to the at least one control signal respectively to obtain a sequencing result.

As shown in fig. 4, the instruction parsing module may receive at least one control signal among brain waves, audio signals, visual signals, voice signals, text signals, path planning signals, and may parse each control signal into at least one control instruction, for a period of time. Therefore, the instruction analysis module can analyze a plurality of control instructions within a period of time, and further, the instruction analysis module can issue the plurality of analyzed control instructions to the dynamic state machine. At this time, the dynamic state machine may sort the plurality of control instructions, for example, according to the execution sequence, thereby obtaining a sorting result. For example, in a period of time, the instruction parsing module parses out 3 control instructions, which are respectively denoted as a control instruction a, a control instruction B, and a control instruction C. The dynamic state machine sequences the 3 control instructions to obtain a sequencing result which is a control instruction B, a control instruction A and a control instruction C.

S304, obtaining the current first unexecuted control instruction from the sequencing result.

For example, after the dynamic state machine obtains the sorting result, the current first unexecuted control instruction is obtained from the sorting result, for example, the control instruction B is the current first unexecuted control instruction.

S305, according to the control instruction, determining a target module for executing the control instruction from the motion matching module and the motion control module.

For example, according to the control instruction B, the dynamic state machine determines a target module for executing the control instruction B from the motion matching module and the motion control module. In this embodiment, the motion matching module may include a plurality of sub-modules. For example, the motion matching module includes 3 sub-modules, and the 3 sub-modules are respectively denoted as a displacement sub-module, an interaction sub-module, and an action sub-module, where the displacement sub-module is configured to process a control instruction related to digital human displacement, the interaction sub-module is configured to process a control instruction related to digital human interaction, and the action sub-module is configured to process a control instruction related to digital human action. The digital person displacement comprises displacement changes generated by walking, going upstairs and downstairs, climbing mountains and the like of the digital person. Digital human interactions include static interactions between a digital human and static objects (e.g., sofas, chairs, etc.) in a virtual environment, as well as dynamic interactions between a digital human and dynamic objects (e.g., other digital people) in a virtual environment. The digital human actions comprise in-situ posture changes of digital human dancing, in-situ martial arts and the like. It is understood that in other embodiments, the sub-module is not limited to process the control command related to the digital human movement, interaction or action, for example, in the case of the digital human movement, the control command related to the digital human movement may be processed by a plurality of movement sub-modules in combination, or each of the plurality of movement sub-modules may be used for processing the control command related to the digital human movement independently.

Similarly, the motion control module may also include 3 sub-modules, for example, a displacement sub-module, an interaction sub-module, and an action sub-module, where the functions of the sub-modules are as described above and are not further described here. However, in this embodiment, since the motion matching module and the motion control module respectively apply different scenes and/or control commands, the displacement sub-module in the motion matching module and the displacement sub-module in the motion control module respectively apply different scenes and/or control commands to the same sub-module, for example, the displacement sub-module. Therefore, when the dynamic state machine determines the target module for executing the control instruction B from the motion matching module and the motion control module according to the control instruction B, specifically, one sub-module may be determined as the target module from 3 sub-modules included in the motion matching module and 3 sub-modules included in the motion control module.

For example, the control instruction B is a control instruction regarding digital human displacement, and the digital human displacement is displacement on a flat ground, the dynamic state machine may select a displacement sub-module in the motion matching module as the target module. If the control instruction B is a control instruction about digital human displacement, and the digital human displacement is displacement generated in the scenes of going upstairs and downstairs, climbing mountains and the like, the dynamic state machine may select a displacement sub-module in the motion control module as a target module.

For example, after the target module has processed the control instruction B, the target module may send a completion signal to the dynamic state machine. At this time, the current first unexecuted control instruction in the sorting result is changed into a control instruction a, further, the dynamic state machine may determine a target module for the control instruction a, and the determination process is similar to the process of determining the target module for the control instruction B, and is not described herein again. After the control instruction a is executed, the control instruction C becomes the current first unexecuted control instruction in the sequencing result, and further, the dynamic state machine may determine a target module for the control instruction C, and the target module processes the control instruction C. That is to say, the dynamic state machine may serially connect (e.g., sort) a plurality of control instructions issued by the instruction parsing module, and sequentially distribute the plurality of control instructions to different sub-modules in the motion matching module or different sub-modules in the motion control module for processing according to a sorting result.

For example, the current state of the digital person is an idle state (e.g., a standing rest state), and it is assumed that the instruction parsing module issues two control instructions to the dynamic state machine, namely "walk to the side of the chair" and "sit on the chair", respectively. The dynamic state machine sequences the two control commands, and the sequencing result is that the control commands are firstly 'go to the side of the chair' and secondly 'sit on the chair'. Further, the dynamic state machine determines a sub-module suitable for completing "walk beside chair", for example, a displacement sub-module in the motion matching module, from among 3 sub-modules included in the motion matching module and 3 sub-modules included in the motion control module. Then, the dynamic state machine distributes the motion to a displacement submodule in the motion matching module, the displacement submodule returns a finished signal to the dynamic state machine after finishing, and at the moment, the digital person can return to an idle state to wait for calling. Next, the dynamic state machine determines a sub-module suitable for "sitting on the chair", for example, an interaction sub-module in the motion control module, from among the 3 sub-modules included in the motion matching module and the 3 sub-modules included in the motion control module. Further, the dynamic state machine distributes 'sitting on the chair' to the interaction submodule in the motion control module, and after the 'sitting on the chair' is processed by the interaction submodule, the digital person can return to the idle state again. It is understood that the idle state returned by the digital person after the digital person performs the actions of different control commands may be different, for example, the idle state returned by the digital person after completing the action of a control command may be the state at the end of the last action in a series of consecutive actions corresponding to the control command.

S306, if the target module is the motion matching module, determining a target animation segment matched with the control instruction from a plurality of preset animation segments according to the control instruction, and taking the bone motion information in the target animation segment as the bone motion information for driving the digital person.

For example, as shown in fig. 4, the motion matching module and the motion control module respectively correspond to a preprocessing module, and specifically, the preprocessing modules respectively corresponding to the motion matching module and the motion control module may be the same module or different modules. If the motion matching module and the motion control module are the same module, the preprocessing processes respectively executed by the preprocessing module aiming at the motion matching module and the motion control module are different. In this embodiment, the preprocessing module corresponding to the motion matching module mainly performs a function of refining a plurality of preset animation segments stored in the database.

For example, when the target module selected by the dynamic state machine for a certain control instruction is the motion matching module or a certain sub-module in the motion matching module, the sub-module in the motion matching module or the motion matching module may determine, according to the control instruction, a target animation segment matched with the control instruction from a plurality of preset animation segments stored in the database, and use bone motion information in the target animation segment as bone motion information for driving the digital person. Taking the example that the motion matching module executes the control instruction, when the input of the motion matching module is the control instruction, the motion matching module may determine a target animation segment matched with the control instruction from a plurality of preset animation segments, and output the target animation segment.

In a possible implementation manner, determining, according to the control instruction, a target animation segment that matches the control instruction from a plurality of preset animation segments includes: and according to at least one historical animation segment driving the digital human to move and the control instruction, determining a target animation segment which is matched with the control instruction and is connected with the at least one historical animation segment from a plurality of preset animation segments.

Taking the control instruction executed by the motion matching module as an example, the input of the motion matching module not only includes the control instruction, but also includes, for example, the first n animation segments, where the first n animation segments are historical animation segments that drive the digital human to move, the number of the historical animation segments is n, and n is greater than or equal to 1. That is, the input of the motion matching module may include the first n animation segments and the control command, in this case, the motion matching module needs to determine the target animation segment matched with the control command from a plurality of preset animation segments, and simultaneously, make the engagement degree of the determined target animation segment and the first n animation segments greater than or equal to the preset engagement degree, that is, the motion matching module needs to output a target animation segment which can be matched with the control command and can be engaged with the first n animation segments.

In another possible implementation manner, determining a target animation segment matched with the control instruction from a plurality of preset animation segments according to the control instruction includes: and determining a target animation segment which is matched with the control instruction and is connected with the at least one historical animation segment from a plurality of preset animation segments according to the historical motion track of the digital person, the at least one historical animation segment for driving the digital person to move and the control instruction.

Taking the example of the motion matching module executing the control command, the input of the motion matching module not only includes the first n animation segments and the control command, but also includes the historical motion trajectory of the digital person, for example. The historical movement track can be a movement track of the digital person in a certain historical time period, namely a track line through which the digital person walks. In this case, the motion matching module may output a target animation segment that can be matched with the control command and can be connected with the first n animation segments.

In this embodiment, the target animation segment determined by the motion matching module may include an initial posture or a reference posture of the bone and the bone motion information based on the initial posture or the reference posture. Specifically, the present embodiment may use the bone motion information in the target animation segment as the bone motion information for driving the digital person. Wherein, the skeleton included in the target animation segment may or may not be the skeleton of the digital person. If the target animation segment is not the skeleton of the digital person, the skeletal motion information in the target animation segment can be ensured to be redirected to the skeleton of the digital person.

And S307, if the target module is the motion control module, inputting the control instruction, the historical motion skeleton information and the historical motion trail of the digital human into a machine learning model which is trained in advance, and generating skeleton motion information for driving the digital human through the machine learning model.

For example, in this embodiment, the preprocessing module corresponding to the motion control module mainly performs a function of normalizing an input, i.e., a sample, of the machine learning model in a training process, for example, in a training phase, the input of the machine learning model includes bone motion information, and the normalization process may be to redirect the bone motion information to a uniform standard bone pose, e.g., a T-pos pose. Thereby improving the accuracy of the trained machine learning model.

For example, when the target module selected by the dynamic state machine for a certain control instruction is a motion control module or a certain sub-module in the motion control module, the sub-module in the motion control module or the motion control module may input the control instruction, the historical motion skeleton information and the historical motion trajectory of the digital person into a machine learning model trained in advance, so that the machine learning model may generate the skeleton motion information of the digital person at the next time or in the next frame according to the input information. Taking the example that the motion control module executes the control instruction, the information input to the machine learning model by the motion control module includes the control instruction, historical motion skeleton information of the digital person, and a historical motion track, wherein the historical motion track may be a motion track of the digital person within a certain historical time period. The meaning of the historical kinematic skeleton information herein refers to the content described in the above embodiments, and is not described herein again.

Optionally, the inputting the control instruction, the historical movement skeletal information of the digital person, and the historical movement trajectory into a machine learning model trained in advance, and generating skeletal movement information for driving the digital person through the machine learning model, includes: and inputting the control command, the environmental information around the digital person, the historical movement skeleton information and the historical movement track of the digital person into a machine learning model which is trained in advance, and generating the skeleton movement information for driving the digital person at the next moment through the machine learning model.

For example, taking the control instruction executed by the motion control module as an example, the information input by the motion control module to the machine learning model is not limited to include the control instruction, the historical kinematic skeleton information and the historical motion trajectory of the digital person, and may also include, for example, the environmental information around the digital person. In addition, the output of the machine learning model is not limited to include the bone motion information of the digital person at the next time or frame, and may also include the motion trajectory of the digital person predicted by the machine learning model in a subsequent short period of time, for example. It is understood that the output of the machine learning model at the current time can be used as the input of the machine learning model at the next time, so as to continuously iterate the calculation, for example, the motion trajectory output by the machine learning model at the current time can be used as the historical motion trajectory of the input required by the machine learning model at the next time. That is, the output of the machine learning model is real-time. In addition, the digital human can be driven once according to the bone motion information of the digital human at the next moment or frame, which is output by the machine learning model each time, so that the digital human can be driven to move in real time while the machine learning model is output in real time.

Optionally, the environmental information around the digital person includes at least one of: the height information of each track point on the historical motion track of the preset length passed by the digital person; voxel information of virtual objects within a preset range around the digital person; trajectory information of dynamic objects around the digital person; and the contact information of the dynamic objects around the digital person and the digital person.

In this embodiment, the environmental information around the digital person may specifically be information of a virtual environment in which the digital person is located. For example, the environment information may include height information of each track point on the historical movement track of the first 2 meters passed by the digital person, and the height information may be height information with respect to a reference horizon in the virtual environment. The historical motion trajectory of the first 2 meters that the digital person has traveled may be the historical motion trajectory of the first 2 meters relative to the current location of the digital person. In addition, the environmental information may also include voxelized information for all objects within 2 meters of the digital person's surroundings. It is understood that the example of 2 meters is used for illustrative purposes, and in other embodiments, the specific numerical value is not limited. In addition, the environment information may further include trajectory information of dynamic objects around the digital person, such as other digital persons, and contact information between the digital person and the other digital persons.

In addition, as shown in fig. 4, the motion matching module and the motion control module respectively correspond to a post-processing module, and the post-processing module corresponding to the motion matching module can redirect the bone motion information in the target animation segment determined by the motion matching module or a certain sub-module in the motion matching module to the bone of the digital person, so that the bone of the digital person can complete the action corresponding to the target action segment. Wherein the target animation segment may be a standard skeletal animation segment. The post-processing module corresponding to the motion control module can redirect the standard bone motion information generated by the machine learning model to the bone of the digital human, so that the bone of the digital human can complete the action corresponding to the standard bone motion information generated by the machine learning model.

In addition, the post-processing module can also process the target animation segments determined by the motion matching module or the standard bone motion information generated by the machine learning model by utilizing a Foot inverse kinematics algorithm (Foot IK), so that the Foot of the digital person can be fixed on the ground when the digital person moves, and the digital person is prevented from sliding relative to the ground when the digital person walks.

In addition, the post-processing module can optimize the bone motion information generated by the machine learning model, for example, the bone motion information generated by the machine learning model is located in an absolute world coordinate system, and the post-processing module can convert the bone motion information in the absolute world coordinate system into the bone motion information in the relative digital human coordinate system, so that the reorientation as described above can be better completed according to the bone motion information in the relative digital human coordinate system. For another example, the post-processing module may determine whether the bone motion information generated by the machine learning model relates to a condition that the joint rotation direction may be incorrect, and if so, the post-processing module may pre-constrain the joint rotation direction, so as to add a priori constraint conditions to different joints. For example, since the knee joint can only rotate in the front-rear direction during walking and rarely rotates in other directions, a constraint condition that the rotation direction is the front-rear direction can be added to the knee joint.

It is understood that in the present embodiment, the pre-processing module or the post-processing module may be eliminated, or may be replaced by another module. For example, the functionality of the pre-processing module or the post-processing module may be replaced with any other rule or with a machine learning model. In addition, in other embodiments, the function of the dynamic state machine can be replaced by defining the gait information, or the gait information input by the model can be replaced by the dynamic state machine.

And S308, driving the digital human to move according to the bone motion information.

Specifically, the implementation manner of S308 and S105 is consistent with specific principles, and is not described herein again.

Optionally, the machine learning model is obtained by training according to preset bone motion information and a plurality of pieces of differential environment information adapted to the preset bone motion information.

For example, when training a machine learning model corresponding to a motion control module, a large amount of differentiated environment information adapted to skeletal motion information may be input, and the environment information may be environment information in the real world. For example, in the process of training the action of the digital human seat, the skeletal motion information of the digital human seat is fixed, but in different scenes, the chair on which the digital human seat is seated may be different, and therefore, by adding different types of chairs, differentiated environment information can be constructed, and the training data of the machine learning model can be diversified. By the improvement of the environment adaptation capability, the generalization of the machine learning model under different environments can be greatly improved. Therefore, the problem of poor generalization caused by the fact that corresponding action segments need to be reconstructed in different scenes based on a traditional state machine and an action library matching scheme is solved. In addition, although a certain cost is required for constructing the differential environment information, the labor cost can be greatly reduced through an excellent environment self-adaptive algorithm, and the cost for constructing the environment information is much lower than the cost for collecting the adaptive bone motion information under different environments.

In addition, since the motion control module is introduced in the embodiment, the algorithm adopted by the motion control module is a generative algorithm, for example, a machine learning model corresponding to the motion control module can generate bone motion information in real time. Compared with the traditional state machine and action library matching algorithm, the machine learning model can directly output the bone motion information instead of extracting a certain animation segment from the existing animation segments. In the training process of the machine learning model, training data input into the machine learning model comprises bone motion information, some training data may have no frame skipping or mode crossing phenomenon, and some training data may have frame skipping or mode crossing phenomenon, however, in the training process, the machine learning model can fit a large amount of training data and automatically learn, so that in the fitting and learning process, parameters of the machine learning model can be influenced by good training data (for example, training data without frame skipping or mode crossing phenomenon) and can also be influenced by bad training data (for example, training data with frame skipping or mode crossing phenomenon). Therefore, after the parameters of the machine learning model tend to be stable, the machine learning model may automatically perform rough correction on unrefined bone motion information, or may automatically perform fine correction on already rough corrected bone motion information. Therefore, huge cost caused by fine modification of each action fragment in the action library is saved.

In addition, the traditional state machine needs to manually construct a large number of state transition conditions according to different states, and the cost is high. By introducing the motion matching algorithm, the disclosed embodiment can automatically match the next adapted animation segment from the action library (for example, a database), and this process is automatically learned without introducing additional labor cost. Since the motion matching algorithm extracts skeletal motion information from existing animation segments, when an animation segment is found to not meet the criteria, the animation segment or a portion of the animation segment in the motion library may be modified.

In addition, if a digital person is required to complete a plurality of, for example, 10 actions for one control instruction, and the control instruction is executed by the motion control module, in the test stage, the digital person may be driven by the motion control module to complete the 10 actions while monitoring whether the effect of each action performed by the digital person is up to standard, and if the digital person is found not to be up to standard when performing the 3 rd action and the 4 th action. Then in the release phase, the motion matching module can drive the digital person to complete the 3 rd action and the 4 th action, and the motion control module can drive the digital person to complete other actions. Therefore, the motion matching module and the motion control module can be perfectly combined and flexibly switched, and the display effect of the digital human drive is improved. Therefore, the problem that data generated by the motion control module in a part of scenes do not reach the standard is solved by using the advantages of controllability and fine generation action of the motion matching module.

It can be understood that the method described in this embodiment can be applied to many scenarios, for example, a metas, a virtual anchor, and so on.

Fig. 5 is a schematic structural diagram of a digital human driving apparatus provided in an embodiment of the present disclosure. The digital human driving apparatus provided in the embodiment of the present disclosure may execute the processing procedure provided in the embodiment of the digital human driving method, as shown in fig. 5, the digital human driving apparatus 50 includes:

an obtaining module 51, configured to obtain a control instruction for driving a digital person;

a first determining module 52, configured to determine, according to the control instruction, a target module for executing the control instruction from the motion matching module and the motion control module;

a second determining module 53, configured to, if the target module is the motion matching module, determine, according to the control instruction, a target animation segment that matches the control instruction from multiple preset animation segments, and use bone motion information in the target animation segment as bone motion information for driving the digital person;

a generating module 54, configured to, if the target module is the motion control module, input the control instruction, historical motion skeleton information and historical motion trajectory of the digital person into a machine learning model that is trained in advance, and generate skeleton motion information for driving the digital person through the machine learning model;

and the driving module 55 is used for driving the digital person to move according to the bone motion information.

Optionally, the obtaining module 51 is further configured to obtain at least one control signal for driving the digital person before obtaining the control instruction for driving the digital person; the digital human drive device 50 further comprises: the device comprises an analysis module 56 and a sequencing module 57, wherein the analysis module 56 is used for respectively analyzing each control signal into at least one control instruction, and the sequencing module 57 is used for sequencing the at least one control instruction corresponding to the at least one control signal to obtain a sequencing result; when the obtaining module 51 obtains a control instruction for driving the digital person, it is specifically configured to: and acquiring the current first unexecuted control instruction from the sequencing result.

Optionally, when the first determining module 52 determines, according to the control instruction, a target animation segment matched with the control instruction from a plurality of preset animation segments, the first determining module is specifically configured to:

according to at least one historical animation segment driving the digital human to move and the control instruction, determining a target animation segment which is matched with the control instruction and is connected with the at least one historical animation segment from a plurality of preset animation segments; or

And determining a target animation segment which is matched with the control instruction and is connected with the at least one historical animation segment from a plurality of preset animation segments according to the historical motion track of the digital person, the at least one historical animation segment for driving the digital person to move and the control instruction.

Optionally, the generating module 54 inputs the control instruction, the historical movement skeletal information of the digital person, and the historical movement trajectory into a machine learning model trained in advance, and when the machine learning model generates skeletal movement information for driving the digital person, the generating module is specifically configured to:

and inputting the control command, the environmental information around the digital person, the historical movement skeleton information and the historical movement track of the digital person into a machine learning model which is trained in advance, and generating the skeleton movement information for driving the digital person at the next moment through the machine learning model.

Optionally, the machine learning model is obtained by training according to preset bone motion information and a plurality of pieces of differential environmental information adapted to the preset bone motion information.

Optionally, the environmental information around the digital person includes at least one of:

the height information of each track point on the historical motion track of the preset length passed by the digital person;

voxel information of virtual objects within a preset range around the digital person;

trajectory information of dynamic objects around the digital person;

and contact information of the dynamic objects around the digital person and the digital person.

Optionally, the historical kinematic skeleton information of the digital person includes at least one of:

the position information, the displacement information and the rotation information of each skeleton point of the digital person on each track point in the historical motion track;

and the state information of the digital person on each track point in the historical motion track.

The digital human driver of the embodiment shown in fig. 5 can be used to implement the technical solutions of the above method embodiments, and the implementation principles and technical effects are similar, which are not described herein again.

The internal functions and structure of the digital human drive apparatus, which can be implemented as an electronic device, are described above. Fig. 6 is a schematic structural diagram of an embodiment of an electronic device provided in the embodiment of the present disclosure. As shown in fig. 6, the electronic device includes a memory 61 and a processor 62.

The memory 61 is used to store programs. In addition to the above-described programs, the memory 61 may also be configured to store other various data to support operations on the electronic device. Examples of such data include instructions for any application or method operating on the electronic device, contact data, phonebook data, messages, pictures, videos, and the like.

The memory 61 may be implemented by any type or combination of volatile or non-volatile storage devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The processor 62 is coupled to the memory 61 and executes programs stored in the memory 61 for:

acquiring a control instruction for driving a digital person;

and driving the digital human to move according to the bone movement information.

Further, as shown in fig. 6, the electronic device may further include: communication components 63, power components 64, audio components 65, display 66, and the like. Only some of the components are schematically shown in fig. 6, and the electronic device is not meant to include only the components shown in fig. 6.

The communication component 63 is configured to facilitate wired or wireless communication between the electronic device and other devices. The electronic device may access a wireless network based on a communication standard, such as WiFi,2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 63 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 63 further comprises a Near Field Communication (NFC) module to facilitate short-range communication. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

A power supply component 64 provides power to the various components of the electronic device. The power components 64 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for electronic devices.

The audio component 65 is configured to output and/or input an audio signal. For example, the audio assembly 65 includes a Microphone (MIC) configured to receive external audio signals when the electronic device is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory 61 or transmitted via the communication component 63. In some embodiments, audio assembly 65 also includes a speaker for outputting audio signals.

The display 66 includes a screen, which may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation.

In addition, the disclosed embodiments also provide a computer readable storage medium, on which a computer program is stored, the computer program being executed by a processor to implement the digital human drive method described in the above embodiments.

It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.

The previous description is only for the purpose of describing particular embodiments of the present disclosure, so as to enable those skilled in the art to understand or implement the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A digital human driven method, wherein the method comprises:

acquiring a control instruction for driving a digital person;

driving the digital human to move according to the bone movement information;

the historical kinematic skeleton information of the digital person comprises at least one of the following:

the position information, the displacement information and the rotation information of each skeleton point of the digital person on each track point in the historical motion trail;

2. The method of claim 1, wherein prior to obtaining the control instructions for driving the digital person, the method further comprises:

acquiring at least one control signal for driving a digital person;

respectively analyzing each control signal into at least one control instruction;

sequencing at least one control instruction corresponding to the at least one control signal respectively to obtain a sequencing result;

accordingly, acquiring a control instruction for driving a digital person, includes:

and acquiring the current first unexecuted control instruction from the sequencing result.

3. The method of claim 1, wherein determining a target animation segment matching the control instruction from a plurality of preset animation segments according to the control instruction comprises:

4. The method of claim 1, wherein inputting the control instruction, historical movement skeletal information and historical movement trajectory of the digital person into a pre-trained machine learning model, generating skeletal movement information for driving the digital person through the machine learning model, comprises:

and inputting the control command, the environmental information around the digital person, the historical movement skeletal information of the digital person and the historical movement track into a machine learning model which is trained in advance, and generating skeletal movement information for driving the digital person at the next moment through the machine learning model.

5. The method according to claim 4, wherein the machine learning model is trained from preset bone motion information and a plurality of differentiated environment information adapted to the preset bone motion information.

6. The method of claim 4, wherein the environmental information surrounding the digital person comprises at least one of:

trajectory information of dynamic objects around the digital person;

and the contact information of the dynamic objects around the digital person and the digital person.

7. A digital human actuation device, comprising:

a second determining module, configured to determine, according to the control instruction, a target animation segment that matches the control instruction from a plurality of preset animation segments if the target module is the motion matching module, and use bone motion information in the target animation segment as bone motion information for driving the digital person;

the driving module is used for driving the digital person to move according to the bone movement information;

8. An electronic device, comprising:

a memory;

a processor; and

a computer program;

wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of any one of claims 1-6.

9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-6.