CN113539218B

CN113539218B - Real-time interaction method and terminal for virtual images

Info

Publication number: CN113539218B
Application number: CN202010298575.1A
Authority: CN
Inventors: 陈节省; 李中冬; 许荣峰; 郭天祈; 陈江煌; 林剑宇
Original assignee: Fujian Kaimi Network Science & Technology Co ltd
Current assignee: Fujian Kaimi Network Science & Technology Co ltd
Priority date: 2020-04-16
Filing date: 2020-04-16
Publication date: 2023-11-17
Anticipated expiration: 2040-04-16
Also published as: CN113539218A

Abstract

The invention discloses a real-time interaction method and a terminal for an avatar, which are used for determining a scene mode of an object to be identified according to accompaniment data acquired in real time, audio data of the object to be identified and action data, controlling the avatar to play actions corresponding to the scene mode in real time, wherein the avatar can dynamically play the corresponding actions according to different scene modes, the scene mode is determined by comprehensively considering the acquired accompaniment data, the audio data of the object to be identified and the action data, and when the object to be identified does not execute any action, the avatar can play the corresponding actions, so that the interactivity is enhanced, and under the condition that a user lacks interaction, the user can be guided to participate, and the user experience is improved.

Description

Real-time interaction method and terminal for virtual images

Technical Field

The invention relates to the field of interactive entertainment, in particular to a real-time virtual image interaction method and a terminal.

Background

In existing entertainment venues such as KTV, in order to improve user experience, an avatar is generally set to interact with a user. However, in the conventional avatar interaction implementation, usually, the avatar is played through a fixed preset action, and the playing action is relatively monotonous and lacks interactivity with the user. In order to improve interactivity, the avatar is generally set to perform based on the recognized user audio, but once the user audio is not recognized, the avatar does not act, and is dull, the interactivity is still poor, and the user experience is affected.

Disclosure of Invention

The technical problems to be solved by the invention are as follows: the method and the terminal for real-time interaction of the virtual images can enhance interactivity and improve user experience.

In order to solve the technical problems, the invention adopts a technical scheme that:

a method for real-time interaction of an avatar, comprising the steps of:

s1, acquiring accompaniment data, audio data of an object to be identified and action data in real time;

s2, determining a scene mode of the object to be identified according to the accompaniment data, the audio data and the action data;

and S3, controlling the virtual image to play the action corresponding to the scene mode in real time.

In order to solve the technical problems, the invention adopts another technical scheme that:

a terminal for real-time interaction of avatars, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:

The invention has the beneficial effects that: according to the accompaniment data acquired in real time, the audio data of the object to be identified and the action data, determining the scene mode of the object to be identified, and controlling the virtual image to play the action corresponding to the scene mode in real time, wherein the virtual image can dynamically play the corresponding action along with the different scene modes, the scene mode is determined by comprehensively considering the acquired accompaniment data, the audio data of the object to be identified and the action data, and when the object to be identified does not execute any action, the virtual image can play the corresponding action, so that the interactivity is enhanced, and the user can be guided to participate in the condition that the user lacks interaction, and the user experience is improved.

Drawings

FIG. 1 is a flowchart illustrating steps of a method for real-time interaction of an avatar according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a real-time interactive terminal for an avatar according to an embodiment of the present invention;

description of the reference numerals:

1. a real-time interactive terminal of virtual image; 2. a memory; 3. a processor.

Detailed Description

In order to describe the technical contents, the achieved objects and effects of the present invention in detail, the following description will be made with reference to the embodiments in conjunction with the accompanying drawings.

Referring to fig. 1, a method for real-time interaction of an avatar includes the steps of:

From the above description, the beneficial effects of the invention are as follows: according to the accompaniment data acquired in real time, the audio data of the object to be identified and the action data, determining the scene mode of the object to be identified, and controlling the virtual image to play the action corresponding to the scene mode in real time, wherein the virtual image can dynamically play the corresponding action along with the different scene modes, the scene mode is determined by comprehensively considering the acquired accompaniment data, the audio data of the object to be identified and the action data, and when the object to be identified does not execute any action, the virtual image can play the corresponding action, so that the interactivity is enhanced, and the user can be guided to participate in the condition that the user lacks interaction, and the user experience is improved.

Further, the step S2 includes:

determining a first matching degree of the audio of the object to be identified and accompaniment according to the accompaniment data and the audio data;

determining standard rhythm and standard rhythm strength according to the accompaniment data;

respectively determining a second matching degree of the object to be identified and the standard rhythm and a third matching degree of the object to be identified and the standard rhythm strength according to the action data;

and determining a scene mode of the object to be identified according to the first matching degree, the second matching degree and the third matching degree.

From the above description, according to the matching degree of the audio frequency and accompaniment of the object to be identified and the matching of the action data of the object to be identified with the standard rhythm and the standard rhythm strength, the scene mode of the object to be identified is determined by comprehensively considering the matching degree of the audio frequency and accompaniment of the object to be identified and the matching degree of the action data of the object to be identified with the standard rhythm and the standard rhythm strength, so that not only can the relatively accurate scene mode of the object to be identified be determined, but also various scene modes of the object to be identified can be determined relatively comprehensively, the universality is strong, the method is suitable for various application scenes, and the interaction type of the virtual image is enriched.

Further, the determining, according to the first matching degree, the second matching degree, and the third matching degree, a scene mode in which the object to be identified is located includes:

determining a matching result according to the first matching degree, the second matching degree and the third matching degree;

determining a scene mode of the object to be identified according to the matching result;

the step S3 comprises the following steps:

if the matching result is smaller than a first preset value, the virtual image is controlled to play a first preset action;

if the matching result is smaller than a second preset value, controlling the virtual image to play a second preset action;

if the matching result is smaller than a third preset value, the virtual image is controlled to play a third preset action;

if the matching result is greater than or equal to the third preset value, controlling the virtual image to play a fourth preset action;

the first preset value is smaller than a second preset value, and the second preset value is smaller than a third preset value.

According to the above description, the virtual images are controlled to play different corresponding preset actions in different scene modes, so that the interaction diversity is improved, the user can see the interaction of the virtual images in different scenes, feedback is given to the user, the user can be guided, and the user experience is improved.

Further, determining the second matching degree of the object to be identified and the standard rhythm according to the action data includes:

determining the time point and the time length of each rhythm in the standard rhythms as a first rhythm data set;

determining the time point and the time length of each rhythm in the shaking rhythms of the objects to be identified according to the action data, and taking the time point and the time length as a second rhythm data set;

and determining a second matching degree of the object to be identified and the standard rhythm according to the matching degree between the data corresponding to each of the first rhythm data set and the second rhythm data set.

As can be seen from the above description, the time point and the duration of each rhythm in the motion data of the object to be identified are compared with the time point of each corresponding rhythm in the standard rhythm in time to determine the matching degree of the object to be identified and the standard rhythm, so that the matching degree between the object to be identified and the standard rhythm can be accurately determined, and the scene mode of the object to be identified can be accurately determined.

Further, determining the third matching degree between the object to be identified and the standard rhythm intensity according to the action data includes:

determining each rhythm point and corresponding intensity in the standard rhythm as a first rhythm intensity data set;

determining each rhythm point and corresponding action amplitude of the object to be identified according to the action data to serve as a second rhythm intensity data set;

and determining a third matching degree of the object to be identified and the standard rhythm intensity according to the matching degree between the data corresponding to each of the first rhythm intensity data set and the second rhythm intensity data set.

As can be seen from the above description, the matching degree between the object to be identified and the standard rhythm strength can be accurately determined by comparing each rhythm point and corresponding action amplitude in the motion data of the object to be identified with each corresponding rhythm point and corresponding action amplitude in the standard rhythm, which is beneficial to accurately determining the scene mode of the object to be identified.

Further, the step S3 further includes:

and switching actions corresponding to different scene modes is realized in a smooth transition mode.

As can be seen from the above description, the actions corresponding to the different scene modes are different, and when the switching of the different scene modes is realized, the smooth transition is realized, so that the unnatural phenomenon caused by the mutual switching of the new action and the current action is avoided, the smooth transition can be realized, and the user experience is improved.

Further, the switching of the actions corresponding to the different scene modes through the smooth transition mode includes:

judging whether the virtual image plays a new action or not, if so, stopping the current action;

determining the amplitude difference of the motion according to the state of the current motion and the state of the first frame of the new motion;

calculating a frame supplementing time according to the amplitude difference;

determining a frame supplementing action according to the amplitude difference and the frame supplementing time;

and controlling the virtual image to play the frame supplementing action, and controlling the virtual image to play the new action after the frame supplementing action is played.

As can be seen from the above description, the frame compensating time is calculated according to the amplitude difference between the new motion and the current motion, the frame compensating motion is determined according to the amplitude difference and the frame compensating time, the frame compensating motion is automatically played in the process of switching the current motion to the new motion, and smooth transition between the current motion and the new motion can be well realized in a frame compensating manner.

Further, the determining the amplitude difference of the motion according to the state of the current motion and the state of the first frame of the new motion, and calculating the frame compensating time according to the amplitude difference includes:

presetting action sampling points on the virtual image;

calculating a distance from a state of a current motion to a state of a first frame of the new motion for each motion sampling point;

determining a distance difference average value according to the distance corresponding to each action sampling point, and taking the distance difference average value as the amplitude difference of the action;

and calculating the frame supplementing time according to the distance difference average value, the maximum distance difference value and the time corresponding to the maximum distance difference value.

As can be seen from the above description, a plurality of motion sampling points are preset, and the average value of the distance differences between the preset plurality of motion sampling points between the current motion and the new motion is determined to be the amplitude difference, so as to calculate the frame-supplementing time according to the amplitude difference, thereby improving the accuracy of calculating the amplitude difference and the frame-supplementing time, and further improving the smoothness of the transition from the current motion to the new motion.

Further, the method further comprises the following steps:

and in the process of controlling the virtual image to play the frame supplementing action, playing the corresponding switching special effect according to the scene mode to be switched.

As can be seen from the above description, in the process of playing the frame supplementing action, the special effect is switched according to the scene playing, the attention of the user is attracted through the switching of the special effect, and the phenomenon that part of actions are unnatural is blocked, so that the smoothness of transition is further improved.

Referring to fig. 2, a real-time interactive terminal for an avatar includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the real-time interactive method for an avatar when executing the computer program.

Example 1

the method comprises the steps that the behavior of a user can be collected in real time through equipment such as a camera, a handheld controller and a microphone, so that audio data and action data of the user can be obtained in real time;

specifically, a first matching degree of the audio of the object to be identified and accompaniment is determined according to the accompaniment data and the audio data;

real-time scoring can be performed on song singing, and real-time song singing scoring, namely, a first score is used as a first matching degree;

and respectively obtaining corresponding scores according to the second matching degree and the third matching degree: a second fraction and a third fraction;

determining a scene mode of the object to be identified according to the first matching degree, the second matching degree and the third matching degree;

the determining the scene mode of the object to be identified according to the first matching degree, the second matching degree and the third matching degree comprises the following steps:

specifically, determining a final score according to the first matching degree, the second matching degree, the third matching degree and the weights of the first matching degree, the second matching degree, the third matching degree and the third matching degree, and taking the final score as a final matching result;

the second matching degree can be determined through a sensor of the mobile phone, and can be matched with the standard rhythm by judging the rhythm of shaking the mobile phone by a user;

the third matching degree can be determined through capturing by the camera, and the matching degree of the limb position change and the rhythm of the user can be determined through a picture captured by the camera;

the weights may be dynamically set according to a specific application scenario, for example, a scenario focusing on singing may be set to have the highest real-time score of song singing, for example: final score = first score 0.7+ second score corresponding to shaking the cell phone 0.2+ camera identification determined third score 0.1; focusing on the scene of entertainment and rhythm, the weight of the matching degree of shaking the mobile phone and the limb actions can be set higher, for example: final score = second score corresponding to remote control handset 0.4+ third score determined by camera recognition 0.5+ first score 0.1;

The step S3 comprises the following steps:

the first preset value is smaller than a second preset value, and the second preset value is smaller than a third preset value;

specifically, the preset actions of the virtual image playing corresponding to the scene are controlled according to the scoring result, for example:

when the final score is smaller than 10, the user is regarded as no operation at all, and the virtual image plays the preset action in the idle state;

when the score is smaller than 40, the virtual image plays a preset action of negative feedback;

when the score is less than 70, the virtual image plays a preset action of oiling feedback;

when the score is higher than 70, the avatar plays the preset action of the motivational feedback.

Example two

The difference between the present embodiment and the first embodiment is that further defining that determining the second matching degree of the object to be identified and the standard tempo according to the motion data includes:

determining a second matching degree of the object to be identified and the standard rhythm according to the matching degree between the data corresponding to each of the first rhythm data set and the second rhythm data set;

taking shaking the phone according to the rhythm as an example:

second score = 100 x (number of jolts with standard cadence error less than a preset value (e.g., 100 ms)/(total number of jolts);

such as:

the standard cadence is: 1000 ms, 1500 ms, 2000 ms, 3000 ms total 4 rhythm points, each rhythm point has corresponding time point;

determining the shaking rhythm corresponding to the user according to the corresponding time point as follows: 950 ms, 1480 ms, 1550 ms, 1700 ms, 3050 ms total 5 shakes;

the user shake cadence with error less than 100 milliseconds is: 950 ms, 1480 ms, 3050 ms, total 3; where 1480 and 1550 are both close to the 1500 tempo point, but 1480 is closer and therefore calculated only once;

final score = 100 x 3/5 = 60;

obviously, the user does not shake faster or more times to get a high score, but needs to be able to approach the correct rhythm point as close as possible, and the correct rhythm point needs to be as consistent as possible in duration;

determining a third matching degree of the object to be identified and the standard rhythm intensity according to the action data comprises:

determining a third matching degree of the object to be identified and the standard rhythm intensity according to the matching degree between the data corresponding to each of the first rhythm intensity data set and the second rhythm intensity data set;

taking an image shot by a camera for image recognition, and judging the matching degree of the limb position change and the rhythm of the user as an example:

score = 100 x (number of rhythm points with motion amplitude greater than a preset value (e.g., 0.1)/(total number of rhythm points);

for example:

motion amplitude= (sum of specific body part movement distances)/human body height;

wherein the specific body part comprises: head, hands, feet;

the total rhythm points are 3, and the motion amplitudes recognized and calculated by the cameras are respectively as follows: 0.08, 0.2, 0.15, and a number of tempo points greater than 0.1 of 2, the score=100×2/3=66 points.

Example III

The difference between this embodiment and the first or second embodiment is that the S3 further includes:

switching actions corresponding to different scene modes is realized in a smooth transition mode;

the switching of actions corresponding to different scene modes through a smooth transition mode comprises the following steps:

calculating a frame supplementing time according to the amplitude difference;

controlling the virtual image to play the frame supplementing action, and controlling the virtual image to play the new action after the frame supplementing action is played;

wherein, determining the amplitude difference of the motion according to the state of the current motion and the state of the first frame of the new motion, and calculating the frame compensating time according to the amplitude difference includes:

presetting action sampling points on the virtual image;

calculating the frame supplementing time according to the distance difference average value, the maximum distance difference value and the time corresponding to the maximum distance difference value;

in another alternative embodiment, the method further comprises:

in the process of controlling the virtual image to play the frame supplementing action, playing a corresponding switching special effect according to a scene mode to be switched;

for example, the switching from the first scene mode to the second scene mode has a corresponding switching effect, and the switching from the second scene mode to the third scene mode also has a corresponding switching effect;

specifically, when a new action is generated, the current action is immediately paused;

calculating the amplitude difference of the motion according to the state of the current motion and the state of the first frame of the new motion;

the method for calculating the amplitude difference comprises the following steps:

for each bone point of the avatar, calculating the distance between the current state and the state of the first frame of the new motion, and averaging the distance differences of all bone points;

calculating the frame supplementing time according to the action amplitude difference; the calculation formula is as follows:

frame time of supplement= (500 ms) × (skeletal point distance difference average)/(maximum distance difference value);

wherein 500 milliseconds is the time corresponding to the maximum distance difference value;

automatically calculating frame-supplementing action data according to the action amplitude difference and the frame-supplementing time;

for example, when the wrist position in the current state is A and the wrist position in the new motion is B, key frame motion data of the wrist is inserted between the A position and the B position according to the frame supplementing time;

playing the frame supplementing action data;

in the process of playing the frame supplementing action, the special effect is switched according to scene playing:

for example: when the idle state is changed into the interactive state, special effects of star explosion are played, and the switching special effects can be used for attracting the attention of a user and shielding the phenomenon that part of actions are unnatural;

after the frame supplementing action is finished, continuing to play the new action;

because the frame supplementing action duration is generally shorter, the feedback of the new action in real time can be realized.

Example IV

Referring to fig. 2, a terminal 1 for real-time interaction of avatars includes a memory 2, a processor 3 and a computer program stored in the memory 2 and executable on the processor 3, wherein the processor 3 implements the steps of any of the first to third embodiments when executing the computer program.

In summary, the method and terminal for real-time interaction of the avatar provided by the invention determine the scene mode where the object to be identified is located according to the accompaniment data acquired in real time by using the existing equipment, the audio data of the object to be identified and the action data, and control the avatar to play the action corresponding to the scene mode in real time, the avatar can dynamically play the corresponding action according to the different scene modes, the scene mode is determined by comprehensively considering the acquired accompaniment data, the audio data of the object to be identified and the action data, when the object to be identified does not execute any action, the avatar can play the corresponding action, in the switching process of different scene modes, the balanced transition is realized by automatically adding the complementary frame action, the existing equipment is used, no additional hardware cost is needed, the method is suitable for inputting the limited scenes of the input equipment such as KTV places, feedback can be generated in real time according to the action of users, the interest of song singing is improved, the interaction performance is enhanced, and the avatar can be played according to the preset action under the condition that the user lacks interaction, so that the user experience is improved; when different scene modes are switched mutually, the frame supplementing realizes smooth transition instead of waiting for the completion of the current action playing, and feedback can be carried out in real time, so that the interaction effect and the user experience are further improved.

The foregoing description is only illustrative of the present invention and is not intended to limit the scope of the invention, and all equivalent changes made by the specification and drawings of the present invention, or direct or indirect application in the relevant art, are included in the scope of the present invention.

Claims

1. A method for real-time interaction of an avatar, comprising the steps of:

s3, controlling the virtual image to play the action corresponding to the scene mode in real time;

the step S3 further includes:

calculating a frame supplementing time according to the amplitude difference;

determining the amplitude difference of the motion according to the state of the current motion and the state of the first frame of the new motion, and calculating the frame supplementing time according to the amplitude difference comprises:

presetting action sampling points on the virtual image;

frame time compensation= (time corresponding to maximum distance difference value) × (distance difference average)/(maximum distance difference value).

2. The method for real-time interaction of an avatar according to claim 1, wherein the S2 comprises:

3. The method for real-time interaction of an avatar according to claim 2, wherein the determining a scene mode in which the object to be recognized is located according to the first matching degree, the second matching degree and the third matching degree comprises:

the step S3 comprises the following steps:

4. The method for real-time interaction of an avatar according to claim 2, wherein determining the second degree of matching of the object to be recognized with the standard cadence according to the motion data comprises:

5. The method for real-time interaction of an avatar according to claim 2, wherein determining a third degree of matching of the object to be recognized with a standard cadence intensity according to the motion data comprises:

6. The method for real-time interaction of an avatar according to claim 1, further comprising:

7. A terminal for real-time avatar interaction comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method for real-time avatar interaction as claimed in any one of claims 1 to 6 when executing the computer program.