CN112527105B

CN112527105B - Man-machine interaction method and device, electronic equipment and storage medium

Info

Publication number: CN112527105B
Application number: CN202011364599.9A
Authority: CN
Inventors: 崔璐
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-11-27
Filing date: 2020-11-27
Publication date: 2023-07-21
Anticipated expiration: 2040-11-27
Also published as: CN112527105A

Abstract

The application discloses a man-machine interaction method, a man-machine interaction device, electronic equipment and a storage medium, and relates to the technical field of computers, in particular to the technical field of artificial intelligence such as voice recognition, natural language processing and deep learning. The specific implementation scheme is as follows: acquiring voice data input by a user; determining the interaction intention of the user according to the voice data; acquiring interaction action parameters corresponding to the interaction intention; and generating an interactive animation according to the interactive action parameters and the virtual character model, and displaying the interactive animation to the user, so that the real-person interactive scene can be effectively simulated, the application scene of the virtual character model is expanded, the interactive efficiency is improved, and the interactive experience of the user is improved.

Description

Man-machine interaction method and device, electronic equipment and storage medium

Technical Field

The application relates to the technical field of computers, in particular to the technical field of artificial intelligence such as voice recognition, natural language processing and deep learning, and especially relates to a man-machine interaction method, a device, electronic equipment and a storage medium.

Background

Artificial intelligence is the discipline of studying the process of making a computer mimic certain mental processes and intelligent behaviors (e.g., learning, reasoning, thinking, planning, etc.) of a person, both hardware-level and software-level techniques. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, a machine learning/deep learning technology, a big data processing technology, a knowledge graph technology and the like.

In an application scenario of a virtual character model, for example, training of the virtual character model using live video recording may be supported, ultimately enabling the synthesis of the live image using the live audio.

Disclosure of Invention

A human-machine interaction method, apparatus, electronic device, storage medium and computer program product are provided.

According to a first aspect, a man-machine interaction method is provided, comprising: acquiring voice data input by a user; determining the interaction intention of the user according to the voice data; acquiring interaction action parameters corresponding to the interaction intention; and generating an interactive animation according to the interactive action parameters and the virtual character model, and displaying the interactive animation to the user.

According to a second aspect, there is provided a human-machine interaction device comprising: the first acquisition module is used for acquiring voice data input by a user; the determining module is used for determining the interaction intention of the user according to the voice data; the second acquisition module is used for acquiring interaction action parameters corresponding to the interaction intention; and the generation module is used for generating an interactive animation according to the interactive action parameters and the virtual character model and displaying the interactive animation to the user.

According to a third aspect, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute the man-machine interaction method of the embodiment of the application.

According to a fourth aspect, a non-transitory computer readable storage medium storing computer instructions for causing a computer to execute the human-machine interaction method disclosed in the embodiments of the present application is provided.

According to a fifth aspect, a computer program product is proposed, comprising a computer program, which, when executed by a processor, implements the man-machine interaction method disclosed in the embodiments of the present application.

It should be understood that the description of this section is not intended to identify key or critical features of the embodiments of the application or to delineate the scope of the application. Other features of the present application will become apparent from the description that follows.

Drawings

The drawings are for better understanding of the present solution and do not constitute a limitation of the present application. Wherein:

FIG. 1 is a schematic diagram according to a first embodiment of the present application;

FIG. 2 is a schematic diagram according to a second embodiment of the present application;

FIG. 3 is a schematic diagram according to a third embodiment of the present application;

FIG. 4 is a schematic diagram according to a fourth embodiment of the present application;

fig. 5 is a block diagram of an electronic device for implementing a man-machine interaction method according to an embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present application to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 is a schematic diagram according to a first embodiment of the present application.

It should be noted that, the execution body of the man-machine interaction method in this embodiment is a man-machine interaction device, and the device may be implemented in a software and/or hardware manner, and the device may be configured in an electronic device, where the electronic device may include, but is not limited to, a terminal, a server, and the like.

The embodiment of the application relates to the technical field of artificial intelligence such as voice recognition, natural language processing, deep learning and the like.

Wherein, artificial intelligence (Artificial Intelligence), english is abbreviated AI. It is a new technical science for researching, developing theory, method, technology and application system for simulating, extending and expanding human intelligence.

Deep learning is the inherent regularity and presentation hierarchy of learning sample data, and the information obtained during such learning is helpful in interpreting data such as text, images and sounds. The final goal of deep learning is to enable a machine to analyze learning capabilities like a person, and to recognize text, images, and sound data.

Natural language processing can realize various theories and methods for effective communication between people and computers by natural language. Deep learning is the inherent regularity and presentation hierarchy of learning sample data, and the information obtained during such learning is helpful in interpreting data such as text, images and sounds. The final goal of deep learning is to enable a machine to analyze learning capabilities like a person, and to recognize text, images, and sound data.

The voice recognition is a technology for enabling a machine to convert a voice signal into a corresponding text or command through a recognition and understanding process, and the voice recognition technology mainly comprises three aspects of a feature extraction technology, a pattern matching criterion and a model training technology.

As shown in fig. 1, the man-machine interaction method includes:

s101: and acquiring voice data input by a user.

Wherein the voice data entered by the user may be, for example, "how is today weather? ".

When acquiring voice data input by a user, the voice data input by the user can be received directly through a microphone of the electronic device, or a voice file (such as a file in WAV format) input by the user can be acquired, and analysis processing is performed on the voice file, so that corresponding voice data is obtained.

S102: and determining the interaction intention of the user according to the voice data.

After the voice data input by the user is obtained, the interactive intention of the user can be determined according to the voice data in real time, for example, when the user inputs a piece of voice data, "how is the weather today? The interactive intention of the user can be determined as the query intention, and when the user inputs a section of voice to help check a roadmap, the interactive intention of the user can be determined as the service intention, which is not limited.

In the embodiment of the application, the electronic device can be combined with a semantic analysis model in some artificial intelligence to determine the interaction intention of the user according to the voice data.

Optionally, in some embodiments, voice recognition features in the voice data may be obtained, and the interaction intention of the user may be determined according to the voice recognition features, so that the interaction intention of the user may be quickly and accurately determined.

The speech recognition features may be, for example, semantics corresponding to the speech data, or may be, for example, pitch corresponding to the speech data, or the like.

In an embodiment of the present application, the speech recognition features include: semantic text and emotional features, wherein the semantic text is, for example, a semantic corresponding to voice data, and text formed according to the semantic text (text formed according to the semantic may be called semantic text), and the emotional features such as happiness, smouldering and the like, when identifying the emotional features, the corresponding emotional features may be identified by combining the semantic and the tone and the like, without limitation.

In this embodiment of the present application, after the semantic text and the emotional characteristics in the voice data are obtained, the semantic text and the emotional characteristics may be used to assist in determining the interactive intention of the user, which may be described in the following.

Optionally, in some embodiments, in using semantic text and emotional characteristics to assist in determining the user's interactive intent, it may be performed: if the semantic text contains the interaction instruction word, comparing the interaction instruction word with a preset interaction keyword; if the interaction instruction word is matched with the interaction keyword, the interaction intention corresponding to the semantic text and the emotion feature is obtained by adopting a preconfigured intention recognition rule.

For example, the interactive command word is used to trigger a function of controlling the interaction between the virtual character model and the user, the interactive command word such as "interaction", "service", "chat", etc. is firstly to perform semantic analysis on the voice data, when the "interaction" or "service" or "chat" is identified, the identified content can be compared with a pre-configured interaction keyword (such as the default "chat" of the electronic device is the pre-configured interaction keyword), when the identified "chat" is matched with the pre-configured interaction keyword "chat", the user can be determined to have a requirement of interacting with the virtual character model, and the process can be regarded as a process of waking up the virtual character model.

When it is determined that the user has a need to interact with the virtual character model, the interaction intention corresponding to the semantic text and the emotion feature can be further acquired by adopting a preconfigured intention recognition rule.

That is, when the user is determined to have the requirement of interacting with the virtual character model, the corresponding interaction intention is determined, so that the hardware or software resource consumption of the electronic device for identifying the interaction intention can be saved, the acquisition efficiency of the interaction intention is improved, and when the user is determined to have the requirement of interacting with the virtual character model, the corresponding interaction intention, such as the service intention or the inquiry intention or the boring intention, can be determined by combining the semantic text and the emotion characteristics, and the accuracy of identifying the interaction intention can be effectively improved.

In other embodiments, default motion parameters may also be obtained and a default animation generated based on the default motion parameters and the virtual character model if it is determined that the user does not have a need to interact with the virtual character model.

Alternatively, the default action parameters may be read randomly from a pre-configured default action library, so that the default action parameters can be determined quickly.

Such as some of the actual person's movements in nature, such as a head, blink, etc., and default movement parameters, such as some of the control parameters that control the virtual character model to make the default movements, such as a head, blink, etc.

In this embodiment of the present invention, after the virtual character model is not triggered to perform interaction, or after the interaction is completed, default action parameters may be obtained, and default animation (that is, an animation that controls the virtual character model to perform effects of default actions such as raising hair and blinking, and the like, which may be referred to as default animation) is generated according to the default action parameters and the virtual character model, and the default animation is displayed, so that the anthropomorphic effect of the virtual character model is more realistic, and the application scene of the virtual character model is enriched while the interaction effect is ensured.

S103: and acquiring interaction action parameters corresponding to the interaction intention.

The interaction action can be some action which is made when the virtual character model interacts with the user, such as some actions of lifting hands, nodding heads, smiling and the like when the virtual character model provides services for the user.

The interactive motion parameters are some control parameters for controlling the virtual character model to make interactive motions such as lifting hands, nodding heads, smiling and the like.

For example, the interactive intention may be input into the interactive parameter acquisition model, and the interactive action parameter corresponding to the interactive intention may be obtained according to the output of the interactive parameter acquisition model, which is not limited.

The interactive parameter acquisition model may be trained in advance and pre-stored in a storage space of the electronic device, so as to facilitate the retrieval application, and the storage space is not limited to a storage space based on an entity, for example, a hard disk, but also may be a storage space (cloud storage space) of a network hard disk connected with the electronic device.

In the embodiment of the present application, when the step of obtaining the interaction action parameter corresponding to the interaction intention is performed, the interaction semantic corresponding to the interaction instruction word may be identified from the semantic text, and the interaction action parameter corresponding to the interaction semantic and the interaction intention may be generated.

For example, the interactive command word is used to trigger a function of controlling the interaction between the virtual character model and the user, where the interactive command word such as "interaction", "service", "chat" and so on, firstly performs semantic analysis on the voice data, when the "interaction" or "service" or "chat" is identified, and after the virtual character model is awakened, the interactive semantic corresponding to the interactive command word "chat" may be identified, where the interactive semantic includes specific semantic content of "chat" corresponding to the semantic text, for example, when the semantic text is "chat, do you like to eat noodles? If the interactive instruction word "chat" is recognized, the specific semantic content (referred to as interactive semantic) that can be parsed into "chat" is "do you like to eat noodles? ", this is not limiting.

When the step of acquiring the interaction action parameters corresponding to the interaction intention is executed, the interaction semantics corresponding to the interaction instruction words can be identified from the semantic text, so that the interaction action parameters corresponding to the interaction semantics and the interaction intention are generated in an auxiliary mode, the matching degree between the generated interaction action parameters and the interaction semantics and the interaction intention can be effectively improved, and the interaction control effect of the virtual character model is improved.

S104: and generating an interactive animation according to the interactive action parameters and the virtual character model, and displaying the interactive animation to a user.

Specifically, after the electronic device obtains the interaction action parameters corresponding to the interaction intention, the interaction action parameters may be substituted into a function formula corresponding to the virtual character model to perform calculation, so as to generate an interaction animation, and the interaction animation is displayed to the user.

In this embodiment, by acquiring voice data input by a user, determining an interaction intention of the user according to the voice data, acquiring an interaction action parameter corresponding to the interaction intention, generating an interaction animation according to the interaction action parameter and the virtual character model, and displaying the interaction animation to the user, the interaction action parameter of the virtual character model is determined by adopting the interaction intention corresponding to the voice data input by the user, and the interaction animation is generated according to the interaction action parameter and the virtual character model, so as to interact with the user, thereby effectively simulating a real person interaction scene, expanding an application scene of the virtual character model, improving interaction efficiency, and improving interaction experience of the user.

Fig. 2 is a schematic diagram according to a second embodiment of the present application.

As shown in fig. 2, the man-machine interaction method includes:

s201: and acquiring voice data input by a user.

S202: a speech recognition feature in the speech data is obtained.

S203: and determining the interaction intention of the user according to the voice recognition characteristics.

S204: and identifying the interaction semantics corresponding to the interaction instruction words from the semantic text.

The descriptions of steps S201 to S204 may be specifically referred to the above embodiments, and are not repeated herein.

S205: and obtaining candidate interaction action parameters corresponding to the interaction intention.

That is, the embodiment of the present application supports preconfiguring an enrichment of an interaction action parameter, where the enrichment may include multiple interaction intents, and some optional interaction action parameters corresponding to each interaction intents.

After the interactive semantics corresponding to the interactive instruction words are identified from the semantic text, some optional interactive action parameters corresponding to the interactive intention (some optional interactive action parameters corresponding to the interactive intention may be referred to as candidate interactive action parameters) may be obtained first, and then the corresponding interactive action parameters are determined from the candidate interactive action parameters, so that diversity and richness of the interactive action parameters can be ensured, flexibility of presentation of the interactive action parameters is improved, and suitability of the interactive action parameters and an interactive scene is improved.

S206: and determining voice interaction data corresponding to the interaction semantics.

In connection with the above example, for example, the interactive semantic is "do you like to eat noodles? "the reply voice data for the interaction semantics may be referred to as voice interaction data, such as" like eat-! ", this is not limiting.

That is, in the embodiment of the application, the interactive animation is synthesized by simultaneously adopting the interactive action parameters and the voice interactive data, so that the virtual character model has a more real interactive effect, and the interactive experience of the user is improved.

S207: and selecting candidate interaction action parameters matched with the voice interaction data from the candidate interaction action parameters and taking the candidate interaction action parameters as corresponding interaction action parameters.

After the candidate interaction motion parameters corresponding to the interaction intention are obtained and the voice interaction data corresponding to the interaction semantics are determined, the candidate interaction motion parameters matched with the voice interaction data can be selected from the candidate interaction motion parameters and used as the corresponding interaction motion parameters, for example, the voice interaction data such as "like eat-! As a further example, the matched candidate interactive motion parameters may be the motion parameters of "high frequency nodding", and for example, speech interaction data such as "dislike eating-! The candidate interaction motion parameters matched may be motion parameters of "shaking head", which is not limited.

S208: and generating an initial interactive animation according to the interactive action parameters and the virtual character model.

That is, after candidate interactive motion parameters matching with the voice interactive data are selected and used as corresponding interactive motion parameters, the interactive motion parameters and the virtual character model may be first used to generate an initial interactive animation, and then a subsequent step is triggered.

S209: and synthesizing the voice interaction data and the initial interaction animation to obtain a target interaction animation, and displaying the target interaction animation to a user.

After the initial interactive animation is generated by adopting the interactive action parameters and the virtual character model, further, the voice interactive data and the initial interactive animation can be synthesized, so that the synthesized target interactive animation can output the voice interactive data while corresponding interactive actions are made, and the real human interactive scene can be effectively simulated.

In this embodiment, by acquiring voice data input by a user, determining an interaction intention of the user according to the voice data, acquiring an interaction action parameter corresponding to the interaction intention, generating an interaction animation according to the interaction action parameter and the virtual character model, and displaying the interaction animation to the user, the interaction action parameter of the virtual character model is determined by adopting the interaction intention corresponding to the voice data input by the user, and the interaction animation is generated according to the interaction action parameter and the virtual character model, so as to interact with the user, thereby expanding an application scene of the virtual character model, improving interaction efficiency, and improving interaction experience of the user. The interactive action parameters matched with the voice interactive data are determined from the candidate interactive action parameters corresponding to the interactive intention, so that the diversity and the richness of the interactive action parameters can be ensured, the flexibility of the presentation of the interactive action parameters is improved, and the suitability of the interactive action parameters and the interactive scene is improved. The method supports the simultaneous adoption of the interaction action parameters and the voice interaction data to synthesize the interaction animation, so that the virtual character model has a more real interaction effect, and the interaction experience of a user is improved. After the initial interactive animation is generated by adopting the interactive action parameters and the virtual character model, further, the voice interactive data and the initial interactive animation can be synthesized, so that the synthesized target interactive animation can output the voice interactive data while corresponding interactive actions are made, and the real human interactive scene can be effectively simulated.

Fig. 3 is a schematic diagram according to a third embodiment of the present application.

As shown in fig. 3, the man-machine interaction device 30 includes:

the first obtaining module 301 is configured to obtain voice data input by a user.

A determining module 302, configured to determine an interaction intention of the user according to the voice data.

The second obtaining module 303 is configured to obtain an interaction action parameter corresponding to the interaction intention.

The generating module 304 is configured to generate an interactive animation according to the interactive motion parameters and the virtual character model, and display the interactive animation to the user.

In some embodiments of the present application, the determining module 302 is specifically configured to:

acquiring voice recognition characteristics in voice data;

and determining the interaction intention of the user according to the voice recognition characteristics.

In some embodiments of the present application, the speech recognition features include: semantic text and emotional characteristics, wherein the determining module 302 is further configured to:

if the semantic text contains the interaction instruction word, comparing the interaction instruction word with a preset interaction keyword;

if the interaction instruction word is matched with the interaction keyword, the interaction intention corresponding to the semantic text and the emotion feature is obtained by adopting a preconfigured intention recognition rule.

In some embodiments of the present application, the second obtaining module 303 is specifically configured to:

identifying interaction semantics corresponding to the interaction instruction words from the semantic text;

and generating interaction action parameters corresponding to the interaction semantics and the interaction intention.

In some embodiments of the present application, the second obtaining module 303 is further configured to:

acquiring candidate interaction action parameters corresponding to the interaction intention;

determining voice interaction data corresponding to interaction semantics;

and selecting candidate interaction action parameters matched with the voice interaction data from the candidate interaction action parameters and taking the candidate interaction action parameters as corresponding interaction action parameters.

In some embodiments of the present application, the generating module 304 is specifically configured to:

generating an initial interactive animation according to the interactive action parameters and the virtual character model;

and synthesizing the voice interaction data and the initial interaction animation to obtain a target interaction animation, and displaying the target interaction animation to a user.

In some embodiments of the present application, referring to fig. 4, fig. 4 is a schematic diagram of a human-machine interaction device 40 according to a fourth embodiment of the present application, including: the first obtaining module 401, the determining module 402, the second obtaining module 403, and the generating module 404, the man-machine interaction device 40 further includes:

a third obtaining module 405, configured to obtain a default action parameter;

the generating module 404 is further configured to generate a default animation according to the default action parameters and the virtual character model.

In some embodiments of the present application, the third obtaining module 405 is specifically configured to:

and randomly reading default action parameters from a default action library which is preconfigured.

It can be understood that the human-computer interaction device 40 in fig. 4 of the present embodiment and the human-computer interaction device 30 in the foregoing embodiment, the first obtaining module 401 and the first obtaining module 301 in the foregoing embodiment, the determining module 402 and the determining module 302 in the foregoing embodiment, the second obtaining module 403 and the second obtaining module 303 in the foregoing embodiment, and the generating module 404 and the generating module 304 in the foregoing embodiment may have the same functions and structures.

It should be noted that the explanation of the man-machine interaction method is also applicable to the man-machine interaction device of the present embodiment, and will not be repeated here.

According to embodiments of the present application, an electronic device and a readable storage medium are also provided.

Fig. 5 is a block diagram of an electronic device for implementing the man-machine interaction method according to the embodiment of the present application, as shown in fig. 5. Is a block diagram of an electronic device according to a man-machine interaction method of an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the application described and/or claimed herein.

As shown in fig. 5, the electronic device includes: one or more processors 501, memory 502, and interfaces for connecting components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 501 is illustrated in fig. 5.

Memory 502 is a non-transitory computer readable storage medium provided herein. The memory stores instructions executable by the at least one processor to cause the at least one processor to perform the human-machine interaction method provided by the present application. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the human-machine interaction methods provided herein.

The memory 502 is used as a non-transitory computer readable storage medium, and may be used to store a non-transitory software program, a non-transitory computer executable program, and modules, such as program instructions/modules (e.g., the first acquisition module 301, the determination module 302, the second acquisition module 303, and the generation module 304 shown in fig. 3) corresponding to the human-computer interaction method in the embodiments of the present application. The processor 501 executes various functional applications of the server and data processing by running non-transitory software programs, instructions and modules stored in the memory 502, i.e., implements the man-machine interaction method in the above-described method embodiments.

Memory 502 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created according to the use of the electronic device performing the man-machine interaction method, etc. In addition, memory 502 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory 502 may optionally include memory located remotely from processor 501, which may be connected to the electronic device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device may further include: an input device 503 and an output device 504. The processor 501, memory 502, input devices 503 and output devices 504 may be connected by a bus or otherwise, for example in fig. 5.

The input device 503 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device, such as a touch screen, keypad, mouse, trackpad, touchpad, pointer stick, one or more mouse buttons, trackball, joystick, and like input devices. The output devices 504 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibration motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASIC (application specific integrated circuit), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service ("Virtual Private Server" or simply "VPS") are overcome. The server may also be a server of a distributed system or a server that incorporates a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions disclosed in the present application can be achieved, and are not limited herein.

The above embodiments do not limit the scope of the application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application are intended to be included within the scope of the present application.

Claims

1. A human-machine interaction method, comprising:

acquiring voice data input by a user;

determining the interaction intention of the user according to the voice data;

identifying interaction semantics corresponding to the interaction instruction words from semantic texts, wherein the semantic texts are texts formed according to the semantics corresponding to the voice data, and the semantic texts comprise: the interactive instruction word;

determining voice interaction data corresponding to the interaction semantics;

selecting candidate interaction action parameters matched with the voice interaction data from the candidate interaction action parameters and taking the candidate interaction action parameters as the corresponding interaction action parameters; and

and generating an interactive animation according to the interactive action parameters and the virtual character model, and displaying the interactive animation to the user.

2. The method of claim 1, wherein the determining the user's intent to interact from the speech data comprises:

acquiring voice recognition characteristics in the voice data;

3. The method of claim 2, the speech recognition feature comprising: semantic text and emotional characteristics, wherein the determining the interactive intention of the user according to the voice recognition characteristics comprises the following steps:

and if the interaction instruction word is matched with the interaction keyword, acquiring the interaction intention corresponding to the semantic text and the emotion feature by adopting a preconfigured intention recognition rule.

4. The method of claim 1, wherein the generating an interactive animation from the interactive motion parameters and virtual character model and presenting the interactive animation to the user comprises:

generating an initial interaction animation according to the interaction action parameters and the virtual character model;

and synthesizing the voice interaction data and the initial interaction animation to obtain a target interaction animation, and displaying the target interaction animation to the user.

5. The method of claim 1, further comprising, prior to the acquiring the user-input voice data:

acquiring default action parameters;

and generating a default animation according to the default action parameters and the virtual character model.

6. The method of claim 5, wherein the obtaining default action parameters comprises:

and randomly reading the default action parameters from a preset default action library.

7. A human-machine interaction device, comprising:

the first acquisition module is used for acquiring voice data input by a user;

the determining module is used for determining the interaction intention of the user according to the voice data;

the second acquisition module is used for identifying interaction semantics corresponding to the interaction instruction words from semantic texts, wherein the semantic texts are texts formed according to the semantics corresponding to the voice data, and the semantic texts comprise: the interactive instruction word;

determining voice interaction data corresponding to the interaction semantics;

and the generation module is used for generating an interactive animation according to the interactive action parameters and the virtual character model and displaying the interactive animation to the user.

8. The apparatus of claim 7, wherein the determining module is specifically configured to:

acquiring voice recognition characteristics in the voice data;

9. The device of claim 8, the speech recognition feature comprising: semantic text and emotional characteristics, wherein the determining module is further configured to:

10. The apparatus of claim 7, wherein the generating module is specifically configured to:

11. The apparatus of claim 7, further comprising:

the third acquisition module is used for acquiring default action parameters;

the generating module is further configured to generate a default animation according to the default action parameter and the virtual character model.

12. The apparatus of claim 11, wherein the third acquisition module is specifically configured to:

13. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.

14. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-6.