CN112669846A

CN112669846A - Interactive system, method, device, electronic equipment and storage medium

Info

Publication number: CN112669846A
Application number: CN202110278539.3A
Authority: CN
Inventors: 刘致远; 常向月; 王鑫宇; 陈泷翔
Original assignee: Shenzhen Zhuiyi Technology Co Ltd
Current assignee: Shenzhen Zhuiyi Technology Co Ltd
Priority date: 2021-03-16
Filing date: 2021-03-16
Publication date: 2021-04-16

Abstract

The embodiment of the application discloses an interaction system, a method, a device, electronic equipment and a storage medium, wherein the interaction system is suitable for interaction with a user by utilizing a digital person and comprises a central control module, an information processing module and an image generation module. The central control module is used for receiving the user information sent by the intelligent terminal. The information processing module is used for generating driving information according to the user information, and the driving information is used for driving the digital person. The image generation module is used for driving the digital person based on the driving information, acquiring a digital person image comprising the digital person, enabling the form of the digital person in the digital person image to correspond to the driving information, and sending the digital person image as an interactive feedback result to the central control module so that the central control module outputs the interactive feedback result. From this, can form video conversation through well accuse module and intelligent terminal, show digital people at intelligent terminal, effectively improve user's intimacy degree, can replace artifical service simultaneously, when being in the consultation peak, can quick response user's consultation demand.

Description

Interactive system, method, device, electronic equipment and storage medium

Technical Field

The present application relates to the field of human-computer interaction technologies, and in particular, to an interaction system, a method, an apparatus, an electronic device, and a storage medium.

Background

In recent years, with the continuous development and application of Network information technology, the traditional PSTN (Public Switched Telephone Network) communication mode gradually cannot meet the requirements of users, and further the PSTN communication mode is gradually replaced by high-definition communication.

Disclosure of Invention

In view of the foregoing problems, embodiments of the present application provide an interactive system, method, apparatus, electronic device, and storage medium to solve the above problems.

In a first aspect, an embodiment of the present application provides an interactive system, which is adapted to interact with a user by using a digital person, and includes a central control module, an information processing module, and an image generation module. The central control module is used for receiving the user information sent by the intelligent terminal. The information processing module is used for generating driving information according to the user information, and the driving information is used for driving the digital person. The image generation module is used for driving the digital person based on the driving information, acquiring a digital person image comprising the digital person, enabling the form of the digital person in the digital person image to correspond to the driving information, and sending the digital person image as an interactive feedback result to the central control module so that the central control module outputs the interactive feedback result.

Optionally, the driving information comprises text output information; the information processing module comprises a voice recognition unit and a semantic unit. The voice recognition unit is used for acquiring user voice information in the user information and recognizing the user voice information to obtain text input information. The semantic unit is used for determining the user intention according to the text input information and acquiring text output information fed back to the user.

Optionally, the driving information comprises feedback voice information; the information processing module includes a speech recognition unit and a speech synthesis unit. The voice recognition unit is used for acquiring user voice information in the user information and recognizing the user voice information to obtain text input information. The speech synthesis unit is used for generating feedback speech information according to the user intention represented by the text input information.

Optionally, the driving information further includes text output information; the information processing module also comprises a semantic unit, wherein the semantic unit is used for determining user intention according to the text input information and acquiring text output information for feeding back to the user based on the user intention; the speech synthesis unit is used for generating feedback speech information according to the text output information.

Optionally, the image generation module comprises an electronic card generation unit. The electronic card generating unit is used for acquiring prompt contents presented at the intelligent terminal according to the driving information, generating an electronic card comprising the prompt contents, and sending the electronic card to the central control module, so that the central control module sends the electronic card to the intelligent terminal as an interactive feedback result.

In a second aspect, an embodiment of the present application provides an interaction method, where the interaction method includes: receiving user information; generating driving information according to the user information, wherein the driving information is used for driving the digital person; driving the digital person based on the driving information, acquiring a digital person image comprising the digital person, and taking the digital person image as an interactive feedback result, wherein the form of the digital person in the digital person image corresponds to the driving information; and outputting an interactive feedback result.

Optionally, the driving information comprises text output information; generating the driving information according to the user information includes: determining user intention according to the user information; text output information for feedback to a user is obtained based on the user's intent.

Optionally, the driving information comprises feedback voice information; the interaction method further comprises the following steps: determining user intention according to the user information; acquiring text output information for feedback to a user based on the user intention; and generating feedback voice information according to the text output information.

Optionally, the interaction method further includes: taking the feedback voice information as an interactive feedback result; and outputting an interactive feedback result.

Optionally, the outputting the interactive feedback result comprises: obtaining privacy information in an interactive feedback result; updating the interactive feedback result according to a preset information protection mode, so that the privacy information in the updated interactive feedback result is hidden; and outputting the updated interactive feedback result.

Optionally, the interactive feedback result includes a plurality of continuous frame digital human images and feedback voice information, and the plurality of continuous frame digital human images correspond to the feedback voice information according to a time sequence.

Optionally, before outputting the interactive feedback result, the interactive method includes: acquiring prompt content for presenting at the intelligent terminal according to the driving information; and generating an electronic card comprising the prompt content, and taking the electronic card as an interactive feedback result.

Optionally, the number of digital human images is multiple; outputting the interactive feedback result comprises: acquiring a card type of the electronic card, wherein the card type comprises a dynamic type; if the card type of the electronic card is a dynamic type, acquiring the time sequence of a plurality of digital human images; acquiring card pictures of the electronic card at different times according to the time sequence of the plurality of digital human images; and outputting the card picture and the plurality of digital human images.

Optionally, the user information comprises user voice information; determining the user intent from the user information includes: acquiring user voice information in user information; recognizing the voice information of the user to obtain text input information; and performing semantic recognition on the text input information to determine the user intention.

Optionally, recognizing the user voice information to obtain the text input information comprises: acquiring audio features of user voice information; acquiring target audio data from the user voice information, wherein the target audio data is associated with voice features conforming to preset audio features; and identifying the target audio data to obtain the text input information.

Optionally, before receiving the user information, the interaction method includes: acquiring a terminal identifier of an intelligent terminal; sending a call request to the intelligent terminal through the terminal identifier; and if the intelligent terminal is determined to allow the call request, controlling the server to establish a data channel with the intelligent terminal.

Optionally, the interaction method further includes: acquiring a customer service type of a data channel established between a server and an intelligent terminal; acquiring an original digital human image and original voice information according to the type of the customer service; and sending the original digital human image and the original voice information to the intelligent terminal through a data channel.

In a third aspect, an embodiment of the present application provides an interactive device, where the interactive device includes a user information receiving module, a digital human driving module, a digital human image obtaining module, and an interactive feedback result sending module. The user information receiving module is used for receiving user information. The digital person driving module is used for generating driving information according to the user information, and the driving information is used for driving the digital person. The digital person image acquisition module is used for driving the digital person based on the driving information, acquiring a digital person image comprising the digital person, and taking the digital person image as an interactive feedback result, wherein the form of the digital person in the digital person image corresponds to the driving information. And the interactive feedback result sending module is used for outputting an interactive feedback result.

Optionally, the driving information comprises text output information; the digital human driver module includes a user intention determining unit and a text output information acquiring unit. The user intention determining unit is used for determining the user intention according to the user information. The text output information acquisition unit is used for acquiring text output information used for feeding back to a user based on the user intention.

Optionally, the driving information comprises feedback voice information; the interactive device also comprises a user intention determining unit, a text output information acquiring unit and a feedback voice information generating unit. The user intention determining unit is used for determining the user intention according to the user information. The text output information acquisition unit is used for acquiring text output information used for feeding back to a user based on the user intention. The feedback voice information generating unit is used for generating feedback voice information according to the text output information.

Optionally, the interactive apparatus further includes an interactive feedback result obtaining module. The interactive feedback result acquisition module is used for taking the feedback voice information as an interactive feedback result. And the interactive feedback result sending module is also used for outputting an interactive feedback result.

Optionally, the interaction feedback result sending module may include a privacy information obtaining unit, an interaction feedback result updating unit, and an interaction feedback result output unit. The privacy information acquisition unit is used for acquiring privacy information in the interaction feedback result. The interaction feedback result updating unit is used for updating the interaction feedback result according to a preset information protection mode, so that the privacy information in the updated interaction feedback result is hidden. And the interactive feedback result output unit is used for outputting the updated interactive feedback result.

Optionally, the interactive device further comprises a prompt content acquisition module and an electronic card generation module. The prompting content acquisition module is used for acquiring prompting contents for being presented at the intelligent terminal according to the driving information. The electronic card generating module is used for generating an electronic card comprising prompt contents and using the electronic card as an interactive feedback result.

Optionally, the number of digital human images is multiple; the interactive feedback result sending module further comprises a card type acquisition unit, a time sequence acquisition unit, a card picture acquisition unit and a digital human image output unit. The card type obtaining unit is used for obtaining the card type of the electronic card, and the card type comprises a dynamic type. The time sequence acquisition unit is used for acquiring the time sequences of a plurality of digital human images if the card type of the electronic card is a dynamic type. The card picture acquiring unit is used for acquiring the card pictures of the electronic card at different moments according to the time sequence of the images of the plurality of digital persons. The digital human image output unit is used for outputting the card picture and the plurality of digital human images.

Optionally, the user information comprises user voice information; the user intention determining unit includes a user voice information acquiring subunit, a text input information acquiring subunit, and a user intention determining subunit. The user voice information acquisition subunit is used for acquiring the user voice information in the user information. The text input information acquisition subunit is used for identifying the user voice information to obtain text input information. The user intention determining subunit is used for performing semantic recognition on the text input information and determining the user intention.

Optionally, the text input information acquiring subunit includes an audio feature acquiring component, a speech feature associating component, and a text input information acquiring component. The audio characteristic acquisition component is used for acquiring the audio characteristics of the voice information of the user. The voice feature association component is used for acquiring target audio data from the voice information of the user, and the target audio data is associated with the voice features which accord with preset audio features. The text input information acquisition component is used for identifying the target audio data to obtain text input information.

Optionally, the interactive apparatus includes a terminal identifier obtaining module, a call request issuing module, and a data channel establishing module. The terminal identification obtaining module is used for obtaining the terminal identification of the intelligent terminal. The calling request sending module is used for sending a calling request to the intelligent terminal through the terminal identifier. And the data channel establishing module is used for controlling the server side to establish a data channel with the intelligent terminal if the intelligent terminal is determined to allow the call request.

Optionally, the interactive device includes a customer service type obtaining module, an original voice information obtaining module, and an information sending unit. The customer service type acquisition module is used for acquiring the customer service type of a data channel established between the server and the intelligent terminal. The original voice information acquisition module is used for acquiring original digital human images and original voice information according to the customer service type. The information sending unit is used for sending the original digital human image and the original voice information to the intelligent terminal through the data channel.

In a fourth aspect, an embodiment of the present application provides an electronic device, including: one or more processors; a memory; one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to perform the steps of the interaction method provided by the second aspect.

In a fifth aspect, the present application provides a computer-readable storage medium, where a computer program is stored, and the computer program is executed by a processor to perform the steps of the interaction method provided in the second aspect.

Compared with the prior art, the interactive system, the method, the device, the electronic equipment and the storage medium provided by the embodiment of the application can form a video call with the intelligent terminal through the central control module, the digital person is displayed at the intelligent terminal, when the interactive system is applied to customer service, the intimacy degree of a user can be effectively improved, meanwhile, manual service can be replaced, when the interactive system is in a consultation peak period, the consultation requirement of the user can be quickly responded, meanwhile, the digital person can be driven based on user information, so that the digital person presented at the intelligent terminal can represent information fed back to the user, a scene that the user and the simulation digital person are in face-to-face communication is simulated, the user feels more intimacy, and the user experience is improved.

These and other aspects of the present application will be more readily apparent from the following description of the embodiments.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic diagram illustrating an application environment of an interaction method according to an embodiment of the present application.

Fig. 2 shows a module schematic diagram of an interactive system provided in an embodiment of the present application.

Fig. 3 shows another module schematic diagram of an interactive system provided in the embodiment of the present application.

Fig. 4 shows a schematic block diagram of an interaction system provided in an embodiment of the present application.

Fig. 5 is a schematic diagram illustrating an effect of an interactive system provided by an embodiment of the present application.

Fig. 6 is a schematic diagram illustrating another effect of an interactive system provided by an embodiment of the present application.

Fig. 7 shows a flowchart of an interaction method provided by an embodiment of the present application.

FIG. 8 is a flow chart illustrating a process for obtaining text output information in the method of FIG. 7.

Fig. 9 is a flow chart illustrating a process of acquiring feedback voice information in the method shown in fig. 7.

FIG. 10 illustrates a flow diagram for determining user intent in the method illustrated in FIG. 9.

FIG. 11 is a flow diagram illustrating a process for obtaining text input information based on audio in the method of FIG. 10.

FIG. 12 is a flow chart illustrating a process of obtaining interactive feedback results based on speech in the method shown in FIG. 2.

FIG. 13 is a flow chart illustrating a privacy-based output of interactive feedback results in the method of FIG. 12.

FIG. 14 is a flow chart illustrating a process for obtaining interactive feedback based on prompt content in the method shown in FIG. 2.

FIG. 15 shows a flow diagram of the output card and digital human image of the method of FIG. 2.

FIG. 16 is a flow chart illustrating the process of establishing a data channel in the method of FIG. 2.

Fig. 17 is a flow chart illustrating a process of transmitting information based on a data channel in the method shown in fig. 2.

Fig. 18 shows a functional block diagram of an interaction apparatus according to an embodiment of the present application.

Fig. 19 shows a functional block diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

With the continuous development and application of Network information technology, the traditional PSTN (Public Switched Telephone Network) communication mode can not meet the requirements of users gradually, and then the PSTN communication is replaced by high-definition communication gradually, therefore, video interaction is presented in the market to serve the users through customer service personnel, the video interaction mode can enable the users to feel more intimate, but the situation that the number of the customer service personnel is insufficient is easy to appear in the peak consultation period, and the users need to wait for a long time, so that the user experience is poor.

In recent years, with the continuous development of communication technology, the traditional PSTN call mode is gradually replaced by a high-definition call mode, such as a VOLTE (Voice over Long-Term Evolution) call mode, and compared with the PSTN call mode, the Voice and video call of the high-definition call mode is clearer and has lower dropped rate, and the traffic data service can be simultaneously used in the call process, and meanwhile, the video call can be realized without the help of a program and software installed in an intelligent terminal, so that the call experience of both parties of the call is better. Based on this, the customer service staff and the user have appeared among the prior art and have adopted high definition conversation mode to communicate, and at this in-process, the intelligent terminal that the customer service staff held can send customer service staff's audio and video information to the intelligent terminal that the user held to reach and present the live effect for the user of customer service staff, this kind of mode can let the user feel more intimate, improves the experience that the user consulted to customer service staff. Although the mode can effectively improve the experience of the user, the condition of insufficient number of customer service staff is easy to occur during the consultation peak period, so that the user needs to wait for a long time, and the experience of the user is poor.

In order to solve the above-described problems, the inventor of the present application has made research and development, and in order to improve user experience, the inventor finds that a video can be recorded in advance, and when the user is in a consultation peak period, if the user needs to be acquired, the video recorded in advance is sent to an intelligent terminal held by the user based on the need.

Based on this, the present inventors continued research and development, and in order to further improve the user experience and enable the user to see smooth video pictures about the customer service staff, the present inventors propose an interactive system, a method, an apparatus, an electronic device, and a storage medium according to embodiments of the present application, where the interactive system is adapted to interact with the user by using a digital person, and the interactive system includes a central control module, an information processing module, and an image generation module. The central control module is used for receiving the user information sent by the intelligent terminal. The information processing module is used for generating driving information according to the user information, and the driving information is used for driving the digital person. The image generation module is used for driving the digital person based on the driving information, acquiring a digital person image comprising the digital person, enabling the form of the digital person in the digital person image to correspond to the driving information, and sending the digital person image as an interactive feedback result to the central control module so that the central control module outputs the interactive feedback result. Through the implementation of this application embodiment, can form the video conversation through well accuse module and intelligent terminal, show digital people at intelligent terminal, when being applied to customer service with this interactive system, can effectively improve user's intimacy degree, can replace artifical service simultaneously, when being in the consultation peak period, can the consultation demand of quick response user, and simultaneously, can drive digital people based on user information, make the digital people who presents in intelligent terminal can characterize the information that feeds back to the user, simulate out the scene that user and emulation digital people face to face exchanged, let the user feel more intimacy, promote user experience, in addition, this interactive system still is particularly useful for VOLTE conversation, realize need not to install APP at intelligent terminal and can realize the interactive effect of user and digital people.

In order to better understand an interactive system, a method, an apparatus, an electronic device, and a storage medium provided in the embodiments of the present application, an application environment suitable for the embodiments of the present application is described below.

Referring to fig. 1, fig. 1 is a schematic diagram illustrating an application environment suitable for an interactive system according to an embodiment of the present disclosure. The interactive system, the method, the apparatus, the electronic device and the storage medium provided by the embodiment of the application can be applied to the video system 300 shown in fig. 1. The video system 300 comprises an intelligent terminal 301 and a server 302, wherein the server 302 is in communication connection with the intelligent terminal 301. The server 302 may be implemented by an independent server or a server cluster composed of a plurality of servers. In addition, the server may be a cloud server, and may also be a traditional machine room server, which is not specifically limited herein.

In some embodiments, the smart terminal 301 may be a variety of electronic devices having a display screen and supporting data entry, including but not limited to smartphones, tablets, laptop portable computers, desktop computers, wearable electronic devices, and the like. Specifically, the data input may be based on voice input by a voice module configured by the intelligent terminal 301, character input by a character input module, image input by an image input module, video input by a video input module, and the like, or may be based on a gesture recognition module configured by the intelligent terminal 301, so that a user may implement an interactive mode such as gesture input.

Wherein, the intelligent terminal 301 may be installed with a client application, and the user may communicate with the server 302 based on the client application (e.g. APP such as WeChat, WeChat applet, etc.), specifically, the server 302 is installed with a corresponding server application, and the user may register a user account at the server 302 based on the client application and communicate with the server 302 based on the user account, for example, a user logs into a user account at a client application, and enters through the client application based on the user account, text information, voice information, image information or video information and the like can be input, and after the client application program receives the information input by the user, the information can be sent to the server 302, so that the server 302 can receive, process and store the information, and the server 302 can also receive the information and return corresponding output information to the intelligent terminal 301 according to the information.

In some embodiments, the server 302 may be configured to receive information input by a user, generate a screen simulating a digital person according to the information, and send the screen to the intelligent terminal 301, so as to provide a customer service to the intelligent terminal 301 and perform customer service communication with the user. Specifically, the intelligent terminal 301 may receive information input by the user, and present a screen of the simulated digital person sent to the intelligent terminal 301 by the server 302. The simulated digital human is a software program based on visual graphics, and the software program can present a robot shape simulating biological behaviors or ideas to a user after being executed. The simulated digital person may be a simulated digital person simulating a real person, such as a simulated digital person shaped like a real person built according to the shape of the user himself or other natural persons, or a simulated digital person simulating an animation effect, such as a simulated digital person in the shape of an animal or cartoon character.

In some embodiments, as shown in fig. 1, after acquiring reply information corresponding to information input by the user, the smart terminal 301 may display an emulated digital human image corresponding to the reply information on a display screen of the smart terminal 301 or other image output device connected thereto. As an implementation manner, while the simulated digital human image is played, an audio corresponding to the simulated digital human image may be played through a speaker of the intelligent terminal 301 or other audio output devices connected thereto, and a text or a graphic corresponding to the reply information may be displayed on a display screen of the intelligent terminal 301, so that multi-state interaction with the user in multiple aspects of image, voice, text, and the like is realized.

The above application environments are only examples for facilitating understanding, and it is to be understood that the embodiments of the present application are not limited to the above application environments.

The interactive system, method, apparatus, electronic device and storage medium provided in the embodiments of the present application are described in detail below with specific embodiments.

Referring to fig. 2, an interactive system 100 is provided, the interactive system 100 is adapted to interact with a user by using a digital person, and the interactive system 100 includes a central control module 110, an information processing module 120, and an image generation module 130. The central control module 110 is configured to receive user information sent by the intelligent terminal 301. The information processing module 120 is configured to generate driving information according to the user information, and the driving information is used to drive the digital person. The image generation module 130 is configured to drive the digital person based on the driving information, obtain a digital person image including the digital person, where the form of the digital person in the digital person image corresponds to the driving information, and send the digital person image to the central control module 110 as an interaction feedback result, so that the central control module 110 outputs the interaction feedback result.

In this embodiment, the central control module 110 may receive the user information transmitted by the smart terminal 301. Wherein the user information may comprise information related to the user. For example, the user information may be information entered by the user at the smart terminal 301, identity information of the user, voice of the user, an image of the user, and the like, and the type of the user information is not particularly limited herein.

Illustratively, when a user holds the intelligent terminal 301, and the intelligent terminal 301 establishes a communication connection with the central control module 110, the user may type information in the intelligent terminal 301, for example, the user may input information such as numbers, symbols, letters, and the like on a virtual keyboard or a physical keyboard provided on the intelligent terminal 301, and the intelligent terminal 301 sends the information to the central control module 110 as user information; the central control module 110 may obtain a SIM card number communicated with the central control module 110, and query, at the server 302, information about a package of the number, a service remaining amount of the package, a service available for the number, a number owner of the number, and the like based on the number, and use the information as user information; the microphone arranged in the intelligent terminal 301 can collect voice of a user using the intelligent terminal 301, the voice is used as user information, and the user information is sent to the central control module 110; the camera disposed in the intelligent terminal 301 may collect a picture of a user using the intelligent terminal 301, use the picture as user information, and send the user information to the central control module 110, where no specific limitation is imposed on a manner of obtaining the user information.

In this embodiment, the central control module 110 may directly establish a communication connection with the intelligent terminal 301, or may indirectly establish a communication connection with the intelligent terminal 301, where a manner of establishing a communication connection between the central control module 110 and the intelligent terminal 301 is not particularly limited.

In some examples, when the central control module 110 directly establishes a communication connection with the intelligent terminal 301, the central control module 110 may have a call function and/or an answering function. Specifically, the central control module 110 may send a call request to the intelligent terminal 301 through the SIM card number, and after one side of the intelligent terminal 301 receives the call request, the central control module 110 establishes a high-definition call connection with the intelligent terminal 301, and the central control module 110 may obtain user information from the intelligent terminal 301 and send a digital person image to the intelligent terminal 301; the intelligent terminal 301 may send a call request to the central control module 110, after one side of the central control module 110 receives the call request, the central control module 110 establishes a high-definition call connection with the intelligent terminal 301, and the central control module 110 may obtain user information from the intelligent terminal 301 and send the digital human image to the intelligent terminal 301. In addition, if the feedback result further includes a voice, the voice and the digital human image may be synchronously transmitted to the intelligent terminal 301.

In some examples, when the central control module 110 indirectly establishes a communication connection with the intelligent terminal 301, as shown in fig. 4, the call platform 200 and an SIP (Session Initiation Protocol) server are set between a communication link formed by the central control module 110 and the intelligent terminal 301. Specifically, the call platform 200 may issue a call request to the smart terminal 301 through the SIM card number, when the intelligent terminal 301 receives the call request, the call platform 200 establishes a high-definition call connection with the intelligent terminal 301, meanwhile, the call platform 200 may establish a communication connection with the SIP server through an RTP (Real-time Transport Protocol) Protocol, the SIP server establishes a data connection with the central control module 110, the call platform 200 may acquire user information of the intelligent terminal 301, and transmits the user information to the SIP server, the SIP server may transmit the user information to the central control module 110, the central control module 110 may acquire the user information through the SIP server, and sending the digital human image to the SIP server, the server packages the digital human image into an H264 data packet through an RTP protocol and transmits the H264 data packet to the call platform 200, and the call platform 200 sends the H264 data packet to the intelligent terminal 301. In addition, if the feedback result further includes voice, the central control module 110 may send the voice and the digital human image to the SIP server, respectively, the SIP server packages the voice into a PCMU data packet and/or PCMA data packet to the call platform 200 through an RTP protocol, packages the digital human image into an H264 data packet to the call platform 200 through the RTP protocol, and the call platform 200 sends the data packet to the intelligent terminal 301.

In this embodiment, it is possible to obtain the user intention according to the user information, and determine information for feedback to the user according to the user intention, and regarding the information as the drive information, and the manner of specifically obtaining the drive information should be associated with the type of the user information.

When the driving information is acquired, determining user intention based on the user information, and generating the driving information according to the user intention; the user information may be at least one of voice, text and image. The driving information may be text, voice, semantics, etc.

For example, if the user information is speech, the speech may be subjected to speech recognition processing to obtain a text corresponding to the speech, the text may be recognized by using an intention recognition model to obtain a user intention, information fed back to the user may be determined based on the user intention, and driving information may be generated based on the information. For example, the text "please report weather of beijing after five hours" is obtained by the voice recognition, the intention recognition is performed on the text to obtain the intention of the user, and then the information "weather of beijing is heavy rain for 15 hours and 20 minutes" fed back to the user is determined, and the driving information for driving the digital person is generated according to the information. It should be noted that, if the voice is obtained through the intelligent terminal 301, a microphone for collecting the voice may be disposed on the intelligent terminal 301, and the intelligent terminal 301 may also receive information sent by other devices to obtain the voice, where the manner of obtaining the voice is not particularly limited.

For example, if the user information is a character, the character may be recognized by using an intention recognition model, the user intention may be obtained, information fed back to the user may be determined based on the user intention, and the driving information may be generated based on the information. For example, if the word is "which province of China the xi' an belongs to

And identifying the intention of the section of characters to obtain the intention of the user, further determining information fed back to the user, namely Shaanxi province, and generating driving information for driving a digital person according to the information. It should be noted that, if the intelligent terminal 301 obtains the text, the intelligent terminal 301 may be provided with a keyboard for collecting the text to be typed, and the intelligent terminal 301 may also receive information sent by other devices to obtain the text, where the manner of obtaining the text is not particularly limited.

For example, if the user information is an image, the image may be subjected to image recognition processing to obtain a user intention represented by the image, information fed back to the user may be determined based on the user intention, and driving information may be generated based on the information. For example, if an image includes an "OK gesture," the image is subjected to image recognition processing to obtain "determination" of the user's intention, information fed back to the user is determined, and control parameters for driving a digital person are generated based on the information. It should be noted that, if the image is acquired through the intelligent terminal 301, an image acquisition device for acquiring the image may be disposed on the intelligent terminal 301, and the intelligent terminal 301 may also receive information sent by other devices to acquire the image, where the manner of acquiring the image is not particularly limited.

For example, the intention recognition model may be used to recognize the intention of the user information, and the intention recognition model may be used to use a Machine learning model such as an RNN (Recurrent Neural Network) model, a CNN (Convolutional Neural Network) model, a VAE (variable auto Encoder) model, a BERT (Bidirectional Encoder representation of transformer), a Support Vector Machine (SVM), and the like, which are not limited herein. For example, the intent recognition model may also be a variant or combination of the machine learning models described above, and the like. In addition, the intention recognition model may be deployed in the intelligent terminal 301, or may be deployed in the server 302, where a deployment carrier of the intention recognition model is not specifically limited, and specifically, the actual scene determination for obtaining the intention of the user in this embodiment is used.

In addition, in the present embodiment, the driving information may be text, voice, semantic, and the like, and the type of the driving information is not particularly limited.

In this embodiment, the form of the digital person may include an action state of the digital person. For example, the digital person may be in the form of an eyeball, a mouth shape, an expression, two arms, or the like of the digital person. It should be noted that the driving information may be a continuity parameter having a time attribute, the motion of the digital person is controlled in time sequence, and the driving information may be associated with the motion state of the digital person. For example, the driving information may include controlling the 3D digital person to head 15 degrees at time t1, controlling the 3D digital person to swing left and right at time t2, controlling the 3D digital person to smile at time t3, and the like.

It is to be noted that the digital person image may be understood as an image obtained by photographing a digital person at an angle in which the driving information of the digital person corresponds to the motion state.

Further, the digital person in the embodiment may be obtained by 3D modeling, or may be a realistic image with an image quality similar to that of a camera shot by generating each frame through a deep learning model, and the digital person has the effect similar to a real person shot by the camera. Alternatively, a video digital person may be generated from a coherent realistic image.

Specifically, the method for obtaining a digital person through a 3D modeling manner may include: acquiring a plurality of sample images comprising a target model; acquiring the form of a target model according to the multiple sample images; acquiring an original digital person and modeling information, wherein the modeling information comprises an original key point of the original digital person; and acquiring target key points of the target model according to the form, and corresponding the original key points to the target key points to generate the digital person.

In this embodiment, the target model may comprise a model associated with a digital person. For example, when the digital person is a certain broadcaster C, the target model may be the broadcaster C, a character similar to the broadcaster C in face, skeleton and stature, or a dummy (e.g., a character wax) similar to the broadcaster C in face, skeleton and stature.

In this embodiment, the sample image may include images of multiple target models at different angles. Specifically, the sample image may be an image of the target model at various angles in the case of various motions, sounds, expressions, and the like.

In some examples, cameras for capturing images of the target model may be arranged annularly around the periphery of the target model, wherein cameras of different focal sizes may be provided relative to the same orientation of the target model. When the target model makes a sound, changes an action, changes a facial expression, and the like, images including the target model may be simultaneously acquired by using the respective cameras, thereby obtaining a sample image.

In this embodiment, the form of the target model may include information related to the body change condition of the target model. For example, the form may be a drop in mouth angle, a right deviation of the eyes, a head lift, a right hand lift, or the like.

In some examples, the shape of each part of the target model may be obtained by acquiring each part of the target model from the sample image through a target detection algorithm, and determining the shape of each part based on the change states of the same part in a plurality of continuous sample images. For example, the target detection algorithm may be a sliding window target detection, a two stage target detection algorithm, a one stage target detection algorithm, or the like.

In this embodiment, the original digital person may comprise a model of an already constructed digital person. For example, the original digital person may be an average human face model of a certain region, or may be a 3D animation model in industrial animation, where the type of the original digital person is not particularly limited. In addition, the modeling information may include parameter information for constructing an original digital person, by which the original digital person may be restored so that the original digital person can be presented.

In this embodiment, the morphology of the target model may be combined with the modeling information such that the morphological features of the target model are added to the original digital person, resulting in a digital person that is substantially the same as the morphology of the target model.

In this embodiment, the original key points of the original digital person may include positions for identifying, locating and controlling various parts of the original digital person. For example, the original keypoints can be the left eye corner position, right eye corner position, mouth corner position, face contour position, eyebrow position, nose wing position, thumb position, shoulder position, etc. of the original digital person. It should be noted that, the more dense the original key points are at each part of the original digital person in the modeling information, the more accurate the digital person is finally constructed.

In this embodiment, the target key points may include positions for identifying and locating various parts of the target model. For example, the target keypoints may be the left eye corner position, the right eye corner position, the mouth corner position, the face contour position, the eyebrow position, the nose wing position, the thumb position, the shoulder position, and the like of the target model. It should be noted that the more dense the target key points are at each part of the target model, the more accurate the digital person is finally constructed.

In this embodiment, the original key points of each part of the original digital person may be in one-to-one correspondence with the target key points of the same part of the target model. For example, the original key points include positions of the contour of the face of the original digital person, the target key points include positions of the contour of the face of the target model, the positions in the upper court of the face of the original digital person correspond to the positions in the upper court of the face of the target model one by one, the positions in the lower court of the face of the original digital person correspond to the positions in the lower court of the face of the target model one by one, and the other original key points of the face of the original digital person correspond to the other target key points of the face of the target model one by one, which will not be described herein again.

In some examples, the target keypoints that are in dynamic state may be corresponded to the original keypoints. Specifically, target key points of the target model can be obtained, the face of the target model in the continuous sample image is marked, and the same target key points of the face of the target model are associated according to the time sequence of the continuous sample image, so that a dynamic change track of each target key point of the face of the target model is obtained; and corresponding the dynamic change track of each target key point with the original key point of the original digital human face, so as to obtain the dynamic change track of the original key point of the original digital human face, calculating an amplitude difference value between the amplitude of the dynamic change track of the original key point and the dynamic change track of the target key point at the same time, and determining that the original key point of the original digital human needs to be corrected if the amplitude difference value is greater than a preset amplitude threshold value.

Illustratively, target key points of a target model can be obtained, the face of the target model in a continuous sample image is marked, the same target key point of the face of the target model is associated according to the time sequence of the continuous sample image, so as to obtain dynamic change tracks of all target key points of the face of the target model, the dynamic change tracks of all target key points are corresponding to original key points of the face of an original digital person, so as to obtain dynamic change tracks of the original key points of the face of the original digital person, the face change amplitude of the original digital person at all times is obtained based on the dynamic change tracks of the original key points, the face change amplitude of the target model at all times is obtained based on the dynamic change tracks of the target key points, the face change amplitude of the original digital person at all times and the face change amplitude of the target model at all times are compared, if the difference between the face variation amplitude of the original digital person and the face variation amplitude of the target model at the same moment is larger than a preset amplitude threshold value, determining that the original key points of the original digital person need to be corrected; and if the difference between the face variation amplitude of the original digital person and the face variation amplitude of the target model at the same moment is smaller than or equal to a preset amplitude threshold value, determining that the face construction of the current digital person is in accordance with the expectation.

Specifically, the creating of the simulation digital human generation model may also include, by pre-creating the simulation digital human model, creating a realistic image with each frame having a quality similar to that of the image captured by the camera by the simulation digital human: acquiring a plurality of sample images including a target model and camera parameters corresponding to each sample image; acquiring sample image configuration parameters corresponding to the camera parameters; acquiring angle information of the target model according to the sample image, and associating the angle information with the sample image configuration parameters; and constructing a simulation digital human model according to the sample image configuration parameters and the angle information to obtain a preset simulation digital human model.

In the present embodiment, the camera parameters may include parameters employed when a photographing device for photographing the sample image photographs the target model. For example, the camera parameters may be focal length, aperture size, and the like. The sample image configuration parameters may include parameters of a sample image generated by a photographing device for photographing a sample image photographing the target model. For example, the sample image configuration parameters may be pixel size, image exposure, the percentage of the target model in the image, the location of the target model in contact with the ground, and the like. The angle information may include an angle at which the target model is presented in the sample image. For example, when the angle between the face orientation of the target model in the sample image and the preset axis direction is 15 degrees, 15 degrees may be taken as the angle information.

In some examples, the sample images may be identified to obtain an angle of the target model. Specifically, each part of the target model can be acquired from the sample image through a target detection algorithm, the angle of the part is determined based on the change state of the same part in the multiple continuous sample images, so that the angle of each part of the target model is obtained, and the angle of each part is used as angle information.

In this embodiment, the sample image may be regarded as being composed of a plurality of regions and a plurality of points, the states of the plurality of regions and the plurality of point locations of the target model at each angle are obtained based on the sample image configuration parameters and the angle information, and the regions and the point locations at each angle are combined to construct the simulated digital human model, so that the simulated digital human model can output images including the target model at different angles.

It should be noted that, compared with the method for acquiring the digital person through 3D modeling, the process of generating the model through constructing the simulation digital person does not need 3D modeling, the acquired simulation digital person is closer to a real person model, the effect is more vivid, and the method is suitable for the situation that different real person models need to be modeled to acquire the simulation digital person in practical application.

In this embodiment, the central control module 110 may send the interaction feedback result to the intelligent terminal 301 through a communication link between the central control module 110 and the intelligent terminal 301, and the intelligent terminal 301 may display the digital human image in the interaction feedback result. Specifically, when the digital human image is continuous, the effect of the digital human video can be presented on the intelligent terminal 301.

In this embodiment, by using the interactive system 100 including the central control module 110, the information processing module 120, and the image generation module 130, a video call can be formed with the intelligent terminal 301 through the central control module 110, a digital person is displayed at the intelligent terminal 301, when the interactive system 100 is applied to a customer service, the affinity of a user can be effectively improved, and meanwhile, manual service can be replaced, when the interactive system is in a consultation peak period, a consultation requirement of the user can be quickly responded, and meanwhile, the digital person can be driven based on user information, so that the digital person presented at the intelligent terminal can represent information fed back to the user, a scene of face-to-face communication between the user and a simulated digital person is simulated, the user feels more appropriate, and user experience is improved.

Further, as an implementation of this embodiment, as shown in fig. 3, a digital person may be driven by text; the driving information includes text output information; the information processing module 120 may include a voice recognition unit 122 and a semantic unit 121. The voice recognition unit 122 is configured to obtain user voice information in the user information, and recognize the user voice information to obtain text input information. The semantic unit 121 is configured to determine a user intention according to the text input information and acquire text output information fed back to the user.

In this embodiment, the voice Recognition unit 122 and the semantic unit 121 may respectively form a communication connection with the central control module 110, the voice Recognition unit 122 may convert voice into text input information through an ASR (Automatic Speech Recognition) technology, and send the text input information to the central control module 110, the central control module 110 may receive the text input information and send the text input information to the semantic unit 121, and the semantic unit 121 determines a user intention according to the text input information and acquires text output information fed back to the user.

Further, the speech recognition unit 122 may recognize the user speech information through a linguistic and acoustic based manner, a stochastic model manner, an artificial neural network manner, a probabilistic grammar analysis manner, and the like, and the manner of the user speech information is not particularly limited herein.

Further, an intention recognition model and a feedback platform may be provided in the semantic unit 121, and the intention recognition model performs intention recognition on the text input information to obtain the user intention, and the intention is transmitted to the feedback platform to obtain the text output information fed back to the user.

For example, the intention recognition model may be used to recognize the intention of the user information, and the intention recognition model may be a machine learning model such as an RNN model, a CNN model, a VAE model, a BERT, a support vector machine, and the like, which is not limited herein. For example, the intent recognition model may also be a variant or combination of the machine learning models described above, and the like. In addition, the intention recognition model may be deployed in the intelligent terminal 301, or may be deployed in the server 302, where a deployment carrier of the intention recognition model is not specifically limited, and specifically, the actual scene determination for obtaining the intention of the user in this embodiment is used. In addition, the feedback platform may be a question and answer library, a customer service call library, or the like, which is constructed based on the customer service type requirements, and the construction of the feedback platform is not particularly limited herein.

It should be noted that the forms of the digital people may correspond to the texts one by one, and when the text output information is acquired, the digital people may be driven by the texts in the text output information, so that the forms of the digital people in the output digital people image correspond to the text output information.

In this embodiment, by providing the semantic unit 121 and the voice recognition unit 122 in the information processing module 120, the voice of the user can be recognized to obtain text input information, and text output information for feedback to the user is obtained based on the text input information, and the digital person is driven by the text output information, so that the content fed back to the user is presented in the form of the digital person, and a scene of face-to-face communication between the user and the customer service staff is simulated, thereby improving the user experience.

Further, as an implementation of this embodiment, as shown in fig. 3, a digital person may be driven by voice; the driving information comprises feedback voice information; the information processing module 120 may include a voice recognition unit 122 and a voice synthesis unit 123. The voice recognition unit 122 is configured to obtain user voice information in the user information, and recognize the user voice information to obtain text input information. The speech synthesis unit 123 is configured to generate feedback speech information according to the user intention represented by the text input information.

In this embodiment, the Speech synthesis unit 123 may be in communication connection with the central control module 110, the Speech recognition unit 122 may convert Speech into Text input information through an ASR technique and send the Text input information To the central control module 110, the central control module 110 may receive the Text input information and send the Text input information To the semantic unit 121, the semantic unit 121 determines user intention according To the Text input information, obtains Text output information fed back To the user, and sends the Text output information To the central control module 110, the central control module 110 sends the Text output information To the Speech synthesis unit 123, and the Speech synthesis unit 123 generates feedback Speech information from the Text output information through a TTS (Text To Speech) technique.

Further, the speech synthesis unit 123 may synthesize the text output information by a parameter synthesis method, a waveform synthesis method, a rule synthesis method, or the like, to obtain feedback speech information.

It should be noted that the form of the digital person may correspond to the voice one by one, and when the feedback voice information is acquired, the digital person may be driven by the voice in the voice feedback information, so that the form of the digital person in the output digital person image corresponds to the voice feedback information.

In addition, in the process of driving the digital person through voice, the mouth shape parameters of the digital person can be controlled through voice. For example, the change of the key points of the mouth of the digital person when the digital person utters the sound corresponding to the feedback voice information may be obtained, so as to obtain the mouth shape parameters for characterizing the change of the key points of the mouth. Wherein the mouth key points may include locations for identifying, locating and controlling various portions of the digital person's mouth. For example, the mouth keypoints may include the left corner of the mouth, the right corner of the mouth, the genioglossus sulcus, the bottom of the nose, and so forth. Meanwhile, the voice in the feedback voice information can be continuous voice, the feedback voice information can have time attribute, and the mouth shape change of the digital person can be controlled at each time node corresponding to the voice in the feedback voice information, so that the change process of the mouth shape of the digital person can be accurately presented at the intelligent terminal 301.

In this embodiment, by providing the speech synthesis unit 123 in the information processing module 120, speech of the user can be recognized to obtain text input information, and text output information for feedback to the user is obtained based on the text input information, and the text output information is converted into feedback speech information through the text output information, and the digital human is driven through the feedback speech information, so that a scene of face-to-face communication between the user and the customer service personnel is simulated, and user experience is improved.

Further, as an implementation manner of this embodiment, as shown in fig. 3, the digital person may be driven by voice, and the driving information further includes text output information; the information processing module 120 further includes a semantic unit 121, where the semantic unit 121 is configured to determine a user intention according to the text input information, and acquire text output information for feedback to the user based on the user intention; the speech synthesis unit 123 is configured to generate feedback speech information according to the text output information.

Since the process of driving the digital person by voice in this embodiment is similar to the method of driving the digital person by voice in the previous embodiment, the description is omitted here.

Further, in order to better prompt the user, an electronic card may be generated at the same time as the digital human image is generated, as shown in fig. 3, and the image generation module 130 may include an electronic card generation unit 131. The electronic card generating unit 131 is configured to obtain prompt content for being presented at the intelligent terminal 301 according to the driving information, generate an electronic card including the prompt content, and send the electronic card to the central control module 110, so that the central control module 110 sends the electronic card to the intelligent terminal 301 as an interaction feedback result.

In the present embodiment, as shown in fig. 5, the electronic card may be used to display prompting content (such as service package, package fee, etc.). For example, when the set of packages that the user can handle includes package a, package B, and package C, the prompt content in the electronic card may be package a, package B, and package C arranged according to a preset format, and the prompt content is presented on the electronic card.

In the present embodiment, the card types of the electronic card may include a static type and a dynamic type. When the electronic card is in a static type, the electronic card generating unit 131 may generate one electronic card, so that the intelligent terminal 301 presents the electronic card, thereby obtaining a static presentation effect of the electronic card; when the electronic card is a dynamic type, the electronic card generating unit 131 may generate a plurality of electronic cards, which may be continuous electronic cards, and when the intelligent terminal 301 presents the plurality of electronic cards according to the time sequence of the electronic card, the effect of changing the video of the electronic card may be presented.

In this embodiment, the electronic card 700 may be combined with the digital human image 600 and then sent to the intelligent terminal 301 as an interactive feedback result, or may be independently sent to the intelligent terminal 301 as a feedback result.

For example, as shown in fig. 6, when the electronic card 700 is sent to the intelligent terminal 301 as a feedback result, the electronic card 700 and the digital human image 600 may be transmitted to the intelligent terminal 301 in the form of H264 data packets, respectively, when the intelligent terminal 301 receives the electronic card 700 and the digital human image 600, the electronic card 700 may be used as a first layer, the digital human image 600 may be used as a second layer, and when the intelligent terminal 301 needs to display the first layer and the second layer simultaneously, the first layer may be placed on a top layer, and the second layer may be placed on a bottom layer.

In this embodiment, by providing the electronic card 700 generating unit 131 in the image generating module 130, the electronic card 700 can be generated by the electronic card 700 generating unit 131, and the interactive system 100 can send the digital person image 600 to the intelligent terminal 301 and send the electronic card 700 at the same time, so that a user can simultaneously view the prompting content of the electronic card 700 during a high-definition conversation with a digital person service, thereby helping the user better understand and transact business.

Referring to fig. 7, an interactive system, an interactive method, an interactive apparatus, an electronic device, and a storage medium are provided in the embodiments of the present application, and the interactive system, the interactive method, the interactive apparatus, the electronic device, and the storage medium can be applied to the interactive system 100. Specifically, the interactive system, method, apparatus, electronic device and storage medium may include the following steps S11 to S14.

Step S11: user information is received.

In this embodiment, since the method for receiving the user information in this embodiment is similar to the method for receiving the user information regarding the central control module 110 in the foregoing embodiment, the description of the method for receiving the user information regarding the central control module 110 in the foregoing embodiment may be referred to for the method for receiving the user information in this embodiment, and the method for receiving the user information is not described herein again.

Step S12: and generating driving information according to the user information, wherein the driving information is used for driving the digital person.

In this embodiment, since the method for generating the driving information according to the user information in this embodiment is similar to the method for generating the driving information according to the user information by the information processing module 120 in the above embodiment, the description about the method for generating the driving information according to the user information by the information processing module 120 in the above embodiment may be referred to, and details of the method for generating the driving information according to the user information are not repeated here.

Step S13: and driving the digital person based on the driving information, acquiring a digital person image comprising the digital person, and taking the digital person image as an interactive feedback result, wherein the form of the digital person in the digital person image corresponds to the driving information.

In the present embodiment, since the method of driving a digital person based on drive information, acquiring a digital person image including the digital person, and taking the digital person image as an interactive feedback result in the present embodiment is similar to the method of driving the digital person based on the drive information, acquiring a digital person image including the digital person, the form of the digital person in the digital person image corresponding to the drive information, and taking the digital person image as an interactive feedback result in the above-described embodiment, the method of driving the digital person based on the drive information, acquiring a digital person image including the digital person, and taking the digital person image as an interactive feedback result in the present embodiment may refer to the method of driving the digital person based on the drive information, acquiring a digital person image including the digital person, the form of the digital person in the digital person image corresponding to the drive information in the above-described embodiment with respect to the image generating module 130, the method of driving the digital person based on the driving information, obtaining the digital person image including the digital person, and using the digital person image as the interactive feedback result is not repeated here.

Step S14: and outputting an interactive feedback result.

In this embodiment, since the method for outputting the interactive feedback result in this embodiment is similar to the method for outputting the interactive feedback result by the central control module 110 in the foregoing embodiment, the description of the method for outputting the interactive feedback result by the central control module 110 in the foregoing embodiment may be referred to for the method for outputting the interactive feedback result in this embodiment, and the method for receiving the user information is not described herein again.

In this embodiment, a digital person can be displayed at the intelligent terminal 301, when the interactive system 100 is applied to customer service, if the interactive system is in a consultation peak period, the consultation requirement of the user can be responded quickly, meanwhile, the digital person can be driven based on user information, so that the digital person presented at the intelligent terminal can represent information fed back to the user, a scene that the user and the simulation digital person communicate face to face is simulated, the user feels more intimate, and user experience is improved.

It should be noted that the interaction method provided in this embodiment may be applied to the interaction system 100 provided in the above embodiment, and the interaction method provided in this embodiment is the same as the operation principle of the interaction system 100 provided in the above embodiment, and is not described here again.

The embodiment of the present application further provides an interaction method, which may include the following steps S21 to S24. The interaction method provided in this embodiment may include the same or similar steps as those in the above embodiments, and for the execution of the same or similar steps, reference may be made to the foregoing description, and details are not repeated in this specification.

Step S21: user information is received.

Step S22: and generating driving information according to the user information, wherein the driving information is used for driving the digital person.

Further, as an embodiment of the present embodiment, as shown in fig. 8, a digital person may be driven by text; the driving information includes text output information; the above step S22 may include the following steps S221 to S222.

Step S221: the user intent is determined from the user information.

In the present embodiment, the user intention may be determined in a manner corresponding to the type of the user information based on the type. Illustratively, speech may be converted to textual input information by ASR techniques and a user intent may be determined based on the textual input information. Further, the user speech information may be recognized by a method based on linguistics and acoustics, a stochastic model, an artificial neural network, a probabilistic grammar, and the like, where the method of recognizing the user speech information is not particularly limited.

Further, the user intent may be derived from an intent recognition model. For example, the intention recognition model may be used to recognize the intention of the user information, and the intention recognition model may be a machine learning model such as an RNN model, a CNN model, a VAE model, a BERT, a support vector machine, and the like, which is not limited herein. For example, the intent recognition model may also be a variant or combination of the machine learning models described above, and the like. In addition, the intention recognition model may be deployed in the intelligent terminal 301, or may be deployed in the server 302, where a deployment carrier of the intention recognition model is not specifically limited, and specifically, the actual scene determination for obtaining the intention of the user in this embodiment is used.

Step S222: text output information for feedback to a user is obtained based on the user's intent.

In this embodiment, a feedback platform may be preset, and the feedback platform may determine text output information fed back to the user according to the user intention. The feedback platform can be a question and answer library, a customer service call library and the like which are constructed based on customer service type requirements, and the construction of the feedback platform is not particularly limited.

In the embodiment, the text input information can be obtained, the user intention can be obtained based on the text input information, the text output information can be obtained based on the user intention, the digital man is driven through the text output information, the content fed back to the user is presented in the form of the digital man, the scene of face-to-face communication between the user and customer service staff is simulated, and the user experience is improved.

Step S23: and driving the digital person based on the driving information, acquiring a digital person image comprising the digital person, and taking the digital person image as an interactive feedback result, wherein the form of the digital person in the digital person image corresponds to the driving information.

Step S24: and outputting an interactive feedback result.

Further, as an embodiment of the present embodiment, as shown in fig. 9, a digital person may be driven by voice; the driving information comprises feedback voice information; the interaction method provided by the present embodiment may further include the following steps S25 to S27.

Step S25: the user intent is determined from the user information.

Step S26: text output information for feedback to a user is obtained based on the user's intent.

It should be noted that the operation principle of step S25 is substantially the same as that of step S221, the operation principle of step S26 is substantially the same as that of step S222, and steps S221 and S222 can be referred to for step S25 and step S26, which are not repeated herein.

Further, when the user information includes speech, the speech may be recognized to determine the user's intention; as an implementation manner of this embodiment, as shown in fig. 10, the user information may include user voice information; the above step S25 may include the following steps S251 to S253.

Step S251: and acquiring the user voice information in the user information.

Step S252: and recognizing the voice information of the user to obtain the text input information.

In this embodiment, speech may be converted to text input information by ASR techniques, and user intent may be determined based on the text input information. Further, the user speech information may be recognized by a method based on linguistics and acoustics, a stochastic model, an artificial neural network, a probabilistic grammar, and the like, where the method of recognizing the user speech information is not particularly limited.

Step S253: and performing semantic recognition on the text input information to determine the user intention.

In the present embodiment, the user intention may be derived by an intention recognition model. For example, the intention recognition model may be used to recognize the intention of the user information, and the intention recognition model may be a machine learning model such as an RNN model, a CNN model, a VAE model, a BERT, a support vector machine, and the like, which is not limited herein. For example, the intent recognition model may also be a variant or combination of the machine learning models described above, and the like. In addition, the intention recognition model may be deployed in the intelligent terminal 301, or may be deployed in the server 302, where a deployment carrier of the intention recognition model is not specifically limited, and specifically, the actual scene determination for obtaining the intention of the user in this embodiment is used.

Further, in order to distinguish the target audio used in the voice information from other audio, the user voice information may be analyzed, and as shown in fig. 11, the above step S252 may include the following steps S2521 to S2523.

Step S2521: and acquiring the audio features of the voice information of the user.

In this embodiment, the audio features may be used to characterize the nature of the sound. For example, the audio features may include loudness, pitch, timbre, etc. of the sound.

Step S2522: and acquiring target audio data from the user voice information, wherein the target audio data is associated with the voice characteristics which accord with the preset audio characteristics.

In this embodiment, the voice feature of the preset audio feature may be a voice parameter for representing the audio feature. For example, the speech parameter may be a sound wave frequency corresponding to the user's tone, a sound wave amplitude corresponding to the user's loudness, or a spectrum structure corresponding to the user's timbre.

In this embodiment, the voice characteristics of the audio characteristics may be preset to screen the user voice information, so as to eliminate the non-user voice.

Step S2523: and identifying the target audio data to obtain the text input information.

In this embodiment, can be through carrying out the analysis to user speech information's audio frequency characteristic, can filter the sound except that owner, discern the pronunciation of owner then, obtain text input information, prevent that the process of speech recognition from producing the interference except that owner's sound, improved the precision of obtaining text input information.

Step S27: and generating feedback voice information according to the text output information.

In this embodiment, the text output information may be generated into feedback speech information by TTS technology. Further, the text output information may be synthesized by a parameter synthesis method, a waveform synthesis method, a rule synthesis method, or the like, to obtain feedback voice information.

In the embodiment, the text input information can be synthesized into the feedback voice information, the digital person is driven through the feedback voice information, the scene of face-to-face communication between the user and the customer service staff is simulated, and the user experience is improved.

Further, as an implementation manner of this embodiment, as shown in fig. 12, voice information may be fed back as an interactive feedback result, and the interactive method provided in this embodiment may further include the following steps S28 to S29.

Step S28: and taking the feedback voice information as an interactive feedback result.

Step S29: and outputting an interactive feedback result.

Further, the interactive feedback result may include a plurality of continuous frame digital human images and feedback voice information, and the plurality of continuous frame digital human images correspond to the feedback voice information according to a time sequence.

In this embodiment, the time sequence information of each digital person image may be acquired, and the plurality of digital person images are sequentially ordered according to the time sequence information, thereby synthesizing the simulated digital person video.

In this embodiment, the feedback voice information and the digital human image may be in one-to-one correspondence according to a time sequence, so that when the intelligent terminal 301 presents the digital human image, the voice in the feedback voice information corresponding to the digital human image video can be played synchronously.

Further, as an implementation manner of the present embodiment, in order to prevent some information in the feedback interaction result from leaking, the above step S29 may include the following steps S291 to S293.

Step S291: and obtaining the privacy information in the interactive feedback result.

In this implementation, the privacy information may include information relating to the privacy of the user. For example, the privacy information may be an identification number, a user balance, a dial record, and the like.

In this embodiment, privacy information in the interaction feedback result may be obtained in different manners based on the content type in the interaction feedback result.

Illustratively, when the interactive feedback result includes an electronic card, the text output information contained in the electronic card may be acquired, and information related to the user privacy in the text output information may be taken as the privacy information. When the interactive feedback result includes voice feedback information, text output information corresponding to the voice feedback information may be acquired, and information related to user privacy in the text output information may be used as privacy information.

Step S292: and updating the interactive feedback result according to a preset information protection mode, so that the privacy information in the updated interactive feedback result is hidden.

In this embodiment, the preset information protection manner may be set based on an actual scene.

Illustratively, when the private information is text in an electronic card, one or more characters in the text may be hidden. For example, when the text in the electronic card is the identification number 110101200108150612, the identification number may be updated, and the updated text in the electronic card is 1101012001 XXXXXXX.

When the privacy information is feedback voice information, text output information corresponding to the feedback voice information can be acquired, if the text output information is user balance, numbers and letters in the user balance can be corresponding, and the corresponding letters are used as updated interactive feedback results. For example, the number "1" may be corresponding to the letter "a", the number "2" may be corresponding to the letter "B", the number "3" may be corresponding to the letter "C", the number "4" may be corresponding to the letter "D", the number "5" may be corresponding to the letter "E", the number "6" may be corresponding to the letter "F", and when the user balance is 12345, the user balance may be updated to "ABCDE", and a voice corresponding to "ABCDE" may be used as an updated interactive feedback result. The mode of corresponding the numbers and the letters can be preset for the user.

Step S293: and outputting the updated interactive feedback result.

In this embodiment, after sending the updated interaction feedback result to the intelligent terminal 301, the intelligent terminal 301 can play the updated interaction feedback result, so as to protect the privacy information in the interaction feedback result.

Further, in order to better prompt the user, the electronic card may be generated at the same time as the digital human image is generated, and for this purpose, as shown in fig. 14, the embodiment of the present application further provides an interactive method, which may include the following steps S31 to S36. The interaction method provided in this embodiment may include the same or similar steps as those in the above embodiments, and for the execution of the same or similar steps, reference may be made to the foregoing description, and details are not repeated in this specification.

Step S31: user information is received.

Step S32: and generating driving information according to the user information, wherein the driving information is used for driving the digital person.

Step S33: and acquiring prompt content for presenting at the intelligent terminal 301 according to the driving information.

In this embodiment, the prompt content that the user needs to obtain can be obtained from the driving information. For example, when the drive information indicates that packages that the user can handle include package D, package E, and package F, the prompt content may be package D, package E, and package F arranged according to a preset format.

Step S34: and generating an electronic card comprising the prompt content, and taking the electronic card as an interactive feedback result.

In this embodiment, the electronic card may be combined with the digital human image and then sent to the intelligent terminal 301 as an interactive feedback result, or may be independently sent to the intelligent terminal 301 as a feedback result.

For example, when the electronic card is sent to the intelligent terminal 301 as a feedback result, the electronic card and the digital person image may be respectively transmitted to the intelligent terminal 301 in the form of an H264 data packet, when the intelligent terminal 301 receives the electronic card and the digital person image, the electronic card may be used as a first layer, the digital person image is used as a second layer, and when the intelligent terminal 301 needs to display the first layer and the second layer at the same time, the first layer may be placed on a top layer, and the second layer may be placed on a bottom layer.

In this embodiment, an electronic card may be generated, and the electronic card occurs while sending the digital person image to the intelligent terminal 301, so that the user can simultaneously view the prompt content of the electronic card in the process of performing high-definition communication with the digital person service, and help the user better understand and transact services.

Step S35: and driving the digital person based on the driving information, acquiring a digital person image comprising the digital person, and taking the digital person image as an interactive feedback result, wherein the form of the digital person in the digital person image corresponds to the driving information.

Step S36: and outputting an interactive feedback result.

Further, a card picture and a plurality of digital person images may be output based on the correspondence between the electronic card and the digital person video, as shown in fig. 15, the number of the digital person images is a plurality; the above step S36 may include the following steps S361 to S364.

Step S361: acquiring a card type of the electronic card, wherein the card type comprises a dynamic type.

In the present embodiment, the card types of the electronic card may include a static type and a dynamic type.

In this embodiment, when the electronic card is of a dynamic type, a plurality of electronic cards may be generated, and the plurality of electronic cards may be continuous electronic cards, and when the intelligent terminal 301 presents the plurality of electronic cards according to the time sequence of the electronic cards, the effect of changing and playing the video of the electronic cards may be presented; when the electronic card is of a static type, one electronic card may be generated, so that the intelligent terminal 301 presents the electronic card, thereby obtaining a presentation effect that the electronic card is in a static state.

Step S362: and if the card type of the electronic card is a dynamic type, acquiring the time sequence of the images of the plurality of digital persons.

In this embodiment, the timing of the digital human image may be the timing of the digital human image generation.

Step S363: and acquiring card pictures of the electronic card at different times according to the time sequence of the images of the plurality of digital persons.

In this embodiment, the card screens of the electronic cards at different times may be the electronic cards at different times acquired based on the driving information.

Step S364: and outputting the card picture and the plurality of digital human images.

In this embodiment, since the electronic card and the digital human image are both obtained through the driving information, and meanwhile, the driving information may have a time attribute, the electronic card and the driving information may be in one-to-one correspondence based on a time relationship between the electronic card and the driving information and a time relationship between the digital human image and the driving information.

In this embodiment, through implementation of the above steps S361 to S364, the card screen and the digital person image can be output, so that the card screen is presented while the smart terminal 301 presents the digital person video.

Further, a call connection may be established with the intelligent terminal 301, and for this reason, an interaction method may further be provided in the embodiment of the present application, as shown in fig. 16, where the interaction method may include the following steps S41 to S47. The interaction method provided in this embodiment may include the same or similar steps as those in the above embodiments, and for the execution of the same or similar steps, reference may be made to the foregoing description, and details are not repeated in this specification.

Step S41: and acquiring a terminal identifier of the intelligent terminal.

In this embodiment, the terminal identifier may be an identifier for establishing a high-definition call connection. For example, the terminal identification may be a standard SIM card number, a mini SIM card number, a Micro SIM card number, or the like.

Step S42: and sending a call request to the intelligent terminal through the terminal identifier.

In this embodiment, the intelligent terminal 301 may be called by the call platform 200 based on the terminal identification.

Step S43: if the intelligent terminal is determined to allow the call request, the control server establishes a data channel with the intelligent terminal 301.

In this embodiment, when the intelligent terminal 301 allows the call request, the server 302 may be controlled to establish a data channel with the intelligent terminal 301. The data channel may be a high definition telephony data channel.

Further, after the intelligent terminal 301 establishes the data channel, the initial information may be sent to the user. As shown in fig. 17, the interaction method provided in this embodiment may further include the following steps S44 to S46.

Step S44: and acquiring the type of the customer service business of the data channel established between the server and the intelligent terminal.

In this embodiment, the service type may be obtained based on the current transactable service type, the transacted service type, and the like of the user.

Step S45: and acquiring an original digital human image and original voice information according to the customer service type.

In this embodiment, the original digital human image may be an original digital human image and an original voice message that are preset and correspond to the type of the customer service. For example, the customer service type F may be set to correspond to the original digital person image F and the original voice information X, and the customer service type G may be set to correspond to the original digital person image G and the original voice information Y. The customer service type H can be set to correspond to the original digital human image H and the original voice information Z, and when the customer service type F is obtained, the original digital human image F and the original voice information X can be obtained.

Step S46: and sending the original digital human image and the original voice information to the intelligent terminal through a data channel.

In this embodiment, information may be promoted to the user by sending the original digital person image and the original voice information to the smart terminal 301.

Step S47: user information is received.

Step S48: and generating driving information according to the user information, wherein the driving information is used for driving the digital person.

Step S49: and driving the digital person based on the driving information, acquiring a digital person image comprising the digital person, and taking the digital person image as an interactive feedback result, wherein the form of the digital person in the digital person image corresponds to the driving information.

Step S50: and outputting an interactive feedback result.

In this embodiment, through the implementation of the above steps S41 to S50, a digital person can be displayed at the intelligent terminal 301, when the interactive method is applied to a customer service, if the interactive method is in a consultation peak period, a consultation requirement of a user can be responded quickly, and meanwhile, the digital person can be driven based on user information, so that the digital person presented at the intelligent terminal can represent information fed back to the user, and a scene of face-to-face communication between the user and a simulation digital person is simulated, so that the user feels more intimate, and user experience is improved.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the interactive system described above may also refer to the corresponding process in the foregoing interactive method embodiment, and is not described herein again.

Referring to fig. 18, a block diagram of an interaction apparatus provided in an embodiment of the present application is shown, where the interaction apparatus includes a user information receiving module 51, a digital human driver module 52, a digital human image obtaining module 53, and an interaction feedback result sending module 54. The user information receiving module 51 is configured to receive user information. The digital person driving module 52 is configured to generate driving information according to the user information, and the driving information is used to drive the digital person. The digital person image obtaining module 53 is configured to drive the digital person based on the driving information, obtain a digital person image including the digital person, and use the digital person image as an interactive feedback result, where a form of the digital person in the digital person image corresponds to the driving information. The interactive feedback result sending module 54 is configured to output an interactive feedback result.

Further, as an implementation manner of this embodiment, the driving information may include text output information; the digital human driver module 52 may include a user intention determining unit and a text output information acquiring unit. The user intention determining unit is used for determining the user intention according to the user information. The text output information acquisition unit is used for acquiring text output information used for feeding back to a user based on the user intention.

Further, as an implementation manner of this embodiment, the driving information may include feedback voice information; the interactive apparatus may further include a user intention determining unit, a text output information acquiring unit, and a feedback voice information generating unit. The user intention determining unit is used for determining the user intention according to the user information. The text output information acquisition unit is used for acquiring text output information used for feeding back to a user based on the user intention. The feedback voice information generating unit is used for generating feedback voice information according to the text output information.

Further, as an implementation manner of this embodiment, the interaction apparatus may further include an interaction feedback result obtaining module. The interactive feedback result acquisition module is used for taking the feedback voice information as an interactive feedback result. The interactive feedback result sending module 54 is further configured to output an interactive feedback result.

Further, as an implementation manner of this embodiment, the interaction feedback result sending module 54 may include a privacy information obtaining unit, an interaction feedback result updating unit, and an interaction feedback result outputting unit. The privacy information acquisition unit is used for acquiring privacy information in the interaction feedback result. The interaction feedback result updating unit is used for updating the interaction feedback result according to a preset information protection mode, so that the privacy information in the updated interaction feedback result is hidden. And the interactive feedback result output unit is used for outputting the updated interactive feedback result.

Further, as an implementation manner of this embodiment, the interactive feedback result may include a plurality of consecutive frame digital human images and feedback voice information, and the plurality of consecutive frame digital human images correspond to the feedback voice information in a time sequence.

Further, as an implementation manner of this embodiment, the interactive device may further include a prompt content obtaining module and an electronic card generating module. The prompt content obtaining module is configured to obtain prompt content for presentation at the intelligent terminal 301 according to the driving information. The electronic card generating module is used for generating an electronic card which can comprise prompt contents and using the electronic card as an interactive feedback result.

Further, as an implementation manner of the present embodiment, the number of the digital human images is multiple; the interactive feedback result sending module 54 may further include a card type acquisition unit, a timing acquisition unit, a card picture acquisition unit, and a digital human image output unit. The card type acquiring unit is used for acquiring the card type of the electronic card, and the card type can comprise a dynamic type. The time sequence acquisition unit is used for acquiring the time sequences of a plurality of digital human images if the card type of the electronic card is a dynamic type. The card picture acquiring unit is used for acquiring the card pictures of the electronic card at different moments according to the time sequence of the images of the plurality of digital persons. The digital human image output unit is used for outputting the card picture and the plurality of digital human images.

Further, as an implementation manner of this embodiment, the user information may include user voice information; the user intention determining unit may include a user voice information acquiring sub-unit, a text input information acquiring sub-unit, and a user intention determining sub-unit. The user voice information acquisition subunit is used for acquiring the user voice information in the user information. The text input information acquisition subunit is used for identifying the user voice information to obtain text input information. The user intention determining subunit is used for performing semantic recognition on the text input information and determining the user intention.

Further, as an implementation manner of the present embodiment, the text input information acquiring subunit may include an audio feature acquiring component, a speech feature associating component, and a text input information acquiring component. The audio characteristic acquisition component is used for acquiring the audio characteristics of the voice information of the user. The voice feature association component is used for acquiring target audio data from the voice information of the user, and the target audio data is associated with the voice features which accord with preset audio features. The text input information acquisition component is used for identifying the target audio data to obtain text input information.

Further, as an implementation manner of this embodiment, the interaction apparatus may include a terminal identifier obtaining module, a call request issuing module, and a data channel establishing module. The terminal identifier obtaining module is configured to obtain a terminal identifier of the intelligent terminal 301. The call request sending module is configured to send a call request to the intelligent terminal 301 through the terminal identifier. The data channel establishing module is configured to control the server 302 to establish a data channel with the intelligent terminal 301 if it is determined that the intelligent terminal 301 allows the call request.

Further, as an implementation manner of this embodiment, the interaction device may include a customer service type obtaining module, an original voice information obtaining module, and an information sending unit. The customer service type obtaining module is used for obtaining the customer service type of a data channel established between the server 302 and the intelligent terminal 301. The original voice information acquisition module is used for acquiring original digital human images and original voice information according to the customer service type. The information sending unit is used for sending the original digital human image and the original voice information to the intelligent terminal 301 through a data channel.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the modules/units/sub-units/components in the above-described apparatus may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, the coupling or direct coupling or communication connection between the modules shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or modules may be in an electrical, mechanical or other form.

In addition, functional modules in the embodiments of the present application may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.

Referring to fig. 19, an electronic device provided in an embodiment of the present application is shown, which may include a processor 810, a communication module 820, a memory 830, and a bus. The bus may be an ISA bus, PCI bus, EISA bus, CAN bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. Wherein:

and a memory 830 for storing programs. In particular, the memory 830 may be used to store software programs as well as various data. The memory 830 may mainly include a program storage area and a data storage area, wherein the program storage area may store a program required to operate at least one function and may include a program code including computer operation instructions. In addition to storing programs, the memory 830 may temporarily store messages or the like that the communication module 820 needs to send. The memory 830 may include a high-speed RAM memory, and may further include a non-volatile memory (non-volatile memory), such as at least one Solid State Disk (SSD).

The processor 810 is configured to execute programs stored in the memory 830. The program when executed by a processor implements the steps of the interaction method of the embodiments described above.

The embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements the processes of the interaction methods in the embodiments, and can achieve the same technical effects, and in order to avoid repetition, the details are not repeated here. The computer-readable storage medium includes, for example, a Read-Only Memory (ROM), a Random Access Memory (RAM), an SSD, a charged Erasable Programmable Read-Only Memory (EEPROM), or a Flash Memory (Flash).

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, SSD, Flash), and includes several instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the methods of the embodiments of the present application.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not necessarily depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. An interactive system, characterized in that the interactive system is adapted to interact with a user using a digital person, the interactive system comprising:

the central control module is used for receiving the user information sent by the intelligent terminal;

the information processing module is used for generating driving information according to the user information, and the driving information is used for driving the digital person; and

and the image generation module is used for driving the digital person based on the driving information, acquiring a digital person image comprising the digital person, wherein the form of the digital person in the digital person image corresponds to the driving information, and sending the digital person image as an interactive feedback result to the central control module so that the central control module outputs the interactive feedback result.

2. The interactive system of claim 1, wherein the driver information comprises textual output information; the information processing module includes:

the voice recognition unit is used for acquiring user voice information in the user information and recognizing the user voice information to obtain text input information; and

and the semantic unit is used for determining the user intention according to the text input information and acquiring the text output information fed back to the user.

3. The interactive system of claim 1, wherein the driving information comprises feedback voice information; the information processing module includes:

and the voice synthesis unit is used for generating the feedback voice information according to the user intention represented by the text input information.

4. The interactive system of claim 3, wherein the driver information further comprises textual output information; the information processing module further comprises a semantic unit, wherein the semantic unit is used for determining user intention according to the text input information and acquiring text output information fed back to the user based on the user intention; the voice synthesis unit is used for generating the feedback voice information according to the text output information.

5. The interactive system according to any one of claims 1 to 4, wherein the image generation module comprises:

and the electronic card generating unit is used for acquiring prompt contents for being presented at the intelligent terminal according to the driving information, generating an electronic card comprising the prompt contents, and sending the electronic card to the central control module, so that the central control module sends the electronic card to the intelligent terminal as an interactive feedback result.

6. An interaction method, comprising:

receiving user information;

generating driving information according to the user information, wherein the driving information is used for driving the digital person;

driving the digital person based on the driving information, acquiring a digital person image comprising the digital person, and taking the digital person image as an interactive feedback result, wherein the form of the digital person in the digital person image corresponds to the driving information; and

and outputting the interactive feedback result.

7. The interactive method of claim 6, wherein the driving information comprises text output information; the generating of the driving information according to the user information includes:

determining a user intention according to the user information; and

obtaining text output information for feedback to the user based on the user intent.

8. The interactive method according to claim 7, wherein the driving information includes feedback voice information; the method further comprises the following steps:

determining a user intention according to the user information;

obtaining text output information for feedback to the user based on the user intent; and

and generating the feedback voice information according to the text output information.

9. The interactive method of claim 8, wherein the method further comprises:

taking the feedback voice information as an interactive feedback result; and

and outputting the interactive feedback result.

10. The interaction method according to any one of claims 7 to 9, wherein the outputting the interaction feedback result comprises:

obtaining privacy information in the interactive feedback result;

updating the interactive feedback result according to a preset information protection mode, so that the privacy information in the updated interactive feedback result is hidden; and

and outputting the updated interactive feedback result.

11. The interaction method according to any one of claims 7 to 9, wherein the interaction feedback result comprises a plurality of consecutive frames of the digital human image and feedback voice information, and the plurality of consecutive frames of the digital human image correspond to the feedback voice information in time sequence.

12. The interaction method according to any one of claims 6 to 9, wherein before said outputting said interaction feedback result, said method comprises:

acquiring prompt contents for presenting at the intelligent terminal according to the driving information; and

and generating an electronic card comprising the prompt content, and taking the electronic card as the interactive feedback result.

13. The interactive method of claim 12, wherein the number of digital human images is multiple; the outputting the interactive feedback result comprises:

obtaining a card type of the electronic card, wherein the card type comprises a dynamic type;

if the card type of the electronic card is a dynamic type, acquiring the time sequence of a plurality of digital human images;

acquiring card pictures of the electronic card at different moments according to the time sequence of the digital human images; and

and outputting the card picture and the plurality of digital human images.

14. The interaction method according to any one of claims 7 to 9, wherein the user information comprises user voice information; determining the user intent from the user information comprises:

acquiring user voice information in the user information;

recognizing the user voice information to obtain text input information; and

and performing semantic recognition on the text input information to determine the user intention.

15. The interactive method of claim 14, wherein the recognizing the user speech information to obtain text input information comprises:

acquiring the audio features of the user voice information;

acquiring target audio data from the user voice information, wherein the target audio data is associated with voice features conforming to preset audio features; and

and identifying the target audio data to obtain the text input information.

16. An interaction method according to any one of claims 6 to 9, wherein, prior to said receiving user information, the method comprises:

acquiring a terminal identifier of an intelligent terminal;

sending a call request to the intelligent terminal through the terminal identifier; and

and if the intelligent terminal is determined to allow the call request, controlling the server to establish a data channel with the intelligent terminal.

17. The interactive method of claim 16, wherein the method further comprises:

acquiring the type of the customer service business of the data channel established between the server and the intelligent terminal;

acquiring an original digital human image and original voice information according to the customer service type; and

and sending the original digital human image and the original voice information to the intelligent terminal through the data channel.

18. An interactive apparatus, comprising:

the user information receiving module is used for receiving user information;

the digital person driving module is used for generating driving information according to the user information, and the driving information is used for driving a digital person;

the digital person image acquisition module is used for driving the digital person based on the driving information, acquiring a digital person image comprising the digital person, and taking the digital person image as an interactive feedback result, wherein the form of the digital person in the digital person image corresponds to the driving information; and

and the interactive feedback result sending module is used for outputting the interactive feedback result.

19. An electronic device, comprising:

one or more processors;

a memory;

one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to perform the interaction method of any of claims 6 to 17.

20. A computer-readable storage medium, having stored thereon program code that can be invoked by a processor to perform the interaction method according to any one of claims 6 to 17.