CN111599359A - Man-machine interaction method, server, client and storage medium - Google Patents

Man-machine interaction method, server, client and storage medium Download PDF

Info

Publication number
CN111599359A
CN111599359A CN202010390340.5A CN202010390340A CN111599359A CN 111599359 A CN111599359 A CN 111599359A CN 202010390340 A CN202010390340 A CN 202010390340A CN 111599359 A CN111599359 A CN 111599359A
Authority
CN
China
Prior art keywords
information
expert
virtual
voice
client
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010390340.5A
Other languages
Chinese (zh)
Inventor
穆向禹
李秀林
胡帅君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Databaker Beijng Technology Co ltd
Original Assignee
Databaker Beijng Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Databaker Beijng Technology Co ltd filed Critical Databaker Beijng Technology Co ltd
Priority to CN202010390340.5A priority Critical patent/CN111599359A/en
Publication of CN111599359A publication Critical patent/CN111599359A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/033Voice editing, e.g. manipulating the voice of the synthesiser
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/04Details of speech synthesis systems, e.g. synthesiser structure or memory management
    • G10L13/047Architecture of speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a man-machine interaction method for a server, the server, a man-machine interaction method for a client, the client and a storage medium. The method comprises the following steps: receiving consultation information which is sent by a client and is related to the consultation problem of the current user, wherein the consultation information comprises user text information and/or user voice information; outputting the consultation information in a text form; receiving expert reply information which is input by an expert customer service and corresponds to the consultation problem, wherein the expert reply information comprises expert text information and/or expert voice information; outputting expert feedback information to the client to output virtual image information by the client, wherein the expert feedback information comprises expert reply information, virtual voice information or virtual image information, the virtual voice information is generated by converting the expert reply information into voice information corresponding to the virtual expert image, and the virtual image information is generated by at least overlapping the virtual voice information and the virtual expert image. Human efficiency and user experience can be improved.

Description

Man-machine interaction method, server, client and storage medium
Technical Field
The invention relates to the technical field of human-computer interaction, in particular to a human-computer interaction method, a server and a storage medium for a server and a human-computer interaction method, a client and a storage medium for a client.
Background
At present, in the fields of finance and banking, a great amount of professional problem consulting services exist, and professional personnel are required to participate. The existing customer service methods mainly include the following three methods:
1. the customer goes to the service network to make an offline consultation, and needs professional customer service personnel to communicate and answer face to face. This approach requires time investments from the customer, lack of privacy protection, and sometimes queuing time investments. The financial service industry needs to invest fixed labor cost, and resources cannot be effectively shared and allocated due to the limitation of regional conditions;
2. and the voice communication is carried out with customer service personnel of the call center by dialing the telephone. Although the method does not need store consultation, only voice interaction is available, the natural sense of in-person communication is lacked, and the expression mode is relatively limited;
3. and communicating with background customer service personnel through an online customer service entrance at a terminal webpage or a mobile phone end. The method reduces the time cost of user investment, but still has the problems of low information exchange efficiency and high potential investment cost caused by factors such as simple communication mode, lack of visualization and the like.
Disclosure of Invention
In order to at least partially solve the problems in the prior art, a human-computer interaction method, a server and a storage medium for a server and a human-computer interaction method, a client and a storage medium for a client are provided.
According to one aspect of the invention, a man-machine interaction method for a server is provided, which comprises the following steps: receiving consultation information which is sent by a client and is related to the consultation problem of the current user, wherein the consultation information comprises user text information and/or user voice information; outputting the consultation information in a text form for the expert customer service to check; receiving expert reply information which is input by an expert customer service and corresponds to the consultation problem, wherein the expert reply information comprises expert text information and/or expert voice information; outputting expert feedback information to the client to output avatar information by the client, wherein the expert feedback information comprises expert reply information, virtual voice information generated based on the expert reply information or avatar information generated based on the virtual voice information, wherein the virtual voice information is generated by converting the expert reply information into voice information corresponding to the virtual expert image, and the avatar information is generated by at least overlapping the virtual voice information with the virtual expert image.
Illustratively, the expert reply information includes expert text information, the expert feedback information includes virtual voice information or avatar information, and the human-computer interaction method further includes, before outputting the expert feedback information to the client to output the avatar information by the client: and carrying out voice synthesis on the expert text information to obtain virtual voice information.
Illustratively, the expert reply information includes expert voice information, the expert feedback information includes virtual voice information or avatar information, and the human-computer interaction method further includes, before outputting the expert feedback information to the client to output the avatar information by the client: and performing tone conversion on the expert voice information to convert the expert voice information into virtual voice information.
Illustratively, the expert reply information includes expert voice information, the expert feedback information includes virtual voice information or avatar information, and the human-computer interaction method further includes, before outputting the expert feedback information to the client to output the avatar information by the client: carrying out voice recognition on the expert voice information to obtain corresponding recognition character information; and carrying out voice synthesis on the recognized character information to obtain virtual voice information.
Illustratively, before outputting the advisory information in a text form for viewing by the expert customer service, the human-computer interaction method further comprises: retrieving in a preset knowledge base based on the consultation information, wherein the preset knowledge base is used for storing preset problems and preset reply information corresponding to the preset problems; and outputting knowledge base feedback information to the client to output specific avatar information by the client when a specific preset question matched with the consultation question is retrieved, wherein the knowledge base feedback information includes specific preset reply information corresponding to the specific preset question, specific avatar information generated based on the specific preset reply information, or specific avatar information generated based on the specific avatar information, wherein the specific avatar information is generated by converting the specific preset reply information into voice information corresponding to the avatar, and the specific avatar information is generated by at least superimposing the specific avatar information with the avatar; wherein, the step of outputting the consultation information in text form for the expert customer service to check is executed under the condition that the specific preset problem matched with the consultation problem is not retrieved.
Illustratively, before outputting the advisory information in a text form for viewing by the expert customer service, the human-computer interaction method further comprises: determining a problem grade of the consultation problem based on the type of the consultation problem; wherein, the step of outputting the consultation information in text form for the expert customer service to check is executed under the condition that the problem grade of the consultation problem is higher than the preset problem grade.
Illustratively, before outputting the advisory information in a text form for viewing by the expert customer service, the human-computer interaction method further comprises: receiving identity related information which is sent by a client and related to the identity of a current user; determining identity information of the user based on the identity-related information; and determining a customer rating of the current user based on the identity information; wherein, the step of outputting the consultation information in text form for the expert customer service to check is executed under the condition that the customer grade of the current user is higher than the preset customer grade.
Illustratively, the human-computer interaction method further comprises: receiving identity related information which is sent by a client and related to the identity of a current user; determining identity information of the user based on the identity-related information; and associating the identity information, the consultation information and the expert reply information together, and storing the associated information in a preset database.
Illustratively, the identity-related information includes a face image, and determining the identity information of the user based on the identity-related information includes: and carrying out face recognition on the face image to determine identity information.
Illustratively, the advisory information includes user voice information, and outputting the advisory information in text form for viewing by the expert customer service includes: carrying out voice recognition on the user voice information so as to transcribe the user voice information into corresponding text information; and outputting the corresponding text information for the expert customer service to check.
Illustratively, speech recognizing the user speech information to transcribe the user speech information into corresponding text information comprises: performing voice recognition on the voice information of the user to obtain at least one group of candidate texts and at least one overall score in one-to-one correspondence with the at least one group of candidate texts, wherein each overall score is used for indicating the confidence degree of the corresponding candidate text; determining that the corresponding text information includes a selected candidate text in the at least one group of candidate texts, wherein the selected candidate text includes a candidate text with an overall score exceeding a preset threshold value or a preset number of candidate texts with the highest overall score in the at least one group of candidate texts.
Illustratively, speech recognizing the user speech information to obtain at least one group of candidate texts and at least one overall score corresponding to the at least one group of candidate texts in a one-to-one correspondence comprises: performing voice recognition on the voice information to obtain at least one group of candidate texts, at least one overall score and a word score of each word of each candidate text in the at least one group of candidate texts, wherein each word score is used for indicating the confidence degree of the corresponding word; determining that the corresponding textual information includes a selected candidate text of the at least one set of candidate texts comprises: determining that the corresponding text information includes the selected candidate text and a word score for each word in the selected candidate text.
Illustratively, after outputting the advisory information in a text form for viewing by the expert customer service, the human-computer interaction method further comprises: receiving a selection instruction of the expert customer service to the target word in the corresponding text information; and outputting the voice information in the preset duration after the target voice in the user voice information for the expert customer service to check, wherein the target voice is the voice corresponding to the target word.
Illustratively, after outputting the advisory information in a text form for viewing by the expert customer service, the human-computer interaction method further comprises: receiving a selection instruction of the expert customer service on the target segment in the corresponding text information; and outputting the voice fragment corresponding to the target fragment in the user voice information for the expert customer service to check.
Illustratively, the expert feedback information includes virtual voice information or avatar information, and before outputting the expert feedback information to the client to output the avatar information by the client, the human-computer interaction method further includes: receiving a style selection instruction input by expert customer service; and converting the expert reply information into virtual voice information which corresponds to the virtual expert image and has the voice style indicated by the style selection instruction.
Illustratively, the expert feedback information includes avatar information, and before outputting the expert feedback information to the client to output the avatar information by the client, the human-computer interaction method further includes: generating virtual voice information based on the expert reply information; generating character features of a virtual expert image based on the face image of the expert customer service; generating virtual video information including a virtual specialist image having a pronunciation action matched with the virtual voice information based on the virtual voice information and the character characteristics; and superimposing at least the virtual voice information and the virtual video information together to obtain avatar information.
According to another aspect of the present invention, there is provided a human-computer interaction method for a client, including: receiving consultation information which is input by a current user and is related to the consultation problem, wherein the consultation information comprises user text information and/or user voice information; outputting the consultation information to a server; receiving expert feedback information sent by a server, wherein the expert feedback information comprises expert reply information, virtual voice information generated based on the expert reply information or virtual image information generated based on the virtual voice information, the expert reply information is reply information which is input by an expert customer service and corresponds to a consultation problem, the expert reply information comprises expert text information and/or expert voice information, the virtual voice information is generated by converting the expert reply information into voice information corresponding to a virtual expert image, and the virtual image information is generated by at least overlapping the virtual voice information and the virtual expert image; obtaining virtual image information based on expert feedback information; and outputting the virtual image information for the current user to view.
Illustratively, the expert feedback information includes expert reply information including expert text information, and the obtaining the avatar information based on the expert feedback information includes: carrying out voice synthesis on the expert text information to obtain virtual voice information; and at least superposing the virtual voice information and the virtual expert image to obtain virtual image information.
Illustratively, the expert feedback information includes expert reply information including expert voice information, and the obtaining the avatar information based on the expert feedback information includes: performing tone conversion on the expert voice information to convert the expert voice information into virtual voice information; and at least superposing the virtual voice information and the virtual expert image to obtain virtual image information.
Illustratively, the expert feedback information includes expert reply information including expert voice information, and the obtaining the avatar information based on the expert feedback information includes: carrying out voice recognition on the expert voice information to obtain corresponding recognition character information; carrying out voice synthesis on the recognized character information to obtain virtual voice information; and at least superposing the virtual voice information and the virtual expert image to obtain virtual image information.
Illustratively, the human-computer interaction method further comprises: and sending the identity related information related to the identity of the current user to the server.
Illustratively, the identity-related information comprises a face image.
Illustratively, the counseling information includes user voice information, and outputting the counseling information to the service end includes: carrying out voice recognition on the user voice information so as to transcribe the user voice information into corresponding text information; and outputting the corresponding text information to the server.
Illustratively, speech recognizing the user speech information to transcribe the user speech information into corresponding text information comprises: performing voice recognition on the voice information of the user to obtain at least one group of candidate texts and at least one overall score in one-to-one correspondence with the at least one group of candidate texts, wherein each overall score is used for indicating the confidence degree of the corresponding candidate text; determining that the corresponding text information includes a selected candidate text in the at least one group of candidate texts, wherein the selected candidate text includes a candidate text with an overall score exceeding a preset threshold value or a preset number of candidate texts with the highest overall score in the at least one group of candidate texts.
Illustratively, speech recognizing the user speech information to obtain at least one group of candidate texts and at least one overall score corresponding to the at least one group of candidate texts in a one-to-one correspondence comprises: performing voice recognition on the voice information to obtain at least one group of candidate texts, at least one overall score and a word score of each word of each candidate text in the at least one group of candidate texts, wherein each word score is used for indicating the confidence degree of the corresponding word; determining that the corresponding textual information includes a selected candidate text of the at least one set of candidate texts comprises: determining that the corresponding text information includes the selected candidate text and a word score for each word in the selected candidate text.
Illustratively, the expert feedback information includes expert reply information or virtual voice information, and obtaining the avatar information based on the expert feedback information includes: acquiring virtual voice information based on expert feedback information; generating character features of a virtual expert image based on the face image of the expert customer service; generating virtual video information including a virtual specialist image having a pronunciation action matched with the virtual voice information based on the virtual voice information and the character characteristics; and superimposing at least the virtual voice information and the virtual video information together to obtain avatar information.
According to another aspect of the present invention, there is provided a server, including: the first receiving module is used for receiving consultation information which is sent by the client and is related to the consultation problem of the current user, wherein the consultation information comprises user text information and/or user voice information; the first output module is used for outputting the consultation information in a text form for the expert customer service to check; the second receiving module is used for receiving expert reply information which is input by the expert customer service and corresponds to the consultation problem, wherein the expert reply information comprises expert text information and/or expert voice information; and the second output module is used for outputting the expert feedback information to the client so as to output the virtual image information by the client, wherein the expert feedback information comprises expert reply information, virtual voice information generated based on the expert reply information or virtual image information generated based on the virtual voice information, the virtual voice information is generated by converting the expert reply information into voice information corresponding to the virtual expert image, and the virtual image information is generated by at least overlapping the virtual voice information and the virtual expert image.
According to another aspect of the present invention, there is provided a client, including: the first receiving module is used for receiving the consultation information which is input by the current user and is related to the consultation problem, and the consultation information comprises user text information and/or user voice information; the first output module is used for outputting the consultation information to the server; the second receiving module is used for receiving expert feedback information sent by the server, wherein the expert feedback information comprises expert reply information, virtual voice information generated based on the expert reply information or virtual image information generated based on the virtual voice information, the expert reply information is reply information which is input by an expert customer service and corresponds to a consultation problem, the expert reply information comprises expert text information and/or expert voice information, the virtual voice information is generated by converting the expert reply information into voice information corresponding to a virtual expert image, and the virtual image information is generated by at least overlapping the virtual voice information and the virtual expert image; the obtaining module is used for obtaining the virtual image information based on the expert feedback information; and the second output module is used for outputting the virtual image information for the current user to view.
According to another aspect of the present invention, there is also provided a server, including a processor and a memory, where the memory stores computer program instructions, and the computer program instructions are used by the processor to execute the above human-computer interaction method for the server.
According to another aspect of the present invention, there is also provided a client, including a processor and a memory, where the memory stores computer program instructions, and the computer program instructions are used for executing the above human-computer interaction method for the client when being executed by the processor.
According to another aspect of the present invention, a storage medium is further provided, on which program instructions are stored, and the program instructions are used for executing the above human-computer interaction method for the server when running.
According to another aspect of the present invention, there is also provided a storage medium on which program instructions are stored, the program instructions being configured to, when executed, perform the above-mentioned human-computer interaction method for a client.
According to the man-machine interaction method, the server and the storage medium for the server and the man-machine interaction method, the client and the storage medium for the client, provided by the embodiment of the invention, the expert customer service can serve a plurality of customers simultaneously, so that the human efficiency is improved, the human resources are greatly saved, and the user experience can be effectively improved.
A series of concepts in a simplified form are introduced in the summary of the invention, which is described in further detail in the detailed description section. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
The advantages and features of the present invention are described in detail below with reference to the accompanying drawings.
Drawings
The following drawings of the invention are included to provide a further understanding of the invention. The drawings illustrate embodiments of the invention and, together with the description, serve to explain the principles of the invention. In the drawings, there is shown in the drawings,
FIG. 1 shows a schematic flow diagram of a human-computer interaction method for a server according to one embodiment of the invention;
FIG. 2 shows a schematic flow diagram of a human-computer interaction method for a client according to one embodiment of the invention;
FIG. 3 shows a schematic diagram of a human-computer interaction system according to one embodiment of the invention;
FIG. 4 shows a schematic block diagram of a server in accordance with one embodiment of the present invention;
FIG. 5 shows a schematic block diagram of a client according to one embodiment of the present invention;
FIG. 6 shows a schematic block diagram of a server in accordance with one embodiment of the present invention; and
fig. 7 shows a schematic block diagram of a client according to an embodiment of the invention.
Detailed Description
In the following description, numerous details are provided to provide a thorough understanding of the present invention. One skilled in the art, however, will understand that the following description merely illustrates a preferred embodiment of the invention and that the invention may be practiced without one or more of these details. In other instances, well known features have not been described in detail so as not to obscure the invention.
In order to at least partially solve the technical problem, embodiments of the present invention provide a human-computer interaction method, a server, and a storage medium for a server, and a human-computer interaction method, a client, and a storage medium for a client. The embodiment of the invention provides a remote virtual customer service access technology, which is used for building human-simulated interactive customer service interaction based on a client virtual expert image through an avatar technology, a voice recognition synthesis technology and the like. Through foreground virtual image display and background expert customer service operation, multidimensional communication between clients (namely users) and expert customer services is achieved, the expert customer services can serve a plurality of clients simultaneously, human efficiency is improved, human resources are greatly saved, and user experience can be effectively improved.
The whole scheme can be divided into at least two layers, one layer is a user experience area which mainly receives input information of a user through a client, and the client can be any suitable terminal equipment including but not limited to a vertical cabinet machine of a bank, a personal computer, a smart phone and the like. The client has the capability of voice (and video) signal acquisition and/or text input, can be used to input text information and/or voice information of a user consultation problem, and can optionally acquire face information of the user. In addition, the client can receive the information fed back by the expert customer service, including expert text information and/or expert voice information, virtual image information (wherein the virtual expert image is displayed in a video form), and the like.
Another area is a background expert service area. The user voice information transmitted by the user experience area is converted into text information through technologies such as voice recognition and the like (the user text information transmitted by the user experience area does not need to be converted and can be directly output to the expert customer service), the expert customer service quickly judges the problem content of the user through the text information, the user waiting time is reduced, and the problems of a plurality of users can be accessed simultaneously. The server may also be any suitable device, such as a personal computer or server device. Implementations of embodiments of the present invention are described below.
According to one aspect of the invention, a man-machine interaction method for a server is provided. Fig. 1 shows a schematic flow diagram of a human-computer interaction method 100 for a server according to an embodiment of the invention. As shown in fig. 1, the human-computer interaction method 100 includes steps S110, S120, S130, and S140.
In step S110, the consultation information related to the consultation problem of the current user, which is sent by the client, is received, and the consultation information includes user text information and/or user voice information.
The client may include an input device such as a microphone, keyboard, touch screen, etc. A user may input user text information to the client through an input device such as a keyboard and/or touch screen, and may also input user voice information to the client through an input device such as a microphone. In addition, optionally, the client may further include an acquisition device such as a camera, and the client may acquire a face image of the user through the camera, and the subsequent server may determine the identity information of the user based on the face image. Alternatively, the user may input the identity-related information thereof, such as an identification number, a name, a login password, and the like, to the client through an input device such as a keyboard and/or a touch screen, and the subsequent client may output the identity-related information to the server.
Optionally, the number of clients accessed by the same server may be one or more, and accordingly, the number of current users may also be one or more. Steps S110-S140 may be performed for each client or each current user, respectively.
At step S120, the counseling information is outputted in a text form for the expert customer service to view.
In one example, the advisory information may include user text information, and the server may directly output the received user text information for the expert customer service to view. In one example, the consulting information may include the user voice information, and the service end may perform voice recognition on the user voice information by itself or through a third party (e.g., a voice recognition and synthesis server) to obtain corresponding recognition text information, and output the recognition text information for being viewed by the expert customer service. Of course, the consulting information may include both the user text information and the user voice information, and the service end may transcribe the user voice information into the recognition text information by itself or through a third party (for example, a voice recognition and synthesis server).
Illustratively, the server may include a display screen, which may optionally be a touch screen, and the server may display the user text information or the recognized text information obtained based on the user voice information recognition on the display screen.
In step S130, expert reply information corresponding to the consultation question and input by the expert customer service is received, wherein the expert reply information comprises expert text information and/or expert voice information.
The server may include input devices such as a microphone, a keyboard, a touch screen, etc. The expert customer service may input expert text information to the service through an input device such as a keyboard and/or touch screen, and may also input expert voice information to the service through an input device such as a microphone. Illustratively, the expert customer service can perform quick retrieval through a background knowledge base system, obtain standard replies to the consultation problems of the current user, obtain expert text information and/or expert voice information directly or after processing, and input the expert text information and/or the expert voice information to the server. Certainly, the expert customer service can think about the reply to the consultation problem of the current user by himself, and input the corresponding expert text information and/or the expert voice information to the service end.
In step S140, expert feedback information is output to the client to output avatar information by the client, wherein the expert feedback information includes expert reply information, avatar information generated based on the expert reply information, or avatar information generated based on the avatar information, wherein the avatar information is generated by converting the expert reply information into voice information corresponding to the avatar, and the avatar information is generated by superimposing at least the avatar information with the virtual expert image.
In one example, the server may directly output the expert reply information to the client, and the client may generate virtual voice information based on the expert reply information by itself or through a third party (e.g., a voice recognition and synthesis server), and further generate avatar information based on the virtual voice information, and output the avatar information to the current user for viewing.
In another example, the server may generate virtual voice information based on the expert reply information by itself or by a third party (e.g., a voice recognition and synthesis server) and output the virtual voice information to the client, and the client may generate avatar information based on the virtual voice information and output the avatar information to the current user for viewing.
In yet another example, the server may generate the virtual voice information based on the expert reply information by itself or through a third party (e.g., a voice recognition synthesis server), and generate the avatar information based on the virtual voice information, and output the avatar information to the client, which may output the avatar information to the current user for viewing.
The avatar may be any suitable avatar, for example, an avatar obtained by training a face image of another person different from the expert customer service, an avatar obtained by training a face image of the expert customer service, or an avatar formed by two-dimensional or three-dimensional modeling. Of course, the avatar may alternatively be a non-human avatar, such as a cartoon of some kind, or the like.
As will be appreciated by those skilled in the art, in training or modeling the avatar, human features of the avatar may be trained or modeled to obtain, which may characterize the avatar. Illustratively, the character features may include limb movement features and/or facial expression features, and the like. Superimposing the virtual speech information with the virtual expert image may be understood as generating virtual video information containing the virtual expert image with a pronunciation action matching the virtual speech information based on the virtual speech information and character characteristics of the virtual expert image and superimposing the virtual speech information with the virtual video information. The vocal actions may include facial expressions and/or limb actions, etc. Those skilled in the art can understand the training or modeling of the virtual expert image and the implementation manner of the superposition of the virtual expert image and the virtual voice information, which are not described herein in detail.
Optionally, when the avatar information is generated, expert text information, that is, virtual voice information, virtual video information, and expert text information may be further superimposed, so that when output on the client, the virtual avatar performs a pronunciation action matching the virtual voice information, and synchronously issues the virtual voice information, and synchronously displays the expert text information.
Alternatively, in the case where the expert customer service inputs expert voice information, voice recognition may be performed on the expert voice information, corresponding recognition text information may be obtained, and the obtained recognition text information may be superimposed with the virtual voice information and the virtual video information.
In the embodiment of the invention, the expert reply information is converted into the uniform virtual voice information no matter the expert reply information comprises the expert text information or the expert voice information, so that a user can continuously hear the sound with the same tone quality and tone color, the switching of the sound can not be realized on a user layer, and the user experience is better.
Through the method, the expert customer service can intervene in the problem consultation service of the user in real time, the scheme is not limited to a single user, namely the service end can be simultaneously accessed to one or more client sides, and the expert customer service can simultaneously serve one or more users. Therefore, compared with the existing offline consultation and telephone customer service, the man-machine interaction method provided by the embodiment of the invention can realize that one expert customer service can simultaneously serve a plurality of external customers, so that the human efficiency can be improved, and the service quality can be ensured. In addition, compared with the existing telephone customer service and on-line customer service, the man-machine interaction method provided by the embodiment of the invention can be used for performing real-person visual interaction experience, the interactive information is not only simple text and voice input, more information in communication can be acquired, and the user experience is better.
According to the embodiment of the present invention, the expert reply message includes an expert text message, the expert feedback message includes a virtual voice message or an avatar message, and before the expert feedback message is output to the client to output the avatar message by the client (step S140), the human-computer interaction method 100 may further include: and carrying out voice synthesis on the expert text information to obtain virtual voice information.
In the case where the expert reply information includes expert text information, the server may perform speech synthesis on the expert text information by itself or by a third party (e.g., a speech recognition and synthesis server) to synthesize it into virtual speech information whose tone quality and tone color match the virtual expert image. The speech synthesis may be implemented by any existing or future speech synthesis technology, which is not described herein.
Under the condition that the expert feedback information comprises the virtual voice information, the server side can directly output the synthesized virtual voice information to the client side. Under the condition that the expert feedback information comprises the virtual image information, the server can overlay the virtual voice information and the virtual expert image to obtain the virtual image information and output the virtual image information to the client.
The expert customer service inputs the expert text information, the calculation amount of the scheme of synthesizing the virtual voice information based on the expert text information by the server side is small, and the calculation pressure of the server side is small.
According to the embodiment of the present invention, the expert reply message includes expert voice information, the expert feedback message includes virtual voice information or avatar information, and before outputting the expert feedback message to the client for outputting the avatar information by the client (step S140), the human-computer interaction method 100 may further include: and performing tone conversion on the expert voice information to convert the expert voice information into virtual voice information.
In the case where the expert reply information includes the expert voice information, the server may perform tone conversion (voice conversion) on the expert voice information by itself or through a third party (e.g., a voice recognition and synthesis server) to convert it into virtual voice information whose tone quality and tone color are matched with the virtual expert image. The tone conversion can be implemented by any existing or future tone conversion technology, which is not described herein.
Under the condition that the expert feedback information comprises the virtual voice information, the server side can directly output the converted virtual voice information to the client side. Under the condition that the expert feedback information comprises the virtual image information, the server can overlay the virtual voice information and the virtual expert image to obtain the virtual image information and output the virtual image information to the client.
The expert customer service inputs the expert voice information, and the service end converts the virtual voice information based on the expert voice information, so that the workload of the expert customer service is small, the user can quickly answer the questions, and the service efficiency is high.
According to the embodiment of the present invention, the expert reply message includes expert voice information, the expert feedback message includes virtual voice information or avatar information, and before outputting the expert feedback message to the client for outputting the avatar information by the client (step S140), the human-computer interaction method 100 may further include: carrying out voice recognition on the expert voice information to obtain corresponding recognition character information; and carrying out voice synthesis on the recognized character information to obtain virtual voice information.
In the case where the expert reply information includes expert voice information, the server may perform voice recognition on the expert voice information by itself or by a third party (e.g., a voice recognition and synthesis server), and then perform voice synthesis based on the recognized text information to obtain virtual voice information whose tone quality and tone color are matched with the virtual expert image. The speech recognition and speech synthesis may be implemented by any existing or future speech recognition technology and speech synthesis technology, which are not described in detail herein.
Under the condition that the expert feedback information comprises the virtual voice information, the server side can directly output the recognized and synthesized virtual voice information to the client side. Under the condition that the expert feedback information comprises the virtual image information, the server can overlay the virtual voice information and the virtual expert image to obtain the virtual image information and output the virtual image information to the client.
The expert customer service inputs the expert voice information, the service end identifies and synthesizes the scheme of the virtual voice information on the basis of the expert voice information, the workload of the expert customer service is small, the user can quickly answer the questions, and the service efficiency is high.
According to the embodiment of the present invention, before outputting the advisory information in text form for the expert customer service to view (step S120), the human-computer interaction method 100 may further include: retrieving in a preset knowledge base based on the consultation information, wherein the preset knowledge base is used for storing preset problems and preset reply information corresponding to the preset problems; and outputting knowledge base feedback information to the client to output specific avatar information by the client when a specific preset question matched with the consultation question is retrieved, wherein the knowledge base feedback information includes specific preset reply information corresponding to the specific preset question, specific avatar information generated based on the specific preset reply information, or specific avatar information generated based on the specific avatar information, wherein the specific avatar information is generated by converting the specific preset reply information into voice information corresponding to the avatar, and the specific avatar information is generated by at least superimposing the specific avatar information with the avatar; wherein the step of outputting the counseling information in a text form for the expert customer service to view (step S120) is performed in case that a specific preset question matching the counseling question is not retrieved.
The specific preset question matched with the consultation question means that the similarity between the specific preset question and the consultation question is greater than a similarity threshold. The similarity threshold may be set to any suitable value, such as 90%, 95%, etc., as desired.
The preset knowledge base may store a large number of preset questions and preset reply information corresponding to each preset question. The server may perform retrieval in the preset knowledge base based on the consultation information, for example, when a specific preset problem with a similarity greater than 90% to the consultation problem is retrieved, specific preset reply information corresponding to the specific preset problem, specific virtual voice information generated based on the specific preset reply information, or specific avatar information generated based on the specific virtual voice information may be output to the server.
The specific virtual voice information generated based on the specific preset reply information and the specific avatar information generated based on the specific virtual voice information may be understood with reference to the above description regarding step S140, and are not described herein again.
The consultation problem of the user can be automatically retrieved in a preset knowledge base by the server side, if the matched reply is retrieved, the consultation problem can be automatically fed back to the client side, and if the matched reply is not retrieved, the problem that the accurate reply cannot be realized is solved, and the problem is transferred to the expert customer service place to be processed by the expert customer service. The scheme can reduce the manpower occupation of expert customer service, can effectively improve the efficiency of problem processing, and further can greatly improve the customer experience.
According to the embodiment of the present invention, before outputting the advisory information in text form for the expert customer service to view (step S120), the human-computer interaction method 100 may further include: determining a problem grade of the consultation problem based on the type of the consultation problem; wherein the step of outputting the counseling information in a text form for the expert customer service to view (step S120) is performed in case that the problem grade of the counseling problem is higher than a preset problem grade.
The server can classify the user questions in advance and assign different question levels to each type of question, for example, the questions related to futures buying and selling can be assigned with higher question levels so as to be processed preferentially, and the questions related to bank passwords can be assigned with lower question levels, because the futures buying and selling are closely related to the assets of the user, if the return has errors, the vital interests of the user can be greatly influenced, and the property of the user cannot be damaged when the user inputs the passwords mistakenly.
When the server receives the consultation information, the server can judge the problem level, the server preferentially transfers the high problem level to the expert customer service department for processing, and the low problem level can be optionally processed in an automatic reply mode. The scheme can reduce the manpower occupation of expert customer service, can effectively improve the efficiency of problem processing, and further can greatly improve the customer experience.
According to the embodiment of the present invention, before outputting the advisory information in text form for the expert customer service to view (step S120), the human-computer interaction method 100 may further include: receiving identity related information which is sent by a client and related to the identity of a current user; determining identity information of the user based on the identity-related information; and determining a customer rating of the current user based on the identity information; wherein the step of outputting the counseling information in a text form for the expert customer service to view (step S120) is performed in case that the customer rating of the current user is higher than the preset customer rating.
For the question reply of the user, the strategy adjustment can be carried out according to the identity of the user. For example, if the user is a VIP user, the problem can be directly forwarded to the expert customer service, and the expert customer service intervenes in the whole process. If the user is a common user, the user can firstly perform automatic reply, and when the problem that the user cannot reply accurately is encountered, the user can flow to a specialist customer service place to process the user. The scheme can effectively ensure the service quality for important users, thereby improving the customer experience.
According to the embodiment of the present invention, the human-computer interaction method 100 may further include: receiving identity related information which is sent by a client and related to the identity of a current user; determining identity information of the user based on the identity-related information; and associating the identity information, the consultation information and the expert reply information together, and storing the associated information in a preset database.
Optionally, the identity-related information may include login information currently input by the user, such as an identification number, a login password, a name, and the like. Optionally, the identity-related information may further include a facial image of the current user, and as described above, the client may collect the facial image of the current user and output the facial image to the server. The server may perform face recognition on the face image of the current user by itself or through a third party (e.g., a face recognition server) to determine the identity information of the current user. Of course, the server may also determine the identity information of the server based on the login information input by the current user.
After the identity information is determined, the server side can store the problems consulted by the current user and the replies of the expert customer service in a preset database for recording, so that a corresponding file can be established for each user, and the user can be conveniently managed. The above-described docketing operations may be performed at any suitable time.
According to the embodiment of the present invention, the identity-related information includes a face image, and determining the identity information of the user based on the identity-related information may include: and carrying out face recognition on the face image to determine identity information.
The face recognition may be implemented by any existing or future face recognition technology, which is not described herein. The identity information of the user is acquired in a face acquisition and recognition mode, the mode is an intelligent and automatic scheme, login information does not need to be input by the user, and user operation can be reduced.
According to the embodiment of the present invention, the outputting the advisory information in text form to be viewed by the expert customer service (step S120) may include: carrying out voice recognition on the user voice information so as to transcribe the user voice information into corresponding text information; and outputting the corresponding text information for the expert customer service to check.
Compared with voice information, the text information is more convenient to read, so that the expert customer service can know the problem of the user consultation in a shorter time. Therefore, if the consultation information comprises user text information, the consultation information can be directly output to the expert customer service to check, and if the consultation information comprises user voice information, the consultation information can be transcribed into corresponding text information and then output to the expert customer service to check, so that the speed of the expert customer service for processing problems can be effectively improved.
According to the embodiment of the invention, the voice recognition of the user voice information to transcribe the user voice information into the corresponding text information comprises the following steps: performing voice recognition on the voice information of the user to obtain at least one group of candidate texts and at least one overall score in one-to-one correspondence with the at least one group of candidate texts, wherein each overall score is used for indicating the confidence degree of the corresponding candidate text; determining that the corresponding text information includes a selected candidate text in the at least one group of candidate texts, wherein the selected candidate text includes a candidate text with an overall score exceeding a preset threshold value or a preset number of candidate texts with the highest overall score in the at least one group of candidate texts.
As a result of the recognition of the user's voice information, there may be false and fuzzy zones. For example, speech recognition may obtain at least one set of candidate texts, and multiple candidate outputs, i.e., all or a portion of the top candidate texts, may be selected.
It will be appreciated by those skilled in the art that each set of candidate texts corresponding to the user speech information may include an overall score and a word score for each word in the candidate text during speech recognition. The candidate texts may be ranked based on the overall score, and the top few sets of candidate texts with high scores may be output. The predetermined number may be any suitable number, which is not limited herein, e.g., the predetermined number may be three groups.
And a plurality of groups of candidate texts are output, so that the expert customer service can more accurately know the correct meaning of the user voice information.
According to the embodiment of the invention, performing voice recognition on the voice information of the user to obtain at least one group of candidate texts and at least one overall score corresponding to the at least one group of candidate texts in a one-to-one mode comprises the following steps: performing voice recognition on the voice information to obtain at least one group of candidate texts, at least one overall score and a word score of each word of each candidate text in the at least one group of candidate texts, wherein each word score is used for indicating the confidence degree of the corresponding word; determining that the corresponding textual information includes a selected candidate text of the at least one set of candidate texts comprises: determining that the corresponding text information includes the selected candidate text and a word score for each word in the selected candidate text.
Alternatively, the word score for outputting each word may be a numerical value for outputting an actual word score, or may be a value representing the word score in a preset color. For example, words with a word score below a predetermined score threshold may be marked with a red font or a red background color. Therefore, the expert customer service can conveniently and quickly judge which characters are inaccurate in recognition, and can listen to the corresponding voice fragments again aiming at the characters so as to acquire accurate user voice information.
According to the embodiment of the present invention, after outputting the advisory information in text form for the expert customer service to view (step S120), the human-computer interaction method 100 may further include: receiving a selection instruction of the expert customer service to the target word in the corresponding text information; and outputting the voice information in the preset duration after the target voice in the user voice information for the expert customer service to check, wherein the target voice is the voice corresponding to the target word.
The preset time period may be set as desired, and may be any suitable time period. In one example, the corresponding text information contains 100 words, wherein the 50 th word is ambiguous, the expert customer service can select (e.g., click) the 50 th word by a mouse or a touch screen, and the server can read the speech corresponding to the word starting from the 50 th word (which can contain the 50 th word) and going back 10 words, or 20 words or until the end of the whole question.
According to the embodiment of the present invention, after outputting the advisory information in text form for the expert customer service to view (step S120), the human-computer interaction method 100 may further include: receiving a selection instruction of the expert customer service on the target segment in the corresponding text information; and outputting the voice fragment corresponding to the target fragment in the user voice information for the expert customer service to check.
In one example, the corresponding text information includes 100 words, wherein the 50 th to 70 th words are ambiguous, the expert customer service can select a target segment composed of the 50 th to 70 th words through a mouse or a touch screen, and the server can read the voice corresponding to the segment from the 50 th word to the 70 th word backwards.
By the mode, the voice corresponding to the words which are considered to be ambiguous by the expert customer service can be played again, so that the customer service expert can quickly confirm the content actually contained in the voice.
According to an embodiment of the present invention, the expert feedback information includes virtual voice information or avatar information, and before outputting the expert feedback information to the client to output the avatar information by the client (step S140), the human-computer interaction method 100 may further include: receiving a style selection instruction input by expert customer service; and converting the expert reply information into virtual voice information which corresponds to the virtual expert image and has the voice style indicated by the style selection instruction.
The voice style may refer to feelings, emotions, and the like, and the voice of the same person may have different voice styles, which are suitable for different scenes. For example, a calm voice may be used in a news broadcast scenario, and a rich, emotional voice may be used in a storytelling scenario. When replying to the user, the expert customer service can independently select different voice styles, so that the virtual expert image is replied more naturally, and the user experience is better. Of course, alternatively, the server may automatically assign different voice styles to the reply messages with different contents, which may be preset.
According to an embodiment of the present invention, the expert feedback information includes avatar information, and before outputting the expert feedback information to the client to output the avatar information by the client (step S140), the human-computer interaction method 100 may further include: generating virtual voice information based on the expert reply information; generating character features of a virtual expert image based on the face image of the expert customer service; generating virtual video information containing a virtual expert image with pronunciation action matched with the virtual voice information based on the virtual voice information and the character characteristics; and superimposing at least the virtual voice information and the virtual video information together to obtain avatar information.
Generating the character features of the virtual expert image based on the face image of the expert customer service may be performed during a training or modeling phase for the virtual expert image. For example, a large amount of human face image data of expert customer service can be collected, and the character features of the virtual expert image can be obtained through training in a neural network model and the like. Optionally, the human character of the virtual expert image consistent with the expert customer service can be repeatedly engraved through the human face image data of a small amount of expert customer service. Subsequently, when the consultation information of the user is actually replied, the virtual voice information can be generated based on the expert reply information, the virtual video information containing the virtual expert image with the pronunciation action matched with the virtual voice information is generated based on the virtual voice information and the character characteristics, and at least the virtual voice information and the virtual video information are overlapped together to obtain the virtual image information.
Similarly, a large amount of voice data of expert customer service can be collected, and the voice characteristics of the virtual expert image can be obtained through training in a neural network model and the like. Optionally, the voice characteristics of the expert customer service can be duplicated through a small amount of voice data of the expert customer service. And subsequently, when the consultation information of the user is actually replied, generating virtual voice information based on the expert reply information and the voice characteristics of the expert customer service.
The finally obtained virtual expert image has the appearance and the tone consistent with the expert customer service, and meanwhile, the pronunciation action consistent with the expert reply information is kept. The virtual expert image obtained according to the expert customer service real image can avoid the user experience of one side of thousands of people. The above-described embodiments are only examples and are not intended to limit the present invention, and as described above, the avatar of the virtual expert may be an avatar of another person or an entirely imaginary avatar.
According to another aspect of the invention, a human-computer interaction method for a client is provided. FIG. 2 shows a schematic flow diagram of a human-computer interaction method 200 for a client according to one embodiment of the invention. As shown in fig. 2, the human-computer interaction method 200 includes steps S210, S220, S230, S240, and S250.
At step S210, counseling information related to the counseling question input by the current user is received, and the counseling information includes user text information and/or user voice information.
In step S220, the consultation information is output to the server.
In step S230, expert feedback information sent by the server is received, wherein the expert feedback information includes expert reply information, virtual voice information generated based on the expert reply information, or virtual image information generated based on the virtual voice information, wherein the expert reply information is reply information corresponding to the consultation question input by the expert customer service, the expert reply information includes expert text information and/or expert voice information, the virtual voice information is generated by converting the expert reply information into voice information corresponding to the virtual expert image, and the virtual image information is generated by at least overlapping the virtual voice information and the virtual expert image.
In step S240, avatar information is obtained based on the expert feedback information. In the case where the expert feedback information includes expert reply information, virtual voice information may be generated based on the expert reply information and avatar information may be generated based on the virtual voice information. In the case where the expert feedback information includes virtual voice information, avatar information may be generated based on the virtual voice information. In the case where the expert feedback information includes avatar information, the avatar information may be directly obtained.
In step S250, avatar information is output for viewing by the current user.
While the man-machine interaction method 100 for the server is described above, the implementation manner of the client sending the consultation information and receiving the expert feedback information and the content included in various information are described, and the implementation manner of the embodiment can be understood in combination with the above description, and is not described here again.
Compared with the existing offline consultation and telephone customer service, the man-machine interaction method provided by the embodiment of the invention can realize that one expert customer service can simultaneously serve a plurality of external customers, so that the human efficiency can be improved, and the service quality can be ensured. In addition, compared with the existing telephone customer service and on-line customer service, the man-machine interaction method provided by the embodiment of the invention can be used for performing real-person visual interaction experience, the interactive information is not only simple text and voice input, more information in communication can be acquired, and the user experience is better.
According to the embodiment of the present invention, the expert feedback information includes expert reply information, the expert reply information includes expert text information, and obtaining the avatar information based on the expert feedback information (step S240) may include: carrying out voice synthesis on the expert text information to obtain virtual voice information; and at least superposing the virtual voice information and the virtual expert image to obtain virtual image information.
The client may perform speech synthesis on the expert text information by itself or by a third party (e.g., a speech recognition synthesis server) to obtain the virtual speech information. Speech synthesis may be implemented using any existing or future speech synthesis technology. The implementation manner of at least superimposing the virtual speech information and the virtual expert image to obtain the virtual image information may refer to the above description, which is not repeated herein.
According to the embodiment of the present invention, the expert feedback information includes expert reply information, the expert reply information includes expert voice information, and obtaining the avatar information based on the expert feedback information (step S240) may include: performing tone conversion on the expert voice information to convert the expert voice information into virtual voice information; and at least superposing the virtual voice information and the virtual expert image to obtain virtual image information.
In the case where the expert reply information includes expert speech information, the client may perform tone conversion on the expert speech information by itself or by a third party (e.g., a speech recognition and synthesis server) to convert it into virtual speech information whose tone quality and tone color match the virtual expert image.
According to the embodiment of the present invention, the expert feedback information includes expert reply information, the expert reply information includes expert voice information, and obtaining the avatar information based on the expert feedback information (step S240) may include: carrying out voice recognition on the expert voice information to obtain corresponding recognition character information; carrying out voice synthesis on the recognized character information to obtain virtual voice information; and at least superposing the virtual voice information and the virtual expert image to obtain virtual image information.
In the case where the expert reply information includes expert voice information, the client may perform voice recognition on the expert voice information by itself or by a third party (e.g., a voice recognition and synthesis server), and then perform voice synthesis based on the recognized text information to obtain virtual voice information whose tone quality and tone color are matched with the virtual expert image.
According to the embodiment of the present invention, the human-computer interaction method 200 may further include: and sending the identity related information related to the identity of the current user to the server.
According to an embodiment of the invention, the identity-related information comprises a face image.
The identity related information related to the identity of the current user is sent to the server, so that the server can obtain the identity information of the current user based on the identity related information, and further, filing or customer level confirmation is facilitated, which can be understood by referring to the description above.
According to the embodiment of the present invention, the outputting the advisory information to the service end (step S220) may include: carrying out voice recognition on the user voice information so as to transcribe the user voice information into corresponding text information; and outputting the corresponding text information to the server.
According to the embodiment of the invention, the voice recognition of the user voice information to transcribe the user voice information into the corresponding text information comprises the following steps: performing voice recognition on the voice information of the user to obtain at least one group of candidate texts and at least one overall score in one-to-one correspondence with the at least one group of candidate texts, wherein each overall score is used for indicating the confidence degree of the corresponding candidate text; determining that the corresponding text information includes a selected candidate text in the at least one group of candidate texts, wherein the selected candidate text includes a candidate text with an overall score exceeding a preset threshold value or a preset number of candidate texts with the highest overall score in the at least one group of candidate texts.
According to the embodiment of the invention, performing voice recognition on the voice information of the user to obtain at least one group of candidate texts and at least one overall score corresponding to the at least one group of candidate texts in a one-to-one mode comprises the following steps: performing voice recognition on the voice information to obtain at least one group of candidate texts, at least one overall score and a word score of each word of each candidate text in the at least one group of candidate texts, wherein each word score is used for indicating the confidence degree of the corresponding word; determining that the corresponding textual information includes a selected candidate text of the at least one set of candidate texts comprises: determining that the corresponding text information includes the selected candidate text and a word score for each word in the selected candidate text.
The implementation manners of identifying to obtain the candidate text and selecting the selected candidate text may refer to the above description, which is not repeated herein.
According to the embodiment of the present invention, the expert feedback information includes expert reply information or virtual voice information, and obtaining the avatar information based on the expert feedback information (step S240) may include: acquiring virtual voice information based on expert feedback information; generating character features of a virtual expert image based on the face image of the expert customer service; generating virtual video information including a virtual specialist image having a pronunciation action matched with the virtual voice information based on the virtual voice information and the character characteristics; and superimposing at least the virtual voice information and the virtual video information together to obtain avatar information.
The avatar information may be generated at the client, and the generation manner may refer to the above description, which is not described herein again.
FIG. 3 shows a schematic diagram of a human-computer interaction system according to one embodiment of the invention. As shown in fig. 3, the human-computer interaction system can be divided into a user experience area and a background expert service area. In the user experience zone, a user can send user voice information and/or user text information to a server through a client such as a video telephone. Optionally, the client may also synchronously send the identity-related information of the user to the server. In the background expert service area, two situations can be distinguished, namely expert reply and automatic reply. And (3) expert reply, namely the expert customer service receives the text information converted from the user voice information and/or the user text information, inputs the expert voice information and/or the expert text information to the server, and the server outputs expert feedback information to the client. And automatically replying, namely, the server side searches in a preset knowledge base, and generates knowledge base feedback information based on the searched specific preset reply information and outputs the knowledge base feedback information to the client side. The server or the client may convert the expert reply information (or the specific preset reply information) into virtual voice information (or specific virtual voice information) of the virtual expert image, superimpose the virtual voice information (or the specific virtual voice information) with the virtual expert image, and finally output the superimposed virtual image information (or the specific virtual image information) at the client.
According to another aspect of the present invention, a server is provided. Fig. 4 shows a schematic block diagram of a server 400 according to an embodiment of the invention.
As shown in fig. 4, the server 400 according to the embodiment of the present invention includes a first receiving module 410, a first outputting module 420, a second receiving module 430, and a second outputting module 440. The modules may respectively perform the steps/functions of the human-computer interaction method 100 for the server described above in conjunction with fig. 1. Only the main functions of the components of the server 400 are described below, and details that have been described above are omitted.
The first receiving module 410 is used for receiving consultation information related to a consultation problem of a current user, which is sent by a client, wherein the consultation information comprises user text information and/or user voice information.
The first output module 420 is used for outputting the consulting information in text form for the expert customer service to view.
The second receiving module 430 is configured to receive expert reply information corresponding to the consultation problem, which is input by the expert customer service, where the expert reply information includes expert text information and/or expert voice information.
The second output module 440 is configured to output expert feedback information to the client to output avatar information from the client, wherein the expert feedback information includes expert reply information, avatar information generated based on the expert reply information, or avatar information generated based on the avatar information, wherein the avatar information is generated by converting the expert reply information into voice information corresponding to the avatar, and the avatar information is generated by at least superimposing the avatar information with the expert virtual voice information.
According to another aspect of the present invention, a client is provided. Fig. 5 shows a schematic block diagram of a client 500 according to an embodiment of the invention.
As shown in fig. 5, the client 500 according to an embodiment of the present invention includes a first receiving module 510, a first outputting module 520, a second receiving module 530, an obtaining module 540, and a second outputting module 550. The respective modules may respectively perform the respective steps/functions of the human-computer interaction method 200 for the client described above in connection with fig. 2. Only the main functions of the respective components of the client 500 are described below, and details that have been described above are omitted.
The first receiving module 510 is configured to receive consulting information related to a consulting question input by a current user, where the consulting information includes user text information and/or user voice information;
the first output module 520 is used for outputting the consultation information to the server;
the second receiving module 530 is configured to receive expert feedback information sent by the server, where the expert feedback information includes expert reply information, virtual voice information generated based on the expert reply information, or virtual image information generated based on the virtual voice information, where the expert reply information is reply information corresponding to a consultation question input by an expert customer service, the expert reply information includes expert text information and/or expert voice information, the virtual voice information is generated by converting the expert reply information into voice information corresponding to a virtual expert image, and the virtual image information is generated by at least superimposing the virtual voice information and the virtual expert image.
The obtaining module 540 is configured to obtain the avatar information based on the expert feedback information.
The second output module 550 is used for outputting the avatar information for the current user to view.
According to another aspect of the present invention, a server is provided. Fig. 6 shows a schematic block diagram of a server 600 according to an embodiment of the invention. The server 600 includes a processor 610 and a memory 620.
The memory 620 stores computer program instructions for implementing corresponding steps in the human-computer interaction method 100 for a server according to an embodiment of the present invention.
The processor 610 is configured to execute the computer program instructions stored in the memory 620 to perform the corresponding steps of the human-computer interaction method 100 for the server according to the embodiment of the present invention.
Illustratively, the server 600 may further include a server communication device (not shown) for receiving the advisory information sent by the client and outputting the advisory information to the processor 610. The server communication device can also be used for outputting expert feedback information to the client. Optionally, the server communication device may include a wired communication interface or a wireless communication interface.
Illustratively, the server 600 may further include a server output means (not shown) for outputting the counseling information in a text form. Optionally, the server output device may include a display screen, and the display screen may be a touch screen.
Illustratively, the server 600 may further include a server input device (not shown) for receiving the expert reply information input by the expert customer service. Optionally, the server-side input device may include one or more of a keyboard, a touch screen, a microphone, and the like. Alternatively, the server input device and the server output device may be implemented by using the same touch screen.
According to another aspect of the present invention, a client is provided. Fig. 7 shows a schematic block diagram of a client 700 according to an embodiment of the invention. Client 700 includes a processor 710 and memory 720.
The memory 720 stores computer program instructions for implementing the corresponding steps in the human-computer interaction method 200 for a client according to an embodiment of the present invention.
The processor 710 is configured to execute the computer program instructions stored in the memory 720 to perform the corresponding steps of the human-computer interaction method 200 for the client according to the embodiment of the present invention.
Illustratively, the client 700 may further include a client communication device (not shown) for outputting the consultation information to the server, receiving the expert feedback information transmitted from the server, and outputting the expert feedback information to the processor 720. Optionally, the client communication device may include a wired communication interface or a wireless communication interface.
Illustratively, the client 700 may further include a client input device (not shown) for receiving the counseling information input by the current user. Alternatively, the client input device may include one or more of a keypad, a touch screen, a microphone, and the like.
Illustratively, the client 700 may further include a client output device (not shown) for outputting avatar information for viewing by the current user. Optionally, the client output device may include a display screen, which may be a touch screen. Alternatively, the client input device and the client output device may be implemented using the same touch screen.
According to another aspect of the present invention, a storage medium is provided, on which program instructions are stored, which when executed by a computer or a processor are used for executing the corresponding steps of the human-computer interaction method 100 for a server according to an embodiment of the present invention, and are used for implementing the corresponding modules in the server 400 according to an embodiment of the present invention. The storage medium may include, for example, a memory card of a smart phone, a storage component of a tablet computer, a hard disk of a personal computer, a Read Only Memory (ROM), an Erasable Programmable Read Only Memory (EPROM), a portable compact disc read only memory (CD-ROM), a USB memory, or any combination of the above storage media.
According to another aspect of the present invention, there is provided a storage medium on which program instructions are stored, which when executed by a computer or a processor are used for executing the corresponding steps of the human-computer interaction method 200 for a client according to an embodiment of the present invention, and for implementing the corresponding modules in the client 500 according to an embodiment of the present invention. The storage medium may include, for example, a memory card of a smart phone, a storage component of a tablet computer, a hard disk of a personal computer, a Read Only Memory (ROM), an Erasable Programmable Read Only Memory (EPROM), a portable compact disc read only memory (CD-ROM), a USB memory, or any combination of the above storage media.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functionality of some of the modules in a server or client according to embodiments of the present invention. The present invention may also be embodied as apparatus programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.
The above description is only for the specific embodiment of the present invention or the description thereof, and the protection scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and the changes or substitutions should be covered within the protection scope of the present invention. The protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A man-machine interaction method for a server side comprises the following steps:
receiving consultation information which is sent by a client and is related to the consultation problem of the current user, wherein the consultation information comprises user text information and/or user voice information;
outputting the consultation information in a text form for the expert customer service to check;
receiving expert reply information which is input by the expert customer service and corresponds to the consultation problem, wherein the expert reply information comprises expert text information and/or expert voice information;
outputting expert feedback information to the client to output avatar information by the client, wherein the expert feedback information includes the expert reply information, avatar information generated based on the expert reply information, or avatar information generated based on the avatar information, wherein the avatar information is generated by converting the expert reply information into voice information corresponding to an avatar, the avatar information is generated by at least superimposing the avatar information with the avatar.
2. The human-computer interaction method according to claim 1, wherein the expert reply message includes the expert text message, the expert feedback message includes the virtual voice message or the avatar message, and before the outputting of the expert feedback message to the client for outputting of the avatar message by the client, the human-computer interaction method further comprises:
and carrying out voice synthesis on the expert text information to obtain the virtual voice information.
3. The human-computer interaction method according to claim 1, wherein the expert reply message includes the expert voice message, the expert feedback message includes the virtual voice message or the avatar message, and before the outputting of the expert feedback message to the client for outputting of the avatar message by the client, the human-computer interaction method further comprises:
and performing tone conversion on the expert voice information to convert the expert voice information into the virtual voice information.
4. A human-computer interaction method for a client comprises the following steps:
receiving consultation information which is input by a current user and is related to consultation problems, wherein the consultation information comprises user text information and/or user voice information;
outputting the consultation information to a server;
receiving expert feedback information sent by the server, wherein the expert feedback information comprises expert reply information, virtual voice information generated based on the expert reply information or virtual image information generated based on the virtual voice information, the expert reply information is reply information which is input by expert customer service and corresponds to the consultation question, the expert reply information comprises expert text information and/or expert voice information, the virtual voice information is generated by converting the expert reply information into voice information corresponding to a virtual expert image, and the virtual image information is generated by at least overlapping the virtual voice information and the virtual expert image;
obtaining the avatar information based on the expert feedback information; and
and outputting the virtual image information for the current user to view.
5. A server, comprising:
the system comprises a first receiving module, a second receiving module and a third receiving module, wherein the first receiving module is used for receiving consultation information which is sent by a client and is related to a consultation problem of a current user, and the consultation information comprises user text information and/or user voice information;
the first output module is used for outputting the consultation information in a text form for being checked by expert customer service;
the second receiving module is used for receiving expert reply information which is input by the expert customer service and corresponds to the consultation problem, and the expert reply information comprises expert text information and/or expert voice information;
a second output module, for outputting expert feedback information to the client, in order by the client outputs avatar information, wherein, the expert feedback information includes expert reply information, based on the virtual voice information that expert reply information generated or based on the virtual voice information generated avatar information, wherein, the virtual voice information is generated by converting expert reply information into the voice information corresponding to the virtual expert image, the avatar information is generated by at least overlapping the virtual voice information with the virtual expert image.
6. A client, comprising:
the system comprises a first receiving module, a second receiving module and a processing module, wherein the first receiving module is used for receiving consultation information which is input by a current user and is related to a consultation problem, and the consultation information comprises user text information and/or user voice information;
the first output module is used for outputting the consultation information to a server;
the second receiving module is used for receiving expert feedback information sent by the server, wherein the expert feedback information comprises expert reply information, virtual voice information generated based on the expert reply information or virtual image information generated based on the virtual voice information, the expert reply information is reply information which is input by an expert customer service and corresponds to the consultation problem, the expert reply information comprises expert text information and/or expert voice information, the virtual voice information is generated by converting the expert reply information into voice information corresponding to a virtual expert image, and the virtual image information is generated by at least overlapping the virtual voice information and the virtual expert image;
an obtaining module, configured to obtain the avatar information based on the expert feedback information; and
and the second output module is used for outputting the virtual image information for the current user to view.
7. A server comprising a processor and a memory, wherein the memory has stored therein computer program instructions for executing the human-computer interaction method for a server according to any one of claims 1 to 3 when executed by the processor.
8. A client comprising a processor and a memory, wherein the memory has stored therein computer program instructions for executing the human-computer interaction method for a client according to claim 4 when executed by the processor.
9. A storage medium on which program instructions are stored, the program instructions being adapted to perform the human-machine interaction method for a server according to any one of claims 1 to 3 when executed.
10. A storage medium on which are stored program instructions for performing, when running, the human-machine interaction method for a client according to claim 4.
CN202010390340.5A 2020-05-09 2020-05-09 Man-machine interaction method, server, client and storage medium Pending CN111599359A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010390340.5A CN111599359A (en) 2020-05-09 2020-05-09 Man-machine interaction method, server, client and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010390340.5A CN111599359A (en) 2020-05-09 2020-05-09 Man-machine interaction method, server, client and storage medium

Publications (1)

Publication Number Publication Date
CN111599359A true CN111599359A (en) 2020-08-28

Family

ID=72182032

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010390340.5A Pending CN111599359A (en) 2020-05-09 2020-05-09 Man-machine interaction method, server, client and storage medium

Country Status (1)

Country Link
CN (1) CN111599359A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111988477A (en) * 2020-09-02 2020-11-24 中国银行股份有限公司 Incoming call processing method and device, server and storage medium
CN112200588A (en) * 2020-09-30 2021-01-08 广东岭南通股份有限公司 Method and system for automatically processing card-opening customer service of air card
CN112992138A (en) * 2021-02-04 2021-06-18 中通天鸿(北京)通信科技股份有限公司 TTS-based voice interaction method and system
CN113031768A (en) * 2021-03-16 2021-06-25 深圳追一科技有限公司 Customer service method, customer service device, electronic equipment and storage medium
CN113382020A (en) * 2021-07-07 2021-09-10 北京市商汤科技开发有限公司 Interaction control method and device, electronic equipment and computer readable storage medium
CN114979723A (en) * 2022-02-14 2022-08-30 杭州脸脸会网络技术有限公司 Virtual intelligent customer service method, device, electronic device and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106098060A (en) * 2016-05-19 2016-11-09 北京搜狗科技发展有限公司 The correction processing method of voice and device, the device of correction process for voice
CN108073976A (en) * 2016-11-18 2018-05-25 科沃斯商用机器人有限公司 Man-machine interactive system and its man-machine interaction method
CN109618068A (en) * 2018-11-08 2019-04-12 上海航动科技有限公司 A kind of voice service method for pushing, device and system based on artificial intelligence
CN110148416A (en) * 2019-04-23 2019-08-20 腾讯科技(深圳)有限公司 Audio recognition method, device, equipment and storage medium
CN110322883A (en) * 2019-06-27 2019-10-11 上海麦克风文化传媒有限公司 A kind of effective speech turns text effects evaluation optimization method
CN110647636A (en) * 2019-09-05 2020-01-03 深圳追一科技有限公司 Interaction method, interaction device, terminal equipment and storage medium
WO2020067666A1 (en) * 2018-09-28 2020-04-02 주식회사 솔루게이트 Virtual counseling system and counseling method using same
CN111045510A (en) * 2018-10-15 2020-04-21 ***通信集团山东有限公司 Man-machine interaction method and system based on augmented reality

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106098060A (en) * 2016-05-19 2016-11-09 北京搜狗科技发展有限公司 The correction processing method of voice and device, the device of correction process for voice
CN108073976A (en) * 2016-11-18 2018-05-25 科沃斯商用机器人有限公司 Man-machine interactive system and its man-machine interaction method
WO2020067666A1 (en) * 2018-09-28 2020-04-02 주식회사 솔루게이트 Virtual counseling system and counseling method using same
CN111045510A (en) * 2018-10-15 2020-04-21 ***通信集团山东有限公司 Man-machine interaction method and system based on augmented reality
CN109618068A (en) * 2018-11-08 2019-04-12 上海航动科技有限公司 A kind of voice service method for pushing, device and system based on artificial intelligence
CN110148416A (en) * 2019-04-23 2019-08-20 腾讯科技(深圳)有限公司 Audio recognition method, device, equipment and storage medium
CN110322883A (en) * 2019-06-27 2019-10-11 上海麦克风文化传媒有限公司 A kind of effective speech turns text effects evaluation optimization method
CN110647636A (en) * 2019-09-05 2020-01-03 深圳追一科技有限公司 Interaction method, interaction device, terminal equipment and storage medium

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111988477A (en) * 2020-09-02 2020-11-24 中国银行股份有限公司 Incoming call processing method and device, server and storage medium
CN112200588A (en) * 2020-09-30 2021-01-08 广东岭南通股份有限公司 Method and system for automatically processing card-opening customer service of air card
CN112992138A (en) * 2021-02-04 2021-06-18 中通天鸿(北京)通信科技股份有限公司 TTS-based voice interaction method and system
CN113031768A (en) * 2021-03-16 2021-06-25 深圳追一科技有限公司 Customer service method, customer service device, electronic equipment and storage medium
CN113382020A (en) * 2021-07-07 2021-09-10 北京市商汤科技开发有限公司 Interaction control method and device, electronic equipment and computer readable storage medium
CN114979723A (en) * 2022-02-14 2022-08-30 杭州脸脸会网络技术有限公司 Virtual intelligent customer service method, device, electronic device and storage medium
CN114979723B (en) * 2022-02-14 2023-08-29 杭州脸脸会网络技术有限公司 Virtual intelligent customer service method, device, electronic device and storage medium

Similar Documents

Publication Publication Date Title
CN111599359A (en) Man-machine interaction method, server, client and storage medium
CN107680019B (en) Examination scheme implementation method, device, equipment and storage medium
CN106686339B (en) Electronic meeting intelligence
CN106685916B (en) Intelligent device and method for electronic conference
CN112233690B (en) Double recording method, device, terminal and storage medium
US20240070397A1 (en) Human-computer interaction method, apparatus and system, electronic device and computer medium
CN110895568B (en) Method and system for processing court trial records
CN111291151A (en) Interaction method and device and computer equipment
CN111427990A (en) Intelligent examination control system and method assisted by intelligent campus teaching
CN114064943A (en) Conference management method, conference management device, storage medium and electronic equipment
CN114065720A (en) Conference summary generation method and device, storage medium and electronic equipment
CN113868472A (en) Method for generating digital human video and related equipment
CN113407696A (en) Collection table processing method, device, equipment and storage medium
CN113573128A (en) Audio processing method, device, terminal and storage medium
CN116737883A (en) Man-machine interaction method, device, equipment and storage medium
CN111933133A (en) Intelligent customer service response method and device, electronic equipment and storage medium
CN113763925B (en) Speech recognition method, device, computer equipment and storage medium
US11704585B2 (en) System and method to determine outcome probability of an event based on videos
CN115171673A (en) Role portrait based communication auxiliary method and device and storage medium
CN111160051B (en) Data processing method, device, electronic equipment and storage medium
CN113961680A (en) Human-computer interaction based session processing method and device, medium and electronic equipment
CN112487164A (en) Artificial intelligence interaction method
JP2020042471A (en) Information sharing support device, information sharing support method, and program
US11810132B1 (en) Method of collating, abstracting, and delivering worldwide viewpoints
CN114399821B (en) Policy recommendation method, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200828

RJ01 Rejection of invention patent application after publication