CN113946674A - Method and device for realizing real-time conversation during man-machine conversation - Google Patents

Method and device for realizing real-time conversation during man-machine conversation Download PDF

Info

Publication number
CN113946674A
CN113946674A CN202111576051.5A CN202111576051A CN113946674A CN 113946674 A CN113946674 A CN 113946674A CN 202111576051 A CN202111576051 A CN 202111576051A CN 113946674 A CN113946674 A CN 113946674A
Authority
CN
China
Prior art keywords
audio
answer
user
conversation
user terminal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111576051.5A
Other languages
Chinese (zh)
Inventor
余文芳
曾文佳
陈新月
宋成业
冯梦盈
梁鹏斌
李航
韩亚昕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lingxi Beijing Technology Co Ltd
Original Assignee
Lingxi Beijing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lingxi Beijing Technology Co Ltd filed Critical Lingxi Beijing Technology Co Ltd
Priority to CN202111576051.5A priority Critical patent/CN113946674A/en
Publication of CN113946674A publication Critical patent/CN113946674A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3343Query execution using phonetics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/50Centralised arrangements for answering calls; Centralised arrangements for recording messages for absent or busy subscribers ; Centralised arrangements for recording messages
    • H04M3/51Centralised call answering arrangements requiring operator intervention, e.g. call or contact centers for telemarketing
    • H04M3/5166Centralised call answering arrangements requiring operator intervention, e.g. call or contact centers for telemarketing in combination with interactive voice response systems or voice portals, e.g. as front-ends

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Signal Processing (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The application belongs to the technical field of communication, and discloses a method and a device for realizing real-time conversation during man-machine conversation, wherein the method comprises the steps of obtaining user voice audio when voice messages sent by a user terminal are determined to be received; performing audio recognition on the voice audio to obtain an audio text corresponding to the voice audio; performing semantic analysis on the audio text to obtain a conversation mode requested by a user; if the conversation mode is manual conversation, establishing connection between the user terminal and the manual customer service terminal; and sending a dialogue response message to the user terminal based on the response dialogue file corresponding to the audio text. Therefore, when manual conversation switching is carried out, real-time switching manual work can be realized, switching efficiency is improved, and user experience is improved.

Description

Method and device for realizing real-time conversation during man-machine conversation
Technical Field
The application relates to the technical field of communication, in particular to a method and a device for realizing real-time conversation during man-machine conversation.
Background
With the development of internet technology, in order to reduce labor cost, a conversation is usually performed with a user through a virtual customer service program to provide a consultation service for the user. And when the virtual customer service program determines that the user requests to carry out conversation with the manual customer service, switching the conversation of the user to the manual customer service terminal.
However, in the prior art, when the virtual customer service program is manually switched, the virtual customer service program usually needs a long time of waiting, and cannot be manually switched in real time, so that the switching efficiency is low.
Therefore, when manual conversation switching is carried out, how to improve switching efficiency is a technical problem to be solved.
Disclosure of Invention
The application aims to provide a method and a device for realizing real-time conversation during man-machine conversation, which are used for improving the switching efficiency during the switching of the man-machine conversation.
In one aspect, a method for realizing real-time conversation during man-machine conversation is provided, which includes:
when determining that a voice message sent by a user terminal is received, acquiring a user voice audio;
performing audio recognition on the voice audio to obtain an audio text corresponding to the voice audio;
performing semantic analysis on the audio text to obtain a conversation mode requested by a user;
if the conversation mode is manual conversation, establishing connection between the user terminal and the manual customer service terminal;
and sending a dialogue response message to the user terminal based on the response dialogue file corresponding to the audio text.
In the implementation process, the audio text corresponding to the voice and the audio of the user is identified, when the user requests manual conversation, the connection between the user terminal and the manual customer service terminal is established, and a conversation response message is sent to the user terminal based on the response speech file corresponding to the audio text, so that the manual switching is realized in real time, and the user experience is improved.
In one embodiment, the establishing of the connection between the user terminal and the artificial customer service terminal includes:
screening out idle artificial customer service terminals in an idle state from the artificial customer service terminals;
and sending the connection request message sent by the user terminal to a voice server, so that the voice server establishes the connection between the user terminal and the idle artificial customer service terminal based on the connection request message.
In the implementation process, the selection range of the manual customer service terminals is narrowed through the use conditions of the manual customer service terminals, and the connection efficiency between the user terminal and the manual customer service terminals is further improved.
In one embodiment, semantic analysis is performed on audio text to determine a dialog mode requested by a user, and the method comprises the following steps:
and if the audio text contains the specified information, determining that the conversation mode requested by the user is manual conversation.
In the implementation process, the conversation mode requested by the user can be accurately obtained by identifying the specified information contained in the audio text.
In one embodiment, the answer document includes at least one of answer text, answer audio, answer video, and answer image.
In the implementation process, the answer corresponding to the voice message of the user can be quickly found according to the answer file, and then the transfer efficiency is improved.
In one embodiment, semantic analysis is performed on an audio text to obtain a dialog mode requested by a user, including:
and inputting the audio text into the decision scientific model to obtain a conversation mode.
In the implementation process, the dialogue mode of the user can be accurately obtained according to the decision scientific model.
In one embodiment, sending a response dialog file corresponding to the audio text to the user terminal includes:
acquiring a user service requirement corresponding to the audio text, wherein the user service requirement is acquired after semantic analysis is carried out on the audio text;
acquiring a response telephone file set aiming at the service requirement of a user;
and sending the answer file to the user terminal.
In the implementation process, the answer speech file set according to the user service requirement can be quickly acquired from the answer speech file according to the user service requirement, and the efficiency of acquiring the answer speech file is improved.
In one embodiment, sending a response dialog file corresponding to the audio text to the user terminal includes:
if the type of the answer file is an answer text, performing audio conversion on the answer file to obtain an answer audio, and sending the answer audio to the user terminal;
and if the type of the answer-to-talk file is audio, sending the answer-to-talk file to the user terminal.
In the implementation process, if only the answer text is in the obtained answer file, the answer text may be subjected to audio conversion to obtain an answer audio.
In one aspect, an apparatus for realizing real-time conversation during a human-computer conversation is provided, including:
the obtaining unit is used for obtaining the voice audio of the user when the voice message sent by the user terminal is determined to be received;
the recognition unit is used for carrying out audio recognition on the voice audio to obtain an audio text corresponding to the voice audio;
the analysis unit is used for carrying out semantic analysis on the audio text to obtain a conversation mode requested by a user;
the connection unit is used for establishing the connection between the user terminal and the manual customer service terminal if the conversation mode is manual conversation;
and the sending unit is used for sending a dialogue response message to the user terminal based on the response dialogue file corresponding to the audio text.
In one embodiment, the connection unit is configured to:
screening out idle artificial customer service terminals in an idle state from the artificial customer service terminals;
and sending the connection request message sent by the user terminal to a voice server, so that the voice server establishes the connection between the user terminal and the idle artificial customer service terminal based on the connection request message.
In one embodiment, the analysis unit is configured to:
and if the audio text contains the specified information, determining that the conversation mode requested by the user is manual conversation.
In one embodiment, the answer document includes at least one of answer text, answer audio, answer video, and answer image.
In one embodiment, the analysis unit is configured to:
and inputting the audio text into the decision scientific model to obtain a conversation mode.
In one embodiment, the sending unit is configured to:
acquiring a user service requirement corresponding to the audio text, wherein the user service requirement is acquired after semantic analysis is carried out on the audio text;
acquiring a response telephone file set aiming at the service requirement of a user;
and sending the answer file to the user terminal.
In one embodiment, the sending unit is configured to:
if the type of the answer file is an answer text, performing audio conversion on the answer file to obtain an answer audio, and sending the answer audio to the user terminal;
and if the type of the answer-to-talk file is audio, sending the answer-to-talk file to the user terminal.
In one aspect, an electronic device is provided, comprising a processor and a memory, the memory storing computer readable instructions which, when executed by the processor, perform the steps of the method provided in any of the various alternative implementations for real-time conversation while conducting a human-machine conversation as described above.
In one aspect, a storage medium is provided, on which a computer program is stored, which, when being executed by a processor, performs the steps of the method provided in any of the various alternative implementations for real-time conversation while conducting a human-machine conversation as described above.
In one aspect, a computer program product is provided which, when run on a computer, causes the computer to perform the steps of the method as provided in any of the various alternative implementations of real-time telephony when human-machine conversation is effected as described above.
In the embodiment of the application, when the voice message sent by the user terminal is determined to be received, the voice audio of the user is obtained; performing audio recognition on the voice audio to obtain an audio text corresponding to the voice audio; performing semantic analysis on the audio text to obtain a conversation mode requested by a user; if the conversation mode is manual conversation, establishing connection between the user terminal and the manual customer service terminal; and sending a dialogue response message to the user terminal based on the response dialogue file corresponding to the audio text. Therefore, in the process of carrying out conversation switching, if the obtained audio text is determined to contain the manual conversation information requested by the user, the connection between the user terminal and the manual customer service terminal is established, and a conversation response message is sent to the user terminal based on the response conversation file corresponding to the audio text, so that the real-time switching manual work is realized, and the user experience is further improved.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
Fig. 1 is a schematic architecture diagram of a system for implementing a real-time call during a human-computer conversation according to an embodiment of the present application;
fig. 2 is a flowchart illustrating an implementation of a method for implementing a real-time call during a human-computer conversation according to an embodiment of the present disclosure;
fig. 3 is an interaction flowchart of a method for implementing a real-time call during a human-computer conversation according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an apparatus for implementing real-time conversation during a human-computer conversation according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.
First, some terms referred to in the embodiments of the present application will be described to facilitate understanding by those skilled in the art.
The user equipment: may be a mobile terminal, a fixed terminal, or a portable terminal such as a mobile handset, station, unit, device, multimedia computer, multimedia tablet, internet node, communicator, desktop computer, laptop computer, notebook computer, netbook computer, tablet computer, personal communication system device, personal navigation device, personal digital assistant, audio/video player, digital camera/camcorder, positioning device, television receiver, radio broadcast receiver, electronic book device, gaming device, or any combination thereof, including the accessories and peripherals of these devices, or any combination thereof. It is also contemplated that the terminal device can support any type of interface to the user (e.g., wearable device), and the like.
The voice server: the cloud server can be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and can also be a cloud server for providing basic cloud computing services such as cloud service, a cloud database, cloud computing, cloud functions, cloud storage, network service, cloud communication, middleware service, domain name service, security service, big data and artificial intelligence platform and the like.
Robotic Process Automation (RPA): is an application that provides another way to automate the end-user's manual process by mimicking the end-user's manual process at the computer.
Dialog management (dialog Manager, DM): the DM determines the response to the user according to the conversation history information.
Automatic Speech Recognition technology (ASR): is a technique for converting human speech into text.
Natural Language Understanding (NLU): also known as man-machine conversation. The branch subject of artificial intelligence is to study and use electronic computer to simulate the human language communication process, so that the computer can understand and use the natural language of human society, such as Chinese and English, to realize the natural language communication between human and machine, to replace part of mental labor of human, including the processing of inquiring data, answering questions, extracting document, compiling data and all the information about natural language.
Decision Science (DS): the method is a comprehensive subject which is established on the basis of modern natural science and social science and used for researching a decision principle, a decision program and a decision method.
Common problem libraries (frequntly ask Questions, FAQ): the method is a main means for providing online help on the current network, and provides consulting services for users by organizing some possible frequently asked question and answer pairs in advance and publishing the frequently asked question and answer pairs on a webpage.
In order to realize real-time switching manual work and improve user experience when conversation switching is carried out, the embodiment of the application provides a method and a device for realizing real-time conversation during man-machine conversation.
Fig. 1 is a schematic diagram of an architecture of a system for implementing a real-time call during a human-computer conversation according to an embodiment of the present application, where the system for implementing a real-time call during a conversation transfer includes a user terminal, a virtual customer service device, a voice server, and an artificial customer service terminal, where the number of the artificial customer service terminals may be 1 or n, and n is a positive integer, which is not limited herein. The virtual customer service equipment comprises a virtual customer service equipment process automation module, an automatic language identification module, a natural language understanding module and a conversation management module.
A user terminal: the terminal device or the server can be used for sending voice messages to the virtual customer service device, receiving response technical files returned by the virtual customer service device and carrying out conversation with the manual customer service terminal.
The virtual customer service equipment process automation module: the voice information receiving device is used for receiving voice information sent by a user terminal and obtaining voice audio.
An automatic language identification module: the method is used for performing audio recognition on the voice audio to obtain the audio text corresponding to the voice audio.
A natural language understanding module: and the system is used for analyzing the user intention, the user attitude and the entity semantic meaning of the audio text corresponding to the user voice audio to obtain an analysis result and sending the analysis result to the dialogue management module.
A conversation management module: and the automatic robot flow module is used for acquiring a response speech file corresponding to the audio text according to the analysis result sent by the natural language understanding module, and sending the response speech file to the automatic virtual customer service equipment flow module, so that the automatic robot flow module sends a conversation response message to the user terminal according to the acquired response speech file.
The voice server: and the voice server is used for sending the connection request message sent by the user terminal to the voice server so as to establish the connection between the user terminal and the artificial customer service terminal based on the connection request message.
Manual customer service terminal: the method is used for establishing connection with the user terminal and interacting with the user terminal based on the established connection.
In one embodiment, if the obtained analysis result represents that the user request dialog mode is an artificial dialog, the virtual customer service device screens out idle artificial customer service terminals in an idle state from the artificial customer service terminals, and sends a connection request message sent by the user terminal to the voice server, so that the voice server establishes connection between the user terminal and the artificial customer service terminals based on the connection request message, and the user can directly interact with the artificial customer service terminals through the user terminal.
In the embodiment of the present application, the execution subject may be a virtual customer service device in a system that implements real-time conversation during a human-computer conversation as shown in fig. 1.
Referring to fig. 2, an implementation flow chart of a method for implementing a real-time call during a human-computer conversation provided in the embodiment of the present application is shown, and with reference to the virtual customer service device shown in fig. 1, a specific implementation flow of the method is as follows:
step 200: and when the voice message sent by the user terminal is determined to be received, the voice audio of the user is obtained.
Specifically, when the virtual customer service device monitors a voice message sent by a user through a user terminal in real time through the monitoring sound card, the virtual customer service device analyzes the voice message to obtain a voice audio of the user.
Optionally, the monitored voice message sent by the ue in real time may be a single voice segment or a continuous voice segment, which is not limited herein.
Therefore, the virtual customer service equipment can monitor the voice message sent by the user through the user terminal in real time through the sound card, and the accuracy of obtaining the voice audio of the user is improved.
Step 201: and performing audio recognition on the voice audio to obtain an audio text corresponding to the voice audio.
Specifically, the virtual customer service device identifies the acquired user voice audio through the automatic language identification module to obtain an audio text corresponding to the voice audio.
Therefore, in the subsequent execution steps, the virtual customer service equipment can accurately judge the user intention according to the audio text corresponding to the voice audio, and the accuracy of obtaining the user intention is improved.
Step 202: and carrying out semantic analysis on the audio text to obtain a conversation mode requested by the user.
Specifically, the virtual customer service device performs semantic analysis on the audio text through the natural language understanding module to obtain a conversation mode requested by the user.
Further, when step 202 is executed, any one of the following manners may be adopted.
Mode 1: and if the audio text contains the specified information, determining that the conversation mode requested by the user is manual conversation.
Optionally, the specifying information may be a specified character representing the manual dialog, or may be a specified tag representing the manual dialog, for example, @ @ transfer @, in the practical application, the setting of the specifying information may be set according to a practical application scene, and no limitation is made from this point on.
Mode 2: and inputting the audio text into the decision scientific model to obtain a conversation mode.
Specifically, the virtual customer service equipment inputs the obtained audio text into a pre-trained decision-making scientific model, obtains a classification decision-making result output by the decision-making scientific model, and obtains a conversation mode according to the classification decision-making result.
In one embodiment, the virtual customer service device inputs the obtained audio text into a pre-trained decision-making scientific model, obtains a classification decision result output by the decision-making scientific model, for example, if a proportion result of the manual classification decision is higher than a preset threshold, it may be determined that the conversation mode is manual conversation.
It should be noted that the preset threshold may be set according to an actual application scenario, for example, 0.6, and is not limited herein.
Therefore, the conversation mode requested by the user can be obtained based on the audio text, and further the conversation mode can be adjusted in real time according to the conversation mode requested by the user, so that the user experience is further improved.
Step 203: and if the conversation mode is a manual conversation, establishing connection between the user terminal and the manual customer service terminal.
Specifically, when step 203 is executed, the following steps may be executed:
s2031: and screening out the idle artificial customer service terminals in the idle state from the artificial customer service terminals.
Specifically, the virtual customer service equipment screens out idle artificial customer service terminals in an idle state from the artificial customer service terminals according to the use states of the artificial customer service terminals.
It should be noted that, if there are a plurality of screened idle artificial customer service terminals in the idle state, one idle artificial customer service terminal is arbitrarily selected from the plurality of screened idle artificial customer service terminals.
In one embodiment, one idle artificial customer service terminal may be selected according to the sorted serial numbers of the plurality of idle artificial customer service terminals.
Therefore, the manual customer service terminals can be screened according to the use states of the manual customer service terminals, the screening range of the manual customer service terminals is narrowed, and the interaction efficiency between the user terminal and the manual customer service terminals is further improved.
S2032: and sending the connection request message sent by the user terminal to the voice server, so that the voice server establishes the connection between the user terminal and the idle artificial customer service terminal based on the connection request message.
Specifically, the virtual customer service equipment sends a connection request message sent by the user terminal to the voice server, and the voice server acquires a user terminal account and an idle artificial customer service terminal account included in the connection request message, so that the user terminal and the idle seat establish long connection through the voice server.
In one embodiment, in the man-machine conversation process, when the conversation mode of the user is determined to be a manual conversation, the virtual customer service device sends a voice conversation client account (namely, a user terminal account) and a voice conversation client account (namely, an idle manual customer service terminal account) of the manual customer service terminal to the voice server, and long connection between the user terminal and the manual customer service terminal is established through the voice server.
Therefore, the real-time interaction between the user and the artificial customer service can be realized by establishing the connection between the user terminal and the idle artificial customer service terminal.
Step 204: and sending a dialogue response message to the user terminal based on the response dialogue file corresponding to the audio text.
It should be noted that the answer-word file includes at least one of an answer-word text, an answer-word audio, an answer-word video, and an answer-word image, which is not limited herein.
In one implementation, the virtual customer service device sends a request for acquiring a response speech audio message corresponding to the response speech text to a server storing the response speech audio file, acquires and stores the response speech audio corresponding to the response speech text based on a response message returned by the server, and sends the response speech audio to the user terminal.
Specifically, when step 204 is executed, the following steps may be executed:
s2041: and acquiring the user service requirement corresponding to the audio text.
It should be noted that the user service requirement is obtained by performing semantic analysis on the audio text.
In one embodiment, the voice text is analyzed to obtain the service requirement of the user, and the answer dialog file set for the service requirement of the user is obtained according to the service requirement of the user.
Optionally, the service requirement of the user may be a consultation flow usage condition, or a package type set by inquiry, and in actual application, the service requirement of the user may be set according to an actual application scenario, which is not limited herein.
Therefore, the corresponding response telephone file can be quickly acquired according to the service requirement of the user, and the user experience is improved.
S2042: and acquiring a response telephone file set according to the service requirement of the user.
S2043: and sending the answer file to the user terminal.
Specifically, the virtual customer service device sends at least one of a response speech text, a response speech audio, a response speech video and a response speech image contained in the response speech file to the user terminal.
In one embodiment, the virtual customer service device sends the response dialog file cached to the local to the user terminal.
Further, when step 204 is executed, the following steps may also be executed:
step 1: and if the type of the answer file is the answer text, performing audio conversion on the answer file to obtain an answer audio, and sending the answer audio to the user terminal.
Step 2: and if the type of the answer-to-talk file is audio, sending the answer-to-talk file to the user terminal.
In one embodiment, if the type of the answer text file is the answer text, the virtual customer service device sends a request instruction for acquiring the recording file corresponding to the answer text to the artificial customer service terminal according to the answer text, downloads and caches the answer audio corresponding to the answer text according to the recording download address information returned by the artificial customer service terminal, and sends the answer audio to the user terminal.
In one implementation mode, long connection between a user terminal and a manual customer service terminal is established through a voice server, if the long connection is established successfully, the connection is successfully switched to the manual customer service terminal, a virtual customer service device sends a response message of the successful switching of the manual customer service terminal to the user terminal, a prompt tone (informing the manual customer service terminal to carry out conversation with the user terminal) is written into a sound card after playing is completed, the prompt tone is transmitted to the manual customer service terminal through the voice server, and the manual customer service terminal carries out conversation with the user terminal after receiving the prompt tone.
In one embodiment, in the process of the man-machine conversation, the virtual customer service device starts timing after sending the answering speech audio to the user terminal, and after reaching the preset time, if the audio text corresponding to the user speech audio is not received, the virtual customer service device executes an inquiry operation, for example, the virtual customer service device continues to execute the conversation process and replies to the user terminal, for example, "give you, ask you for a hearing sight
Figure P_211220095332912_912039001
If the virtual customer service device has not received the audio text all the time during the specified times, the virtual customer service device will execute the on-hook operation.
It should be noted that the preset time and the designated number of times may be set according to an actual application scenario, for example, the preset time may not be 5S, and the designated number of times may be 3 times, which is not limited herein.
Referring to fig. 3, an interactive flowchart of a method for implementing a real-time call during a human-computer conversation according to an embodiment of the present application is shown, where the method includes the following specific implementation flows:
step 300: and the user terminal sends the user voice message to the virtual customer service equipment.
Step 301: and the virtual customer service equipment acquires the voice audio of the user according to the voice message of the user.
Step 302: and the virtual customer service equipment performs audio recognition on the voice audio to obtain an audio text corresponding to the voice audio.
Step 303: and the virtual customer service equipment performs semantic analysis on the audio text to obtain a conversation mode requested by the user.
Step 304: and if the virtual customer service equipment determines that the conversation mode is a manual conversation, sending a message for requesting to acquire the use state of each manual customer service terminal to the manual customer service terminal.
Step 305: and the virtual customer service equipment screens out the idle artificial customer service terminals in the idle state based on the response messages returned by the artificial customer service terminals.
Step 306: the virtual customer service device sends a connection request message to the voice server.
Step 307: and the voice server establishes connection between the user terminal and the idle seat based on the connection request message.
Step 308: and the virtual customer service equipment sends a dialogue response message to the user terminal based on the response dialogue file corresponding to the audio text.
Step 309: the user terminal sends a user voice message based on the dialog response message.
Step 310: and the artificial customer service terminal sends the answering speech file corresponding to the user voice message based on the user voice message.
Specifically, when step 300 to step 310 are executed, the specific steps refer to step 200 to step 204, which are not described herein again.
Referring to fig. 4, a schematic structural diagram of an apparatus for implementing real-time conversation during a human-computer conversation according to an embodiment of the present application is shown, including:
the obtaining unit 401: the voice receiving device is used for obtaining the voice audio of the user when the voice message sent by the user terminal is determined to be received;
the recognition unit 402: the voice recognition module is used for performing audio recognition on the voice audio to obtain an audio text corresponding to the voice audio;
an analysis unit 403; the voice recognition system is used for performing semantic analysis on the audio text to obtain a conversation mode requested by a user;
the connection unit 404: the system comprises a user terminal, a manual customer service terminal and a dialogue mode setting module, wherein the user terminal is used for setting up connection between the user terminal and the manual customer service terminal if the dialogue mode is manual dialogue;
transmission section 405: and the dialog response message is used for sending a dialog response message to the user terminal based on the response dialog file corresponding to the audio text.
In one embodiment, the connection unit 404 is configured to:
screening out idle artificial customer service terminals in an idle state from the artificial customer service terminals;
and sending the connection request message sent by the user terminal to the voice server, so that the voice server establishes the connection between the user terminal and the artificial customer service terminal based on the connection request message.
In one embodiment, the analysis unit 403 is configured to:
if the audio text contains the specified information, the conversation mode of obtaining the user request is manual conversation.
In one embodiment, the answer document includes at least one of answer text, answer audio, answer video, and answer image.
In one embodiment, the analysis unit 403 is configured to:
and inputting the audio text into the decision scientific model to obtain a conversation mode.
In one embodiment, the sending unit 405 is configured to:
acquiring a user service requirement corresponding to the audio text, wherein the user service requirement is acquired after semantic analysis is carried out on the audio text;
acquiring a response telephone file set aiming at the service requirement of a user;
and sending the answering question file to the user terminal.
In one embodiment, the sending unit 405 is configured to:
if the answer text file type is the answer text, performing audio conversion on the answer text file to obtain an answer audio;
sending the answering voice frequency to the user terminal;
and if the answer-to-speech file type is the answer-to-speech audio, sending the answer-to-speech file to the user terminal.
In the embodiment of the application, when the voice message sent by the user terminal is determined to be received, the voice audio of the user is obtained; performing audio recognition on the voice audio to obtain an audio text corresponding to the voice audio; performing semantic analysis on the audio text to obtain a conversation mode requested by a user; if the conversation mode is manual conversation, establishing connection between the user terminal and the manual customer service terminal; and sending a dialogue response message to the user terminal based on the response dialogue file corresponding to the audio text. Therefore, in the process of carrying out conversation switching, if the obtained audio text is determined to contain the manual conversation information requested by the user, the connection between the user terminal and the manual customer service terminal is established, and a conversation response message is sent to the user terminal based on the response conversation file corresponding to the audio text, so that the real-time switching manual work is realized, and the switching efficiency is further improved.
Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
The electronic apparatus 5000 includes: processor 5050 and memory 5020 may optionally include power supply 5030, display unit 5040, and input unit 5050.
The processor 5050 is a control center of the electronic apparatus 5000, connects various components using various interfaces and lines, and performs various functions of the electronic apparatus 5000 by running or executing software programs and/or data stored in the memory 5020, thereby performing overall monitoring of the electronic apparatus 5000.
In the embodiment of the present application, when calling a computer program stored in the storage 5020, the processor 5050 executes a method for implementing real-time conversation during a human-computer conversation, as provided in the embodiment shown in fig. 2.
Optionally, processor 5050 may include one or more processing units; preferably, the processor 5050 may integrate an application processor, which primarily handles operating systems, user interfaces, applications, etc., and a modem processor, which primarily handles wireless communications. It is to be appreciated that the modem processor described above may not be integrated into processor 5050. In some embodiments, the processor, memory, and/or memory may be implemented on a single chip, or in some embodiments, they may be implemented separately on separate chips.
The memory 5020 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, various applications, and the like; the storage data area may store data created according to the use of the electronic device 5000, and the like. Further, the memory 5020 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.
The electronic device 5000 also includes a power supply 5030 (e.g., a battery) that provides power to the various components and which may be logically connected to the processor 5050 via a power management system to manage charging, discharging, and power consumption via the power management system.
The display unit 5040 may be configured to display information input by a user or information provided to the user, and various menus of the electronic device 5000, and in the embodiment of the present invention, the display unit is mainly configured to display a display interface of each application in the electronic device 5000 and objects such as texts and pictures displayed in the display interface. The display unit 5040 may include a display panel 5041. The Display panel 5041 may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like.
The input unit 5050 may be used to receive information such as numbers or characters input by a user. Input units 5050 may include touch panel 5051 as well as other input devices 5052. Among other things, the touch panel 5051, also referred to as a touch screen, may collect touch operations by a user on or near the touch panel 5051 (e.g., operations by a user on or near the touch panel 5051 using a finger, a stylus, or any other suitable object or attachment).
Specifically, the touch panel 5051 can detect a touch operation by a user, detect signals generated by the touch operation, convert the signals into touch point coordinates, send the touch point coordinates to the processor 5050, and receive a command sent from the processor 5050 and execute the command. In addition, the touch panel 5051 may be implemented in various types, such as a resistive type, a capacitive type, an infrared ray, and a surface acoustic wave. Other input devices 5052 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, power on/off keys, etc.), a trackball, a mouse, a joystick, and the like.
Of course, the touch panel 5051 may overlay the display panel 5041, and when the touch panel 5051 detects a touch operation thereon or thereabout, the touch panel 5051 passes to the processor 5050 to determine the type of touch event, and the processor 5050 then provides a corresponding visual output on the display panel 5041 in accordance with the type of touch event. Although in fig. 5, the touch panel 5051 and the display panel 5041 are implemented as two separate components to implement input and output functions of the electronic device 5000, in some embodiments, the touch panel 5051 and the display panel 5041 may be integrated to implement input and output functions of the electronic device 5000.
The electronic device 5000 may also include one or more sensors, such as pressure sensors, gravitational acceleration sensors, proximity light sensors, and the like. Of course, the electronic device 5000 may further include other components such as a camera according to the requirements of a specific application, and these components are not shown in fig. 5 and are not described in detail since they are not components used in this embodiment of the present application.
Those skilled in the art will appreciate that fig. 5 is merely an example of an electronic device and is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or different components.
In an embodiment of the present application, a storage medium has a computer program stored thereon, and the computer program, when executed by a processor, enables a communication device to perform the steps in the above-described embodiments.
For convenience of description, the above parts are separately described as modules (or units) according to functional division. Of course, the functionality of the various modules (or units) may be implemented in the same one or more pieces of software or hardware when implementing the present application.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (16)

1. A method for realizing real-time conversation during man-machine conversation is characterized by comprising the following steps:
when determining that a voice message sent by a user terminal is received, acquiring a user voice audio;
performing audio recognition on the voice audio to obtain an audio text corresponding to the voice audio;
performing semantic analysis on the audio text to obtain a conversation mode requested by a user;
if the conversation mode is a manual conversation, establishing connection between the user terminal and a manual customer service terminal;
and sending a dialogue response message to the user terminal based on the response dialogue file corresponding to the audio text.
2. The method of claim 1, wherein establishing the connection between the user terminal and the man-made customer service terminal comprises:
screening out idle artificial customer service terminals in an idle state from the artificial customer service terminals;
and sending the connection request message sent by the user terminal to a voice server, so that the voice server establishes the connection between the user terminal and the idle artificial customer service terminal based on the connection request message.
3. The method of claim 1, wherein the semantic analyzing the audio text to obtain a dialog style requested by a user comprises:
and if the audio text contains the specified information, determining that the conversation mode requested by the user is manual conversation.
4. The method of any of claims 1-3, wherein the answer document includes at least one of answer text, answer audio, answer video, and answer image.
5. The method of claim 4, wherein the semantic analyzing the audio text to obtain a dialog style requested by a user comprises:
and inputting the audio text into a decision scientific model to obtain a conversation mode.
6. The method of claim 1, wherein sending a dialog response message to the user terminal based on the answer dialog file corresponding to the audio text comprises:
acquiring a user service requirement corresponding to the audio text, wherein the user service requirement is obtained after semantic analysis is carried out on the audio text;
acquiring a response telephone file set aiming at the user service requirement;
and sending the answer file to the user terminal.
7. The method according to claim 5 or 6, wherein the sending a dialog response message to the user terminal based on the answer dialog file corresponding to the audio text comprises:
if the type of the answer file is an answer text, performing audio conversion on the answer file to obtain an answer audio, and sending the answer audio to a user terminal;
and if the type of the answer file is audio, sending the answer file to a user terminal.
8. An apparatus for realizing real-time conversation during man-machine conversation, comprising:
the obtaining unit is used for obtaining the voice audio of the user when the voice message sent by the user terminal is determined to be received;
the recognition unit is used for carrying out audio recognition on the voice audio to obtain an audio text corresponding to the voice audio;
the analysis unit is used for carrying out semantic analysis on the audio text to obtain a conversation mode requested by a user;
the connection unit is used for establishing the connection between the user terminal and the manual customer service terminal if the conversation mode is manual conversation;
and the sending unit is used for sending a dialogue response message to the user terminal based on the response dialogue file corresponding to the audio text.
9. The apparatus according to claim 8, wherein the connection unit is specifically configured to:
screening out idle artificial customer service terminals in an idle state from the artificial customer service terminals;
and sending the connection request message sent by the user terminal to a voice server, so that the voice server establishes the connection between the user terminal and the idle artificial customer service terminal based on the connection request message.
10. The apparatus according to claim 8, wherein the analysis unit is specifically configured to:
and if the audio text contains the specified information, determining that the conversation mode requested by the user is manual conversation.
11. The apparatus of any one of claims 8-10, wherein the answer document includes at least one of answer text, answer audio, answer video, and answer image.
12. The apparatus according to claim 11, wherein the analysis unit is specifically configured to:
and inputting the audio text into a decision scientific model to obtain a conversation mode.
13. The apparatus according to claim 8, wherein the sending unit is specifically configured to:
acquiring a user service requirement corresponding to the audio text, wherein the user service requirement is obtained after semantic analysis is carried out on the audio text;
acquiring a response telephone file set aiming at the user service requirement;
and sending the answer file to the user terminal.
14. The apparatus according to claim 12 or 13, wherein the sending unit is specifically configured to:
if the type of the answer file is an answer text, performing audio conversion on the answer file to obtain an answer audio, and sending the answer audio to a user terminal;
and if the type of the answer file is audio, sending the answer file to a user terminal.
15. An electronic device comprising a processor and a memory, the memory storing computer readable instructions that, when executed by the processor, perform the method of any of claims 1-7.
16. A storage medium on which a computer program is stored, which computer program, when being executed by a processor, is adapted to carry out the method according to any one of claims 1-7.
CN202111576051.5A 2021-12-22 2021-12-22 Method and device for realizing real-time conversation during man-machine conversation Pending CN113946674A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111576051.5A CN113946674A (en) 2021-12-22 2021-12-22 Method and device for realizing real-time conversation during man-machine conversation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111576051.5A CN113946674A (en) 2021-12-22 2021-12-22 Method and device for realizing real-time conversation during man-machine conversation

Publications (1)

Publication Number Publication Date
CN113946674A true CN113946674A (en) 2022-01-18

Family

ID=79339218

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111576051.5A Pending CN113946674A (en) 2021-12-22 2021-12-22 Method and device for realizing real-time conversation during man-machine conversation

Country Status (1)

Country Link
CN (1) CN113946674A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109688281A (en) * 2018-12-03 2019-04-26 复旦大学 A kind of intelligent sound exchange method and system
CN110035187A (en) * 2019-04-16 2019-07-19 浙江百应科技有限公司 A method of realizing AI and operator attendance seamless switching in the phone
CN110086946A (en) * 2019-03-15 2019-08-02 深圳壹账通智能科技有限公司 Intelligence chat sound control method, device, computer equipment and storage medium
CN110351443A (en) * 2019-06-17 2019-10-18 深圳壹账通智能科技有限公司 Intelligent outgoing call processing method, device, computer equipment and storage medium
CN110738981A (en) * 2019-10-22 2020-01-31 集奥聚合(北京)人工智能科技有限公司 interaction method based on intelligent voice call answering
CN113676602A (en) * 2021-07-23 2021-11-19 上海原圈网络科技有限公司 Method and device for processing manual transfer in automatic response

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109688281A (en) * 2018-12-03 2019-04-26 复旦大学 A kind of intelligent sound exchange method and system
CN110086946A (en) * 2019-03-15 2019-08-02 深圳壹账通智能科技有限公司 Intelligence chat sound control method, device, computer equipment and storage medium
CN110035187A (en) * 2019-04-16 2019-07-19 浙江百应科技有限公司 A method of realizing AI and operator attendance seamless switching in the phone
CN110351443A (en) * 2019-06-17 2019-10-18 深圳壹账通智能科技有限公司 Intelligent outgoing call processing method, device, computer equipment and storage medium
CN110738981A (en) * 2019-10-22 2020-01-31 集奥聚合(北京)人工智能科技有限公司 interaction method based on intelligent voice call answering
CN113676602A (en) * 2021-07-23 2021-11-19 上海原圈网络科技有限公司 Method and device for processing manual transfer in automatic response

Similar Documents

Publication Publication Date Title
CN111754985B (en) Training of voice recognition model and voice recognition method and device
CN109514586B (en) Method and system for realizing intelligent customer service robot
CN107609092B (en) Intelligent response method and device
CN110489101A (en) Interface analogy method, system, medium and electronic equipment
US11526681B2 (en) Dynamic multilingual speech recognition
CN113870083A (en) Policy matching method, device and system, electronic equipment and readable storage medium
US20180095807A1 (en) Method and Apparatus for Automatic Processing of Service Requests on an Electronic Device
CN108197105B (en) Natural language processing method, device, storage medium and electronic equipment
CN109271503A (en) Intelligent answer method, apparatus, equipment and storage medium
CN114357278A (en) Topic recommendation method, device and equipment
CN113378583A (en) Dialogue reply method and device, dialogue model training method and device, and storage medium
CN112685578A (en) Multimedia information content providing method and device
CN115022098A (en) Artificial intelligence safety target range content recommendation method, device and storage medium
CN114064943A (en) Conference management method, conference management device, storage medium and electronic equipment
CN109948155B (en) Multi-intention selection method and device and terminal equipment
CN112165627A (en) Information processing method, device, storage medium, terminal and system
CN113946674A (en) Method and device for realizing real-time conversation during man-machine conversation
CN115328303A (en) User interaction method and device, electronic equipment and computer-readable storage medium
CN114547242A (en) Questionnaire investigation method and device, electronic equipment and readable storage medium
CN114255751A (en) Audio information extraction method and device, electronic equipment and readable storage medium
CN110472113B (en) Intelligent interaction engine optimization method, device and equipment
CN113922998A (en) Vulnerability risk assessment method and device, electronic equipment and readable storage medium
CN113413590A (en) Information verification method and device, computer equipment and storage medium
CN111276144A (en) Platform matching method, device, equipment and medium
CN115757746A (en) Session interruption method and device, electronic equipment and computer-readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20220118