CN111128147A

CN111128147A - System and method for terminal equipment to automatically access AI multi-turn conversation capability

Info

Publication number: CN111128147A
Application number: CN201911129150.1A
Authority: CN
Inventors: 李旭滨
Original assignee: Unisound Intelligent Technology Co Ltd
Current assignee: Unisound Intelligent Technology Co Ltd
Priority date: 2019-11-18
Filing date: 2019-11-18
Publication date: 2020-05-08

Abstract

The invention provides a system for automatically accessing an AI multi-turn conversation capability by a terminal device, which comprises the following steps: the system comprises a visual process configuration module, an equipment end and an IVN cloud server; the visual process configuration module is used for acquiring a client's speech process and sending the client's speech process to the IVN cloud server; the device side is used for acquiring the speaking intention of a user and transmitting the speaking intention of the user to the IVN cloud server in a voice stream mode; the IVN cloud server is used for receiving the speaking process of the client and the speaking intention of the user, and controlling the terminal equipment to automatically access the AI multi-turn conversation capability system according to the speaking process of the client and the speaking intention of the user so as to realize multi-turn conversation between the user and the equipment terminal. By adopting the scheme disclosed by the invention, the code is not required to be developed by professional personnel at regular time, the implementation is simple, and the time and the labor are saved.

Description

System and method for terminal equipment to automatically access AI multi-turn conversation capability

Technical Field

The invention relates to the technical field of internet, in particular to a system and a method for automatically accessing AI multi-turn conversation capability by terminal equipment.

Background

Artificial Intelligence (Artificial Intelligence), abbreviated in english as AI. The artificial intelligence is a branch of computer science, and attempts to understand the essence of intelligence and produces a new intelligent machine which can react in a manner similar to human intelligence, and the research in the field includes robots, speech recognition, image recognition, natural language processing, expert systems, etc., IVN (interactive Voice navigation), and intelligent Voice navigation systems.

At present, terminal hardware on the market is various, and terminal hardware needs professionals to have professional language and semantic related knowledge to realize AI multi-turn conversations, regularly develops a large number of codes, and has complex logic, time consumption and labor consumption.

Disclosure of Invention

The invention provides a system for automatically accessing AI multi-turn conversation capability by terminal equipment, which is used for carrying out multi-turn conversation by acquiring the conversation process of a client and the speaking intention of the user, does not need professional to develop codes regularly, and is simple to realize, time-saving and labor-saving.

The invention provides a system for automatically accessing AI multi-turn conversation capability by terminal equipment, which comprises the following steps: the system comprises a visual process configuration module, an equipment end and an IVN cloud server;

the visual process configuration module is used for acquiring a client's speech process and sending the client's speech process to the IVN cloud server;

the device side is used for acquiring the speaking intention of a user and transmitting the speaking intention of the user to the IVN cloud server in a voice stream mode;

the IVN cloud server is used for receiving the speaking process of the client and the speaking intention of the user, and controlling the terminal equipment to automatically access the AI multi-turn conversation capability system according to the speaking process of the client and the speaking intention of the user so as to realize multi-turn conversation between the user and the equipment terminal.

Preferably, the visualization process configuration module includes:

the IVN project visualization sub-module is used for configuring the speaking and operation process of the client through a speaking and operation process editing interface and configuring and modifying the speaking and operation text, audio and intention of each speaking and operation node in the speaking and operation process of the client; the IVN project visualization submodule is also used for providing a management background of customer operation.

Preferably, the IVN cloud server includes:

the voice interaction interface service module is used for receiving the speaking intention of the user transmitted by the equipment terminal; the voice interaction interface service is further used for sending a response audio generated by the IVN cloud server to the speaking intention of the user to the equipment end;

and the data interface service module is used for receiving the speaking process of the client sent by the visualization process configuration module.

Preferably, the IVN cloud server includes:

the voice recognition service module is used for carrying out voice recognition on the speaking intention of the user and carrying out intelligent voice sentence break on the speaking intention of the user;

the semantic understanding service module is used for understanding the meaning of the speaking intention of the user;

the voice synthesis service is used for carrying out audio synthesis on the response of the IVN cloud server to the speaking intention of the user; the speech synthesis service is also used for dynamic parametric audio synthesis.

Preferably, the IVN cloud server includes:

the data center service module is used for providing big data service, analyzing direct interaction logs of the user and the equipment and screening out the intention of the user;

and the business service module is used for uniformly skipping when the user is abnormal.

Preferably, the voice interaction interface service module, the data interface service module, the voice recognition service module, the semantic understanding service module, the voice synthesis module, the data center service module and the business service module, which are included in the IVN cloud server, can perform horizontal cluster expansion and longitudinal expansion.

The system for automatically accessing the AI multi-turn conversation capability by the terminal equipment provided by the embodiment has the following beneficial effects: the method and the system have the advantages that multiple rounds of conversations are carried out by obtaining the speaking process of the client and the speaking intention of the user, and the method and the system do not need to develop codes regularly by professionals, so that the method and the system are easy to implement, and time and labor are saved.

The invention also provides a method for automatically accessing the AI multi-turn conversation capability by the terminal equipment, which is characterized by comprising the following steps:

the method comprises the steps of obtaining a client's speech process, and sending the client's speech process to an IVN cloud server;

acquiring the speaking intention of a user, and transmitting the speaking intention of the user to an IVN cloud server in a voice stream mode;

and receiving the dialect process of the client and the speaking intention of the user, and controlling the terminal equipment to automatically access the system with AI multi-turn conversation capability according to the dialect process of the client and the speaking intention of the user so as to realize multi-turn conversation between the user and the equipment terminal.

Preferably, in the obtaining of the client's speech and operation process, the client's speech and operation process is configured through a speech and operation process editing interface, and the speech and operation text, audio and intention of each speech and operation node in the client's speech and operation process are configured and modified.

Preferably, the system for controlling the terminal device to automatically access the AI multi-turn conversation capability according to the client's conversation process and the user's speaking intention so as to realize multi-turn conversations between the user and the device side includes:

performing voice recognition on the speaking intention of the user;

carrying out intelligent voice sentence break on the speaking intention of the user;

understanding the meaning in the user's speaking intent.

Preferably, the system for controlling the terminal device to automatically access the AI multi-turn conversation capability according to the client's conversation process and the user's speaking intention so as to realize multi-turn conversations between the user and the device side further includes:

generating a response to the speaking intention of the user according to the meaning in the speaking intention of the user and the speaking process of the client;

and carrying out audio synthesis on the response, and sending the synthesized audio to the equipment terminal.

The method for automatically accessing the AI multi-turn conversation capability by the terminal equipment provided by the embodiment has the following beneficial effects: the method and the system have the advantages that multiple rounds of conversations are carried out by obtaining the speaking process of the client and the speaking intention of the user, and the method and the system do not need to develop codes regularly by professionals, so that the method and the system are easy to implement, and time and labor are saved.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and drawings.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

fig. 1 is a block diagram of a system for automatically accessing an AI multi-turn dialogue capability by a terminal device according to an embodiment of the present invention;

fig. 2 is a block diagram illustrating an example of a system for automatically accessing an AI multi-turn session capability by a terminal device according to an embodiment of the present invention;

fig. 3 is a flowchart of a method for automatically accessing an AI multi-turn session capability by a terminal device according to an embodiment of the present invention.

Detailed Description

The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.

Fig. 1 is a system for a terminal device to automatically access an AI multi-turn session capability according to an embodiment of the present invention, as shown in fig. 1, the system includes: the system comprises a visual process configuration module 11, an equipment end 12 and an IVN cloud server 13;

the visualization process configuration module 11 is configured to obtain a customer's speech process, and send the customer's speech process to the IVN cloud server 13;

the device end 12 is configured to obtain a speaking intention of a user, and transmit the speaking intention of the user to the IVN cloud server 13 in a voice stream manner;

the IVN cloud server 13 is configured to receive the client's speech process and the user's speech intention, and control the terminal device to automatically access the AI multi-turn conversation capability system according to the client's speech process and the user's speech intention, so as to implement multi-turn conversation between the user and the device 12.

The working principle of the embodiment is as follows: the visualization process configuration module 11 acquires a client's speech process and sends the client's speech process to the IVN cloud server 13; the device end 12 acquires the speaking intention of a user, namely a terminal, the user refers to a user using the device end, and transmits the speaking intention of the user to the IVN cloud server 13 in a voice stream mode; the IVN cloud server 13 receives the client's speech process sent by the visual process configuration module 11 and the user's speech intention sent by the device 12, and then the IVN cloud server 13 realizes multiple rounds of conversations between the user and the device 12 according to the client's speech process and the user's speech intention, and the IVN cloud server 13 can also realize control over the business process according to the client's speech process and the user's speech intention.

The beneficial effect of this embodiment lies in: the method and the system have the advantages that multiple rounds of conversations are carried out by obtaining the speaking process of the client and the speaking intention of the user, and the method and the system do not need to develop codes regularly by professionals, so that the method and the system are easy to implement, and time and labor are saved.

In one embodiment, the visualization process configuration module 11 includes:

the IVN project visualization sub-module 111 is used for configuring the speaking and operation process of the client through a speaking and operation process editing interface, and configuring and modifying the speaking and operation text, audio and intention of each speaking and operation node in the speaking and operation process of the client; the IVN project visualization submodule is also used for providing a management background of customer operation.

The working principle of the embodiment is as follows: and configuring the voice process of the client through the voice process editing interface.

The beneficial effect of this embodiment lies in: the method has the advantages that the voice art text, the audio and the intention of each voice art node in the voice art process of the client are configured and modified through the IVN project visualization module, the process development efficiency is greatly improved, the visual feeling of the client business process is improved, and the client can conveniently inquire the conversation log between the user and the equipment terminal in real time through the management platform for client operation.

In one embodiment, the IVN cloud server 13 includes:

the voice interaction interface service module 131 is configured to receive the speaking intention of the user transmitted by the device side; the voice interaction interface service is further used for sending a response audio generated by the IVN cloud server to the speaking intention of the user to the equipment end;

a data interface service module 132, configured to receive the verbal process of the client sent by the visualization process configuration module.

It should be noted that the IVN cloud server provides a client unified input interface, that is, a data interface service module.

The working principle of the embodiment is as follows: receiving the speaking intention of the user, sending the response audio of the speaking intention of the user, and receiving the speaking process of the client.

The beneficial effect of this embodiment lies in: the convenience of customer access is guaranteed.

In one embodiment, the IVN cloud server 13 includes:

the voice recognition service module 133 is configured to perform voice recognition on the speaking intention of the user and perform intelligent voice sentence break on the speaking intention of the user;

a semantic understanding service module 134 for understanding a meaning in the user's speaking intent;

a speech synthesis service 135 for audio synthesizing responses generated by the IVN cloud server to the user's speaking intent; the speech synthesis service is also used for dynamic parametric audio synthesis.

It should be noted that the response generated by the IVN cloud server to the speaking intention of the user may also be text-synthesized.

The working principle of the embodiment is as follows: and carrying out voice recognition, intelligent sentence breaking and semantic understanding on the speaking intention of the user, and carrying out voice synthesis on the response generated by the speaking intention of the user.

The beneficial effect of this embodiment lies in: the user can obtain corresponding response according to the speaking intention.

In one embodiment, the IVN cloud server 13 includes:

the data center service module 136 is used for providing big data service, analyzing direct interaction logs of the user and the equipment, and screening out the intention of the user;

and the business service module 137 is used for unified skip when the user is abnormal.

The working principle of the embodiment is as follows: and analyzing the direct interaction log of the user and the equipment, and screening out the intention of the user.

The beneficial effect of this embodiment lies in: by screening the intentions of the users, the clients can be helped to accurately locate the target groups.

In an embodiment, the voice interaction interface service module, the data interface service module, the voice recognition service module, the semantic understanding service module, the voice synthesis module, the data center service module, and the business service module included in the IVN cloud server may all perform horizontal cluster expansion and vertical expansion.

The working principle of the example is as follows: high available load is realized through a multi-instance deployment mode, and horizontal cluster expansion is achieved.

The beneficial effect of this embodiment lies in: the expandability of customer requirements can be ensured by carrying out horizontal cluster expansion and longitudinal expansion on each module, such as expansion face recognition service and speech processing service in an integrated module.

In one embodiment, a system for a terminal device to automatically access AI multi-turn conversation capability includes: the system comprises a visual process configuration module, an equipment end and an IVN cloud server;

the visualization process configuration module comprises an IVN project visualization submodule;

the IVN cloud server comprises a voice interaction interface service module, a data interface service module, a voice recognition service module, a semantic understanding service module, a voice synthesis module, a data center service module and a business service module.

For example, as shown in fig. 2, the device side is a terminal device, which may be an intelligent robot, an intelligent speaker, etc., the terminal device transmits the speaking intention of the user to the IVN cloud server by means of voice stream, namely the IVN service mechanism, the IVN cloud server sends response audio or text generated by the speaking intention of the user to the terminal equipment, the visual process configuration module, namely the visual process configuration system, sends the speech process of the customer to the IVN cloud server, and the voice interaction interface service module, the data interface service module, the voice recognition service module, the semantic understanding service module, the voice synthesis module, the data center service module and the business service module respectively correspond to the voice interaction interface service, the data interface service, the voice recognition service, the semantic understanding service, the voice synthesis, the data center service and the business service in the IVN service architecture.

Fig. 3 is a method for a terminal device to automatically access AI multi-turn dialogue capability according to an embodiment of the present invention, as shown in fig. 3, the method may be implemented as S31-S33:

in step S31, acquiring a client 'S speech process, and sending the client' S speech process to the IVN cloud server;

in step S32, acquiring the speaking intention of the user, and transmitting the speaking intention of the user to the IVN cloud server in a voice stream manner;

in step S33, the user 'S speech process and the user' S speech intention are received, and the terminal device is controlled to automatically access the AI multi-turn dialogue capability system according to the user 'S speech process and the user' S speech intention, so as to implement multi-turn dialogue between the user and the device.

The working principle of the embodiment is as follows: the method comprises the steps of obtaining a client's speech process, and sending the client's speech process to an IVN cloud server; acquiring the speaking intention of a user, wherein the user refers to a user using an equipment end, and sending the speaking intention of the user to an IVN cloud server; and receiving the dialect process of the client and the speaking intention of the user, further realizing multi-turn conversation between the user and the equipment end according to the dialect process of the client and the speaking intention of the user, and realizing control over the business process according to the dialect process of the client and the speaking intention of the user.

In one embodiment, in step S31, the customer 'S speech process is configured through the speech process editing interface, and the speech text, audio and intention of each speech node in the customer' S speech process are configured and modified.

The working principle of the embodiment is as follows: the client can configure the dialog flow through the dialog flow editing interface, and can also configure and modify the dialog text, audio and intention of each dialog node in the dialog flow.

The beneficial effect of this embodiment lies in: the visual experience of the customer business process is improved.

In one embodiment, the step S33 includes:

performing voice recognition on the speaking intention of the user;

understanding the meaning in the user's speaking intent.

In an embodiment, the step S33 further includes:

The working principle of the embodiment is as follows: and responding to the speaking intention of the user according to the meaning in the speaking intention of the user, namely the speaking process of the client, synthesizing audio and sending the audio to the equipment side.

The beneficial effect of this embodiment lies in: the speaking intention of the user can be satisfied.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A system for terminal equipment to automatically access AI multi-turn conversation capability is characterized by comprising: the system comprises a visual process configuration module, an equipment end and an IVN cloud server;

2. The system of claim 1, wherein the visualization process configuration module comprises:

3. The system of claim 1, wherein the IVN cloud server comprises:

4. The system of claim 1, wherein the IVN cloud server comprises:

5. The system of claim 1, wherein the IVN cloud server comprises:

6. The system of any one of claims 3 to 5, wherein the IVN cloud server comprises a voice interaction interface service module, a data interface service module, a voice recognition service module, a semantic understanding service module, a voice synthesis module, a data center service module and a business service module, which are all capable of performing horizontal cluster expansion and vertical expansion.

7. A method for automatically accessing AI multi-turn conversation capability by a terminal device is characterized by comprising the following steps:

8. The method of claim 6, wherein in the obtaining the client's verbal process, the client's verbal process is configured through a verbal process editing interface, and the verbal text, audio and intent of each verbal node in the client's verbal process are configured and modified.

9. The method of claim 6, wherein the controlling the terminal device to automatically access the system for multi-turn AI dialog capability according to the client's dialect flow and the user's speaking intention to realize multi-turn dialog between the user and the device comprises:

performing voice recognition on the speaking intention of the user;

understanding the meaning in the user's speaking intent.

10. The method of claim 9, wherein the controlling the terminal device to automatically access the system for multi-turn AI dialog capability according to the client's dialect flow and the user's speaking intention to realize multi-turn dialog between the user and the device, further comprises: