WO2020129419A1 - Système et procédé de traitement d'interrogation vocale, dispositif serveur d'exploitation de haut-parleur intelligent et programme - Google Patents

Système et procédé de traitement d'interrogation vocale, dispositif serveur d'exploitation de haut-parleur intelligent et programme Download PDF

Info

Publication number
WO2020129419A1
WO2020129419A1 PCT/JP2019/042493 JP2019042493W WO2020129419A1 WO 2020129419 A1 WO2020129419 A1 WO 2020129419A1 JP 2019042493 W JP2019042493 W JP 2019042493W WO 2020129419 A1 WO2020129419 A1 WO 2020129419A1
Authority
WO
WIPO (PCT)
Prior art keywords
chatbot
server device
text
answer
question
Prior art date
Application number
PCT/JP2019/042493
Other languages
English (en)
Japanese (ja)
Inventor
敏秀 金
Original Assignee
Jeインターナショナル株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jeインターナショナル株式会社 filed Critical Jeインターナショナル株式会社
Publication of WO2020129419A1 publication Critical patent/WO2020129419A1/fr

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/02User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail using automatic reactions or user delegation, e.g. automatic replies or chatbot-generated messages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/06Message adaptation to terminal or network requirements
    • H04L51/066Format adaptation, e.g. format conversion or compression

Definitions

  • the present invention relates to a voice inquiry system, a voice inquiry processing method, a smart speaker operation server device, a chatbot portal server device, and a program.
  • the present application claims priority based on Japanese Patent Application No. 2018-237446 filed in Japan on December 19, 2018, the contents of which are incorporated herein by reference.
  • AI speakers also called “AI speakers”.
  • AI means artificial intelligence
  • the smart speaker acquires a human voice, outputs a response to the acquired voice as a voice, and performs an operation corresponding to the acquired voice (for example, control of an external electric/electronic device).
  • Non-Patent Document 1 outlines the technology of a smart speaker.
  • AI chatbots are currently in practical use.
  • the AI chatbot uses artificial intelligence or the like to output output text data that is an appropriate response to input input text data.
  • the AI chatbot has, for example, a wealth of information about products and services provided by companies and stores, and is widely used for the purpose of answering questions from customers of those companies and stores.
  • companies and stores have succeeded in providing useful, high-quality information to customers at a low cost.
  • Patent Literature 1 describes a chatbot server device that learns a relationship between an input text and a response text by machine learning processing and generates a response text suitable for the input text based on knowledge data as a learning result. ing.
  • the conventional smart speaker can support. It is desirable that the smart speaker can deal with specific and detailed matters in various fields, but it is not easy to accumulate detailed knowledge in a specific field in the current backbone system that supports the smart speaker. Moreover, it is very difficult to prepare learning data for machine learning and to perform learning processing with a limited calculation resource using a huge amount of learning data.
  • AI chatbots have been introduced by various companies, and knowledge is accumulating.
  • the conventional smart speaker technology has no mechanism for utilizing the knowledge accumulated in these AI chatbots.
  • the present invention has been made based on the above-mentioned problem recognition, and intends to provide a technique in which a smart speaker can answer a question by using the knowledge already stored in the AI chatbot server device. To do.
  • the present invention is intended to provide a voice inquiry system, a voice inquiry processing method, a smart speaker operation server device, a chatbot portal server device, and a program configured using such a technology.
  • a voice inquiry system is as follows. That is, a terminal device having a function of inputting/outputting voice, a smart speaker operation server device that executes a function for operating the terminal device, and a chatbot server device that outputs a reply to a question passed from the terminal device. And a chatbot portal server device that performs processing for identifying the chatbot server device that matches the question, wherein the terminal device is a voice question.
  • a voice input unit for acquiring a question voice, a voice transmitting unit for transmitting the question voice to the smart speaker operation server device and a response voice corresponding to the question voice from the smart speaker operation server device, and the answer valve
  • a voice output unit for outputting voice as voice, the smart speaker operation server device, a voice recognition unit for converting the question voice transmitted from the terminal device into a question text, and the question text
  • a chatbot portal server transfer unit a reply receiving unit that receives a reply text corresponding to the question text sent to the chatbot portal server device, and a reply receiving unit that receives the reply text
  • a voice synthesizing unit for converting the answer text to an answer voice
  • the chatbot portal server device is a data representing characteristics of each chatbot server device with respect to the plurality of chatbot server devices.
  • chatbot server device Based on the question text sent from the chat bot specifying data management unit and the smart speaker operation server device, the features of each chat bot server device held by the chat bot specifying data management unit, A chatbot identification that identifies the chatbot server device that matches the question text from the characteristics of the question text and outputs chatbot information including location information indicating the location of the identified chatbot server device. And a chatbot server device, which comprises a model that has been machine-learned about the relationship between the question text and the answer text, from the question text and the model received from the outside, An answer reasoning unit that infers and outputs the answer text corresponding to the question text.
  • the chatbot portal server device is a smart speaker operation server device with respect to the chatbot server device specified by the chatbot specifying unit. And a question transmitting unit that transmits the question text received from the question transmitting unit.
  • the chatbot portal server device transmits the chatbot information output by the chatbot identifying unit to the smart speaker operation server device.
  • the smart speaker operation server device further includes a chat bot information transmitting unit for performing the specific chat, which is included in the chat bot information transmitted from the chat bot information transmitting unit of the chat bot portal server device. It further comprises a chat bot server transfer unit for transmitting the question text output from the voice recognition unit to the specific chat bot server device based on the location information of the bot server device.
  • the smart speaker operation server device includes a second model that has been machine-learned about a relationship between the question text and the answer text.
  • the smart speaker operation server device further comprising a second answer inference unit that infers and outputs a second answer text corresponding to the question text from the question text and the second model output from the voice recognition unit.
  • the voice synthesizing unit converts the second answer text into the answer voice instead of the answer text.
  • the second answer reasoning unit outputs the degree of conformity of the second answer text when outputting the second answer text.
  • the chatbot portal server transfer unit uses the chat text to identify the chatbot server device only when the compatibility output by the second answer reasoning unit is less than a predetermined threshold.
  • the question text is transmitted to the bot portal server device, and when the matching degree is equal to or more than the threshold value, the question text is inhibited from being transmitted to the chat bot portal server device.
  • the smart speaker operation server device compares the goodness of fit of the second answer text with the predetermined threshold value, and the goodness of fit is If it is less than the threshold, the question text is determined to be transmitted to the chatbot portal server device, and if the conformity is equal to or more than the threshold, the question text is not transmitted to the chatbot portal server device.
  • a chatbot portal server transfer determining unit that determines that the chatbot portal server transfer determining unit determines that the chatbot portal server transfer determining unit transmits the question text to the chatbot portal server device. In that case, the question text is transmitted to the chatbot portal server device, and when the chatbot portal server transfer determination unit determines not to transmit the question text to the chatbot portal server device, the question text is This is to suppress sending to the chatbot portal server device.
  • a terminal device having a function of inputting/outputting voice, a smart speaker operation server device that executes a function for operating the terminal device, and a terminal device passed from the terminal device.
  • a voice inquiry processing method using a chatbot server device that outputs an answer to a question, and a chatbot portal server device that performs a process for identifying the chatbot server device that matches the question, the terminal comprising:
  • the voice input unit acquires a question voice that is a voice question
  • the voice transmitting unit transmits the question voice to the smart speaker operation server device, and the answer voice corresponding to the question voice is the smart speaker.
  • the voice recognition unit asks the question voice transmitted from the terminal device, and the voice output unit outputs the answer voice as voice.
  • the text is converted into a text
  • the chatbot portal server transfer unit transmits the question text to the chatbot portal server device
  • the answer reception unit responds to the question text sent to the chatbot portal server device.
  • the voice synthesizing unit converts the answer text received by the answer receiving unit into an answer voice
  • the chatbot specifying data management unit includes a plurality of the above.
  • the chatbot server device it holds data representing characteristics of each of the chatbot server devices, and the chatbot specifying unit, based on the question text transmitted from the smart speaker operation server device, the chatbot specifying data.
  • the chatbot server device From the characteristics of each chatbot server device held by the management unit and the characteristics of the question text, the chatbot server device that matches the question text is specified, and the whereabouts of the specified chatbot server device is specified.
  • the answer reasoning unit comprises a model that has been machine-learned about the relationship between the question text and the answer text, and The answer text corresponding to the question text is inferred and output from the question text and the model received from.
  • the second answer reasoning unit performs machine learning on the relationship between the question text and the answer text.
  • a second model that has been completed, and infers and outputs a second answer text corresponding to the question text from the question text and the second model output from the voice recognition unit, and the voice synthesis unit, If the question text is not transferred to the chatbot server device, the second answer text is converted into the answer voice instead of the answer text, and the second answer reasoning unit is configured to When outputting the second answer text, the goodness of fit of the second answer text is output, and the chatbot portal server transfer unit has the goodness of fit less than a predetermined threshold value output by the second answer reasoning unit. Only when the chatbot server device is specified, the question text is transmitted to the chatbot portal server device, and when the matching degree is equal to or more than the threshold value, the question text is sent to the chatbot portal server. It suppresses sending to the device.
  • an aspect of the present invention is the voice inquiry processing method described above, wherein in the smart speaker operation server device, the chatbot portal server transfer determination unit includes the compatibility of the second answer text and the compatibility. If the fitness is less than the threshold, the question text is determined to be transmitted to the chatbot portal server device, and if the fitness is more than the threshold, The chatbot portal server transfer unit decided not to send the question text to the chatbot portal server device, and the chatbot portal server transfer decision unit decided to send the question text to the chatbot portal server device. In that case, the question text is transmitted to the chatbot portal server device, and when the chatbot portal server transfer determination unit determines not to transmit the question text to the chatbot portal server device, the question text is This is to suppress sending to the chatbot portal server device.
  • a voice inquiry system includes a smart speaker operation server device that executes a function for operating a terminal device that has a function of inputting and outputting voice, and a chatbot server device that matches a question.
  • a chat bot portal server device that performs a process for identifying the voice inquiry system, wherein the smart speaker operation server device converts the question voice transmitted from the terminal device into a question text.
  • a voice recognition unit for transmitting the question text to the chatbot portal server device, and an answer reception for receiving an answer text corresponding to the question text sent to the chatbot portal server device And a voice synthesizing unit for converting the reply text received by the reply receiving unit into a reply voice, wherein the chatbot portal server device is a chatbot server device for a plurality of chatbot server devices.
  • the chatbot portal server device is a chatbot server device for a plurality of chatbot server devices.
  • a chatbot specifying data management unit that holds data representing characteristics of each bot server device, and the chatbot specifying data management unit that holds the chatbot specifying data management unit based on the question text transmitted from the smart speaker operation server device.
  • chatbot server device that matches the question text is specified, and location information indicating the whereabouts of the specified chatbot server device is included.
  • chatbot specifying unit that outputs chatbot information.
  • the chatbot portal server device is a smart speaker operation server device with respect to the chatbot server device specified by the chatbot specifying unit.
  • a question transmitting unit for transmitting the question text received from
  • the chatbot portal server device transmits the chatbot information output by the chatbot identifying unit to the smart speaker operation server device.
  • the smart speaker operation server device further includes a chat bot information transmitting unit for performing the specific chat, which is included in the chat bot information transmitted from the chat bot information transmitting unit of the chat bot portal server device. It further comprises a chat bot server transfer unit for transmitting the question text output from the voice recognition unit to the specific chat bot server device based on the location information of the bot server device.
  • the smart speaker operation server device includes a second model that has been machine-learned about a relationship between the question text and the answer text.
  • the smart speaker operation server device further comprising a second answer inference unit that infers and outputs a second answer text corresponding to the question text from the question text and the second model output from the voice recognition unit.
  • the voice synthesizing unit converts the second answer text into the answer voice instead of the answer text.
  • the second answer reasoning unit outputs the fitness of the second answer text when outputting the second answer text.
  • the chatbot portal server transfer unit uses the chat text to identify the chatbot server device only when the compatibility output by the second answer reasoning unit is less than a predetermined threshold.
  • the question text is transmitted to the bot portal server device, and when the matching degree is equal to or more than the threshold value, the question text is inhibited from being transmitted to the chat bot portal server device.
  • the smart speaker operation server device compares the goodness of fit of the second answer text with the predetermined threshold value, and the goodness of fit is If it is less than the threshold, the question text is determined to be transmitted to the chatbot portal server device, and if the conformity is equal to or more than the threshold, the question text is not transmitted to the chatbot portal server device.
  • a chatbot portal server transfer determining unit that determines that the chatbot portal server transfer determining unit determines that the chatbot portal server transfer determining unit transmits the question text to the chatbot portal server device. In that case, the question text is transmitted to the chatbot portal server device, and when the chatbot portal server transfer determination unit determines not to transmit the question text to the chatbot portal server device, the question text is This is to suppress sending to the chatbot portal server device.
  • Another aspect of the present invention is to specify a smart speaker operation server device that executes a function for operating a terminal device having a function of inputting/outputting voice, and a chatbot server device that matches a question.
  • a chat bot portal server device that performs the processing of 1. and a voice inquiry processing method using the same, wherein in the smart speaker operation server device, the voice recognition unit converts the question voice transmitted from the terminal device into question text.
  • the chatbot portal server transfer unit transmits the question text to the chatbot portal server device, and the answer reception unit receives the answer text corresponding to the question text sent to the chatbot portal server device,
  • a voice synthesizing unit converts the answer text received by the answer receiving unit into an answer voice.
  • a chatbot specifying data management unit includes a plurality of chatbot server devices.
  • the chatbot specifying part holds the data
  • the chatbot specifying part holds the chatbot specifying data management part based on the question text transmitted from the smart speaker operation server device. From the characteristics of each chatbot server device and the characteristics of the question text, the chatbot server device that matches the question text is specified, and location information indicating the whereabouts of the specified chatbot server device is specified. It outputs chatbot information that includes it.
  • the second answer reasoning unit has already machine-learned a relationship between the question text and the answer text.
  • a second model is provided, and a second answer text corresponding to the question text is inferred and output from the question text output from the voice recognition unit and the second model, and the voice synthesis unit is configured to output the question.
  • the second answer text is converted into the answer voice instead of the answer text, and the second answer reasoning unit is configured to perform the second answer
  • the goodness of fit of the second answer text is output, and the chatbot portal server transfer unit has the goodness of fit output by the second answer reasoning unit is less than a predetermined threshold value. Only when the question text is sent to the chatbot portal server device to identify the chatbot server device, and when the conformity is equal to or more than the threshold value, the question text is sent to the chatbot portal server device. It suppresses sending.
  • the chatbot portal server transfer determination unit determines the matching degree of the second answer text and the predetermined value.
  • the threshold value is compared, and if the matching degree is less than the threshold value, the question text is determined to be transmitted to the chatbot portal server device. If the matching degree is equal to or more than the threshold value, the question text is determined. Is determined not to be transmitted to the chatbot portal server device, and the chatbot portal server transfer unit determines that the chatbot portal server transfer determination unit should transmit the question text to the chatbot portal server device. Transmits the question text to the chatbot portal server device, and when the chatbot portal server transfer determining unit determines not to transmit the question text to the chatbot portal server device, transmits the question text to the chatbot. This is to suppress sending to the portal server device.
  • a smart speaker operation server device is a smart speaker operation server device that executes a function for operating a terminal device having a function of inputting/outputting audio.
  • a voice recognition unit that converts the transmitted question voice into a question text
  • a chatbot portal that transmits the question text to a chatbot portal server device that performs a process for identifying a chatbot server device that matches the question text.
  • a chatbot server transfer unit for transmitting the question text output from the voice recognition unit to the specific chatbot server device is further provided.
  • the second model which has been machine-learned about the relationship between the question text and the answer text, is output from the voice recognition unit.
  • a second answer inference unit that infers and outputs a second answer text corresponding to the question text from the question text and the second model.
  • the second answer text is converted into the answer voice instead of the answer text.
  • the second answer reasoning unit outputs the degree of conformity of the second answer text when outputting the second answer text.
  • the chatbot portal server transfer unit uses the question text to specify the chatbot server device only when the goodness of fit output by the second answer reasoning unit is less than a predetermined threshold value.
  • the question text is transmitted to the chatbot portal server device, and when the matching degree is equal to or more than the threshold value, the question text is inhibited from being transmitted to the chatbot portal server device.
  • the goodness of fit of the second answer text is compared with the predetermined threshold value, and the goodness of fit is less than the threshold value.
  • the question text is determined to be transmitted to the chatbot portal server device, and the question text is determined not to be transmitted to the chatbot portal server device when the compatibility is equal to or higher than the threshold value.
  • the chatbot portal server transfer unit further comprises a server transfer determination unit, and the chatbot portal server transfer unit determines the question text when the chatbot portal server transfer determination unit determines to transmit the question text to the chatbot portal server device. To the chatbot portal server device, and if the chatbot portal server transfer determination unit determines not to send the question text to the chatbot portal server device, the question text to the chatbot portal server device. It suppresses sending.
  • Another aspect of the present invention is a smart speaker operation server device that executes a function for causing a computer to operate a terminal device having a function of inputting and outputting voice, and transmitted from the terminal device.
  • a voice recognition unit for converting a question voice into a question text and a chatbot portal server transfer unit for transmitting the question text to a chatbot portal server device that performs processing for identifying a chatbot server device that matches the question text.
  • an answer receiving unit that receives an answer text corresponding to the question text transmitted to the chatbot portal server device, and a voice synthesizing unit that converts the answer text received by the answer receiving unit into an answer voice. It is a program for functioning as a smart speaker operation server device provided.
  • a chatbot portal server device includes a chatbot specifying data management unit that holds data representing characteristics of each of the chatbot server devices for a plurality of chatbot server devices; For each of the chatbot server devices held by the chatbot identifying data management unit, based on the question text transmitted from the smart speaker operation server device that executes the function for operating the terminal device having the function of inputting and outputting And the characteristics of the question text, the chatbot server device that matches the question text is identified, and the chatbot information including the location information indicating the location of the identified chatbot server device is output. And a chatbot specifying unit for performing.
  • the chatbot portal server device in the chatbot portal server device, the question text received from the smart speaker operation server device to the chatbot server device identified by the chatbot identifying unit. And a question sending unit for sending the.
  • the chatbot portal server device stores the chatbot information output by the chatbot identifying unit in the smart speaker operation server device. And a chatbot information transmitting unit for transmitting the information to the chatbot.
  • a computer a chatbot specifying data management unit that holds data representing characteristics of each of the chatbot server devices, and a voice is input/output.
  • a chatbot specifying data management unit Based on the question text transmitted from the smart speaker operation server device that executes the function for operating the terminal device having the function to perform, the characteristics of each chatbot server device held by the chatbot specifying data management unit, A chatbot that identifies the chatbot server device that matches the question text from the characteristics of the question text, and outputs chatbot information that includes location information indicating the location of the identified chatbot server device. It is a program for functioning as a chatbot portal server device including a specific unit.
  • the smart speaker operation server device not only the knowledge (learning model) possessed by the smart speaker operation server device but also the knowledge (learning model) possessed by many chatbot server devices is used for the question by the voice acquired by the smart speaker. You can make an answer.
  • Each of the many chatbot server devices can perform learning processing in a distributed manner and accumulate knowledge.
  • the smart speaker can answer a voice question about a detailed matter in a narrow field based on the knowledge.
  • 1 is a configuration diagram showing a schematic configuration of a voice inquiry system according to a first embodiment of the present invention. It is a functional block diagram which shows a schematic functional structure of the smart speaker by 1st Embodiment. It is a functional block diagram showing a schematic functional structure of a smart speaker operation server device by a 1st embodiment. It is a functional block diagram which shows the schematic functional structure of the chatbot portal server apparatus by 1st Embodiment. It is a functional block diagram which shows the schematic functional structure of the chatbot server apparatus by 1st Embodiment. It is a sequence diagram (the 1) which showed the flow of the inquiry process by the voice implement
  • FIG. 1 is a configuration diagram showing a schematic configuration of the voice inquiry system according to the present embodiment.
  • the voice inquiry system 9 includes a smart speaker 1 (terminal device), a smart speaker operation server device 2, a chatbot portal server device 3, chatbot server devices 4A, 4B, 4C,. It is configured to include.
  • the smart speaker 1 may be called a "terminal device”.
  • Each of the chatbot server devices 4A, 4B, 4C,... May also be referred to as “chatbot server device 4”.
  • each device constituting the voice inquiry system 9 is realized by using an electronic circuit, for example.
  • a part of the functions of each device may be realized using a computer and a program.
  • the voice inquiry system 9 includes one smart speaker 1, one smart speaker operation server device 2, one chatbot portal server device 3, and three chatbot server devices 4. Includes and. Actually, the number of each device configuring the voice inquiry system 9 is not limited to the number illustrated here, but is arbitrary.
  • the smart speaker 1 acquires a voice question from a user and answers the question by voice.
  • the smart speaker 1 is also called an "AI speaker".
  • the smart speaker 1 may not only output the answer to the question but also interpret a voice command from the user and execute the content of the command.
  • the command is, for example, to turn on or off a switch of a home electric appliance (a television receiver, a lighting device, a personal computer, an air conditioner, a cooking device, or the like).
  • the command is, for example, to control the operation of those home electric appliances (for example, to raise or lower the set temperature of the air conditioner).
  • the smart speaker operation server device 2 is a server device that provides the functions necessary for the smart speaker 1 to operate.
  • the chatbot portal server device 3 identifies the chatbot server device 4 suitable for the question based on the information (knowledge) accumulated in advance according to the content of the question passed from the smart speaker 1.
  • the chatbot portal server device 3 stores in advance the location information (for example, URL (uniform resource locator) or the like) of the specified chatbot server device 4. By using this location information, it becomes possible to access the specified chatbot server device 4. That is, the chatbot portal server device 3 functions as a portal to many chatbot server devices 4.
  • Each of the chatbot server devices 4A, 4B, 4C,... Receives a question from the outside, estimates an optimal answer based on the received question, and gives an answer that is an estimation result to the sender of the question. Send.
  • Each device constituting the voice inquiry system 9 is mutually connected by a communication network (for example, the Internet or a wireless LAN (local area network)). As a result, it is possible to communicate between the devices and send and receive data.
  • a communication network for example, the Internet or a wireless LAN (local area network)
  • FIG. 2 is a functional block diagram showing a schematic functional configuration of the smart speaker according to this embodiment.
  • the smart speaker 1 includes a microphone 11 (voice input unit), a voice transmitting unit 12, and a speaker 13 (voice output unit).
  • the microphone 11 acquires sound from outside and outputs it as an electric signal.
  • the microphone 11 passes the acquired voice signal to the voice transmitting unit 12.
  • the microphone 11 is also referred to as a “voice input unit”.
  • the voice transmitting unit 12 transmits the voice passed from the microphone 11 to the smart speaker operation server device 2. At this time, the voice transmitting unit 12 appropriately encodes and transmits voice. The voice transmitting unit 12 also receives voice from the smart speaker operation server device 2 and passes the received voice to the speaker 13. At this time, the voice transmitting unit 12 appropriately decodes the voice and passes it to the speaker 13.
  • the speaker 13 outputs the sound passed from the sound transmitting unit 12 as an electric signal to the outside as vibration of a medium such as air.
  • the speaker 13 may be, for example, a bone conduction type speaker in addition to the type that vibrates air.
  • the speaker 13 is also referred to as a “voice output unit”.
  • the smart speaker 1 sends, for example, a question uttered by the user as voice to the smart speaker operation server device 2. Further, the smart speaker 1 receives the answer by the voice signal from the smart speaker operation server device 2 and outputs it as a voice. This answer is an answer to the question sent to the smart speaker operation server device 2. That is, the content of the voice passed from the microphone 11 to the voice transmission unit 12 is a question, and the content of the voice passed from the voice transmission unit 12 to the speaker 13 is an answer to the question.
  • the function of the smart speaker 1 itself is realized by conventional technology.
  • a feature of the present embodiment that has not been found in the past is that the functions of the backbone side for deriving an optimal answer to a question, that is, the smart speaker operation server device 2, the chatbot portal server device 3, and the chatbot server device 4 are provided. In combination.
  • FIG. 3 is a functional block diagram showing a schematic functional configuration of the smart speaker operation server device according to the present embodiment.
  • the smart speaker operation server device 2 includes a voice receiving unit 21, a voice recognizing unit 22, an answer inference unit 23, a chatbot portal server transfer determining unit 24, a chatbot portal server transferring unit 25, The answer reception unit 26, the voice synthesis unit 27, and the voice transmission unit 28 are included.
  • the answer reasoning unit 23 may be referred to as a “second answer reasoning unit”.
  • the voice receiving unit 21 receives voice from the smart speaker 1.
  • the voice receiving unit 21 passes the received voice to the voice recognizing unit 22.
  • the voice recognition unit 22 converts the voice passed from the voice reception unit 21 into text and outputs it. That is, the voice recognition unit 22 performs a voice recognition process.
  • the voice recognition unit 22 passes the text (question) (also referred to as “question text”) that is the result of voice recognition to the answer reasoning unit 23.
  • the answer reasoning unit 23 receives the text (question) from the voice recognition unit 22. Based on the text, the answer inference unit 23 infers an appropriate answer using the knowledge accumulated in advance.
  • the answer inference unit 23 uses an AI (artificial intelligence) method to infer the answer. That is, the answer inference unit 23 stores a model (second model) that has been learned about the relationship between the question and the answer, and applies the model to the text (question) passed from the voice recognition unit to obtain the inference result. (Also referred to as “answer text”. In particular, the “text (answer)” that is the inference result by the answer inference unit 23 is also referred to as “second answer text”).
  • the answer reasoning unit 23 passes the text (answer) that is the result of the reasoning to the chatbot portal server transfer determining unit 24.
  • the answer inference unit 23 may pass together the text (answer) and numerical data representing the matching degree of the text (answer) to the chatbot portal server transfer determination unit 24.
  • the goodness of fit is obtained during the process of inference, and represents the degree to which the text (answer) fits the text (question).
  • the answer reasoning unit 23 calculates the goodness of fit for each of the plurality of text (answer) candidates.
  • the function of the answer reasoning unit 23 itself can be realized by a conventional technique.
  • the chatbot portal server transfer determination unit 24 receives the text (answer) from the answer reasoning unit 23 and determines whether to transfer the text (question) to the chatbot portal server device 3. As an example, the chatbot portal server transfer determination unit 24 compares the conformance of the text (answer) passed from the answer reasoning unit 23 with a predetermined threshold value. When the matching degree of the text (answer) is lower than the threshold value (when it is less than the threshold value), it is determined to transfer the text (question) to the chatbot portal server device 3. If the matching degree of the text (answer) is equal to or more than the threshold value, the chatbot portal server transfer determination unit 24 determines not to transfer (transmit) the text (question) to the chatbot portal server device 3.
  • the chatbot portal server transfer determination unit 24 causes the chatbot portal server transfer unit 25 to prevent the text (question) from being transferred to the chatbot portal server device 3.
  • the chatbot portal server transfer determination unit 24 passes the text (answer) output from the answer inference unit 23 to the voice synthesis unit 27. .. That is, when the conformity of the text (answer) output from the answer inference unit 23 is equal to or more than the threshold value, the text (answer) output from the answer inference unit 23 is directly returned to the smart speaker 1 side. Used for.
  • the chatbot portal server transfer unit 25 transmits the text (question) to the chatbot portal server device 3 when the chatbot portal server transfer determination unit 24 determines to transfer the text (question) to the chatbot portal server device 3. Transfer to device 3. Also, the chatbot portal server transfer unit 25 sends the text (question) when the chatbot portal server transfer determination unit 24 determines not to transfer (send) the text (question) to the chatbot portal server device 3. The transfer to the chatbot portal server device 3 is suppressed.
  • the answer reception unit 26 receives a text (an answer) from the chatbot portal server device 3. This text (answer) is transmitted from the chatbot portal server device 3 in response to the text (question) transmitted by the chatbot portal server transfer unit 25.
  • the above chatbot portal server transfer determination unit 24 passes the text (reply) received by the answer reception unit 26 from the chatbot portal server device 3 to the voice synthesis unit 27. However, when it is determined that the text (question) is not transferred to the chatbot portal server device 3, the chatbot portal server transfer determination unit 24 uses the text (answer) output from the answer inference unit 23 as the voice synthesis unit. Give to 27.
  • the voice synthesizing unit 27 synthesizes a voice based on the text (answer) passed from the chatbot portal server transfer determining unit 24. In other words, the voice synthesis unit 27 converts the text (reply) to voice (reply) (also referred to as “reply voice”).
  • the text-based speech synthesis process itself can be realized using existing technology.
  • the voice transmitting unit 28 transmits the voice (answer) synthesized by the voice synthesizing unit 27 to the smart speaker 1 that has transmitted the original voice (question) (also referred to as “question voice”). As a result, the voice (answer) is output as voice on the smart speaker 1 side.
  • FIG. 4 is a functional block diagram showing a schematic functional configuration of the chatbot portal server device in this embodiment.
  • the chatbot portal server device 3 includes a question receiving unit 31, a chatbot identifying unit 32, a chatbot identifying data managing unit 33, a question transmitting unit 34, an answer receiving unit 35, and an answer transmission. And a part 36.
  • the question receiving unit 31 receives a text (question) from the smart speaker operation server device 2.
  • the question receiving unit 31 passes the received text (question) to the chatbot specifying unit 32.
  • the chatbot specifying unit 32 specifies the chatbot server device 4 suitable for the text (question) passed from the question receiving unit 31 by referring to the information held by the chatbot specifying data management unit 33.
  • the chatbot specifying unit 32 acquires the location information of the specified chatbot server device 4 from the chatbot specifying data management unit 33.
  • the chatbot identifying unit 32 delivers the text (question) passed from the question receiving unit 31 and the location information of the identified chatbot server device 4 to the question transmitting unit 34.
  • the chatbot identifying unit 32 generates vector information representing a distribution of words included in the text (question) by performing a morphological analysis process on the text (question) passed from the question receiving unit 31. To do.
  • the chatbot identifying unit 32 generates a syntax tree corresponding to the text (question) by performing a syntax analysis process on the text (question), if necessary.
  • the chatbot specifying unit 32 stores these analysis results (vectors indicating the distribution of words and syntax trees) and the data that the chatbot specifying data management unit 33 holds for each chatbot server device 4. And calculate the goodness of fit.
  • the chatbot specifying unit 32 specifies the chatbot server device 4 having a high matching degree (for example, the highest matching degree) with respect to the text (question).
  • the text (question) is such as "What is the price of the Shibuya porridge noodles?
  • the chatbot specifying unit 32 reads the location information (URL or the like) of the specified chatbot server device 4 from the chatbot specifying data managing unit 33.
  • the chatbot specifying data management unit 33 holds and manages information necessary for the chatbot specifying unit 32 to specify the chatbot server device 4.
  • Each chatbot server device 4 is operated by, for example, a company, a public institution, an individual store, or the like.
  • the chatbot specifying data management unit 33 holds the chatbot server device 4 with data such as terms or sentences representing the characteristics of these companies, public institutions, stores, and the like.
  • the chatbot specifying data management unit 33 uses company terms, institution names, store names, location information (addresses, etc.), business type information, and provision as terms indicating characteristics of companies, public institutions, stores, etc. It holds information indicating the content of the product or service to be processed.
  • the chatbot specifying data management unit 33 for the restaurant, the name, the location, the telephone number, the area name, the food menu, the store characteristics, the store owner name, the chef name, the business hours, the business days and the holidays (day of the week). Etc.) etc. are retained.
  • the chatbot-specific data management unit 33 for an airline company, company name, flight departure and arrival (place name, airport name), flight number, service content related to flight, information related to flight reservation, information related to ticket discount, etc. Hold.
  • the chatbot specifying data management unit 33 also holds location information (URL etc.) for accessing each chatbot server device 4 and provides it to the chatbot specifying unit 32.
  • chatbot specifying data management unit 33 appropriately acquires information about the chatbot server device 4 and updates the information managed by itself by using the technology of the patrol robot on the Internet.
  • the chatbot specifying data management unit 33 holds information representing the characteristics of each chatbot server device 4 and updates it as necessary.
  • the chatbot specifying unit 32 which refers to the information of the chatbot specifying data management unit 33, can specify the chatbot server device 4 that is well suited to the text (question).
  • the question transmitting unit 34 receives the target text (question) and the location information of the identified chatbot server device 4 from the chatbot identifying unit 32.
  • the question transmitting unit 34 accesses the chatbot server device 4 and transmits the received text (question) to the chatbot server device 4.
  • the answer reception unit 35 receives a text (an answer) from the chatbot server device 4. This text (answer) is transmitted from the chatbot server device 4 side in response to the text (question) transmitted by the question transmitting unit 34.
  • the reply reception unit 35 passes the received text (reply) to the reply transmission unit 36.
  • the reply transmission unit 36 receives the text (reply) from the reply reception unit 35.
  • the reply transmission unit 36 transmits the text (reply) to the smart speaker operation server device 2.
  • the reply transmission unit 36 may add information for associating with the corresponding original text (question).
  • FIG. 5 is a functional block diagram showing a schematic functional configuration of the chatbot server device in this embodiment.
  • the chatbot server device 4 is configured to include a question receiving unit 41, a reply inference unit 42, and a reply transmitting unit 43.
  • the question receiving unit 41 receives a text (question) from an external device.
  • the question receiving unit 41 passes the received text (question) to the answer inference unit 42.
  • the question receiving unit 41 receives a text (question) from the chatbot portal server device 3.
  • the answer reasoning unit 42 receives the text (question) from the question receiving unit 41.
  • the answer inference unit 42 infers an appropriate answer based on the text, using the knowledge accumulated in advance.
  • the answer inference unit 42 uses an AI (artificial intelligence) method to infer the answer. That is, the answer inference unit 42 performs the inference process using the same or similar technique as the answer inference unit 23 (smart speaker operation server device 2) already described.
  • the answer inference unit 42 uses the learning data for the chatbot server device 4 to make an inference using a model learned beforehand.
  • the answer reasoning unit 23 outputs the text (answer) that is the result of the inference.
  • the reply inference unit 23 passes the text (reply) as the inference result to the reply transmitting unit 43.
  • the function of the answer reasoning unit 23 itself can be realized by a conventional technique.
  • the reply transmission unit 43 transmits the text (reply) passed from the reply inference unit 42 to an external device.
  • the reply transmission unit 43 returns the text (reply) to the chatbot portal server device 3.
  • 6 and 7 are sequence diagrams showing a flow of a voice inquiry process realized by the voice inquiry system 9 according to the present embodiment. The procedure will be described below with reference to this sequence diagram.
  • step S1 of FIG. 6 the smart speaker 1 acquires the voice that is the utterance of the user.
  • the content of this voice is a question regarding a predetermined field.
  • step S2 the smart speaker 1 transmits the voice (question) acquired in step S1 to the smart speaker operation server device 2.
  • the smart speaker operation server device 2 receives this voice (question).
  • step S3 the smart speaker operation server device 2 performs voice recognition processing of voice (question). As a result, the smart speaker operation server device 2 converts the voice (question) into text (question).
  • step S4 the smart speaker operation server device 2 performs inference processing based on the text (question) obtained in step S3 and obtains a text (answer) corresponding to the text (question).
  • step S5 whether or not the smart speaker operation server device 2 transfers the text (question) to the chatbot portal server device 3 based on, for example, the matching degree between the text (question) and the text (answer). To decide.
  • step S6 when the smart speaker operation server device 2 transfers the text (question) to the chatbot portal server device 3, the process from the next step S6 is continued. On the other hand, when the smart speaker operation server device 2 does not transfer the text (question) to the chatbot portal server device 3 (that is, when the answer which is the inference result of the smart speaker operation server device 2 is returned to the smart speaker 1). , And skips the processing of steps S6 to S11 and moves to the processing of step S12.
  • step S6 the smart speaker operation server device 2 transmits the text (question) to the chatbot portal server device 3.
  • the chatbot portal server device 3 receives this text (question).
  • step S7 the chatbot portal server device 3 identifies the optimum chatbot server device 4 corresponding to the received text (question).
  • the chatbot portal server device 3 can access the chatbot server device 4 by using the location information of the specified chatbot server device 4.
  • step S8 the chatbot portal server device 3 transmits the text (question) received in step S8 to the specified chatbot server device 4.
  • the chatbot server device 4 receives this text (question).
  • step S9 the text (answer) to be returned is inferred based on the text (question) received in step S8.
  • step S10 the chatbot server device 4 transmits the text (reply) obtained as a result of the inference process in step S9 to the chatbot portal server device 3.
  • the chatbot portal server device 3 receives this text (answer).
  • step S11 the chatbot portal server device 3 sends the text (reply) received in step S10 above to the smart speaker operation server device 2.
  • the smart speaker operation server device 2 receives this text (reply).
  • step S12 the smart speaker operation server device 2 performs voice synthesis processing based on the obtained text (answer).
  • the voice generated by this process is a voice that reads a text (answer).
  • the text (answer) that is the basis of the speech synthesis process is the text (answer) obtained by the smart speaker operation server device 2 inferring in step S4 or the text obtained by receiving in step S11. (Answer).
  • the smart speaker operation server device 2 transfers the text (question) to the chatbot portal server device 3 in step S6, the smart speaker operation server device 2 performs a voice synthesis process based on the text (answer) returned from the chatbot portal server device 3. .
  • the smart speaker operation server device 2 performs a voice synthesis process based on the text (answer) obtained in the process of step S4. To do.
  • step S13 the chatbot portal server device 3 transmits the voice (answer) generated in step S12 to the smart speaker 1.
  • the smart speaker 1 receives this voice (answer).
  • step S14 the smart speaker 1 reproduces and outputs the voice (answer) received in step S13. Thereby, the user of the smart speaker 1 can listen to the voice (answer) corresponding to the voice (question) uttered in step S1.
  • the smart speaker operation server device 2 transfers the text (question) to the chatbot portal server device 3.
  • the chatbot portal server device 3 identifies the chatbot server device 4 that matches the text (question).
  • the chatbot portal server device 3 transfers the text (question) to the specified chatbot server device 4. Then, the resulting text (answer) inferred by the chatbot server device 4 can be output as voice from the smart speaker 1.
  • FIG. 8 is a configuration diagram showing a schematic configuration of the voice inquiry system according to the present embodiment.
  • the voice inquiry system 109 includes a smart speaker 1, a smart speaker operation server device 102, a chatbot portal server device 103, chatbot server devices 4A, 4B, 4C,... (Chatbot server device). 4) and are included.
  • the smart speaker 1 and the chatbot server device 4 have functions equivalent to those described in the first embodiment.
  • the smart speaker operation server device 102 and the chatbot portal server device 103 include functions unique to this embodiment and can execute processing unique to this embodiment.
  • the internal functional configurations of the smart speaker 1 and the chatbot server device 4 are the same as those in the first embodiment, and therefore detailed description thereof will be omitted here.
  • the smart speaker operation server device 102 is a server device that provides functions necessary for the smart speaker 1 to operate.
  • the smart speaker operation server apparatus 2 sends the text (answer) corresponding to the text (question) from the chatbot portal server apparatus 3. I was receiving.
  • the smart speaker operation server device 102 stores the location information of the chatbot server device 4 corresponding to the text (question) transmitted to the chatbot portal server device 103 in the chatbot portal server device 103.
  • the smart speaker operation server device 102 transmits the text (question) to the chatbot server device 4 by using the location information of the chatbot server device 4.
  • the text (answer) corresponding to the text (question) is received from the chatbot server device 4.
  • the other points of the function of the smart speaker operation server device 102 are the same as (or similar to) the smart speaker operation server device 2 in the first embodiment.
  • the chatbot portal server device 103 identifies the chatbot server device 4 suitable for the question based on the information (knowledge) accumulated in advance according to the content of the question passed from the smart speaker 1.
  • the chatbot portal server device 3 in the first embodiment has transmitted the text (question) to the chatbot server device 4 suitable for the text (question).
  • the chatbot portal server device 103 in the present embodiment returns the location information of the specified chatbot server device 4 to the smart speaker operation server device 102.
  • the smart speaker operation server device 102 can access the specified chatbot server device 4.
  • the chatbot portal server device 103 itself does not receive a text (reply) from the chatbot server device 4 or transfer the text (reply) to the smart speaker operation server device 102. Also in this embodiment, the chatbot portal server device 103 functions as a portal to many chatbot server devices 4.
  • FIG. 9 is a block diagram showing a schematic functional configuration of the smart speaker operation server device according to the present embodiment.
  • the smart speaker operation server device 102 includes a voice receiving unit 21, a voice recognizing unit 22, an answer inference unit 23, a chatbot portal server transfer determining unit 124, a chatbot portal server transferring unit 125, It is configured to include a chatbot server transfer unit 126, a voice synthesis unit 27, and a voice transmission unit 28.
  • the functions of the voice receiving unit 21, the voice recognizing unit 22, the answer inference unit 23, the voice synthesizing unit 27, and the voice transmitting unit 28 are the same as those in the first embodiment. Therefore, detailed description of the functions of these units will be omitted.
  • Each of the chatbot portal server transfer determination unit 124, the chatbot portal server transfer unit 125, and the chatbot server transfer unit 126 includes a function unique to this embodiment and executes a unique process. The function of each part is as described below.
  • the chatbot portal server transfer determination unit 124 receives the text (answer) from the answer reasoning unit 23 and determines whether to transfer the text (question) to the chatbot portal server apparatus 103. Similar to the first embodiment, the chatbot portal server transfer determination unit 124 compares the suitability of the text (reply) passed from the reply inference unit 23 with a predetermined threshold value. If the matching degree of the text (answer) is lower than a predetermined threshold value, it is determined to transfer the text (question) to the chatbot portal server device 103. If the conformity of the text (answer) is equal to or more than the threshold value, the chatbot portal server transfer determination unit 124 determines not to transfer the text (question) to the chatbot portal server device 103.
  • the chatbot portal server transfer determination unit 124 passes the text (answer) output from the answer inference unit 23 to the voice synthesis unit 27. .. That is, when the matching degree of the text (answer) output from the answer inference unit 23 is equal to or more than the threshold value, the text (answer) output from the answer inference unit 23 is directly returned to the smart speaker 1 side. Used for.
  • chat bot portal server transfer determination unit 124 passes a text (question) to the chat bot portal server transfer unit 125
  • the chat bot server device 4 of the chat bot server device 4 corresponding to the text (question). Receive location information.
  • the chatbot portal server transfer determination unit 124 passes the text (question) and the location information of the chatbot server device 4 to the chatbot server transfer unit 126.
  • the chatbot portal server transfer determination unit 124 receives the text (answer) corresponding to the text (question) from the chatbot server transfer unit 126.
  • the chatbot portal server transfer determination unit 124 determines whether the text (answer) received from the answer inference unit 23 or the text (answer) received from the chatbot server transfer unit 126. Is passed to the voice synthesizer 27.
  • the chatbot portal server transfer unit 125 transmits the text (question) to the chatbot portal server device 103 when the chatbot portal server transfer determination unit 124 determines to transfer the text (question) to the chatbot portal server apparatus 103. It is transmitted to the device 103. Then, the chatbot portal server transfer unit 125 receives the location information of the chatbot server device 4 corresponding to the transmitted text (question) from the chatbot portal server device 103.
  • the chatbot portal server transfer unit 125 transmits the text (question) to the chatbot portal server when the chatbot portal server transfer determination unit 124 determines not to transfer the text (question) to the chatbot portal server apparatus 103. Transmission to the server device 103 is suppressed.
  • the chatbot server transfer unit 126 receives the above-mentioned text (question) and the location information of the chatbot server device 4 corresponding to this text (question) from the chatbot portal server transfer determination unit 124. Then, the chatbot server transfer unit 126 uses the location information of the chatbot server device 4 to access the chatbot server device 4. Then, the chatbot server transfer unit 126 transmits the text (question) to the chatbot server device 4. Then, the chatbot server transfer unit 126 receives, from the chatbot server device 4, a text (reply) which is a response to the text (question). The chatbot server transfer unit 126 passes the received text (answer) to the chatbot portal server transfer determination unit 124.
  • chatbot server transfer unit 126 receives a text (reply) from the chatbot server device 4 and is therefore also called an “answer receiving unit”.
  • FIG. 10 is a block diagram showing a schematic functional configuration of the chatbot portal server device according to the present embodiment.
  • the chatbot portal server device 103 includes a question receiving unit 31, a chatbot identifying unit 132, a chatbot identifying data managing unit 33, and a chatbot information transmitting unit 137.
  • the functions of the question receiving unit 31 and the chatbot specifying data management unit 33 are the same as those in the first embodiment, and therefore detailed description thereof will be omitted here.
  • the chatbot identifying unit 132 includes a function unique to this embodiment.
  • the chatbot information transmission unit 137 is a function not provided in the first embodiment.
  • the chatbot identifying unit 132 refers to the information held by the chatbot identifying data managing unit 33, and the chatbot identifying unit 32 refers to the chatbot server suitable for the text (question) passed from the question receiving unit 31.
  • the device 4 is specified. Specifically, the chatbot specifying unit 132 specifies the chatbot server device 4 by using the same method as the chatbot specifying unit 32 of the first embodiment. In the present embodiment, the chatbot portal server device 103 does not send the text (question) to the chatbot server device 4.
  • the chatbot specifying unit 132 passes the specified information on the chatbot server device 4 to the chatbot information transmitting unit 137.
  • the chatbot information transmitting unit 137 receives the chatbot information from the chatbot identifying unit 132.
  • the chatbot information includes information necessary for accessing the specific chatbot server device 4.
  • the chatbot information transmitting unit 137 transmits the received chatbot information to the smart speaker operation server device 102. As a result, the smart speaker operation server device 102 can access the specified chatbot server device 4.
  • 11 and 12 are sequence diagrams showing the flow of a voice inquiry process in the voice inquiry system 109 according to the present embodiment. The procedure will be described below with reference to this sequence diagram.
  • step S101 to step S107 in FIG. 11 is the same as the process flow from step S101 to step S107 shown in FIG. 6 (first embodiment), and therefore the description thereof is omitted here. ..
  • the chatbot portal server device 103 transmits the chatbot information including the location information of the specified chatbot server device 4 to the smart speaker operation server device 102.
  • the smart speaker operation server device 102 receives this chatbot information.
  • the smart speaker operation server device 102 can access the specific chatbot server device 4.
  • step S109 the smart speaker operation server device 102 transmits the text (question) obtained in step S103 to the chatbot server device 4.
  • the chatbot server device 4 receives this text (question).
  • step S110 the chatbot server device 4 infers an optimal answer based on the text (question) received in step S109. As a result of the inference, the chatbot server device 4 obtains the text (answer).
  • step S111 the chatbot server device 4 transmits the text (answer) obtained by the inference process of step S110 to the smart speaker operation server device 102.
  • the smart speaker operation server device 102 receives this text (reply).
  • step S112 the smart speaker operation server device 102 performs voice synthesis processing based on the text (answer).
  • the text (answer) used here is either the text obtained as the inference result in step S104 or the text obtained as the inference result in step S110.
  • the chatbot server device 4 is called, the text (answer) output by the chatbot server device 4 in step S110 is used. Since the other points of the process of step S112 are the same as the process of step S12 of FIG. 7 (first embodiment), detailed description thereof will be omitted here.
  • steps S113 and S114 are the same as the processes in steps S13 and S14 of FIG. 7 (first embodiment), detailed description thereof will be omitted here.
  • the smart speaker operation server device 102 transfers the text (question) to the chatbot portal server device 103.
  • the chatbot portal server device 103 identifies the chatbot server device 4 that matches the text (question).
  • the chatbot portal server device 103 passes the information on the specified chatbot server device 4 to the smart speaker operation server device 102.
  • the smart speaker operation server device 102 transmits a text (question) to the chatbot server device 4 specified by the chatbot portal server device 103 based on the received chatbot server information. Then, the resulting text (answer) inferred by the chatbot server device 4 can be output as voice from the smart speaker 1.
  • a smart speaker 1 instead of the smart speaker 1, another device may be used as an equivalently positioned device.
  • a PC personal computer
  • a smartphone smart phone
  • a tablet terminal a smart watch (wristwatch type information terminal device), another wearable terminal, or the like
  • the device which is an alternative means for them, is implemented as having the function of the smart speaker 1 described above.
  • the smart speaker operation server device (2, 102) may not have the answer reasoning unit 23, or the answer reasoning unit may not function.
  • the smart speaker operation server device (2, 102) does not perform the process of inferring the text (answer) based on the text (question).
  • the chatbot server device 4 directly or indirectly called from the smart speaker operation server device (2, 102) infers the text (answer) based on the text (question). Therefore, the voice synthesis unit 27 in the smart speaker operation server device (2, 102) always performs the voice synthesis process based on the text (answer) transmitted from the chatbot server device 4 side.
  • the chatbot portal server device (3, 103) cannot specify the chatbot server device 4 having a high degree of conformity to the text (question)
  • smart chat is performed.
  • the speaker operation server device (2, 102) may return information indicating that the appropriate chatbot server device 4 does not exist to the smart speaker 1 side.
  • Information for example, voice
  • Information indicating that the appropriate chatbot server device 4 does not exist is returned from the smart speaker operation server device (2, 102) to the smart speaker 1.
  • the voice recognition process (voice recognition unit 22) and the voice synthesis process (voice synthesis unit 27) are performed in the smart speaker operation server device (2, 102).
  • at least one of the voice recognition process and the voice synthesis process may be performed on the smart speaker 1 side.
  • the voice recognition process is performed on the smart speaker 1 side
  • the smart speaker 1 transmits text (question) instead of voice to the smart speaker operation server device (2, 102).
  • the smart speaker 1 receives not the voice but the text (answer) from the smart speaker operation server device (2, 102).
  • the voice inquiry system includes a smart speaker (1, terminal device), a smart speaker operation server device (2, 102), a chatbot portal server device (3, 103), and a chatbot server device (4). Composed.
  • a system including only the smart speaker operation server device (2, 102) and the chatbot portal server device (3, 103) may be called a “voice inquiry system”.
  • the smart speaker (1, terminal device) has the function of inputting and outputting voice. As a result, the user can cast a voice such as a question to the smart speaker 1 and hear a voice such as an answer to the question from the smart speaker (1).
  • the smart speaker (1) includes a microphone (11, voice input unit), a voice transmission unit 12, and a speaker (13, voice output unit).
  • the microphone (11) acquires a question voice that is a voice question.
  • a voice transmitting unit (12) transmits the question voice to the smart speaker operation server device (2, 102) and receives an answer voice corresponding to the question voice from the smart speaker operation server device (2, 102). To do.
  • the speaker (13) outputs the answer voice as voice.
  • the smart speaker operation server device (2, 102) executes a function for operating the terminal device (1).
  • the smart speaker operation server device (2, 102) includes a voice recognition unit (22), a chatbot portal server transfer unit (25, 125), an answer reception unit (26, chatbot server transfer unit 126), and voice synthesis. Part (27).
  • the voice recognition unit (22) converts the question voice transmitted from the terminal device (1) into a question text.
  • the chatbot portal server transfer unit (25, 125) sends the question text to the chatbot portal server device (3, 103).
  • the answer receiving unit (26, 126) receives the answer text corresponding to the question text transmitted to the chatbot portal server device (3, 103).
  • the voice synthesis unit (27) converts the answer text received by the answer reception unit (26) into answer voice.
  • the chatbot portal server device (3, 103) performs a process for identifying the chatbot server device (4) that matches the question.
  • the chatbot portal server device (3, 103) includes a chatbot specifying data management unit (33) and a chatbot specifying unit (32, 132).
  • the chatbot specifying data management unit (33) holds data representing the characteristics of each of the chatbot server devices (4) with respect to the plurality of chatbot server devices (4).
  • the chatbot identifying unit (32, 132) holds the chatbot identifying data management unit (33) based on the question text transmitted from the smart speaker operation server device (2, 102).
  • the chatbot server device (4) that matches the question text is identified from the feature of each server device (4) and the feature of the question text.
  • the chatbot specifying unit (32, 132) outputs chatbot information including location information indicating the location of the specified chatbot server device (4).
  • the question text can be transmitted to the identified chatbot server device (4) using the chatbot information output by the chatbot identifying unit (32, 132).
  • the chatbot server device (4) outputs an answer to the question passed from the terminal device (1).
  • the chatbot server device (4) includes an answer reasoning unit (42).
  • the answer reasoning unit (42) includes a model that has been machine-learned about the relationship between the question text and the answer text, and the answer text corresponding to the question text from the question text and the model received from the outside. Infer and output.
  • the chatbot portal server device (3) may further include a question sending unit (34).
  • the question transmitting unit (34) transmits the question text received from the smart speaker operation server device (2) to the chatbot server device (4) identified by the chatbot identifying unit (32). At this time, the question transmitting unit (34) obtains the location information of the chatbot server device (4), which is the destination of the question text, from the chatbot information output by the chatbot identifying unit (32).
  • the chatbot portal server device (103) may further include a chatbot information transmission unit (137).
  • a chatbot information transmission unit (137) transmits the chatbot information output by the chatbot identification unit (132) to the smart speaker operation server device (102).
  • the smart speaker operation server device (102) further includes a chatbot server transfer unit (126).
  • the chatbot server transfer unit (126) includes the specific chatbot server device (4) included in the chatbot information transmitted from the chatbot information transmission unit (137) of the chatbot portal server device (103).
  • the question text output from the voice recognition unit (22) is transmitted to the specific chatbot server device (4) on the basis of the location information.
  • the smart speaker operation server device (2, 102) may further include an answer reasoning unit (23, second answer reasoning unit).
  • the answer reasoning unit (23) includes a second model that has been machine-learned about the relationship between the question text and the answer text, and the question text and the second model output from the voice recognition unit (23). To infer and output the second answer text corresponding to the question text.
  • the voice synthesizing unit (27) of the smart speaker operation server device (2, 102) transmits the answer text (that is, the chat text).
  • the second answer text is converted into the answer voice instead of the answer text output from the bot server device (4).
  • the answer can be returned to the smart speaker (1) side. Only when the smart speaker operation server device (2, 102) cannot infer an appropriate answer, the answer inferred by the chatbot server device (4) is returned to the smart speaker (1) side.
  • the smart speaker operation server device (2, 102) may be further configured as follows.
  • the second answer reasoning unit (23) When outputting the second answer text, the second answer reasoning unit (23) outputs the matching degree of the second answer text.
  • the chatbot portal server transfer unit (25, 125) specifies the chatbot server device (4) only when the suitability output by the second answer reasoning unit (23) is less than a predetermined threshold value. In order to transmit the question text to the chatbot portal server device (3, 103), and when the matching degree is equal to or higher than the threshold value, the question text is transmitted to the chatbot portal server device (3, 103). Suppress doing.
  • the smart speaker operation server device (2, 102) may include a chatbot portal server transfer determination unit (24, 124).
  • the chatbot portal server transfer determination unit (24, 124) compares the fitness of the second answer text output by the second answer inference unit (23) with a predetermined threshold.
  • the chatbot portal server transfer determination unit (24, 124) determines to transmit the question text to the chatbot portal server device when the matching degree is less than the threshold value, and the matching degree is equal to or more than the threshold value. If so, it is determined not to send the question text to the chatbot portal server device.
  • the chatbot portal server transfer unit (25, 125) determines that the chatbot portal server transfer determination unit (24, 124) determines to transmit the question text to the chatbot portal server device (3, 103). Sends the question text to the chatbot portal server device (3, 103), and the chatbot portal server transfer determining unit (24, 124) sends the question text to the chatbot portal server device (3, 103). When it is determined not to send, the sending of the question text to the chatbot portal server device (3, 103) is suppressed.
  • each device such as the smart speaker, the smart speaker operation server device, the chatbot portal site server device, and the chatbot server device according to the above-described embodiments (including modifications) can be realized by a computer. it can.
  • the program for realizing this function may be recorded in a computer-readable recording medium, and the program recorded in this recording medium may be read by a computer system and executed.
  • the “computer system” mentioned here includes an OS and hardware such as peripheral devices.
  • the term “computer-readable recording medium” means a portable medium such as a flexible disk, a magneto-optical disk, a ROM, a CD-ROM, a DVD-ROM, a USB memory, or a storage device such as a hard disk built in a computer system.
  • the "computer-readable recording medium” means a program that temporarily and dynamically holds a program, such as a communication line when transmitting the program via a network such as the Internet or a communication line such as a telephone line.
  • it may also include a volatile memory that holds a program for a certain period of time, such as a volatile memory inside a computer system that serves as a server or a client.
  • the above program may be one for realizing some of the functions described above, or may be one that can realize the above functions in combination with a program already recorded in the computer system.
  • the present invention can be used, for example, for providing information using a communication network such as the Internet.
  • a communication network such as the Internet.
  • industries to which the present invention can be applied There are no restrictions on the industries to which the present invention can be applied. Note that the scope of use of the present invention is not limited to the examples given here.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Theoretical Computer Science (AREA)
  • Signal Processing (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Transfer Between Computers (AREA)
  • Telephonic Communication Services (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Manipulator (AREA)

Abstract

L'invention concerne un système d'interrogation vocale (9) comprenant un dispositif serveur d'exploitation de haut-parleur intelligent (2) et un dispositif serveur de portail d'agent conversationnel (3). Le dispositif serveur d'exploitation de haut-parleur intelligent (2) comprend une unité de reconnaissance vocale (22), une unité de transfert de serveur de portail d'agent conversationnel (25) et une unité de synthèse vocale (27). L'unité de reconnaissance vocale (22) convertit la voix interrogative en un texte interrogatif. L'unité de transfert de serveur de portail d'agent conversationnel (25) transmet le texte interrogatif au dispositif serveur de portail d'agent conversationnel (3). L'unité de synthèse vocale (27) convertit un texte de réponse reçu en une voix de réponse. Le dispositif serveur de portail d'agent conversationnel (3) comprend une unité de gestion de données de détermination d'agent conversationnel (33) et une unité de détermination d'agent conversationnel (32). L'unité de gestion de données de détermination d'agent conversationnel (33) conserve des caractéristiques pour chaque dispositif de serveur d'agent conversationnel (4). L'unité de détermination d'agent conversationnel (32) détermine un dispositif de serveur d'agent conversationnel (4) s'adaptant au texte demandé à partir des caractéristiques pour chaque dispositif de serveur d'agent conversationnel (4) et des caractéristiques du texte demandé.
PCT/JP2019/042493 2018-12-19 2019-10-30 Système et procédé de traitement d'interrogation vocale, dispositif serveur d'exploitation de haut-parleur intelligent et programme WO2020129419A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2018-237446 2018-12-19
JP2018237446A JP6555838B1 (ja) 2018-12-19 2018-12-19 音声問合せシステム、音声問合せ処理方法、スマートスピーカー運用サーバー装置、チャットボットポータルサーバー装置、およびプログラム。

Publications (1)

Publication Number Publication Date
WO2020129419A1 true WO2020129419A1 (fr) 2020-06-25

Family

ID=67539799

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2019/042493 WO2020129419A1 (fr) 2018-12-19 2019-10-30 Système et procédé de traitement d'interrogation vocale, dispositif serveur d'exploitation de haut-parleur intelligent et programme

Country Status (3)

Country Link
JP (1) JP6555838B1 (fr)
TW (1) TW202028992A (fr)
WO (1) WO2020129419A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115410576A (zh) * 2022-08-26 2022-11-29 国网河南省电力公司信息通信公司 一种智能客服***及智能客服机器人
WO2023239779A1 (fr) * 2022-06-08 2023-12-14 Liveperson, Inc. Messagerie d'attente basée sur l'ia

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0133614B2 (fr) * 1982-01-27 1989-07-14 Purepai Kogyo Kk
JP2003295890A (ja) * 2002-04-04 2003-10-15 Nec Corp 音声認識対話選択装置、音声認識対話システム、音声認識対話選択方法、プログラム
JP2006039120A (ja) * 2004-07-26 2006-02-09 Sony Corp 対話装置および対話方法、並びにプログラムおよび記録媒体
US20180025085A1 (en) * 2016-07-19 2018-01-25 Microsoft Technology Licensing, Llc Systems and methods for responding to an online user query
JP2018041124A (ja) * 2016-09-05 2018-03-15 株式会社Nextremer 対話制御装置、対話エンジン、管理端末、対話装置、対話制御方法、対話方法、およびプログラム
WO2018088355A1 (fr) * 2016-11-08 2018-05-17 国立研究開発法人情報通信研究機構 Système d'interaction vocale, dispositif d'interaction vocale, terminal utilisateur et procédé d'interaction vocale
JP2018081444A (ja) * 2016-11-15 2018-05-24 ソフトバンク株式会社 ユーザーサポートシステム、ユーザーサポートプログラム及びユーザーサポート方法
JP2018189984A (ja) * 2013-06-19 2018-11-29 パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカPanasonic Intellectual Property Corporation of America 音声対話方法、及び、音声対話エージェントサーバ
JP2019074865A (ja) * 2017-10-13 2019-05-16 ロボットスタート株式会社 会話収集装置、会話収集システム及び会話収集方法

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3734427B2 (ja) * 2001-03-30 2006-01-11 株式会社ジャストシステム 番組関連情報出力装置
EP2612261B1 (fr) * 2010-09-08 2018-11-07 Nuance Communications, Inc. Procédés et appareils associés aux recherches sur internet
JP6433614B1 (ja) * 2018-04-16 2018-12-05 Jeインターナショナル株式会社 チャットボット検索システムおよびプログラム

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0133614B2 (fr) * 1982-01-27 1989-07-14 Purepai Kogyo Kk
JP2003295890A (ja) * 2002-04-04 2003-10-15 Nec Corp 音声認識対話選択装置、音声認識対話システム、音声認識対話選択方法、プログラム
JP2006039120A (ja) * 2004-07-26 2006-02-09 Sony Corp 対話装置および対話方法、並びにプログラムおよび記録媒体
JP2018189984A (ja) * 2013-06-19 2018-11-29 パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカPanasonic Intellectual Property Corporation of America 音声対話方法、及び、音声対話エージェントサーバ
US20180025085A1 (en) * 2016-07-19 2018-01-25 Microsoft Technology Licensing, Llc Systems and methods for responding to an online user query
JP2018041124A (ja) * 2016-09-05 2018-03-15 株式会社Nextremer 対話制御装置、対話エンジン、管理端末、対話装置、対話制御方法、対話方法、およびプログラム
WO2018088355A1 (fr) * 2016-11-08 2018-05-17 国立研究開発法人情報通信研究機構 Système d'interaction vocale, dispositif d'interaction vocale, terminal utilisateur et procédé d'interaction vocale
JP2018081444A (ja) * 2016-11-15 2018-05-24 ソフトバンク株式会社 ユーザーサポートシステム、ユーザーサポートプログラム及びユーザーサポート方法
JP2019074865A (ja) * 2017-10-13 2019-05-16 ロボットスタート株式会社 会話収集装置、会話収集システム及び会話収集方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023239779A1 (fr) * 2022-06-08 2023-12-14 Liveperson, Inc. Messagerie d'attente basée sur l'ia
CN115410576A (zh) * 2022-08-26 2022-11-29 国网河南省电力公司信息通信公司 一种智能客服***及智能客服机器人

Also Published As

Publication number Publication date
JP2020098308A (ja) 2020-06-25
JP6555838B1 (ja) 2019-08-07
TW202028992A (zh) 2020-08-01

Similar Documents

Publication Publication Date Title
US10455091B1 (en) User input driven short message service (SMS) applications
KR102146884B1 (ko) 채팅 시스템, 채팅봇 서버 장치, 채팅봇 id 관리장치, 채팅 중개 서버 장치, 프로그램, 채팅 방법 및 채팅 중개 방법
US11682393B2 (en) Method and system for context association and personalization using a wake-word in virtual personal assistants
US9020107B2 (en) Performing actions for users based on spoken information
CN105284107A (zh) 用于提供交互式广告的设备、***、方法和计算机可读介质
CN107835444A (zh) 信息交互方法、装置、音频终端及计算机可读存储介质
CN103631853B (zh) 基于相关性的语音搜索和响应
CN107748500A (zh) 用于控制智能设备的方法和装置
CN101023658A (zh) 伴有通话的推送型信息通信***
US20100112991A1 (en) Ambient sound detection and recognition method
CN108648756A (zh) 语音交互方法、装置和***
CN108932946A (zh) 客需服务的语音交互方法和装置
JP5441455B2 (ja) ネットワーク基盤のサービス提供システム
CN102292766A (zh) 用于提供用于语音识别自适应的复合模型的方法、装置和计算机程序产品
WO2020129419A1 (fr) Système et procédé de traitement d'interrogation vocale, dispositif serveur d'exploitation de haut-parleur intelligent et programme
JPWO2019073669A1 (ja) 情報処理装置、情報処理方法、及びプログラム
CN116701601A (zh) 人机交互的方法
JP2019057093A (ja) 情報処理装置及びプログラム
KR20160047244A (ko) 통번역 서비스 제공 방법, 휴대 단말 및 컴퓨터 판독 가능 매체
CN112447179A (zh) 一种语音交互方法、装置、设备及计算机可读存储介质
KR20200063646A (ko) 음성 광고 제공 방법 및 시스템
JP2019204271A (ja) オペレータ支援装置、オペレータ支援システム、及びプログラム
WO2019176018A1 (fr) Système de haut-parleurs à intelligence artificielle, procédé de commande d'un système de haut-parleurs à intelligence artificielle, et programme
KR20220058745A (ko) 대용어를 포함하는 텍스트에 관한 보이스 어시스턴트 서비스를 제공하는 시스템 및 방법
CN109981770A (zh) 用于推送信息的方法、装置和***

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19897600

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19897600

Country of ref document: EP

Kind code of ref document: A1