WO2019026314A1 - 情報処理装置、音声認識システム、及び、情報処理方法 - Google Patents
情報処理装置、音声認識システム、及び、情報処理方法 Download PDFInfo
- Publication number
- WO2019026314A1 WO2019026314A1 PCT/JP2018/003522 JP2018003522W WO2019026314A1 WO 2019026314 A1 WO2019026314 A1 WO 2019026314A1 JP 2018003522 W JP2018003522 W JP 2018003522W WO 2019026314 A1 WO2019026314 A1 WO 2019026314A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- voice
- control unit
- activation word
- vpa
- word
- Prior art date
Links
- 230000010365 information processing Effects 0.000 title claims abstract description 64
- 238000003672 processing method Methods 0.000 title claims description 8
- 230000004913 activation Effects 0.000 claims abstract description 246
- 230000005540 biological transmission Effects 0.000 claims abstract description 92
- 238000012545 processing Methods 0.000 claims abstract description 27
- 230000005236 sound signal Effects 0.000 claims description 81
- 238000000034 method Methods 0.000 claims description 20
- 230000008569 process Effects 0.000 claims description 16
- 238000001994 activation Methods 0.000 description 237
- 230000006870 function Effects 0.000 description 54
- 238000004891 communication Methods 0.000 description 48
- 238000005406 washing Methods 0.000 description 31
- 238000010586 diagram Methods 0.000 description 22
- 238000006243 chemical reaction Methods 0.000 description 6
- 238000003058 natural language processing Methods 0.000 description 5
- 239000004065 semiconductor Substances 0.000 description 5
- 238000004590 computer program Methods 0.000 description 4
- 230000003213 activating effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004883 computer application Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/30—Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/162—Interface to dedicated audio devices, e.g. audio drivers, interface to CODECs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/167—Audio in a user interface, e.g. using voice commands for navigating, audio feedback
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/32—Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Definitions
- the present disclosure relates to an information processing apparatus used for a speech recognition system.
- Patent Document 1 discloses an information processing apparatus that predicts a user's speech when an activation word is detected.
- the information processing apparatus When main speech recognition is performed by the cloud server, the information processing apparatus starts transmission of a speech signal to the speech recognition server by recognizing an activation word, for example.
- the present disclosure provides an information processing apparatus capable of selectively transmitting voice signals to a plurality of voice recognition servers.
- An information processing apparatus includes: a voice acquisition unit configured to acquire a voice of a user; and the first activation word when the audio acquired by the voice acquisition unit is recognized as a first activation word
- a first control unit that outputs an audio signal corresponding to the first voice, and the voice of the voice acquired by the voice acquisition unit when it is recognized that the audio signal output by the first control unit indicates the first activation word
- a second control unit for starting a first voice transmission process of transmitting a signal to a first voice recognition server, wherein the first control unit is a voice acquired by the voice acquisition unit during the first voice transmission process.
- the audio signal corresponding to the second activation word is selected based on a predetermined priority.
- Output to the second control unit A determination is made as to whether or not Luke, the second audio transmission process is a process of transmitting an audio signal of the sound acquired by the sound acquiring unit to the first speech recognition server different from the second speech recognition server.
- the information processing apparatus of the present disclosure can selectively transmit voice signals to a plurality of voice recognition servers.
- FIG. 1 is a diagram for explaining the function of the smart speaker.
- FIG. 2 is a diagram for explaining control of a home appliance using a smart speaker.
- FIG. 3 is a diagram showing the relationship between services and activation words.
- FIG. 4 is a diagram for describing a case where a user uses a smartphone to call a service.
- FIG. 5 is a block diagram showing the configuration of the speech recognition system according to the first embodiment.
- FIG. 6 is a flowchart of the operation of the speech recognition system according to the first embodiment.
- FIG. 7 is a diagram showing the relationship between the service and the activation word in the first embodiment.
- FIG. 8 is a flowchart of the operation of the speech recognition system according to the second embodiment.
- VPA Virtual Personal Assistance
- the user can receive provision of weather forecast information from the cloud server 132 that provides a weather forecast service by speaking "today's weather" to the smart speaker 110.
- the smart speaker 110 outputs an audio such as “fine”.
- the user if the user has a purchase history of the product of the user stored in the cloud server 133 that provides the e-commerce site, the user utters “same item bought” to the smart speaker 110 and the same product Can be purchased.
- a variety of interactive functions through the smart speaker 110 can be realized by recording the user's voice, television voice, radio voice, etc. by the microphone provided in the smart speaker 110 placed in a house etc. It is realized by transferring to 120.
- the VPA cloud server 120 converts an audio signal into text by a speech recognition function (ASR: automatic speech recognition) and machineizes text by a natural language processing function (NLP: natural language processing). Furthermore, the VPA cloud server 120 converts the machine language into a meaning that is in a specific context by the context understanding function, and finally converts the instruction content according to the information of each user by the personalized function.
- the VPA cloud server 120 can call the third party cloud server group 130 by transmitting such an instruction content as a command.
- FIG. 2 is a diagram for explaining control of a home appliance using the smart speaker 110. As shown in FIG.
- an air conditioner group 150 including air conditioners a to d is illustrated as devices to be controlled.
- the voice signal of this voice is transmitted to the VPA cloud server 120.
- the VPA cloud server 120 converts the speech signal into text by the speech recognition function, and converts the speech signal into a machine language instructing setting of the outing mode by the natural language processing function.
- the VPA cloud server 120 converts the machine language instructing setting of the going-out mode into a command to turn off the electric device in the user's home by the context understanding function, and the four air conditioners based on the user information by the personalized function. Convert to a command to turn off.
- the four air conditioners a to d are turned off by the user uttering "Please go out, so please set”. That is, according to home appliance control using the VPA, functions more than turning off the individual air conditioners a to d by using the remote controller are realized.
- the user may activate the VPA by operating a button provided on the smartphone or performing an operation such as touching an icon displayed on the smartphone. it can.
- the activation word specified by the VPA service provider will be used. Assuming that the VPA service provider is company B, the activation word is, for example, "company B" or "Hey B company”.
- a user who recognizes that there are two service systems may purchase and use a VPA device such as the smart speaker 110 manufactured and sold by Company A.
- a VPA device such as the smart speaker 110 manufactured and sold by Company A.
- using the activation word specified by the VPA service provider is similar to the user performing an application using a smartphone. Therefore, the user feels natural and there is no sense of discomfort.
- the user may misunderstand that the provider of the home appliance control service is not the company A but the VPA service provider (that is, the company B). If the home appliance control service is not provided due to a fault occurring in the home appliance control server 140, the user may consider it as a problem of the VPA service provider and may call the telephone consultation center of the VPA service provider. As described above, when the activation word specified by the VPA service provider is used, it is also a problem that the user (i.e., the person in charge of the service) may not understand the service provider.
- FIG. 5 is a block diagram showing the configuration of the speech recognition system according to the first embodiment.
- the smart speaker 110 includes an information processing apparatus 10 that transmits an audio signal to the VPA cloud server 120 as an audio user interface.
- the information processing apparatus 10 includes a voice acquisition unit 11, a first control unit 12, a second control unit 13, a communication unit 14, a sound output unit 15, and a storage unit 16.
- the second control unit 13 When the second control unit 13 recognizes that the audio signal output by the first control unit 12 indicates the second activation word, the second control unit 13 acquires the audio acquired by the audio acquisition unit 11 (more specifically, the audio acquisition unit 11 Start processing for starting transmission of the voice signal of the voice acquired by the VPA cloud server 120 to the VPA cloud server 120. Specifically, the second control unit 13 executes the VPA SDK stored in the storage unit 16.
- the second control unit 13 is realized by, for example, a microcomputer, but may be realized by a processor.
- the communication unit 14 transmits an audio signal to the communication unit 121 of the VPA cloud server 120 based on the control of the second control unit 13.
- the communication unit 14 is a communication module.
- the communication module is, in other words, a communication circuit.
- the communication unit 14 may perform wired communication or wireless communication.
- a relay device such as a broadband router and a communication network such as the Internet intervene between the communication unit 14 and the communication unit 121.
- the sound output unit 15 outputs sound based on the control of the second control unit 13.
- the sound output unit 15 outputs, for example, music transferred from the cloud server 131 providing the audio streaming service to the communication unit 14.
- the sound output unit 15 is specifically a speaker.
- the storage unit 16 is a storage device that stores a voice recognition program that the first control unit 12 executes to recognize the first activation word and the second activation word, and a VPA SDK that the second control unit 13 executes. is there.
- the storage unit 16 may store audio data read out by the first control unit 12 to output an audio signal corresponding to the first activation word or the second activation word.
- the storage unit 16 may be used as a buffer memory in which the voice acquired by the voice acquisition unit 11 is temporarily stored as voice data.
- the storage unit 16 is realized by a semiconductor memory or the like.
- the VPA cloud server 120 receives the voice signal of the voice acquired by the voice acquisition unit 11 after activation of the smart speaker 110 (after the VPA function is turned on), and performs voice recognition processing on the received voice signal.
- the VPA cloud server 120 is an example of a speech recognition server.
- the VPA cloud server 120 includes a communication unit 121, a VPA control unit 122, and a storage unit 123.
- the VPA control unit 122 performs voice recognition processing on the voice signal received by the communication unit 121, and causes the communication unit 121 to transmit a command obtained as a result of the voice recognition processing.
- voice recognition processing a speech recognition function, a natural language processing function, a context understanding function, a personalized function, and the like are used.
- the VPA control unit 122 is realized by, for example, a microcomputer, but may be realized by a processor.
- the communication unit 141 receives the command transmitted by the communication unit 121 of the VPA cloud server 120. Further, the communication unit 141 transmits a control signal to the air conditioner group 150 based on the control unit of the home appliance control unit 142. Specifically, the communication unit 141 is a communication module.
- the communication module is, in other words, a communication circuit.
- the home appliance control unit 142 causes the communication unit 141 to transmit a control signal according to the command received by the communication unit 141.
- the home appliance control unit 142 is realized by, for example, a microcomputer, but may be realized by a processor.
- the storage unit 143 is a storage device in which a control program or the like for the home appliance control unit 142 to control the air conditioner group 150 is stored.
- storage unit 143 is realized by a semiconductor memory or the like.
- the second control unit 13 of the smart speaker 110 performs an initialization process (S11).
- the initialization process is performed, for example, when power supply to the smart speaker 110 is started.
- the smart speaker 110 enters a standby state in which the first activation word and the second activation word can be recognized by the initialization process. In the standby state, the transmission of the audio signal to the VPA cloud server 120 is stopped.
- the second activation word is an activation word specified by the VPA service provider (that is, company B).
- the second activation word is, for example, "company B", “Hey B company” or the like.
- the first control unit 12 determines that the voice acquired by the voice acquisition unit 11 is not the first activation word (No in S13), the audio acquired by the voice acquisition unit 11 is the second activation word It is determined whether or not (S15).
- the activation process is a process for starting transmission of the voice signal of the voice acquired by the voice acquisition unit 11 to the VPA cloud server 120, and as a result, the VPA function is turned on.
- the voice acquisition unit 11 continues to acquire voice even after the activation process (S18), and the second control unit 13 uses the communication unit 14 for the audio signal of the speech acquired by the voice acquisition unit 11 after the activation process. Then, it transmits to the VPA cloud server 120 in real time (S19).
- the communication unit 121 of the VPA cloud server 120 receives an audio signal from the communication unit 14 and performs an audio recognition process on the acquired audio signal (S20). As a result, various services are provided to the user according to the voice acquired in step S18.
- the activation word may be designated by the user, and for example, as shown in FIG. 7, the name of the user's pet may be used.
- the storage unit 16 stores a voice recognition program for making the activation word a word designated by the user based on the voice of the user.
- the user speaks “A company,” which is the first activation word, and activates the smart speaker 110. Speak "A company” as the start-up word, and further utter “outing mode” as the command content. In other words, it is necessary to speak "company A” twice.
- the function activation word is also described as a designated word.
- the VPA cloud server 120 transmits a command to another server according to the result of speech recognition of the speech signal received from the smart speaker 110 (that is, the information processing apparatus 10) after the activation process.
- the designated word is a word for designating the server to which this command is to be sent.
- the first control unit 12 outputs an audio signal corresponding to the second activation word to the second control unit 13 (S16), and the second control unit 13 recognizes this as the second activation word.
- the activation process is performed (S17).
- the first control unit 12 determines that the designated word transmission mode is on (Yes in S22)
- the first control unit 12 reads the audio data corresponding to the designated word stored in advance in the storage unit 16 and outputs the audio signal corresponding to the designated word. It is output to the second control unit 13. Then, the second control unit 13 causes the communication unit 14 to transmit an audio signal corresponding to the designated word to the VPA cloud server 120 (S23).
- step S23 is omitted.
- the user utters the second activation word when he wishes to receive the VPA provision service, and utters the first activation word when he wants to receive the home appliance control service. It is useful when you do.
- the designated word is, for example, "company A” which is the same as the first activation word. That is, the first control unit 12 outputs an audio signal corresponding to the first activation word as an audio signal corresponding to the designated word.
- the designated word and the first activation word may be different. For example, based on FIG. 9, the first activation word may be "company A" and the designated word may be "television.”
- the voice recognition system 100a includes a washing machine 170, a VPA cloud server 120, a home appliance control server 140, and a washing machine group 180.
- the washing machine 170 is installed in the user's house or the like, and is also included in the washing machine group 180.
- the washing machine 170 includes the information processing apparatus 10 in addition to the washing control unit 20 for realizing the washing function. That is, the washing machine 170 is a home appliance corresponding to the VPA.
- the washing machine 170 is, for example, a home appliance manufactured and sold by Company A.
- the home appliance control server 140 transmits a completion message to the smartphone 160 of the user.
- the home appliance control service related to the washing machine 170 is mainly received.
- the third party cloud server group 130 is not included in the speech recognition system 100a. Therefore, using the second activation word (for example, “Company B” and “Hey B company”) designated by the VPA service provider as the activation word of the washing machine 170 manufactured and sold by Company A is not very good. It is natural.
- FIG. 11 is a flowchart of the operation of such a speech recognition system 100a.
- step S15 shown in the flowchart of FIG. 6 is omitted.
- the voice acquisition unit 11 performs voice acquisition (S11).
- the VPA function can be turned on by the first activation word, the VPA function can not be turned on by the second activation word.
- the change of the activation word from the second activation word specified by the VPA service provider to the first activation word specified by the hardware provider is realized.
- Embodiment 4 In the first to third embodiments, the information processing apparatus 10 is connectable to only one VPA cloud server 120, but the information processing apparatus 10 may be connectable to a plurality of VPA cloud servers.
- FIG. 12 is a block diagram showing the configuration of a speech recognition system 100b according to the fourth embodiment.
- the voice recognition system 100b includes the smart speaker 110b, the VPA cloud server 120b, the VPA cloud server 120c, the third party cloud server group 130, and the home appliance control server 140. , And an air conditioner group 150.
- the smart speaker 110 b includes an information processing apparatus 10 b that transmits an audio signal to the VPA cloud server 120 b and the VPA cloud server 120 c as an audio user interface.
- the information processing apparatus 10b includes a voice acquisition unit 11, a first control unit 12b, a second control unit 13b, a communication unit 14b, a sound output unit 15, and a storage unit 16b.
- a voice acquisition unit 11 a first control unit 12b, a second control unit 13b, a communication unit 14b, a sound output unit 15, and a storage unit 16b.
- the definitions of the first activation word and the second activation word are different from those of the first to third embodiments.
- the first activation word is an activation word for connecting the smart speaker 110b to the VPA cloud server 120b
- the second activation word is an activation for connecting the smart speaker 110b to the VPA cloud server 120c. It is a word.
- the first control unit 12 b is located between the voice acquisition unit 11 and the second control unit 13 b, and constantly monitors the user's voice acquired by the voice acquisition unit 11. For example, when the voice acquired by the voice acquisition unit 11 is recognized as the first activation word, the first control unit 12 b outputs an audio signal corresponding to the first activation word to the second control unit 13 b. When the first control unit 12 b recognizes that the voice acquired by the voice acquisition unit 11 is the second activation word, the first control unit 12 b outputs an audio signal corresponding to the second activation word to the second control unit 13 b. For example, the first control unit 12 b temporarily stores the acquired voice signal of voice in the storage unit 16 b, and outputs the stored voice signal to the second control unit 13 b.
- the first control unit 12 b is realized by, for example, a microcomputer, but may be realized by a processor.
- the second control unit 13 b When the second control unit 13 b recognizes that the audio signal output by the first control unit 12 b is the first activation word, the second control unit 13 b transmits the audio signal of the audio acquired by the audio acquisition unit 11 to the VPA cloud server 120 b Start the first voice transmission process. Specifically, the second control unit 13b executes the VPA SDK-B stored in the storage unit 16b.
- the VPA SDK-B is provided by a company B that provides speech recognition service using the VPA cloud server 120b.
- the second control unit 13 b when the second control unit 13 b recognizes that the audio signal output by the first control unit 12 b is the second activation word, the second control unit 13 b transmits the audio signal acquired by the audio acquisition unit 11 to the VPA cloud server 120 c. Start a second voice transmission process. Specifically, the second control unit 13b executes the VPA SDK-C stored in the storage unit 16b.
- the VPA SDK-C is provided by Company C that provides speech recognition service using the VPA cloud server 120c.
- the second control unit 13 b is realized by, for example, a microcomputer, but may be realized by a processor.
- the communication unit 14b transmits an audio signal to the VPA cloud server 120b under the control of the second control unit 13b during the first audio transmission process, and during the second audio transmission process, the communication unit 14b transmits the audio signal of the second control unit 13b.
- An audio signal is transmitted to the VPA cloud server 120c based on the control.
- the communication unit 14b is specifically a communication module.
- the communication module is, in other words, a communication circuit.
- the storage unit 16b is a voice recognition program that the first control unit 12b executes to recognize the first activation word and the second activation word, and VPA SDK-B and VPA SDK-C that the second control unit 13b executes. Is a storage device to be stored. Further, priority information to be described later is stored in the storage unit 16b.
- the storage unit 16 b may be used as a buffer memory in which the audio signal of the audio acquired by the audio acquisition unit 11 is temporarily stored. Specifically, the storage unit 16b is realized by a semiconductor memory or the like.
- the VPA cloud server 120b receives the voice signal of the voice acquired by the voice acquisition unit 11 during the first voice transmission process, and implements the VPA providing service by performing voice recognition processing on the received voice signal.
- the VPA cloud server 120 b is an example of a first speech recognition server.
- the specific configuration of the VPA cloud server 120 b is similar to that of the VPA cloud server 120.
- the VPA cloud server 120c receives the audio signal of the voice acquired by the voice acquisition unit 11 during the second voice transmission processing, and implements the home appliance control service by performing voice recognition processing on the received voice signal.
- the VPA cloud server 120c is an example of a second speech recognition server.
- the specific configuration of the VPA cloud server 120c is similar to that of the VPA cloud server 120.
- the division of roles of the two VPA cloud servers is clarified.
- the user may utter the first activation word if he wishes to receive the VPA provided service, and may utter the second activation word if he wishes to receive the home appliance control service. Therefore, the user is prevented from being confused by the activation word.
- the speech recognition system 100b it may be considered that the user may want to switch to the other when one of the first speech transmission process and the second speech transmission process is being performed.
- a word for switching voice transmission processing, a switching button for voice transmission processing, or the like is prepared.
- the first control unit 12b may control switching from one of the first voice transmission process and the second voice transmission process to the other according to a predetermined priority.
- FIG. 13 is a flowchart of the operation of such a speech recognition system 100b. Although it is determined in the flowchart of FIG. 13 whether or not to switch to the second voice transmission process during the first voice transmission process, it is also possible to determine whether to switch to the first voice transmission process during the second voice transmission process. The same operation is performed.
- the second control unit 13b performs a first voice transmission process (S31).
- the first voice transmission process is a process in which the voice signal of the voice acquired by the voice acquisition unit 11 is transmitted to the VPA cloud server 120b in real time.
- the second control unit 13b starts the first sound transmission process when it recognizes that the sound signal output by the first control unit 12b indicates the first activation word.
- step S32 when it is determined in step S32 that the audio signal indicates the second activation word (Yes in S33), that is, when the first control unit 12b recognizes that the audio is the second activation word, the first control The unit 12b makes a determination based on the priority (S34).
- the priority is stored in advance in the storage unit 16b as priority information, and in step S34, the first control unit 12b refers to the priority information stored in the storage unit 16b.
- the priority is set, for example, for each VPA cloud server to which the smart speaker 110b is connected (in other words, for each activation word). In this case, the first control unit 12b determines whether the first priority of the VPA cloud server 120b is lower than the second priority of the VPA cloud server 120c.
- the priority may be determined for each service provided to the user. Priorities may be defined, for example, for audio streaming services, weather forecasting services, e-commerce services, and home appliance control services.
- the first control unit 12b determines that the first priority of the service provided as a result of the first voice transmission process is lower than the second priority of the service provided as a result of the second voice transmission process. (Yes in S34), an audio signal corresponding to the second activation word is output to the second control unit 13b (S35). In addition, when the priority of the first service is higher than the priority of the second service, the first control unit 12b does not output an audio signal corresponding to the second activation word to the second control unit 13b. As a result, the first voice transmission process is continued (S31).
- information indicating the service content transmitted from the third party cloud server group 130 or the like providing the service is received by the communication unit 14b as to what service the currently provided service is. Can be recognized. Such information is unnecessary when the service content and the VPA cloud server have a one-to-one relationship.
- FIG. 14 is a simplified block diagram showing the configuration of a speech recognition system 100c according to the fifth embodiment.
- the voice recognition system 100c includes a smart speaker 110, a television 190, a washing machine 170, a VPA cloud server 120b, a VPA cloud server 120c, a third party cloud server group 130, a home appliance control server 140, and an air conditioner group 150. And a home appliance control server 200, a television group 210, and a washing machine group 180.
- the smart speaker 110, the television 190, and the washing machine 170 are home appliances manufactured and sold by the company A, and are installed in a user's home or the like.
- the smart speaker 110 includes the information processing apparatus 10 having a start word conversion function.
- the storage unit 16 included in the information processing apparatus 10 stores VPA SDK-B supplied from Company B, which is a VPA service provider that provides a voice recognition service using the VPA cloud server 120b. That is, the smart speaker 110 can be connected to the VPA cloud server 120b.
- the television 190 includes an information processing apparatus 10 having a television function and an activation word conversion function.
- the storage unit 16 included in the information processing apparatus 10 stores VPA SDK-C supplied from Company C, which is a VPA service provider that provides a voice recognition service using the VPA cloud server 120c. That is, the television 190 can be connected to the VPA cloud server 120c.
- the washing machine 170 includes an information processing apparatus 10 having a washing function and an activation word conversion function.
- the storage unit 16 included in the information processing apparatus 10 stores VPA SDK-C supplied from Company C, which is a VPA service provider that provides a voice recognition service using the VPA cloud server 120c. That is, the washing machine 170 can be connected to the VPA cloud server 120c.
- the VPA cloud server 120 b is managed by the company B and can be connected to the third party cloud server group 130 and the home appliance control server 140.
- the home appliance control server 140 has a function of controlling the air conditioner group 150, and is managed by the company A.
- the user turns on the VPA function of the smart speaker 110.
- the user In order to speak the activation word designated by company B and turn on the VPA function of the television 190 and the washing machine 170, it is necessary to speak the activation word designated by company C.
- the user may unify the activation word for the smart speaker 110, the television 190, and the washing machine 170. it can.
- the activation word may be unified to the activation word specified by company B, may be unified to the activation word specified by company C, or may be unified to other activation words.
- FIG. 15 is a simplified block diagram showing the configuration of a speech recognition system 100d according to the sixth embodiment.
- the user sets the activation word for the smart speaker 110, the television 190, and the washing machine 170 to an activation word specified by company B or an activation word specified by company C. It is possible to unify.
- the information processing apparatus 10 recognizes that the voice acquired by the voice of the user and the voice acquired by the voice acquisition unit 11 are the first activation word
- the first activation word The first control unit 12 outputs an audio signal corresponding to a second activation word different from the above, and the audio acquisition unit 11 recognizes that the audio signal output by the first control unit 12 indicates a second activation word.
- the second control unit 13 is configured to perform start processing for starting transmission of the acquired voice signal of the voice to the VPA cloud server 120.
- the VPA cloud server 120 is an example of a speech recognition server.
- Such an information processing apparatus 10 can start transmission of voice to the VPA cloud server 120 by the first activation word other than the second activation word specified by the VPA service provider.
- the activation word can be unified.
- the first control unit 12 when the first control unit 12 recognizes that the voice acquired by the voice acquisition unit 11 is the second activation word, the first control unit 12 selects the audio signal corresponding to the second activation word as the second activation word. 2. Output to the control unit 13.
- Such an information processing apparatus 10 can start transmission of voice to the VPA cloud server 120 by the first activation word other than the second activation word specified by the VPA service provider.
- the first control unit 12 when the first control unit 12 recognizes that the voice acquired by the voice acquisition unit 11 is the second activation word, the first control unit 12 selects the audio signal corresponding to the second activation word as the second activation word. 2) Do not output to the control unit 13.
- Such an information processing apparatus 10 can activate the speech recognition system 100a using only the first activation word among the first activation word and the second activation word.
- the VPA cloud server 120 transmits a command to another server according to the result of speech recognition of the speech signal received from the information processing apparatus 10 after the activation processing.
- the first control unit 12 recognizes that the voice acquired by the voice acquisition unit 11 is the first activation word
- the first control unit 12 outputs an audio signal corresponding to the second activation word, and further designates the transmission destination of the command.
- An audio signal corresponding to the designated word for output is output to the second control unit 13.
- the first control unit 12 outputs, to the second control unit 13, an audio signal corresponding to the first activation word as an audio signal corresponding to the designated word.
- the user can designate the transmission destination of the command by uttering the first activation word which should normally utter twice.
- Such speech recognition system 100 or speech recognition system 100a can start transmission of the speech signal to the VPA cloud server 120 by the first activation word other than the second activation word specified by the VPA service provider.
- the audio corresponding to the second activation word different from the first activation word is output, and when it is recognized that the output audio signal indicates the second activation word, an activation process for starting transmission of the acquired audio signal of the audio to the VPA cloud server is performed.
- Such an information processing method can start transmission of voice to the VPA cloud server 120 by the first activation word other than the second activation word specified by the VPA service provider.
- the information processing apparatus 10 b when the information processing apparatus 10 b recognizes that the voice acquired by the voice acquisition unit 11 that acquires the voice of the user and the voice acquired by the voice acquisition unit 11 are the first activation word, Acquired by the audio acquisition unit 11 when it is recognized that the first control unit 12 b outputting an audio signal corresponding to the first activation word and the audio signal output by the first control unit 12 b indicate the first activation word And a second control unit 13b that starts a first voice transmission process of sending the voice signal of the voice to the VPA cloud server 120b.
- the first control unit 12 b recognizes that the voice acquired by the voice acquisition unit 11 during the first voice transmission process is a second activation word for causing the second control unit 13 b to start the second voice transmission process.
- the VPA cloud server 120b is an example of a first speech recognition server
- the VPA cloud server 120c is an example of a second speech recognition server.
- Such an information processing apparatus 10b can recognize both the first activation word and the second activation word, and can selectively transmit voice to the VPA cloud server 120b and the VPA cloud server 120c. Specifically, the information processing apparatus 10b can switch the first sound transmission process to the second sound transmission process in consideration of a predetermined priority.
- Such an information processing apparatus 10 b can switch the first voice transmission process to the second voice transmission process based on the priority of the VPA cloud server.
- the first control unit 12b performs a second control of the audio signal corresponding to the second activation word based on the determination. It does not output to the part 13b.
- Such an information processing apparatus 10b can switch the first voice transmission process to the second voice transmission process based on the priority of the service.
- the first control unit 12b performs the second control unit 13b on the basis of the above determination based on the audio signal corresponding to the second activation word. Do not output to
- Such an information processing apparatus 10 b can continue the first voice transmission process based on the priority of the service.
- the comprehensive or specific aspects of the present disclosure may be realized by a device, a system, a method, an integrated circuit, a computer program, or a recording medium such as a computer readable CD-ROM, and the device, system, method. It may be realized by any combination of integrated circuit, computer program and recording medium.
- the present disclosure may be realized as a program for causing a computer to execute the information processing method of the above embodiment, or realized as a computer readable non-transitory recording medium having such a program recorded thereon. It is also good.
- another processing unit may execute the processing executed by a specific processing unit.
- the order of the plurality of processes in the operation of the speech recognition system described in the above embodiment is an example.
- the order of the plurality of processes may be changed, or the plurality of processes may be performed in parallel.
- the information processing apparatus of the present disclosure can selectively transmit an audio signal to a plurality of VPA cloud servers.
- the information processing apparatus of the present disclosure can easily switch the VPA cloud server as the connection destination, and thus can contribute to the spread of VPA devices and the spread of services using the VPA cloud server.
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Acoustics & Sound (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Telephonic Communication Services (AREA)
- Telephone Function (AREA)
- Computer And Data Communications (AREA)
Abstract
Description
米国アマゾン(登録商標)社が提供するアレクサ(登録商標)、米国Google(登録商標)社が提供するGoogle Assistant(登録商標)、米国マイクロソフト(登録商標)社が提供するCortana(登録商標)等のいわゆるVPA(Virtual Personal Assistance)と呼ばれる、音声で機器を操作するサービスが普及し始めている。
[構成]
以下、実施の形態1に係る音声認識システムの構成について説明する。図5は、実施の形態1に係る音声認識システムの構成を示すブロック図である。
スマートスピーカ110は、音声ユーザインターフェースとして、VPAクラウドサーバ120へ音声信号を送信する情報処理装置10を備える。情報処理装置10は、音声取得部11と、第一制御部12と、第二制御部13と、通信部14と、出音部15と、記憶部16とを備える。
VPAクラウドサーバ120は、スマートスピーカ110の起動後(VPA機能がオンした後)に音声取得部11によって取得された音声の音声信号を受信し、受信した音声信号を音声認識処理することにより、VPA提供サービスまたは家電制御サービスを実現する。VPAクラウドサーバ120は、音声認識サーバの一例である。VPAクラウドサーバ120は、通信部121と、VPA制御部122と、記憶部123とを備える。
家電制御サーバ140は、VPAクラウドサーバ120から指令を受信し、受信した指令に基づいてエアコン群150を制御することにより家電制御サービスをユーザに提供する。なお、エアコン群150は、制御対象の家電機器の一例であり、制御対象の家電機器は、エアコン以外の家電機器であってもよい。家電制御サーバ140は、通信部141と、家電制御部142と、記憶部143とを備える。
次に、音声認識システム100の動作について説明する。図6は、音声認識システム100の動作のフローチャートである。
ところで、図7に示されるように、ユーザは、例えば、エアコンに外出モードの動作を行わせるために、第一起動ワードである「A社」と発話してスマートスピーカ110を起動した後、機能起動ワードとして「A社」と発話し、さらに、コマンド内容として「外出モード」と発話する。つまり、「A社」を2度発話する必要がある。
情報処理装置10は、スマートスピーカ110以外の家電機器に実装されてもよい。例えば、情報処理装置10は、洗濯機に実装されてもよい。図10は、このような実施の形態3に係る音声認識システムの構成を示すブロック図である。
上記実施の形態1~3では、情報処理装置10は、1つのVPAクラウドサーバ120にのみ接続可能であったが、情報処理装置10は、複数のVPAクラウドサーバに接続可能であってもよい。図12は、このような実施の形態4に係る音声認識システム100bの構成を示すブロック図である。
上記実施の形態1~3で説明された起動ワードの変換機能を有する情報処理装置10によれば、図14に示されるような複数のVPA機器が混在している音声認識システム100cにおいて、起動ワードを統一することができる。図14は、実施の形態5に係る音声認識システム100cの構成を示す簡易ブロック図である。
上記実施の形態4で説明された複数のVPAクラウドサーバへの接続機能を有する情報処理装置10bによれば、図15に示されるような複数のVPA機器が混在している音声認識システム100dにおいて、起動ワードを整理することができる。図15は、実施の形態6に係る音声認識システム100dの構成を示す簡易ブロック図である。
以上説明したように、情報処理装置10は、ユーザの音声を取得する音声取得部11と、音声取得部11によって取得された音声が第一起動ワードであると認識した場合に、第一起動ワードと異なる第二起動ワードに対応する音声信号を出力する第一制御部12と、第一制御部12によって出力された音声信号が第二起動ワードを示すと認識した場合に、音声取得部11によって取得された音声の音声信号のVPAクラウドサーバ120への送信を開始するための起動処理を行う第二制御部13とを備える。VPAクラウドサーバ120は、音声認識サーバの一例である。
以上、実施の形態について説明したが、本開示は、上記実施の形態に限定されるものではない。
11 音声取得部
12、12b 第一制御部
13、13b 第二制御部
14、14b、121、141 通信部
15 出音部
16、16b、123、143 記憶部
20 洗濯制御部
100、100a、100b、100c、100d 音声認識システム
110、110b スマートスピーカ
120、120b、120c VPAクラウドサーバ
122 VPA制御部
130 サードパーティクラウドサーバ群
131、132、133 クラウドサーバ
140、200 家電制御サーバ
142 家電制御部
150 エアコン群
160 スマートフォン
170 洗濯機
180 洗濯機群
190 テレビ
210 テレビ群
Claims (7)
- ユーザの音声を取得する音声取得部と、
前記音声取得部によって取得された音声が第一起動ワードであると認識した場合に、前記第一起動ワードに対応する音声信号を出力する第一制御部と、
前記第一制御部によって出力された音声信号が前記第一起動ワードを示すと認識した場合に、前記音声取得部によって取得された音声の音声信号を第一音声認識サーバへ送信する第一音声送信処理を開始する第二制御部とを備え、
前記第一制御部は、前記第一音声送信処理中に前記音声取得部によって取得された音声が、前記第二制御部に第二音声送信処理を開始させるための第二起動ワードであると認識した場合に、所定の優先度に基づいて前記第二起動ワードに対応する音声信号を前記第二制御部に出力するか否かの判定を行い、
前記第二音声送信処理は、前記音声取得部によって取得された音声の音声信号を前記第一音声認識サーバと異なる第二音声認識サーバへ送信する処理である
情報処理装置。 - 前記第一制御部は、前記第一音声認識サーバの優先度が前記第二音声認識サーバの優先度よりも低い場合に、前記判定に基づいて前記第二起動ワードに対応する音声信号を前記第二制御部に出力する
請求項1に記載の情報処理装置。 - 前記第一制御部は、前記第一音声認識サーバの優先度が前記第二音声認識サーバの優先度よりも高い場合に、前記判定に基づいて前記第二起動ワードに対応する音声信号を前記第二制御部に出力しない
請求項2に記載の情報処理装置。 - 前記第一制御部は、前記第一音声送信処理の結果として提供されるサービスの優先度が前記第二音声送信処理の結果として提供されるサービスの優先度よりも低い場合に、前記判定に基づいて前記第二起動ワードに対応する音声信号を前記第二制御部に出力する
請求項1に記載の情報処理装置。 - 前記第一制御部は、前記第一サービスの優先度が前記第二サービスの優先度よりも高い場合に、前記判定に基づいて前記第二起動ワードに対応する音声信号を前記第二制御部に出力しない
請求項4に記載の情報処理装置。 - 請求項1~5のいずれか1項に記載の情報処理装置と、
前記第一音声認識サーバと、
前記第二音声認識サーバとを備える
音声認識システム。 - コンピュータによって実行される情報処理方法であって、
ユーザの音声を取得し、
取得された音声が第一起動ワードであると認識した場合に、前記第一起動ワードに対応する音声信号を出力し、
出力された音声信号が前記第一起動ワードであると認識した場合に、取得された音声の音声信号を第一音声認識サーバへ送信する第一音声送信処理を開始し、
前記第一音声送信処理中に取得された音声が、第二音声送信処理を開始するための第二起動ワードを示すと認識した場合に、所定の優先度に基づいて前記第二起動ワードに対応する音声信号を出力するか否かを決定し、
前記第二音声送信処理は、取得された音声の音声信号を前記第一音声認識サーバと異なる第二音声認識サーバへ送信する処理である
情報処理方法。
Priority Applications (7)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201880003037.6A CN109601016B (zh) | 2017-08-02 | 2018-02-02 | 信息处理装置、声音识别***及信息处理方法 |
US16/325,844 US10803872B2 (en) | 2017-08-02 | 2018-02-02 | Information processing apparatus for transmitting speech signals selectively to a plurality of speech recognition servers, speech recognition system including the information processing apparatus, and information processing method |
BR112019002607A BR112019002607A2 (pt) | 2017-08-02 | 2018-02-02 | aparelho de processamento de informação, sistema de reconhecimento de fala e método de processamento de informação |
MX2019001807A MX2019001807A (es) | 2017-08-02 | 2018-02-02 | Aparato para procesamiento de informacion, sistema para reconocimiento de voz, y metodo para procesamiento de informacion. |
JP2018567322A JP7033713B2 (ja) | 2017-08-02 | 2018-02-02 | 情報処理装置、音声認識システム、及び、情報処理方法 |
SG11201901419QA SG11201901419QA (en) | 2017-08-02 | 2018-02-02 | Information processing apparatus, speech recognition system, and information processing method |
EP18842220.8A EP3663906B1 (en) | 2017-08-02 | 2018-02-02 | Information processing apparatus and information processing method |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201762540415P | 2017-08-02 | 2017-08-02 | |
US62/540415 | 2017-08-02 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019026314A1 true WO2019026314A1 (ja) | 2019-02-07 |
Family
ID=65232459
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2018/003522 WO2019026314A1 (ja) | 2017-08-02 | 2018-02-02 | 情報処理装置、音声認識システム、及び、情報処理方法 |
PCT/JP2018/003521 WO2019026313A1 (ja) | 2017-08-02 | 2018-02-02 | 情報処理装置、音声認識システム、及び、情報処理方法 |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2018/003521 WO2019026313A1 (ja) | 2017-08-02 | 2018-02-02 | 情報処理装置、音声認識システム、及び、情報処理方法 |
Country Status (8)
Country | Link |
---|---|
US (2) | US11145311B2 (ja) |
EP (2) | EP3663906B1 (ja) |
JP (2) | JP6928882B2 (ja) |
CN (2) | CN109601016B (ja) |
BR (2) | BR112019002607A2 (ja) |
MX (2) | MX2019001803A (ja) |
SG (2) | SG11201901419QA (ja) |
WO (2) | WO2019026314A1 (ja) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020194367A1 (ja) * | 2019-03-22 | 2020-10-01 | 三菱重工サーマルシステムズ株式会社 | 制御装置、機器制御システム、制御方法及びプログラム |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102498007B1 (ko) * | 2018-01-08 | 2023-02-08 | 엘지전자 주식회사 | 음성인식을 이용한 세탁물 처리기기 제어시스템 및 동작방법 |
US11501761B2 (en) * | 2019-04-05 | 2022-11-15 | Samsung Electronics Co., Ltd. | Method and apparatus for speech recognition |
JP7236919B2 (ja) * | 2019-04-12 | 2023-03-10 | 三菱電機株式会社 | 音声入力装置、音声操作システム、音声操作方法及びプログラム |
JP2020178177A (ja) * | 2019-04-16 | 2020-10-29 | シャープ株式会社 | ネットワークシステム |
CN110570859B (zh) * | 2019-09-20 | 2022-05-27 | Oppo广东移动通信有限公司 | 智能音箱控制方法、装置、***及存储介质 |
JP7248564B2 (ja) * | 2019-12-05 | 2023-03-29 | Tvs Regza株式会社 | 情報処理装置及びプログラム |
JP7264071B2 (ja) * | 2020-01-23 | 2023-04-25 | トヨタ自動車株式会社 | 情報処理システム、情報処理装置、及びプログラム |
CN111353771A (zh) * | 2020-02-19 | 2020-06-30 | 北京声智科技有限公司 | 一种远程控制支付的方法、装置、设备和介质 |
CN111768783B (zh) | 2020-06-30 | 2024-04-02 | 北京百度网讯科技有限公司 | 语音交互控制方法、装置、电子设备、存储介质和*** |
CN114726830A (zh) * | 2020-12-18 | 2022-07-08 | 阿里巴巴集团控股有限公司 | 语音服务访问方法、***和车辆 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2005031758A (ja) * | 2003-07-07 | 2005-02-03 | Canon Inc | 音声処理装置及び方法 |
JP2013064777A (ja) * | 2011-09-15 | 2013-04-11 | Ntt Docomo Inc | 端末装置、音声認識プログラム、音声認識方法および音声認識システム |
JP2016095383A (ja) * | 2014-11-14 | 2016-05-26 | 株式会社ATR−Trek | 音声認識クライアント装置及びサーバ型音声認識装置 |
JP2017138476A (ja) | 2016-02-03 | 2017-08-10 | ソニー株式会社 | 情報処理装置、情報処理方法、及びプログラム |
Family Cites Families (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100719776B1 (ko) * | 2005-02-25 | 2007-05-18 | 에이디정보통신 주식회사 | 휴대형 코드인식 음성 합성출력장치 |
JP2009080183A (ja) * | 2007-09-25 | 2009-04-16 | Panasonic Electric Works Co Ltd | 音声認識制御装置 |
US9117449B2 (en) * | 2012-04-26 | 2015-08-25 | Nuance Communications, Inc. | Embedded system for construction of small footprint speech recognition with user-definable constraints |
US10381001B2 (en) * | 2012-10-30 | 2019-08-13 | Google Technology Holdings LLC | Voice control user interface during low-power mode |
JP2015011170A (ja) | 2013-06-28 | 2015-01-19 | 株式会社ATR−Trek | ローカルな音声認識を行なう音声認識クライアント装置 |
CN103383134B (zh) * | 2013-08-06 | 2016-12-28 | 四川长虹电器股份有限公司 | 一种智能空调***及空调控制方法 |
JP2016531375A (ja) * | 2013-09-20 | 2016-10-06 | アマゾン テクノロジーズ インコーポレイテッド | ローカルとリモートのスピーチ処理 |
US9508345B1 (en) * | 2013-09-24 | 2016-11-29 | Knowles Electronics, Llc | Continuous voice sensing |
CN105280180A (zh) * | 2014-06-11 | 2016-01-27 | 中兴通讯股份有限公司 | 一种终端控制方法、装置、语音控制装置及终端 |
US10339928B2 (en) * | 2014-10-24 | 2019-07-02 | Sony Interactive Entertainment Inc. | Control device, control method, program and information storage medium |
TWI525532B (zh) * | 2015-03-30 | 2016-03-11 | Yu-Wei Chen | Set the name of the person to wake up the name for voice manipulation |
US9996316B2 (en) * | 2015-09-28 | 2018-06-12 | Amazon Technologies, Inc. | Mediation of wakeword response for multiple devices |
JP2017117371A (ja) * | 2015-12-25 | 2017-06-29 | パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカPanasonic Intellectual Property Corporation of America | 制御方法、制御装置およびプログラム |
US10133612B2 (en) * | 2016-03-17 | 2018-11-20 | Nuance Communications, Inc. | Session processing interaction between two or more virtual assistants |
US10115400B2 (en) * | 2016-08-05 | 2018-10-30 | Sonos, Inc. | Multiple voice services |
US10685656B2 (en) | 2016-08-31 | 2020-06-16 | Bose Corporation | Accessing multiple virtual personal assistants (VPA) from a single device |
US10437841B2 (en) | 2016-10-10 | 2019-10-08 | Microsoft Technology Licensing, Llc | Digital assistant extension automatic ranking and selection |
US10127908B1 (en) * | 2016-11-11 | 2018-11-13 | Amazon Technologies, Inc. | Connected accessory for a voice-controlled device |
US10559309B2 (en) * | 2016-12-22 | 2020-02-11 | Google Llc | Collaborative voice controlled devices |
US11164570B2 (en) * | 2017-01-17 | 2021-11-02 | Ford Global Technologies, Llc | Voice assistant tracking and activation |
US10694608B2 (en) * | 2017-02-07 | 2020-06-23 | Lutron Technology Company Llc | Audio-based load control system |
US10748531B2 (en) * | 2017-04-13 | 2020-08-18 | Harman International Industries, Incorporated | Management layer for multiple intelligent personal assistant services |
US20190013019A1 (en) * | 2017-07-10 | 2019-01-10 | Intel Corporation | Speaker command and key phrase management for muli -virtual assistant systems |
-
2018
- 2018-02-02 JP JP2018568454A patent/JP6928882B2/ja active Active
- 2018-02-02 US US16/325,793 patent/US11145311B2/en active Active
- 2018-02-02 SG SG11201901419QA patent/SG11201901419QA/en unknown
- 2018-02-02 MX MX2019001803A patent/MX2019001803A/es unknown
- 2018-02-02 CN CN201880003037.6A patent/CN109601016B/zh active Active
- 2018-02-02 BR BR112019002607A patent/BR112019002607A2/pt unknown
- 2018-02-02 MX MX2019001807A patent/MX2019001807A/es unknown
- 2018-02-02 SG SG11201901441QA patent/SG11201901441QA/en unknown
- 2018-02-02 EP EP18842220.8A patent/EP3663906B1/en active Active
- 2018-02-02 WO PCT/JP2018/003522 patent/WO2019026314A1/ja unknown
- 2018-02-02 US US16/325,844 patent/US10803872B2/en active Active
- 2018-02-02 WO PCT/JP2018/003521 patent/WO2019026313A1/ja unknown
- 2018-02-02 CN CN201880003041.2A patent/CN109601017B/zh active Active
- 2018-02-02 JP JP2018567322A patent/JP7033713B2/ja active Active
- 2018-02-02 BR BR112019002636A patent/BR112019002636A2/pt unknown
- 2018-02-02 EP EP18842080.6A patent/EP3663905B1/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2005031758A (ja) * | 2003-07-07 | 2005-02-03 | Canon Inc | 音声処理装置及び方法 |
JP2013064777A (ja) * | 2011-09-15 | 2013-04-11 | Ntt Docomo Inc | 端末装置、音声認識プログラム、音声認識方法および音声認識システム |
JP2016095383A (ja) * | 2014-11-14 | 2016-05-26 | 株式会社ATR−Trek | 音声認識クライアント装置及びサーバ型音声認識装置 |
JP2017138476A (ja) | 2016-02-03 | 2017-08-10 | ソニー株式会社 | 情報処理装置、情報処理方法、及びプログラム |
Non-Patent Citations (1)
Title |
---|
See also references of EP3663906A4 |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020194367A1 (ja) * | 2019-03-22 | 2020-10-01 | 三菱重工サーマルシステムズ株式会社 | 制御装置、機器制御システム、制御方法及びプログラム |
JPWO2020194367A1 (ja) * | 2019-03-22 | 2020-10-01 | ||
JP7412414B2 (ja) | 2019-03-22 | 2024-01-12 | 三菱重工サーマルシステムズ株式会社 | 制御装置、機器制御システム、制御方法及びプログラム |
Also Published As
Publication number | Publication date |
---|---|
JPWO2019026313A1 (ja) | 2020-05-28 |
EP3663905B1 (en) | 2020-12-09 |
US20190187953A1 (en) | 2019-06-20 |
JP7033713B2 (ja) | 2022-03-11 |
SG11201901419QA (en) | 2019-03-28 |
US10803872B2 (en) | 2020-10-13 |
JP6928882B2 (ja) | 2021-09-01 |
MX2019001803A (es) | 2019-07-04 |
CN109601016B (zh) | 2023-07-28 |
SG11201901441QA (en) | 2019-03-28 |
EP3663906A1 (en) | 2020-06-10 |
CN109601016A (zh) | 2019-04-09 |
US20190214015A1 (en) | 2019-07-11 |
WO2019026313A1 (ja) | 2019-02-07 |
BR112019002607A2 (pt) | 2019-05-28 |
CN109601017B (zh) | 2024-05-03 |
EP3663906A4 (en) | 2020-07-22 |
BR112019002636A2 (pt) | 2019-05-28 |
CN109601017A (zh) | 2019-04-09 |
JPWO2019026314A1 (ja) | 2020-06-18 |
US11145311B2 (en) | 2021-10-12 |
EP3663906B1 (en) | 2024-04-03 |
EP3663905A4 (en) | 2020-06-17 |
EP3663905A1 (en) | 2020-06-10 |
MX2019001807A (es) | 2019-06-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP7033713B2 (ja) | 情報処理装置、音声認識システム、及び、情報処理方法 | |
JP6942755B2 (ja) | スマート音声機器間のインタラクション方法、装置、機器及び記憶媒体 | |
US11250859B2 (en) | Accessing multiple virtual personal assistants (VPA) from a single device | |
US10115396B2 (en) | Content streaming system | |
US20190304448A1 (en) | Audio playback device and voice control method thereof | |
US10055190B2 (en) | Attribute-based audio channel arbitration | |
WO2014203495A1 (ja) | 音声対話方法、及び機器 | |
JP5753212B2 (ja) | 音声認識システム、サーバ、および音声処理装置 | |
JP7311707B2 (ja) | ヒューマンマシン対話処理方法 | |
JP6619488B2 (ja) | 人工知能機器における連続会話機能 | |
CN111263962A (zh) | 信息处理设备和信息处理方法 | |
US10147426B1 (en) | Method and device to select an audio output circuit based on priority attributes | |
WO2020135773A1 (zh) | 数据处理方法、装置及计算机可读存储介质 | |
JP2014021493A (ja) | 外部入力制御方法及びそれを適用した放送受信装置 | |
CN109600677A (zh) | 数据传输方法及装置、存储介质、电子设备 | |
KR102002872B1 (ko) | 외부 디바이스를 통한 모바일 디바이스에서의 채팅 방법 및 시스템 | |
JP2015002394A (ja) | 情報処理装置及びコンピュータプログラム | |
JP2016206249A (ja) | 対話装置、対話システム、及び対話装置の制御方法 | |
JP5973030B2 (ja) | 音声認識システム、および音声処理装置 | |
WO2022215284A1 (ja) | 発話機器を制御する方法、サーバ、発話機器、およびプログラム | |
CN103747328A (zh) | 电视***、开机广告音频播放装置及方法 | |
TWI466105B (zh) | 音頻系統及音頻處理方法 | |
CN118283202A (zh) | 一种显示设备及音频处理方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase |
Ref document number: 2018567322 Country of ref document: JP Kind code of ref document: A |
|
REG | Reference to national code |
Ref country code: BR Ref legal event code: B01A Ref document number: 112019002607 Country of ref document: BR |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18842220 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 112019002607 Country of ref document: BR Kind code of ref document: A2 Effective date: 20190208 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2018842220 Country of ref document: EP Effective date: 20200302 |