WO2015151157A1 - 意図理解装置および方法 - Google Patents
意図理解装置および方法 Download PDFInfo
- Publication number
- WO2015151157A1 WO2015151157A1 PCT/JP2014/059445 JP2014059445W WO2015151157A1 WO 2015151157 A1 WO2015151157 A1 WO 2015151157A1 JP 2014059445 W JP2014059445 W JP 2014059445W WO 2015151157 A1 WO2015151157 A1 WO 2015151157A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- intention
- understanding
- result
- unit
- intent
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 44
- 238000004364 calculation method Methods 0.000 claims abstract description 52
- 238000012937 correction Methods 0.000 claims abstract description 34
- 238000004458 analytical method Methods 0.000 claims description 36
- 238000012545 processing Methods 0.000 claims description 20
- 238000012790 confirmation Methods 0.000 claims description 9
- 230000000877 morphologic effect Effects 0.000 description 18
- 230000008569 process Effects 0.000 description 16
- 238000012217 deletion Methods 0.000 description 11
- 230000037430 deletion Effects 0.000 description 11
- 238000010586 diagram Methods 0.000 description 11
- 101000710013 Homo sapiens Reversion-inducing cysteine-rich protein with Kazal motifs Proteins 0.000 description 8
- 230000006870 function Effects 0.000 description 7
- 101000911772 Homo sapiens Hsc70-interacting protein Proteins 0.000 description 6
- 101001139126 Homo sapiens Krueppel-like factor 6 Proteins 0.000 description 5
- 102100029860 Suppressor of tumorigenicity 20 protein Human genes 0.000 description 5
- 108090000237 interleukin-24 Proteins 0.000 description 4
- 230000008859 change Effects 0.000 description 3
- 230000014509 gene expression Effects 0.000 description 3
- 230000004044 response Effects 0.000 description 3
- 101000585359 Homo sapiens Suppressor of tumorigenicity 20 protein Proteins 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 239000002245 particle Substances 0.000 description 2
- 230000005236 sound signal Effects 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- 101000661807 Homo sapiens Suppressor of tumorigenicity 14 protein Proteins 0.000 description 1
- 238000007476 Maximum Likelihood Methods 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C21/00—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
- G01C21/26—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network
- G01C21/34—Route searching; Route guidance
- G01C21/36—Input/output arrangements for on-board computers
- G01C21/3605—Destination input or retrieval
- G01C21/3608—Destination input or retrieval using speech input, e.g. using speech recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/268—Morphological analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1815—Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Definitions
- the present invention relates to an intention understanding device and method for estimating a user's intention from a speech recognition result.
- Patent Document 1 a technique for correctly estimating the user's intention from the user's utterance is disclosed in Patent Document 1, for example.
- the speech processing apparatus of Patent Document 1 has a linguistic dictionary database and a grammar database for each of a plurality of intention information indicating each of a plurality of intentions, and further uses information of commands executed so far as a pre-score. Hold.
- the speech processing apparatus calculates an acoustic score, a language score, and a pre-score for each of a plurality of intention information as scores indicating the degree of fitness of the speech signal input based on the user's utterance with respect to the intention information.
- the intention information that maximizes the total score obtained by combining the items is selected. Furthermore, it is disclosed that the speech processing apparatus executes selected intention information based on the total score, executes it after confirmation with the user, or rejects it.
- Patent Document 1 the intention to be defined is an intention determined uniquely such as “Tell me the weather” or “Tell me the time”.
- processing that is intended to include the names of the various facilities that are required.
- the speech processing apparatus since the speech recognition result is designed for each intention, selection from a plurality of different intentions and determination of execution and rejection of the finally selected intention information are performed. Just do not handle the next candidate for speech recognition results. For example, when a user utters an utterance such as “Do not listen to music” while listening to music, the intention of the first candidate “I want to listen to music” and the second candidate “Do not listen to music” is the result. As a result, the first candidate “I want to listen to music” is selected.
- the present invention has been made to solve the above-described problems, and an object thereof is to provide an intent understanding device and method for correctly understanding a user's intention using input speech.
- An intent understanding device includes a speech recognition unit that recognizes a single speech uttered by a user in a natural language and generates a plurality of speech recognition results, and a morpheme analysis unit that converts each speech recognition result into a morpheme sequence.
- An intention understanding unit that estimates the intention of the user's utterance based on the morpheme sequence, and outputs one or more intention understanding result candidates and a score indicating the likelihood level from one morpheme sequence;
- a weight calculation unit that calculates weights, calculates a final score by correcting the score of the intention understanding result candidate using the weight, and selects an intention understanding result from the intention understanding result candidates based on the final score And a correction unit.
- An intent understanding method includes a speech recognition step for recognizing a single speech uttered by a user in a natural language and generating a plurality of speech recognition results, and a morpheme analysis step for converting each speech recognition result into a morpheme sequence.
- An intention understanding step for estimating the intention of the user's utterance based on the morpheme sequence, outputting one or more intention understanding result candidates and a score indicating the likelihood level from one morpheme sequence, and for each intention understanding result candidate
- a weight calculation step for calculating a weight, and calculating a final score by correcting the score of the intent understanding result candidate using the weight, and selecting the intent understanding result from the intent understanding result candidates based on the final score A correction step.
- a plurality of speech recognition results are generated from a single speech, an intention understanding result candidate is generated from each of the speech recognition results, and the score of the intention understanding result candidate is corrected using a weight to obtain a final score. Since the intention understanding result is selected from a plurality of intention understanding result candidates based on the final score, not only the first candidate of the speech recognition result for the input speech but also the next candidate is included. The result of understanding the intention. Therefore, it is possible to provide an intention understanding device that can correctly understand the user's intention.
- a plurality of speech recognition results are generated from a single speech, an intention understanding result candidate is generated from each of the speech recognition results, and the score of the intention understanding result candidate is corrected using a weight to obtain a final score. Since the intention understanding result is selected from a plurality of intention understanding result candidates based on the final score, not only the first candidate of the speech recognition result for the input speech but also the next candidate is included. The result of understanding the intention. Therefore, it is possible to provide an intention understanding method that can correctly understand the user's intention.
- FIG. 3 (a) is an example of setting information
- FIG.3 (b) is an example of a dialog.
- FIG. 4A is a diagram showing an output result of each unit of the intent understanding device according to Embodiment 1
- FIG. 4A is an example of a speech recognition result
- FIGS. 4A to 4D are first speech recognition result results; This is an example of an intention understanding result candidate for the third place.
- FIG. 6 is a table that defines a correspondence relationship between constraint conditions and standby weights used by the weight calculation unit of the intent understanding device according to Embodiment 1; 4 is a flowchart showing an operation of the intent understanding device according to the first embodiment. It is a block diagram which shows the structure of the intention understanding apparatus which concerns on Embodiment 2 of this invention. It is a figure explaining operation
- FIG. 9A is a diagram showing an output result of each unit of the intent understanding device according to the second embodiment.
- FIG. 9A is an example of a speech recognition result
- FIGS. 9B to 9D are speech recognition result No. 1 to FIG. This is an example of an intention understanding result candidate for the third place.
- FIG. 11 is a list of intentions of each node in the hierarchical tree of FIG. 10. It is a figure which shows the example of the standby weight which the weight calculation part of the intent understanding apparatus which concerns on Embodiment 2 computed.
- 6 is a flowchart showing the operation of the intent understanding device according to Embodiment 2. It is a flowchart which shows the specific operation
- FIG. 11 is a list of intentions of each node in the hierarchical tree of FIG. 10. It is a figure which shows the example of the standby weight which the weight calculation part of the intent understanding apparatus which concerns on Embodiment 2 computed.
- 6 is a flowchart showing the operation of the intent understanding device according to Embodiment 2. It is a flowchart which shows the specific operation
- FIG. 18 is a diagram illustrating an output result of each unit of the intent understanding device according to Embodiment 3, where FIG. 18 (a) is an example of a speech recognition result, and FIGS. 18 (b) to 18 (d) are speech recognition result No. 1 to FIG. This is an example of an intention understanding result candidate for the third place.
- 10 is a flowchart showing the operation of the intent understanding device according to Embodiment 3. It is a flowchart which shows the specific operation
- Embodiment 1 FIG.
- the intent understanding device 1 includes a speech recognition unit 3 that recognizes an input speech 2 uttered by a user and converts it into text, and the speech recognition unit 3 performs speech.
- a speech recognition dictionary 4 used for recognition, a morpheme analysis unit 5 that decomposes a speech recognition result into morphemes, a morpheme analysis dictionary 6 that the morpheme analysis unit 5 uses for morpheme analysis, and a candidate of an intention understanding result are generated from the morpheme analysis results
- intention understanding model 8 used by intent understanding unit 7 to estimate the user's intention setting information storage unit 10 that stores setting information 9 of the control target device, and settings in setting information storage unit 10
- a weight calculation unit 11 that calculates weights using the information 9, and an intention understanding correction unit 12 that corrects the intention understanding result candidates using the weights, and selects and outputs the final intention understanding result 13 from the candidates.
- the intent understanding device 1 is configured by a CPU (Central Processing Unit) (not shown), and when the CPU executes a program stored in an internal memory, the speech recognition unit 3, the morpheme analysis unit 5, the intent understanding unit 7. The functions as the weight calculation unit 11 and the intention understanding correction unit 12 are realized.
- the speech recognition dictionary 4, the morphological analysis dictionary 6, the intention understanding model 8, and the setting information storage unit 10 are configured by an HDD (Hard Disk Drive), a DVD (Digital Versatile Disc), a memory, and the like.
- FIG. 2 is a block diagram showing a configuration of the navigation device 100 in which the intention understanding device 1 is incorporated as a voice interface.
- This navigation device 100 is an object to be controlled by voice.
- the voice input unit 101 is composed of a microphone or the like, converts voice spoken by the user into a signal, and outputs the signal as input voice 2 to the intention understanding device 1.
- the navigation control unit 102 includes a CPU and the like, and executes functions such as searching for and guiding a route from the current location to the destination.
- Setting information 9 such as a destination is output from the navigation control unit 102 to the intent understanding device 1.
- the navigation control unit 102 receives the intention understanding result 13 from the intention understanding device 1 and executes an operation indicated by the intention understanding result 13, or outputs a voice signal related to the intention understanding result 13 to the voice output unit 103.
- the audio output unit 103 includes a speaker or the like, and reproduces the audio signal input from the navigation control unit 102. Note that the intention understanding device 1 and the navigation control unit 102 may be configured using separate CPUs, or may be configured using a single CPU.
- the intention understanding unit 7 As an intention understanding method performed by the intention understanding unit 7, for example, a method such as a maximum entropy method can be used.
- the intention understanding unit 7 extracts the feature “destination, setting” from the morphological analysis result of the input speech 2 that “desired to set a destination”, and uses the statistical method to select which intention from the intention understanding model 8. Estimate how likely is.
- the intention understanding unit 7 outputs a list of a set of an intention and a score representing the likelihood of the intention as a candidate of the intention understanding result.
- the intention understanding unit 7 executes an intention understanding method using the maximum entropy method.
- FIG. 3A shows an example of setting information 9 in the first embodiment
- FIG. 3B shows an example of dialogue
- the control object by voice is the navigation device 100
- the setting information 9 displays whether or not the destination and waypoints are set, if set, the name of the destination or waypoint, and other information. Contains information such as the type of map that is being used.
- the setting information storage unit 10 of the intention understanding device 1 stores setting information 9 output from the navigation control unit 102 of the navigation device 100.
- the setting information 9 includes information of “Destination: ⁇ ” and “Intermediate point: OO”.
- FIG. 3B shows that the dialogue is proceeding in order from the top between the navigation device 100 and the user.
- “U:” at the beginning of each line represents the input voice 2 spoken by the user
- “S:” represents a response from the navigation device 100.
- FIG. 4 is an example of the output result of each part of the intent understanding device 1.
- FIG. 4A shows an example of a speech recognition result output by the speech recognition unit 3.
- the speech recognition result is a list of a set of a speech recognition result such as “XX is a destination” and a likelihood representing the likelihood of the speech recognition result, which are arranged in descending order of likelihood.
- FIG. 4B shows the intention understanding result candidate, score, standby weight, and final score for the first speech recognition result “XX is the destination” of the speech recognition result of FIG. c) is for the second speech recognition result “Don't go”, and FIG. 4D is for the third speech recognition result “Find”.
- the weight calculation unit 11 calculates a standby weight for each intention understanding result candidate output by the intention understanding unit 7.
- the intention understanding correction unit 12 calculates a final score using the standby weight for each intention understanding result candidate output by the intention understanding unit 7.
- FIG. 5 is a table that defines the correspondence between the constraint condition and the standby weight.
- the weight calculation unit 11 holds standby weight information defined in advance from the possibility of occurrence of an intention as described above, and selects a standby weight corresponding to the intention based on the setting information 9.
- the intention understanding correction unit 12 corrects the candidate of the intention understanding result of the intention understanding unit 7 using the following formula (1). Specifically, the intention understanding correction unit 12 calculates the score by multiplying the likelihood of the speech recognition result obtained from the speech recognition unit 3 and the intention understanding score of the intention understanding result candidate obtained from the intention understanding unit 7. (Corresponding to the “score” shown in FIG. 4B, etc.), this score and the standby weight obtained from the weight calculator 11 are multiplied to obtain the final score (the “final score” shown in FIG. 4B, etc.). Corresponding).
- the intention comprehension correction using multiplication is performed as in Expression (1), but the present invention is not limited to this method.
- the intention understanding device 1 will be described with reference to the flowchart of FIG.
- the intent understanding device 1 is incorporated in the navigation device 100 that is a control target, and when the user presses an unillustrated dialogue start button, the dialogue is started.
- the setting information 9 shown in FIG. 3A is stored in the setting information storage unit 10, the intention understanding process will be described in detail with respect to the contents of the dialogue in FIG.
- the navigation control unit 102 detects that the user has pressed the dialogue start button of the navigation device 100
- the navigation output unit 103 outputs a response “Please speak when you hear a beep” from the voice output unit 103, followed by a beep sound. To sound.
- the intent understanding device 1 puts the voice recognition unit 3 into a recognizable state and waits for the user to speak.
- the voice input unit 101 converts the utterance into voice data, and the voice recognition unit 3 of the intention understanding device 1.
- the speech recognition unit 3 of the intention understanding device 1 converts the input speech 2 into text using the speech recognition dictionary 4, calculates the likelihood, and outputs the likelihood to the morpheme analysis unit 5 (step ST11).
- the morphological analysis unit 5 performs morphological analysis on the speech recognition result using the morphological analysis dictionary 6, and outputs the result to the intent understanding unit 7 (step ST12).
- the morphological analysis result of the speech recognition result “XX is a destination” is “XX / noun, ha / particle, destination / noun, de / particle”.
- the intention understanding unit 7 estimates the intention from the morphological analysis result using the intention understanding model 8 and calculates a score, and outputs the score to the intention understanding correction unit 12 as an intention understanding result candidate (step ST13).
- the intention understanding unit 7 extracts a feature used for the purpose understanding from the morphological analysis result, and compares the feature with the intention understanding model 8 to estimate the intention.
- a list of features “XX, destination” is extracted from the morphological analysis result of the speech recognition result “XX is a destination” in FIG. 4A, and the intention understanding result candidate “route point” in FIG.
- the weight calculation unit 11 reads the setting information 9 from the setting information storage unit 10, selects a standby weight for each intention based on the setting information 9 and a table as shown in FIG. It outputs to the part 12 (step ST14).
- the intent understanding device 1 repeats the processing of steps ST12 to ST15 for the second speech recognition result “Don't go”, and as a result, the intent understanding result candidate 1 in FIG.
- the intention understanding device 1 receives the input voice 2 of the utterance through the voice input unit 101 and determines that the voice recognition and the intention are correctly understood.
- the intention understanding device 1 performs voice recognition and intention understanding on the input voice 2 of “Yes”, and outputs an intention understanding result 13 to the navigation control unit 102.
- the navigation control unit 102 performs an operation of deleting the waypoint “XXX” according to the intention understanding result 13.
- the intention understanding device 1 recognizes one input speech 2 uttered by a user in a natural language and generates a plurality of speech recognition results, and a speech recognition result.
- a morpheme analysis unit 5 that converts each into a morpheme sequence; an intention understanding unit 7 that estimates a user's utterance intention based on the morpheme sequence and outputs one or more intention understanding result candidates and scores from one morpheme sequence;
- a weight calculation unit 11 that calculates a standby weight for each intention understanding result candidate, a score of the intention understanding result candidate is corrected using the standby weight, and a final score is calculated. Based on the final score, the intention understanding result candidate is selected.
- An intention understanding correction unit 12 that selects the intention understanding result 13 is provided. Therefore, the final intention understanding result 13 can be selected from not only the first speech recognition result for the input speech 2 but also the second and subsequent speech recognition results. Therefore, it is possible to provide the intention understanding device 1 that can correctly understand the user's intention.
- the intention understanding unit 7 generates the intention understanding result candidates in order from the most likely one of the plurality of speech recognition results, and the intention understanding correcting unit 12 7 is configured to calculate a final score each time an intention understanding result candidate is generated, and select an intention understanding result candidate that satisfies the preset condition X as the intention understanding result 13. For this reason, the calculation amount of the intent understanding device 1 can be suppressed.
- the weight calculation unit 11 uses the setting information 9 of the control target device (for example, the navigation device 100) that operates based on the intention understanding result 13 selected by the intention understanding correction unit 12. , Configured to calculate the standby weight. Specifically, whether the weight calculation unit 11 has a table as shown in FIG. 5 in which the constraint condition and the standby weight when the constraint condition is satisfied, and satisfies the constraint condition based on the setting information 9 The standby weight is selected by judging whether or not. For this reason, it is possible to estimate an appropriate intention according to the situation of the control target device.
- FIG. FIG. 7 is a block diagram illustrating a configuration of the intent understanding device 20 according to the second embodiment.
- the intention understanding device 20 includes a hierarchical tree 21 that expresses intentions hierarchically, and a weight calculator 22 that calculates standby weights based on activated intentions among the intentions of the hierarchical tree 21.
- FIG. 8 shows an example of dialogue in the second embodiment. Similar to FIG. 3B, “U:” at the beginning of a line represents a user utterance, and “S:” represents a response from a device to be controlled (for example, the navigation device 100 shown in FIG. 2).
- FIG. 9 is an example of the output result of each part of the intent understanding device 1.
- FIG. 9A shows a speech recognition result output by the speech recognition unit 3 and its likelihood.
- FIGS. 9B to 9D show intention understanding result candidates output by the intention understanding unit 7 and their scores, standby weights output by the weight calculation unit 22, and final scores output by the intention understanding correction unit 12. It is.
- the intention understanding result candidate for the first speech recognition result “Don't do XX” in FIG. 9A is the intention of the second speech recognition result “via XX” in FIG. 9B.
- An understanding result candidate is shown in FIG. 9C, and an intention understanding result candidate of the third-ranked speech recognition result “with XX as the destination” is shown in FIG. 9D.
- the hierarchical tree 21 has a hierarchical structure with nodes representing intentions, and becomes a node representing abstract intentions as it goes to the root (upper hierarchy), and leaves (lower hierarchy) The more you go, the more specific your node becomes.
- the node # 16 representing the intention in which a specific slot value (for example, OO store) is buried is positioned below.
- the intention “navigation” of the node # 1 positioned in the first hierarchy is an abstract node representing a set of navigation functions of the navigation control unit 102, and represents an individual navigation function in the second hierarchy below it. Nodes # 2 to # 5 are positioned.
- the intention “destination setting []” of the node # 4 represents a state in which the user wants to set the destination but has not yet decided on a specific place. When the destination is set, the node # 4 changes to the node # 9 or the node # 16.
- FIG. 10 shows a state in which the node # 4 is activated in accordance with the user utterance “set destination” shown in FIG.
- the hierarchical tree 21 activates the intention node according to the information output from the navigation device 100.
- FIG. 12 is an example of the standby weight calculated by the weight calculation unit 22. Since the intention “destination setting []” of the node # 4 of the hierarchical tree 21 is activated by the user utterance “setting the destination”, the standby weights of the intentions of the nodes # 9 and # 10 in the branches and leaves direction of the node # 4 Is 1.0, and the standby weight of the other intention nodes is 0.5. A method of calculating the standby weight by the weight calculator 22 will be described later.
- FIG. 13 is a flowchart showing the operation of the intent understanding device 20. Steps ST11 to ST13, ST15, and ST16 in FIG. 13 are the same as the processes in steps ST11 to ST13, ST15, and ST16 in FIG.
- the weight calculation unit 22 refers to the hierarchical tree 21, calculates the standby weight of the intention understanding result candidate of the intention understanding unit 7, and outputs it to the intention understanding correction unit 12.
- FIG. 14 is a flowchart showing a specific operation of step ST20 of FIG.
- the weight calculation unit 22 compares the intention understanding result candidate of the intention understanding unit 7 with the activated intention of the hierarchical tree 21.
- the weight calculation unit 22 sets the standby weight to the first weight a.
- the basic operation of the intent understanding device 20 is the same as that of the intent understanding device 1 of the first embodiment.
- the difference between the second embodiment and the first embodiment is a method for calculating the standby weight.
- the intention understanding device 20 is incorporated in the navigation device 100 (shown in FIG. 2) that is the control target.
- a dialogue is started when a user presses a speech start button (not shown). Since the navigation device 100 has not acquired any information from the user at the time of the first user utterance “set destination” in FIG. 8, the activated intention node is present in the hierarchical tree 21 of the intention understanding device 20. There is no state. The hierarchical tree 21 activates the intention node based on the intention understanding result 13 output by the intention understanding correction unit 12.
- the input voice 2 of the utterance is input to the intention understanding device 20.
- the input speech 2 is recognized by the speech recognition unit 3 (step ST11), decomposed into morphemes by the morpheme analysis unit 5 (step ST12), and the intention understanding result candidate is calculated by the intention understanding unit 7 (step ST13).
- the intention understanding correction unit 12 obtains the intention understanding result 13 of “destination setting []”.
- the navigation control unit 102 instructs the voice output unit 103 to specify the facility to be set as the destination, and outputs a voice message “Set the destination. Further, the hierarchical tree 21 activates the node # 4 corresponding to “destination setting []” of the intention understanding result 13.
- the navigation device 100 responds to prompt the next utterance, so that the dialogue with the user continues and the user utters “with XX as the destination” as shown in FIG.
- the intent understanding device 20 performs the processing of steps ST11 and ST12 for the user utterance “with XX as the destination”.
- the speech recognition results “Don't go to XX”, “Via via XX”, and “Destination from XX” are obtained as the morphological analysis results of FIG.
- the intention understanding unit 7 estimates the intention from the morphological analysis result (step ST13).
- the weight calculation unit 22 calculates the standby weight with reference to the hierarchical tree 21 (step ST20).
- the node # 4 of the hierarchical tree 21 is in the activated state, and the weight calculation unit 22 calculates the weight according to this state.
- the intention understanding correction unit 12 calculates the likelihood of the speech recognition result calculated by the speech recognition unit 3, the score of the intention understanding result candidate calculated by the intention understanding unit 7, and the standby weight calculated by the weight calculation unit 22. Is used to calculate the final score of the intent understanding result candidate from the above equation (1) (step ST15). The final score is as shown in FIG.
- the intention understanding correction unit 12 determines whether or not the final score satisfies the condition X or more, as in the first embodiment (step ST16).
- X 0.5
- the intention understanding device 20 repeats the processes of steps ST12 to ST14, ST20, and ST15 with respect to “via XX” in the second place of the speech recognition result.
- the intention understanding device 20 repeats the processes of steps ST12, ST13, ST20, and ST15 for “with XX as the destination” in the third place of the speech recognition result, and as a result, as shown in FIG.
- the hierarchical tree 21 activates the node # 16 based on the intention understanding result 13.
- the weight calculation unit 22 performs weighting so that the intention understanding result candidate corresponding to the intention expected from the flow of dialogue with the user can be easily selected by the intention understanding correction unit 12. I made it. For this reason, it is possible to estimate an appropriate intention according to the state of interaction between the user and the control target device.
- the intention understanding device 20 includes the hierarchical tree 21 expressed in a tree structure that becomes more abstract as the user's intention goes to the root and becomes more specific as the user goes to the leaf.
- the weight calculation unit 22 is configured to perform weighting so that an intention understanding result candidate located in the branch and leaf direction is easily selected from the intention corresponding to the intention understanding result 13 selected immediately before.
- the control target device can be operated based on the appropriate speech recognition result and the intention understanding result.
- FIG. 15 is a block diagram illustrating a configuration of the intent understanding device 30 according to the third embodiment.
- the intention understanding device 30 includes a keyword table 31 that stores keywords corresponding to intentions, a keyword search unit 32 that searches the keyword table 31 for intentions corresponding to morphological analysis results, and a hierarchy tree that represents the intentions corresponding to keywords. And a weight calculator 33 for calculating a standby weight in comparison with the activated intention of 21.
- FIG. 16 is an example of the keyword table 31.
- the keyword table 31 stores a set of intentions and keywords. For example, keywords that are characteristic expressions of intentions, such as “destination”, “go”, and “destination”, are assigned to the intention “destination setting []”.
- the keyword is assigned to the intention of each node below the second hierarchy excluding the node # 1 of the first hierarchy of the hierarchy tree 21.
- an intention corresponding to a keyword is referred to as a keyword corresponding intention.
- An intention corresponding to the activated intention node of the hierarchical tree 21 is called a hierarchical tree corresponding intention.
- FIG. 17 is an example of the speech recognition result output by the speech recognition unit 3, the keywords included in the speech recognition result, and the keyword correspondence intention searched by the keyword search unit 32.
- the keyword correspondence intention corresponding to the keyword “do not go” of the speech recognition result “do not go ⁇ ” is “via”, and the keyword “via” of the voice recognition result “go through ⁇ ”
- the corresponding keyword correspondence intention is “route setting []”
- the keyword correspondence intention corresponding to the keyword “destination” of the speech recognition result “with XX as destination” is “destination setting []”.
- FIG. 18A shows an example of a speech recognition result output by the speech recognition unit 3 and its likelihood.
- 18 (b) to 18 (d) show intention understanding result candidates output by the intention understanding unit 7 and their scores, standby weights output by the weight calculation unit 33, and final scores output by the intention understanding correction unit 12. is there.
- the intention understanding result candidate of the first speech recognition result “Don't do XX” in FIG. 18A is the intention of the second speech recognition result “via XX” in FIG. 18B.
- An understanding result candidate is shown in FIG. 18C, and an intention understanding result candidate of the third-ranked speech recognition result “with XX as the destination” is shown in FIG. 18D.
- FIG. 19 is a flowchart showing the operation of the intent understanding device 30. Steps ST11 to ST13, ST15, and ST16 in FIG. 19 are the same as the processes in steps ST11 to ST13, ST15, and ST16 in FIG.
- the keyword search unit 32 searches the keyword table 31 for a keyword corresponding to the morphological analysis result, and acquires a keyword correspondence intention associated with the searched keyword.
- the keyword search unit 32 outputs the acquired keyword correspondence intention to the weight calculation unit 33.
- FIG. 20 is a flowchart showing a specific operation of step ST31 of FIG.
- the weight calculation unit 33 compares the intention understanding result candidate of the intention understanding unit 7, the activated hierarchical tree corresponding intention of the hierarchical tree 21, and the keyword corresponding intention of the keyword searching unit 32. If the intention understanding result candidate does not match either the keyword correspondence intention or the hierarchical tree correspondence intention (step ST32 “NO”), the weight calculation unit 33 sets the standby weight to the third weight c. When the intention understanding result candidate matches the hierarchical tree correspondence intention (step ST32 “YES” and step ST34 “YES”), the weight calculation unit 33 sets the standby weight to the fourth weight d (step ST35).
- step ST34 “YES” the intention understanding result candidate may match both the hierarchical tree corresponding intention and the keyword corresponding intention.
- the weight calculation unit 33 sets the standby weight to the fifth weight e (step ST36).
- the standby weight is 1.0, and if the intention understanding result candidate does not match the hierarchical tree correspondence intention and does match the keyword correspondence intention, it becomes 0.5. If it does not match the keyword correspondence intention, 0.0 is set.
- the basic operation of the intent understanding device 30 is the same as that of the intent understanding devices 1 and 20 of the first and second embodiments.
- the difference between the third embodiment and the first and second embodiments is the standby weight calculation method.
- the input speech 2 of the user utterance “with XX as the destination” is recognized by the speech recognition unit 3 (step ST11), decomposed into morphemes by the morpheme analysis unit 5 (step ST12), and the intention understanding result is obtained by the intention understanding unit 7.
- the keyword search unit 32 searches the keyword table 31 for a keyword corresponding to the morpheme analysis result of the morpheme analysis unit 5, and acquires a keyword correspondence intention corresponding to the searched keyword. Since the keyword “do not go” in FIG. 16 exists in the morphological analysis result “do not go ⁇ ”, the keyword correspondence intent is “delay via []”.
- the weight calculator 33 calculates a standby weight (step ST31).
- the node # 4 of the hierarchical tree 21 is in the activated state, and the hierarchical tree correspondence intention of the node # 4 is “destination setting []”.
- the hierarchical tree 21 outputs the activated hierarchical tree-corresponding intention “destination setting []” of the node # 4 to the weight calculation unit 33.
- the keyword search unit 32 outputs the keyword correspondence intention “delay via []” to the weight calculation unit 33.
- the intention understanding device 30 searches for a keyword that matches the morpheme string from the keyword table 31 in which the correspondence between the intention and the keyword is defined, and corresponds to the searched keyword.
- the keyword search unit 32 for acquiring the keyword correspondence intention is provided, and the weight calculation unit 33 is configured to calculate the standby weight using the hierarchical tree correspondence intention and the keyword correspondence intention. For this reason, it is possible to correct intentions for user utterances using a hierarchy of intentions and keywords that are characteristic expressions of intentions, and to operate controlled devices based on appropriate speech recognition results and intention understanding results. It becomes possible.
- Embodiments 1 to 3 the example of Japanese has been described. However, by changing the feature extraction method for intent understanding for each language, various languages such as English, German, and Chinese can be used. It is possible to apply to.
- the natural language text of the input speech 2 can be “ It is also possible to directly execute the intention understanding process after extracting slot values such as “$ facility $” and “$ address $”.
- the text of the speech recognition result is analyzed by the morphological analysis unit 5 to prepare the intention understanding process.
- the speech recognition result itself may be a morpheme.
- the analysis result may be included.
- the morphological analysis unit 5 and the morphological analysis dictionary 6 may be omitted, and the intention understanding process may be directly executed after the speech recognition process.
- Embodiments 1 to 3 described above the example of assuming the learning model based on the maximum entropy method has been described as the method of understanding the intention. However, the method of understanding the intention is not limited.
- the weight calculation unit 33 is configured to calculate the standby weight using the hierarchical tree correspondence intention and the keyword correspondence intention.
- the morphological analysis result is obtained without using the hierarchical tree 21.
- the intention understanding process is performed in order from the highest likelihood of the plurality of speech recognition results, and there are intention understanding result candidates that satisfy the condition that the final score is X or more.
- a method of performing the intention understanding processing on all the speech recognition results and selecting the intention understanding result 13 is also possible.
- Embodiments 1 to 3 above it is confirmed that the user can execute the operation corresponding to the intention understanding result 13 (for example, “route point XX in FIG. 3B). It is also possible to change whether or not to confirm according to the final score of the intent understanding result 13. In addition, for example, confirmation is not performed when the intention understanding result candidate of the first speech recognition result is selected as the intention understanding result 13, but is confirmed when the intention understanding result candidate of the second or higher position is selected as the intention understanding result 13. It is also possible to change whether to check or not according to the ranking.
- the intention understanding result candidate having the highest score before correction by the standby weight is selected as the intention understanding result 13
- the intention understanding result candidate having a lower score is selected as the intention understanding result 13
- FIG. 21 shows a modification of the intent understanding device 40.
- the intention understanding device 40 converts a voice spoken by the user into a signal and obtains it as an input voice, and the intention understanding correction unit 12 most likely the intention understanding result candidate (that is, before correcting with the standby weight). Intent understanding result candidates with a large score are excluded, and when other intention understanding result candidates are selected as the intention understanding result 13, the user confirms whether or not to adopt the intention understanding result 13 and decides whether or not to adopt it.
- An intention confirmation processing unit 42, and an audio output unit 43 that outputs an intention understanding result confirmation audio signal generated by the intention confirmation processing unit 42.
- voice input unit 41 intention confirmation processing unit 42, and voice output unit 43 play the same role as the voice input unit 101, navigation control unit 102, and voice output unit 103 shown in FIG. )
- voice output such as “Are you sure?
- the confirmation method to the user may be a screen display in addition to voice output.
- the intended hierarchy is expressed by the tree structure of the hierarchical tree 21.
- the tree structure is not necessarily a complete tree structure, and the graph structure does not necessarily include a loop structure. Can be processed.
- Embodiments 2 and 3 described above only the current user utterance is used for intent understanding processing. However, in the case of an utterance in the middle of the hierarchy transition of the hierarchical tree 21, the user utterance before this time is included.
- the intention understanding process may be performed using features extracted from a plurality of utterances. Thereby, it is possible to estimate an intention that is difficult to estimate from partial information obtained by a plurality of partial utterances.
- the navigation device 100 in FIG. 2 is taken as an example of the control target device of the intent understanding device.
- the present invention is not limited to the navigation device.
- the intention understanding device is built in the control target device, but may be externally attached.
- the present invention can be freely combined with each embodiment, modified any component of each embodiment, or omitted any component in each embodiment. Is possible.
- the intention understanding device uses the input voice to estimate the user's intention, and is therefore suitable for use in a voice interface such as a car navigation device that is difficult to operate manually. .
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Acoustics & Sound (AREA)
- Human Computer Interaction (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Navigation (AREA)
- User Interface Of Digital Computer (AREA)
- Machine Translation (AREA)
Abstract
Description
この方法は、ユーザが手動で操作する場合と比べると、音声の発話によって直接操作ができるため、ショートカット機能として有効に働く。一方で、ユーザは操作を実行するために、装置が待ち受けている言葉を発話する必要があり、装置が扱う機能が増えていくとユーザが覚えておかなくてはならない言葉が増えていく。また一般には、取り扱い説明書を十分に理解した上で装置を使用するユーザは少なく、理解していないユーザは操作のために何をどう言えばいいのかが分からないので、実際には覚えている機能のコマンド以外、音声で操作できないという問題があった。
例えば、ユーザが音楽を聴いている状態で「音楽を聴かない」といった発話をした場合に、「音楽を聴きたい」という第1候補と、「音楽を聴かない」という第2候補の意図が結果として求まったときには、第1候補の「音楽を聴きたい」が選択される。
さらに、ナビゲーション装置の目的地に既に「○○センター」が設定されている状態で、ユーザが経由地を追加するために「○×センターに立ち寄る」といった発話をした結果、「○○センターに立ち寄る」が第1候補、「○×センターに立ち寄る」が第2候補の意図であった場合にも、第1候補の「○○センターに立ち寄る」が選択される。
実施の形態1.
図1に示すように、本発明の実施の形態1に係る意図理解装置1は、ユーザが発話した入力音声2を音声認識してテキストに変換する音声認識部3と、音声認識部3が音声認識に用いる音声認識辞書4と、音声認識結果を形態素に分解する形態素解析部5と、形態素解析部5が形態素解析に用いる形態素解析辞書6と、形態素解析結果から意図理解結果の候補を生成する意図理解部7と、意図理解部7がユーザの意図を推定するために用いる意図理解モデル8と、制御対象機器の設定情報9を記憶する設定情報記憶部10と、設定情報記憶部10の設定情報9を用いて重みを計算する重み計算部11と、重みを用いて意図理解結果の候補を補正しその中から最終的な意図理解結果13を選択して出力する意図理解補正部12とを備えている。
音声認識辞書4、形態素解析辞書6、意図理解モデル8、および設定情報記憶部10は、HDD(Hard Disk Drive)、DVD(Digital Versatile Disc)、メモリ等によって構成されている。
なお、意図理解装置1とナビ制御部102を、別々のCPUを用いて構成してもよいし、1個のCPUを用いて構成してもよい。
以下では、意図理解部7が最大エントロピ法を利用した意図理解方法を実行するものとして説明する。
音声による制御対象がナビゲーション装置100の場合、設定情報9には、目的地および経由地の設定の有無、設定されている場合にはその目的地または経由地の名前、その外にも、表示している地図のタイプなどの情報が含まれる。意図理解装置1の設定情報記憶部10は、ナビゲーション装置100のナビ制御部102が出力した設定情報9を記憶する。図3(a)の例では、設定情報9に、「目的地:△△」と「経由地:○○」の情報が含まれている。
図4(a)は、音声認識部3が出力する音声認識結果の例を示す。音声認識結果は、「○○は行き先で」といった音声認識結果と、その音声認識結果の尤もらしさを表す尤度との組のリストであり、尤度が高い順に並んでいる。
例えば、ナビゲーション装置100の目的地が既に「△△」に設定されている場合、ユーザが次の発話でもう一回「目的地を△△に設定する」という意図の発話をする可能性は低いと考えられる。よって、この制約条件に対して、意図『目的地設定[施設=$施設$(=△△)]』の待ち受け重みが「0.0」に設定されている。一方、ユーザが目的地を「?」(△△以外の場所)に変更する可能性があるので、意図『目的地設定[施設=施設=$施設$(=?)]』の待ち受け重みは「1.0」に設定されている。また、ユーザが目的地と同じ「○○」を経由地に設定する意図の発話をする可能性は低いので、意図『経由地設定[施設=$施設$(=○○)]』の待ち受け重みは「0.0」に設定されている。さらに、既に設定済みの経由地「○○」をユーザが削除する場合があるため、意図『経由地削除[施設=$施設$(=○○)]』の待ち受け重みは「1.0」に設定されている。
重み計算部11は、上記のように意図の発生可能性から事前に定義された待ち受け重みの情報を保持しており、設定情報9に基づいて意図に対応する待ち受け重みを選択する。
(スコア)×(待ち受け重み)=(最終スコア)
・・・(1)
ここで、意図理解装置1は、制御対象であるナビゲーション装置100に組み込まれており、明示しない対話開始ボタンをユーザが押下すると、対話が開始されるものとする。また、設定情報記憶部10には、図3(a)に示した設定情報9が記憶されているものとして、図3(b)の対話内容について意図理解過程の詳しい説明をする。
図7は、実施の形態2に係る意図理解装置20の構成を示すブロック図である。図7において図1と同一または相当の部分については同一の符号を付し説明を省略する。この意図理解装置20は、意図を階層的に表現した階層木21と、階層木21の意図のうちの活性化している意図に基づいて待ち受け重みを計算する重み計算部22とを備える。
第1階層に位置付けられているノード#1の意図『ナビ』は、ナビ制御部102のナビゲーション機能のまとまりを表す抽象的なノードであり、その下の第2階層に、個別のナビゲーション機能を表すノード#2~#5が位置付けられている。例えばノード#4の意図『目的地設定[]』は、ユーザが目的地を設定したいが具体的な場所については決まっていない状態を表している。目的地が設定された状態になると、ノード#4からノード#9またはノード#16へ遷移することとなる。図10の例では、図8に示した「行き先を設定する」というユーザの発話に従って、ノード#4が活性化している状態を示している。
階層木21は、ナビゲーション装置100が出力した情報に応じて、意図ノードを活性化させる。
「行き先を設定する」というユーザ発話によって、階層木21のノード#4の意図『目的地設定[]』が活性化したため、ノード#4の枝葉方向のノード#9,#10の意図の待ち受け重みが1.0になり、他の意図ノードの待ち受け重みが0.5になっている。
重み計算部22による待ち受け重みの計算方法は後述する。
ステップST20では、重み計算部22が階層木21を参照して、意図理解部7の意図理解結果候補の待ち受け重みを計算し、意図理解補正部12に出力する。
意図理解装置20の動作の基本は、上記実施の形態1の意図理解装置1の動作と同じである。本実施の形態2と上記実施の形態1との違いは、待ち受け重みの計算方法である。
なお、階層木21は、意図理解補正部12が出力した意図理解結果13に基づいて意図ノードを活性化している。
まずステップST21で、活性化したノード#4の情報が階層木21から重み計算部22に渡されると共に、意図理解結果候補『経由地削除[施設=$施設$(=○○)]』と『目的地設定[施設=$施設$(=○○)]』が意図理解部7から重み計算部22に渡される。重み計算部22は、活性化したノード#4の意図と意図理解結果候補とを比較し、意図理解結果候補が活性化ノード#4の枝葉方向に位置する(つまり、ノード#9およびノード#10)の場合(ステップST22“YES”)、待ち受け重みを第1の重みaにする(ステップST23)。一方、意図理解結果候補が活性化ノード#4の枝葉方向以外に位置する場合(ステップST22“NO”)、重み計算部22は、待ち受け重みを第2の重みbにする(ステップST24)。
第1の重みaは、第2の重みbより大きい値とする。例えばa=1.0、b=0.5とした場合、待ち受け重みは図9(b)に示すとおりになる。
図15は、実施の形態3に係る意図理解装置30の構成を示すブロック図である。図15において図1および図5と同一または相当の部分については同一の符号を付し説明を省略する。この意図理解装置30は、意図と対応するキーワードを記憶しているキーワードテーブル31と、形態素解析結果に対応する意図をキーワードテーブル31から検索するキーワード検索部32と、キーワードに対応する意図を階層木21の活性化した意図と比較して待ち受け重みを計算する重み計算部33とを備える。
以下では、キーワードに対応する意図をキーワード対応意図と呼ぶ。また、階層木21の活性化した意図ノードに対応する意図を階層木対応意図と呼ぶ。
ステップST30では、キーワード検索部32が形態素解析結果に該当するキーワードをキーワードテーブル31の中から検索し、検索したキーワードに対応付けられたキーワード対応意図を取得する。キーワード検索部32は、取得したキーワード対応意図を重み計算部33へ出力する。
意図理解結果候補が階層木対応意図に一致する場合(ステップST32“YES”かつステップST34“YES”)、重み計算部33は待ち受け重みを第4の重みdにする(ステップST35)。なお、ステップST34“YES”では意図理解結果候補が階層木対応意図とキーワード対応意図の両方に一致している場合も有り得る。
意図理解結果候補が階層木対応意図に一致せずキーワード対応意図のみに一致する場合(ステップST34“NO”)、重み計算部33は待ち受け重みを第5の重みeにする(ステップST36)。
意図理解装置30の動作の基本は、上記実施の形態1,2の意図理解装置1,20の動作と同じである。本実施の形態3と上記実施の形態1,2との違いは、待ち受け重みの計算方法である。
また、階層木21は、図10および図11を援用する。
まずステップST32で、階層木21が重み計算部33に対して、活性化したノード#4の階層木対応意図『目的地設定[]』を出力する。また、意図理解部7が重み計算部33に対して、ユーザ発話「○○を行かないって」の意図理解結果候補1位『経由地削除[施設=$施設$(=○○)]』を出力する。さらに、キーワード検索部32が重み計算部33に対して、キーワード対応意図『経由地削除[]』を出力する。
ここでは、重み計算部33が階層木21の親子関係も含めて一致を判断しており、『経由地削除[施設=$施設$(=○○)]』は『経由地削除[]』の子供なので一致と判断される。
その結果、図18(c)のように、「○○を経由して」の意図理解結果候補1位『経由地削除[施設=$施設$(=○○)]』および2位『施設検索[施設=$施設$(=○○)]』はそれぞれ待ち受け重み「0.0」(=c)が設定され、最終スコアはそれぞれ「0.0」となり、ここでもX以上の条件を満足しない。
そのため、処理対象が3位の音声認識結果「○○を行き先にして」に移り、図18(d)のように、意図理解結果候補1位『目的地設定[施設=$施設$(=○○)]』の最終スコアがX以上の条件を満足するので意図理解結果13として出力される。よって、上記実施の形態2と同様に「○○」が目的地に設定される。
例えば「行かない」、「経由」といった意図を特定するのに重要な単語がユーザ発話に現われた場合、意図理解部7は通常はユーザ発話「○○へは行かない」に対して「○○、行かない」という素性を使って意図理解処理を行う。これに代えて、「○○、行かない、行かない」という風にキーワードテーブル31にあるキーワードを重ねることにより、意図理解部7が意図を推定する際に「行かない」の個数に応じて重み付けしたスコアを算出することが可能となる。
また例えば、音声認識結果1位の意図理解結果候補が意図理解結果13として選択された場合には確認せず、2位以降の意図理解結果候補が意図理解結果13として選択された場合には確認するなど、順位に応じて確認するかしないかを変更することも可能である。
また例えば、待ち受け重みで補正する前のスコアが最も高い意図理解結果候補が意図理解結果13として選択された場合には確認せず、それより低いスコアの意図理解結果候補が意図理解結果13として選択された場合には確認するなど、スコアの大小に応じて確認するかしないかを変更することも可能である。
なお、ユーザへの確認方法は音声出力の他、画面表示などでもよい。
上記実施の形態2の場合、最初のユーザ発話「行き先を設定する」からは「行き先、設定」が素性として抽出される。また、2番目の発話「○○」からは「$施設$(=○○)」が素性として抽出される。結果として、通常は2番目の発話では「$施設$(=○○)」だけを使用して意図理解処理が行われることになる(図13のステップST13)。
一方、階層遷移途中か否かを考慮した場合、最初の発話「行き先を設定する」が階層木21のノード#4であり、2番目の発話はノード#4と親子関係になる可能性が高いので、2番目の発話に対して「行き先、設定、$施設$(=○○)」の3素性を使用して意図理解処理を行うことで、より適切な意図理解結果が得られるようになる。
Claims (10)
- ユーザが自然言語で発話した一の音声を認識して複数の音声認識結果を生成する音声認識部と、
前記音声認識結果それぞれを形態素列に変換する形態素解析部と、
前記形態素列に基づいて前記ユーザの発話の意図を推定し、一の前記形態素列から一以上の意図理解結果候補と尤もらしさの度合いを表すスコアとを出力する意図理解部と、
前記意図理解結果候補ごとの重みを計算する重み計算部と、
前記重みを用いて前記意図理解結果候補の前記スコアを補正して最終スコアを算出し、当該最終スコアに基づいて前記意図理解結果候補の中から意図理解結果を選択する意図理解補正部とを備える意図理解装置。 - 前記意図理解部は、前記複数の音声認識結果のうちの尤もらしいものから順番に前記意図理解結果候補を生成していき、
前記意図理解補正部は、前記意図理解部が前記意図理解結果候補を生成するつど前記最終スコアを算出していき、当該最終スコアが予め設定された条件を最初に満足した前記意図理解結果候補を前記意図理解結果として選択することを特徴とする請求項1記載の意図理解装置。 - 前記重み計算部は、前記意図理解補正部が選択した前記意図理解結果に基づいて動作する制御対象機器の設定情報を用いて、前記重みを計算することを特徴とする請求項2記載の意図理解装置。
- 前記重み計算部は、制約条件と当該制約条件を満足した場合の前記重みとを定義した情報を有し、前記制御対象機器の設定情報に基づいて前記制約条件を満足するか否かを判断して前記重みを選択することを特徴とする請求項3記載の意図理解装置。
- 前記重み計算部は、前記ユーザとの対話の流れから期待される意図に該当する前記意図理解結果候補が前記意図理解補正部において選択されやすくなるよう重み付けすることを特徴とする請求項2記載の意図理解装置。
- 前記ユーザの意図を、根に行くほど抽象的な意図、葉に行くほど具体的な意図になる木構造で表現した階層木を備え、
前記重み計算部は、前記階層木に基づいて、直前に選択された前記意図理解結果に該当する意図から枝葉の方向に位置する前記意図理解結果候補が選択されやすくなるよう重み付けすることを特徴とする請求項5記載の意図理解装置。 - 前記意図理解部は、今回の発話から生成された前記形態素列に加えて、今回より前の発話から生成された前記形態素列も用いて前記ユーザの意図を推定することを特徴とする請求項6記載の意図理解装置。
- 意図とキーワードとの対応関係が定義されたキーワードテーブルの中から、前記形態素列に一致するキーワードを検索し、当該検索したキーワードに対応する前記意図を取得するキーワード検索部を備え、
前記重み計算部は、前記階層木と前記キーワード検索部が取得した前記意図とを用いて前記重みを計算することを特徴とする請求項6の意図理解装置。 - 前記意図理解補正部が最も尤もらしい前記意図理解結果候補を排除しそれ以外の前記意図理解結果候補を前記意図理解結果として選択した場合に、当該意図理解結果を採用するか否かを前記ユーザに確認して採用可否を決定する意図確認処理部を備えることを特徴とする請求項1記載の意図理解装置。
- ユーザが自然言語で発話した一の音声を認識して複数の音声認識結果を生成する音声認識ステップと、
前記音声認識結果それぞれを形態素列に変換する形態素解析ステップと、
前記形態素列に基づいて前記ユーザの発話の意図を推定し、一の前記形態素列から一以上の意図理解結果候補と尤もらしさの度合いを表すスコアとを出力する意図理解ステップと、
前記意図理解結果候補ごとの重みを計算する重み計算ステップと、
前記重みを用いて前記意図理解結果候補の前記スコアを補正して最終スコアを算出し、当該最終スコアに基づいて前記意図理解結果候補の中から意図理解結果を選択する意図理解補正ステップとを備える意図理解方法。
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201480077480.XA CN106663424B (zh) | 2014-03-31 | 2014-03-31 | 意图理解装置以及方法 |
DE112014006542.0T DE112014006542B4 (de) | 2014-03-31 | 2014-03-31 | Einrichtung und Verfahren zum Verständnis von einer Benutzerintention |
JP2016511184A JPWO2015151157A1 (ja) | 2014-03-31 | 2014-03-31 | 意図理解装置および方法 |
PCT/JP2014/059445 WO2015151157A1 (ja) | 2014-03-31 | 2014-03-31 | 意図理解装置および方法 |
US15/120,539 US10037758B2 (en) | 2014-03-31 | 2014-03-31 | Device and method for understanding user intent |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2014/059445 WO2015151157A1 (ja) | 2014-03-31 | 2014-03-31 | 意図理解装置および方法 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2015151157A1 true WO2015151157A1 (ja) | 2015-10-08 |
Family
ID=54239528
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2014/059445 WO2015151157A1 (ja) | 2014-03-31 | 2014-03-31 | 意図理解装置および方法 |
Country Status (5)
Country | Link |
---|---|
US (1) | US10037758B2 (ja) |
JP (1) | JPWO2015151157A1 (ja) |
CN (1) | CN106663424B (ja) |
DE (1) | DE112014006542B4 (ja) |
WO (1) | WO2015151157A1 (ja) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20170090127A (ko) * | 2016-01-28 | 2017-08-07 | 한국전자통신연구원 | 음성 언어 이해 장치 |
WO2018054309A1 (en) * | 2016-09-22 | 2018-03-29 | Zhejiang Geely Holding Group Co., Ltd. | Speech processing method and device |
WO2018118202A1 (en) * | 2016-12-19 | 2018-06-28 | Interactions Llc | Underspecification of intents in a natural language processing system |
CN113516491A (zh) * | 2020-04-09 | 2021-10-19 | 百度在线网络技术(北京)有限公司 | 推广信息展示方法、装置、电子设备及存储介质 |
Families Citing this family (129)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8677377B2 (en) | 2005-09-08 | 2014-03-18 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
US9318108B2 (en) | 2010-01-18 | 2016-04-19 | Apple Inc. | Intelligent automated assistant |
US8977255B2 (en) | 2007-04-03 | 2015-03-10 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
US8676904B2 (en) | 2008-10-02 | 2014-03-18 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US20120309363A1 (en) | 2011-06-03 | 2012-12-06 | Apple Inc. | Triggering notifications associated with tasks items that represent tasks to perform |
US10276170B2 (en) | 2010-01-18 | 2019-04-30 | Apple Inc. | Intelligent automated assistant |
US8682667B2 (en) | 2010-02-25 | 2014-03-25 | Apple Inc. | User profiling for selecting user specific voice input processing information |
US9262612B2 (en) | 2011-03-21 | 2016-02-16 | Apple Inc. | Device access using voice authentication |
US10417037B2 (en) | 2012-05-15 | 2019-09-17 | Apple Inc. | Systems and methods for integrating third party services with a digital assistant |
DE112014000709B4 (de) | 2013-02-07 | 2021-12-30 | Apple Inc. | Verfahren und vorrichtung zum betrieb eines sprachtriggers für einen digitalen assistenten |
US10652394B2 (en) | 2013-03-14 | 2020-05-12 | Apple Inc. | System and method for processing voicemail |
US10748529B1 (en) | 2013-03-15 | 2020-08-18 | Apple Inc. | Voice activated device for use with a voice-based digital assistant |
US10176167B2 (en) | 2013-06-09 | 2019-01-08 | Apple Inc. | System and method for inferring user intent from speech inputs |
EP3008641A1 (en) | 2013-06-09 | 2016-04-20 | Apple Inc. | Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant |
KR101749009B1 (ko) | 2013-08-06 | 2017-06-19 | 애플 인크. | 원격 디바이스로부터의 활동에 기초한 스마트 응답의 자동 활성화 |
US10296160B2 (en) | 2013-12-06 | 2019-05-21 | Apple Inc. | Method for extracting salient dialog usage from live data |
AU2015266863B2 (en) | 2014-05-30 | 2018-03-15 | Apple Inc. | Multi-command single utterance input method |
US9430463B2 (en) | 2014-05-30 | 2016-08-30 | Apple Inc. | Exemplar-based natural language processing |
US9715875B2 (en) | 2014-05-30 | 2017-07-25 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US10170123B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Intelligent assistant for home automation |
US9633004B2 (en) | 2014-05-30 | 2017-04-25 | Apple Inc. | Better resolution when referencing to concepts |
US9338493B2 (en) | 2014-06-30 | 2016-05-10 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US9668121B2 (en) | 2014-09-30 | 2017-05-30 | Apple Inc. | Social reminders |
US10127911B2 (en) | 2014-09-30 | 2018-11-13 | Apple Inc. | Speaker identification and unsupervised speaker adaptation techniques |
US10074360B2 (en) | 2014-09-30 | 2018-09-11 | Apple Inc. | Providing an indication of the suitability of speech recognition |
US10152299B2 (en) | 2015-03-06 | 2018-12-11 | Apple Inc. | Reducing response latency of intelligent automated assistants |
US9886953B2 (en) | 2015-03-08 | 2018-02-06 | Apple Inc. | Virtual assistant activation |
US9721566B2 (en) | 2015-03-08 | 2017-08-01 | Apple Inc. | Competing devices responding to voice triggers |
DE102015205044A1 (de) * | 2015-03-20 | 2016-09-22 | Bayerische Motoren Werke Aktiengesellschaft | Eingabe von Navigationszieldaten in ein Navigationssystem |
US10546001B1 (en) * | 2015-04-15 | 2020-01-28 | Arimo, LLC | Natural language queries based on user defined attributes |
US10460227B2 (en) | 2015-05-15 | 2019-10-29 | Apple Inc. | Virtual assistant in a communication session |
US10200824B2 (en) | 2015-05-27 | 2019-02-05 | Apple Inc. | Systems and methods for proactively identifying and surfacing relevant content on a touch-sensitive device |
US10083688B2 (en) | 2015-05-27 | 2018-09-25 | Apple Inc. | Device voice control for selecting a displayed affordance |
US9578173B2 (en) | 2015-06-05 | 2017-02-21 | Apple Inc. | Virtual assistant aided communication with 3rd party service in a communication session |
US20160378747A1 (en) | 2015-06-29 | 2016-12-29 | Apple Inc. | Virtual assistant for media playback |
JP6594981B2 (ja) * | 2015-07-13 | 2019-10-23 | 帝人株式会社 | 情報処理装置、情報処理方法およびコンピュータプログラム |
US10671428B2 (en) | 2015-09-08 | 2020-06-02 | Apple Inc. | Distributed personal assistant |
US10740384B2 (en) | 2015-09-08 | 2020-08-11 | Apple Inc. | Intelligent automated assistant for media search and playback |
US10747498B2 (en) | 2015-09-08 | 2020-08-18 | Apple Inc. | Zero latency digital assistant |
US10331312B2 (en) | 2015-09-08 | 2019-06-25 | Apple Inc. | Intelligent automated assistant in a media environment |
US10691473B2 (en) | 2015-11-06 | 2020-06-23 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US10956666B2 (en) | 2015-11-09 | 2021-03-23 | Apple Inc. | Unconventional virtual assistant interactions |
US10223066B2 (en) | 2015-12-23 | 2019-03-05 | Apple Inc. | Proactive assistance based on dialog communication between devices |
WO2017168637A1 (ja) * | 2016-03-30 | 2017-10-05 | 三菱電機株式会社 | 意図推定装置及び意図推定方法 |
US11227589B2 (en) | 2016-06-06 | 2022-01-18 | Apple Inc. | Intelligent list reading |
US10586535B2 (en) | 2016-06-10 | 2020-03-10 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
DK179415B1 (en) | 2016-06-11 | 2018-06-14 | Apple Inc | Intelligent device arbitration and control |
DK201670540A1 (en) | 2016-06-11 | 2018-01-08 | Apple Inc | Application integration with a digital assistant |
US10474753B2 (en) | 2016-09-07 | 2019-11-12 | Apple Inc. | Language identification using recurrent neural networks |
US11204787B2 (en) | 2017-01-09 | 2021-12-21 | Apple Inc. | Application integration with a digital assistant |
US10417266B2 (en) | 2017-05-09 | 2019-09-17 | Apple Inc. | Context-aware ranking of intelligent response suggestions |
DK201770383A1 (en) | 2017-05-09 | 2018-12-14 | Apple Inc. | USER INTERFACE FOR CORRECTING RECOGNITION ERRORS |
DK180048B1 (en) | 2017-05-11 | 2020-02-04 | Apple Inc. | MAINTAINING THE DATA PROTECTION OF PERSONAL INFORMATION |
US10395654B2 (en) | 2017-05-11 | 2019-08-27 | Apple Inc. | Text normalization based on a data-driven learning network |
US10726832B2 (en) | 2017-05-11 | 2020-07-28 | Apple Inc. | Maintaining privacy of personal information |
US11301477B2 (en) | 2017-05-12 | 2022-04-12 | Apple Inc. | Feedback analysis of a digital assistant |
DK179496B1 (en) | 2017-05-12 | 2019-01-15 | Apple Inc. | USER-SPECIFIC Acoustic Models |
DK179745B1 (en) | 2017-05-12 | 2019-05-01 | Apple Inc. | SYNCHRONIZATION AND TASK DELEGATION OF A DIGITAL ASSISTANT |
DK201770427A1 (en) | 2017-05-12 | 2018-12-20 | Apple Inc. | LOW-LATENCY INTELLIGENT AUTOMATED ASSISTANT |
DK201770432A1 (en) * | 2017-05-15 | 2018-12-21 | Apple Inc. | Hierarchical belief states for digital assistants |
DK201770431A1 (en) | 2017-05-15 | 2018-12-20 | Apple Inc. | Optimizing dialogue policy decisions for digital assistants using implicit feedback |
DK201770411A1 (en) | 2017-05-15 | 2018-12-20 | Apple Inc. | MULTI-MODAL INTERFACES |
US10303715B2 (en) | 2017-05-16 | 2019-05-28 | Apple Inc. | Intelligent automated assistant for media exploration |
DK179560B1 (en) | 2017-05-16 | 2019-02-18 | Apple Inc. | FAR-FIELD EXTENSION FOR DIGITAL ASSISTANT SERVICES |
US10311144B2 (en) | 2017-05-16 | 2019-06-04 | Apple Inc. | Emoji word sense disambiguation |
US20180336892A1 (en) | 2017-05-16 | 2018-11-22 | Apple Inc. | Detecting a trigger of a digital assistant |
CN107170446A (zh) * | 2017-05-19 | 2017-09-15 | 深圳市优必选科技有限公司 | 语义处理服务器及用于语义处理的方法 |
CN107240398B (zh) * | 2017-07-04 | 2020-11-17 | 科大讯飞股份有限公司 | 智能语音交互方法及装置 |
US10599377B2 (en) | 2017-07-11 | 2020-03-24 | Roku, Inc. | Controlling visual indicators in an audio responsive electronic device, and capturing and providing audio using an API, by native and non-native computing devices and services |
US10455322B2 (en) | 2017-08-18 | 2019-10-22 | Roku, Inc. | Remote control with presence sensor |
US11062702B2 (en) | 2017-08-28 | 2021-07-13 | Roku, Inc. | Media system with multiple digital assistants |
US10777197B2 (en) | 2017-08-28 | 2020-09-15 | Roku, Inc. | Audio responsive device with play/stop and tell me something buttons |
US11062710B2 (en) | 2017-08-28 | 2021-07-13 | Roku, Inc. | Local and cloud speech recognition |
CN110168535B (zh) * | 2017-10-31 | 2021-07-09 | 腾讯科技(深圳)有限公司 | 一种信息处理方法及终端、计算机存储介质 |
US10733375B2 (en) * | 2018-01-31 | 2020-08-04 | Apple Inc. | Knowledge-based framework for improving natural language understanding |
US11145298B2 (en) | 2018-02-13 | 2021-10-12 | Roku, Inc. | Trigger word detection with multiple digital assistants |
US10592604B2 (en) | 2018-03-12 | 2020-03-17 | Apple Inc. | Inverse text normalization for automatic speech recognition |
US10818288B2 (en) | 2018-03-26 | 2020-10-27 | Apple Inc. | Natural assistant interaction |
US11379706B2 (en) * | 2018-04-13 | 2022-07-05 | International Business Machines Corporation | Dispersed batch interaction with a question answering system |
US11145294B2 (en) | 2018-05-07 | 2021-10-12 | Apple Inc. | Intelligent automated assistant for delivering content from user experiences |
US10928918B2 (en) | 2018-05-07 | 2021-02-23 | Apple Inc. | Raise to speak |
US10984780B2 (en) | 2018-05-21 | 2021-04-20 | Apple Inc. | Global semantic word embeddings using bi-directional recurrent neural networks |
US11704533B2 (en) * | 2018-05-23 | 2023-07-18 | Ford Global Technologies, Llc | Always listening and active voice assistant and vehicle operation |
DK201870355A1 (en) | 2018-06-01 | 2019-12-16 | Apple Inc. | VIRTUAL ASSISTANT OPERATION IN MULTI-DEVICE ENVIRONMENTS |
DK180639B1 (en) | 2018-06-01 | 2021-11-04 | Apple Inc | DISABILITY OF ATTENTION-ATTENTIVE VIRTUAL ASSISTANT |
DK179822B1 (da) | 2018-06-01 | 2019-07-12 | Apple Inc. | Voice interaction at a primary device to access call functionality of a companion device |
US10892996B2 (en) | 2018-06-01 | 2021-01-12 | Apple Inc. | Variable latency device coordination |
US11386266B2 (en) | 2018-06-01 | 2022-07-12 | Apple Inc. | Text correction |
US11076039B2 (en) | 2018-06-03 | 2021-07-27 | Apple Inc. | Accelerated task performance |
US11010561B2 (en) | 2018-09-27 | 2021-05-18 | Apple Inc. | Sentiment prediction from textual data |
US10839159B2 (en) | 2018-09-28 | 2020-11-17 | Apple Inc. | Named entity normalization in a spoken dialog system |
US11462215B2 (en) | 2018-09-28 | 2022-10-04 | Apple Inc. | Multi-modal inputs for voice commands |
US11170166B2 (en) | 2018-09-28 | 2021-11-09 | Apple Inc. | Neural typographical error modeling via generative adversarial networks |
CN109634692A (zh) * | 2018-10-23 | 2019-04-16 | 蔚来汽车有限公司 | 车载对话***及用于其的处理方法和*** |
US11475898B2 (en) | 2018-10-26 | 2022-10-18 | Apple Inc. | Low-latency multi-speaker speech recognition |
KR20200072907A (ko) * | 2018-12-13 | 2020-06-23 | 현대자동차주식회사 | 대화 시스템이 구비된 차량 및 그 제어 방법 |
CN109710941A (zh) * | 2018-12-29 | 2019-05-03 | 上海点融信息科技有限责任公司 | 基于人工智能的用户意图识别方法和装置 |
US11638059B2 (en) | 2019-01-04 | 2023-04-25 | Apple Inc. | Content playback on multiple devices |
US11348573B2 (en) | 2019-03-18 | 2022-05-31 | Apple Inc. | Multimodality in digital assistant systems |
US11100933B2 (en) * | 2019-04-17 | 2021-08-24 | Tempus Labs, Inc. | Collaborative artificial intelligence method and system |
US11423908B2 (en) | 2019-05-06 | 2022-08-23 | Apple Inc. | Interpreting spoken requests |
US11475884B2 (en) | 2019-05-06 | 2022-10-18 | Apple Inc. | Reducing digital assistant latency when a language is incorrectly determined |
DK201970509A1 (en) | 2019-05-06 | 2021-01-15 | Apple Inc | Spoken notifications |
US11307752B2 (en) | 2019-05-06 | 2022-04-19 | Apple Inc. | User configurable task triggers |
US11140099B2 (en) | 2019-05-21 | 2021-10-05 | Apple Inc. | Providing message response suggestions |
DK201970511A1 (en) | 2019-05-31 | 2021-02-15 | Apple Inc | Voice identification in digital assistant systems |
US11289073B2 (en) | 2019-05-31 | 2022-03-29 | Apple Inc. | Device text to speech |
DK180129B1 (en) | 2019-05-31 | 2020-06-02 | Apple Inc. | USER ACTIVITY SHORTCUT SUGGESTIONS |
US11496600B2 (en) | 2019-05-31 | 2022-11-08 | Apple Inc. | Remote execution of machine-learned models |
US11227599B2 (en) | 2019-06-01 | 2022-01-18 | Apple Inc. | Methods and user interfaces for voice-based control of electronic devices |
US11360641B2 (en) | 2019-06-01 | 2022-06-14 | Apple Inc. | Increasing the relevance of new available information |
CN110472030A (zh) * | 2019-08-08 | 2019-11-19 | 网易(杭州)网络有限公司 | 人机交互方法、装置和电子设备 |
US11488406B2 (en) | 2019-09-25 | 2022-11-01 | Apple Inc. | Text detection using global geometry estimators |
CN110956958A (zh) * | 2019-12-04 | 2020-04-03 | 深圳追一科技有限公司 | 搜索方法、装置、终端设备及存储介质 |
KR20210081103A (ko) * | 2019-12-23 | 2021-07-01 | 엘지전자 주식회사 | 복수의 언어를 포함하는 음성을 인식하는 인공 지능 장치 및 그 방법 |
US11043220B1 (en) | 2020-05-11 | 2021-06-22 | Apple Inc. | Digital assistant hardware abstraction |
US11061543B1 (en) | 2020-05-11 | 2021-07-13 | Apple Inc. | Providing relevant data items based on context |
US11755276B2 (en) | 2020-05-12 | 2023-09-12 | Apple Inc. | Reducing description length based on confidence |
CN111696558A (zh) * | 2020-06-24 | 2020-09-22 | 深圳壹账通智能科技有限公司 | 智能外呼方法、装置、计算机设备及存储介质 |
US11490204B2 (en) | 2020-07-20 | 2022-11-01 | Apple Inc. | Multi-device audio adjustment coordination |
US11438683B2 (en) | 2020-07-21 | 2022-09-06 | Apple Inc. | User identification using headphones |
CN112002321B (zh) * | 2020-08-11 | 2023-09-19 | 海信电子科技(武汉)有限公司 | 显示设备、服务器及语音交互方法 |
JP2022050011A (ja) * | 2020-09-17 | 2022-03-30 | 富士フイルムビジネスイノベーション株式会社 | 情報処理装置及びプログラム |
CN113763947B (zh) * | 2021-01-15 | 2024-04-05 | 北京沃东天骏信息技术有限公司 | 一种语音意图识别方法、装置、电子设备及存储介质 |
CN112417712A (zh) * | 2021-01-21 | 2021-02-26 | 深圳市友杰智新科技有限公司 | 目标设备的确定方法、装置、计算机设备和存储介质 |
JP7420109B2 (ja) * | 2021-04-08 | 2024-01-23 | トヨタ自動車株式会社 | 情報出力システム、サーバ装置および情報出力方法 |
US11947548B2 (en) * | 2021-11-29 | 2024-04-02 | Walmart Apollo, Llc | Systems and methods for providing search results based on a primary intent |
CN113870842B (zh) * | 2021-12-02 | 2022-03-15 | 深圳市北科瑞声科技股份有限公司 | 基于权重调节的语音控制方法、装置、设备及介质 |
CN114254622B (zh) * | 2021-12-10 | 2024-06-14 | 马上消费金融股份有限公司 | 一种意图识别方法和装置 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2008203559A (ja) * | 2007-02-20 | 2008-09-04 | Toshiba Corp | 対話装置及び方法 |
JP2010145930A (ja) * | 2008-12-22 | 2010-07-01 | Nissan Motor Co Ltd | 音声認識装置及び方法 |
JP2011033680A (ja) * | 2009-07-30 | 2011-02-17 | Sony Corp | 音声処理装置及び方法、並びにプログラム |
Family Cites Families (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7286984B1 (en) * | 1999-11-05 | 2007-10-23 | At&T Corp. | Method and system for automatically detecting morphemes in a task classification system using lattices |
US20020198714A1 (en) * | 2001-06-26 | 2002-12-26 | Guojun Zhou | Statistical spoken dialog system |
US7228275B1 (en) | 2002-10-21 | 2007-06-05 | Toyota Infotechnology Center Co., Ltd. | Speech recognition system having multiple speech recognizers |
US7751551B2 (en) * | 2005-01-10 | 2010-07-06 | At&T Intellectual Property I, L.P. | System and method for speech-enabled call routing |
US7826945B2 (en) * | 2005-07-01 | 2010-11-02 | You Zhang | Automobile speech-recognition interface |
US8265939B2 (en) * | 2005-08-31 | 2012-09-11 | Nuance Communications, Inc. | Hierarchical methods and apparatus for extracting user intent from spoken utterances |
US20070094022A1 (en) * | 2005-10-20 | 2007-04-26 | Hahn Koo | Method and device for recognizing human intent |
US8112276B2 (en) * | 2005-12-14 | 2012-02-07 | Mitsubishi Electric Corporation | Voice recognition apparatus |
JP2008032834A (ja) * | 2006-07-26 | 2008-02-14 | Toshiba Corp | 音声翻訳装置及びその方法 |
JP4791984B2 (ja) * | 2007-02-27 | 2011-10-12 | 株式会社東芝 | 入力された音声を処理する装置、方法およびプログラム |
JP2012047924A (ja) * | 2010-08-26 | 2012-03-08 | Sony Corp | 情報処理装置、および情報処理方法、並びにプログラム |
KR101522837B1 (ko) * | 2010-12-16 | 2015-05-26 | 한국전자통신연구원 | 대화 방법 및 이를 위한 시스템 |
JP5710317B2 (ja) * | 2011-03-03 | 2015-04-30 | インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation | 情報処理装置、自然言語解析方法、プログラムおよび記録媒体 |
CA2747153A1 (en) * | 2011-07-19 | 2013-01-19 | Suleman Kaheer | Natural language processing dialog system for obtaining goods, services or information |
EP3392876A1 (en) | 2011-09-30 | 2018-10-24 | Apple Inc. | Using context information to facilitate processing of commands in a virtual assistant |
KR101359718B1 (ko) * | 2012-05-17 | 2014-02-13 | 포항공과대학교 산학협력단 | 대화 관리 시스템 및 방법 |
US8983840B2 (en) * | 2012-06-19 | 2015-03-17 | International Business Machines Corporation | Intent discovery in audio or text-based conversation |
US9053708B2 (en) * | 2012-07-18 | 2015-06-09 | International Business Machines Corporation | System, method and program product for providing automatic speech recognition (ASR) in a shared resource environment |
US9530405B2 (en) * | 2012-11-30 | 2016-12-27 | Mitsubishi Electric Corporation | Intention estimating device and intention estimating method |
CN103021403A (zh) * | 2012-12-31 | 2013-04-03 | 威盛电子股份有限公司 | 基于语音识别的选择方法及其移动终端装置及信息*** |
KR102261552B1 (ko) * | 2014-06-30 | 2021-06-07 | 삼성전자주식회사 | 음성 명령어 제공 방법 및 이를 지원하는 전자 장치 |
-
2014
- 2014-03-31 WO PCT/JP2014/059445 patent/WO2015151157A1/ja active Application Filing
- 2014-03-31 JP JP2016511184A patent/JPWO2015151157A1/ja active Pending
- 2014-03-31 DE DE112014006542.0T patent/DE112014006542B4/de active Active
- 2014-03-31 CN CN201480077480.XA patent/CN106663424B/zh active Active
- 2014-03-31 US US15/120,539 patent/US10037758B2/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2008203559A (ja) * | 2007-02-20 | 2008-09-04 | Toshiba Corp | 対話装置及び方法 |
JP2010145930A (ja) * | 2008-12-22 | 2010-07-01 | Nissan Motor Co Ltd | 音声認識装置及び方法 |
JP2011033680A (ja) * | 2009-07-30 | 2011-02-17 | Sony Corp | 音声処理装置及び方法、並びにプログラム |
Non-Patent Citations (1)
Title |
---|
HIROKI YUASA ET AL.: "Construction and Evaluation of Spoken Dialogue Type Car Interface Using a Situation and the Context", IEICE TECHNICAL REPORT, vol. 103, no. 517, 11 December 2003 (2003-12-11), pages 199 - 204 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20170090127A (ko) * | 2016-01-28 | 2017-08-07 | 한국전자통신연구원 | 음성 언어 이해 장치 |
KR102267561B1 (ko) | 2016-01-28 | 2021-06-22 | 한국전자통신연구원 | 음성 언어 이해 장치 및 방법 |
WO2018054309A1 (en) * | 2016-09-22 | 2018-03-29 | Zhejiang Geely Holding Group Co., Ltd. | Speech processing method and device |
US11011170B2 (en) | 2016-09-22 | 2021-05-18 | Zhejiang Geely Holding Group Co., Ltd. | Speech processing method and device |
WO2018118202A1 (en) * | 2016-12-19 | 2018-06-28 | Interactions Llc | Underspecification of intents in a natural language processing system |
US10216832B2 (en) | 2016-12-19 | 2019-02-26 | Interactions Llc | Underspecification of intents in a natural language processing system |
US10796100B2 (en) | 2016-12-19 | 2020-10-06 | Interactions Llc | Underspecification of intents in a natural language processing system |
CN113516491A (zh) * | 2020-04-09 | 2021-10-19 | 百度在线网络技术(北京)有限公司 | 推广信息展示方法、装置、电子设备及存储介质 |
CN113516491B (zh) * | 2020-04-09 | 2024-04-30 | 百度在线网络技术(北京)有限公司 | 推广信息展示方法、装置、电子设备及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
DE112014006542T5 (de) | 2016-12-15 |
CN106663424B (zh) | 2021-03-05 |
DE112014006542B4 (de) | 2024-02-08 |
CN106663424A (zh) | 2017-05-10 |
JPWO2015151157A1 (ja) | 2017-04-13 |
US20170011742A1 (en) | 2017-01-12 |
US10037758B2 (en) | 2018-07-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2015151157A1 (ja) | 意図理解装置および方法 | |
US10446141B2 (en) | Automatic speech recognition based on user feedback | |
JP6073498B2 (ja) | 対話制御装置及び対話制御方法 | |
JP4542974B2 (ja) | 音声認識装置、音声認識方法および音声認識プログラム | |
WO2016067418A1 (ja) | 対話制御装置および対話制御方法 | |
JP2017513047A (ja) | 音声認識における発音予測 | |
JP2007047412A (ja) | 認識文法モデル作成装置、認識文法モデル作成方法、および、音声認識装置 | |
US20170345426A1 (en) | System and methods for robust voice-based human-iot communication | |
JP4634156B2 (ja) | 音声対話方法および音声対話装置 | |
JP2008243080A (ja) | 音声を翻訳する装置、方法およびプログラム | |
US20170337922A1 (en) | System and methods for modifying user pronunciation to achieve better recognition results | |
JP2013125144A (ja) | 音声認識装置およびそのプログラム | |
JP2004045900A (ja) | 音声対話装置及びプログラム | |
JP2010169973A (ja) | 外国語学習支援システム、及びプログラム | |
Rudzionis et al. | Web services based hybrid recognizer of Lithuanian voice commands | |
JP2009116075A (ja) | 音声認識装置 | |
JP5493537B2 (ja) | 音声認識装置、音声認識方法及びそのプログラム | |
JP4951422B2 (ja) | 音声認識装置、および音声認識方法 | |
JP2012255867A (ja) | 音声認識装置 | |
JP4930014B2 (ja) | 音声認識装置、および音声認識方法 | |
US11393451B1 (en) | Linked content in voice user interface | |
JP2005157166A (ja) | 音声認識装置、音声認識方法及びプログラム | |
WO2013125203A1 (ja) | 音声認識装置、音声認識方法およびコンピュータプログラム | |
AU2019100034A4 (en) | Improving automatic speech recognition based on user feedback | |
KR101830210B1 (ko) | 적어도 하나의 의미론적 유닛의 집합을 개선하기 위한 방법, 장치 및 컴퓨터 판독 가능한 기록 매체 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 14888136 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2016511184 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 15120539 Country of ref document: US |
|
WWE | Wipo information: entry into national phase |
Ref document number: 112014006542 Country of ref document: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 14888136 Country of ref document: EP Kind code of ref document: A1 |