WO2015098079A1 - Voice recognition processing device, voice recognition processing method, and display device - Google Patents

Voice recognition processing device, voice recognition processing method, and display device Download PDF

Info

Publication number
WO2015098079A1
WO2015098079A1 PCT/JP2014/006367 JP2014006367W WO2015098079A1 WO 2015098079 A1 WO2015098079 A1 WO 2015098079A1 JP 2014006367 W JP2014006367 W JP 2014006367W WO 2015098079 A1 WO2015098079 A1 WO 2015098079A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
voice
unit
command
search
Prior art date
Application number
PCT/JP2014/006367
Other languages
French (fr)
Japanese (ja)
Inventor
智弘 小金井
小沼 知浩
Original Assignee
パナソニックIpマネジメント株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by パナソニックIpマネジメント株式会社 filed Critical パナソニックIpマネジメント株式会社
Priority to EP14874773.6A priority Critical patent/EP3089157B1/en
Priority to JP2015554558A priority patent/JP6244560B2/en
Priority to US15/023,385 priority patent/US9905225B2/en
Priority to CN201480057905.0A priority patent/CN105659318B/en
Publication of WO2015098079A1 publication Critical patent/WO2015098079A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/083Recognition networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/32Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/42203Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS] sound input device, e.g. microphone
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/42204User interfaces specially adapted for controlling a client device through a remote control device; Remote control devices therefor
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Definitions

  • the present disclosure relates to a speech recognition processing device, a speech recognition processing method, and a display device that operate by recognizing speech uttered by a user.
  • Patent Document 1 discloses a voice input device having a voice recognition function.
  • the voice input device receives voice uttered by the user, analyzes the received voice, recognizes a command indicated by the user's voice (voice recognition), and controls the device according to the voice-recognized command. It is configured. That is, the voice input device of Patent Document 1 can recognize a voice arbitrarily generated by a user, and can control the device according to a command (command) that is a result of the voice recognition.
  • the hypertext displayed on the browser is displayed.
  • the text can be selected using the voice recognition function of the voice input device.
  • the user can also perform a search on a website (search site) that provides a search service by using this voice recognition function.
  • This disclosure provides a voice recognition processing device and a voice recognition processing method that improve user operability.
  • the speech recognition processing device includes a speech acquisition unit, a first speech recognition unit, a second speech recognition unit, a selection unit, a storage unit, and a processing unit.
  • the voice acquisition unit is configured to acquire voice uttered by the user and output voice information.
  • the first voice recognition unit is configured to convert voice information into first information.
  • the second voice recognition unit is configured to convert voice information into second information.
  • the sorting unit is configured to sort the third information and the fourth information from the second information.
  • the storage unit is configured to store the first information, the third information, and the fourth information.
  • the processing unit is configured to execute processing based on the first information, the third information, and the fourth information. Then, if there is one or two pieces of deficient information among the first information, the third information, and the fourth information, the processing unit complements the deficient information using the information stored in the storage unit and performs processing. Is configured to run.
  • a speech recognition processing method includes a step of acquiring speech uttered by a user and converting it into speech information, a step of converting speech information into first information, a step of converting speech information into second information, Selecting the third information and the fourth information from the two information, storing the first information, the third information, and the fourth information in the storage unit, the first information, the third information, and the fourth information A step of executing a process based on the information, and a step of complementing using one or two missing information of the first information, the third information, and the fourth information using information stored in the storage unit, Prepare.
  • the display device includes a voice acquisition unit, a first voice recognition unit, a second voice recognition unit, a selection unit, a storage unit, a processing unit, and a display unit.
  • the voice acquisition unit is configured to acquire voice uttered by the user and output voice information.
  • the first voice recognition unit is configured to convert voice information into first information.
  • the second voice recognition unit is configured to convert voice information into second information.
  • the sorting unit is configured to sort the third information and the fourth information from the second information.
  • the storage unit is configured to store the first information, the third information, and the fourth information.
  • the processing unit is configured to execute processing based on the first information, the third information, and the fourth information.
  • the display unit is configured to display a processing result in the processing unit. Then, if there is one or two pieces of deficient information among the first information, the third information, and the fourth information, the processing unit complements the deficient information using the information stored in the storage unit and performs processing. Is configured to run.
  • the voice recognition processing device can improve operability when the user performs voice operation.
  • FIG. 1 is a diagram schematically showing a speech recognition processing system according to the first embodiment.
  • FIG. 2 is a block diagram illustrating a configuration example of the speech recognition processing system according to the first embodiment.
  • FIG. 3 is a diagram showing an outline of dictation performed by the speech recognition processing system according to the first embodiment.
  • FIG. 4 is a flowchart showing an operation example of the keyword single search process performed by the speech recognition processing apparatus according to the first embodiment.
  • FIG. 5 is a flowchart showing an operation example of the keyword associative search process performed by the speech recognition processing apparatus according to the first embodiment.
  • FIG. 6 is a flowchart illustrating an operation example of the speech recognition interpretation process performed by the speech recognition processing apparatus according to the first embodiment.
  • FIG. 7 is a diagram schematically illustrating an example of a reserved word table of the speech recognition processing device according to the first embodiment.
  • a television receiver (television) 10 is cited as an example of a display device including a voice recognition processing device, but the display device is not limited to the television 10 at all.
  • a PC or a tablet terminal may be used.
  • FIG. 1 schematically shows a speech recognition processing system 11 according to the first embodiment.
  • a speech recognition processing device is built in the television 10 which is an example of a display device.
  • the voice recognition processing system 11 in the present embodiment includes a television 10 and a voice recognition unit 50.
  • the voice recognition processing system 11 may include at least one of a remote controller (hereinafter also referred to as “remote controller”) 20 and a portable terminal 30.
  • remote controller hereinafter also referred to as “remote controller”
  • the display unit 140 of the television 10 displays the sound recognition icon 201 and the volume of the collected sound along with the video based on the input video signal, the received broadcast signal, and the like.
  • An indicator 202 is displayed. This is to indicate to the user 700 that the operation of the television 10 based on the voice of the user 700 (hereinafter referred to as “voice operation”) is possible and to prompt the user 700 to speak.
  • the user 700 When the user 700 utters a sound, the sound is collected by the microphone incorporated in the remote controller 20 or the portable terminal 30 used by the user 700 and transferred to the television 10. Then, the voice uttered by the user 700 is recognized by the voice recognition processing device built in the television 10. In the television 10, the television 10 is controlled according to the result of the voice recognition.
  • the television 10 may include a built-in microphone 130.
  • the voice recognition processing system 11 can be configured not to include the remote controller 20 and the portable terminal 30.
  • the television 10 is connected to the voice recognition unit 50 via the network 40. Communication between the television 10 and the voice recognition unit 50 is possible.
  • FIG. 2 is a block diagram showing a configuration example of the speech recognition processing system 11 according to the first embodiment.
  • the television 10 includes a voice recognition processing device 100, a display unit 140, a transmission / reception unit 150, a tuner 160, a storage unit 171, a built-in microphone 130, and a wireless communication unit 180.
  • the built-in microphone 130 is a microphone configured to collect sound mainly coming from a direction facing the display surface of the display unit 140. That is, the built-in microphone 130 has a sound collection direction set so as to collect the sound emitted by the user 700 facing the display unit 140 of the television 10, and can collect the sound emitted by the user 700. Is possible.
  • the built-in microphone 130 may be provided in the casing of the television 10 or may be installed outside the casing of the television 10 as shown as an example in FIG.
  • the remote controller 20 is a controller for the user 700 to remotely operate the television 10.
  • the remote controller 20 includes a microphone 21 and an input unit 22 in addition to a general configuration necessary for remote operation of the television 10.
  • the microphone 21 is configured to collect a sound uttered by the user 700 and output a sound signal.
  • the input unit 22 is configured to receive an input operation manually performed by the user 700 and output an input signal corresponding to the input operation.
  • the input unit 22 is, for example, a touch pad, but may be a keyboard, a button, or the like.
  • An audio signal generated by the sound collected by the microphone 21 or an input signal generated when the user 700 performs an input operation on the input unit 22 is wirelessly transmitted to the television 10 by, for example, infrared rays or radio waves.
  • the display unit 140 is, for example, a liquid crystal display, but may be a plasma display, an organic EL (ElectroLuminescence) display, or the like.
  • the display unit 140 is controlled by the display control unit 108 and displays an image based on an externally input video signal, a broadcast signal received by the tuner 160, or the like.
  • the transmission / reception unit 150 is connected to the network 40 and is configured to communicate with an external device (for example, the voice recognition unit 50) connected to the network 40 through the network 40.
  • an external device for example, the voice recognition unit 50
  • the tuner 160 is configured to receive a terrestrial broadcast or satellite broadcast television broadcast signal via an antenna (not shown).
  • the tuner 160 may be configured to receive a television broadcast signal transmitted via a dedicated cable.
  • the storage unit 171 is, for example, a nonvolatile semiconductor memory, but may be a volatile semiconductor memory, a hard disk, or the like.
  • the storage unit 171 stores information (data), a program, and the like used for controlling each unit of the television 10.
  • the mobile terminal 30 is, for example, a smartphone, and can operate software for remotely operating the television 10. Therefore, in the speech recognition processing system 11 in the present embodiment, the mobile terminal 30 on which the software is operating can be used for remote operation of the television 10.
  • the portable terminal 30 has a microphone 31 and an input unit 32.
  • the microphone 31 is a microphone built in the mobile terminal 30, and is configured to collect the sound emitted by the user 700 and output the sound signal, similarly to the microphone 21 provided in the remote controller 20.
  • the input unit 32 is configured to receive an input operation manually performed by the user 700 and output an input signal corresponding to the input operation.
  • the input unit 32 is, for example, a touch panel, but may be a keyboard, a button, or the like.
  • the mobile terminal 30 in which the software is operating receives an audio signal based on the sound collected by the microphone 31 or an input signal generated when the user 700 performs an input operation on the input unit 32. For example, wireless transmission to the television 10 by infrared rays or radio waves is performed.
  • the television 10 and the remote controller 20 or the portable terminal 30 are connected by wireless communication such as a wireless LAN (Local Area Network) or Bluetooth (registered trademark).
  • wireless communication such as a wireless LAN (Local Area Network) or Bluetooth (registered trademark).
  • the network 40 is, for example, the Internet, but may be another network.
  • the voice recognition unit 50 is a server (a server on the cloud) connected to the television 10 via the network 40.
  • the voice recognition unit 50 receives voice information transmitted from the television 10 and converts the received voice information into a character string.
  • the character string may be a plurality of characters or a single character. Then, the voice recognition unit 50 transmits character string information indicating the converted character string to the television 10 via the network 40 as a result of the voice recognition.
  • the voice recognition processing device 100 includes a voice acquisition unit 101, a voice processing unit 102, a recognition result acquisition unit 103, an intention interpretation processing unit 104, a word storage processing unit 105, a command processing unit 106, and a search processing unit 107.
  • the storage unit 170 is, for example, a nonvolatile semiconductor memory, but may be a volatile semiconductor memory, a hard disk, or the like.
  • the storage unit 170 is controlled by the word storage processing unit 105 and can arbitrarily write and read data.
  • the storage unit 170 also stores information (for example, “voice-command” correspondence information described later) referred to by the voice processing unit 102 and the like.
  • the “voice-command” correspondence information is information in which voice information is associated with a command. Note that the storage unit 170 and the storage unit 171 may be configured integrally.
  • the voice acquisition unit 101 acquires a voice signal based on a voice uttered by the user 700.
  • the voice acquisition unit 101 may acquire a voice signal based on voice generated by the user 700 from the built-in microphone 130 of the television 10, or may be built into the microphone 21 built into the remote controller 20 or the portable terminal 30.
  • the microphone 31 may be acquired via the wireless communication unit 180.
  • the voice acquisition unit 101 converts the voice signal into voice information that can be used for various processes in the subsequent stage, and outputs the voice information to the voice processing unit 102. Note that if the audio signal is a digital signal, the audio acquisition unit 101 may use the audio signal as it is as audio information.
  • the voice processing unit 102 is an example of a “first voice recognition unit”.
  • the voice processing unit 102 is configured to convert voice information into command information that is an example of “first information”.
  • the voice processing unit 102 performs “command recognition processing”.
  • the “command recognition process” is a process for determining whether or not a preset command is included in the voice information acquired from the voice acquisition unit 101 and specifying the command if included.
  • the voice processing unit 102 refers to the “voice-command” correspondence information stored in advance in the storage unit 170 based on the voice information acquired from the voice acquisition unit 101.
  • the “voice-command” correspondence information is a correspondence table in which voice information and a command that is instruction information for the television 10 are associated with each other.
  • voice processing unit 102 can identify a command included in the voice information acquired from the voice acquisition unit 101 with reference to the “voice-command” correspondence information, information (command information) representing the command is obtained as a result of voice recognition. The result is output to the recognition result acquisition unit 103.
  • the voice processing unit 102 transmits the voice information acquired from the voice acquisition unit 101 from the transmission / reception unit 150 to the voice recognition unit 50 via the network 40.
  • the voice recognition unit 50 is an example of a “second voice recognition unit”.
  • the voice recognition unit 50 is configured to convert voice information into character string information that is an example of “second information”, and performs “keyword recognition processing”.
  • the voice recognition unit 50 receives the voice information transmitted from the television 10
  • the voice recognition unit 50 divides the voice information into phrases and distinguishes each phrase in order to distinguish a keyword from a keyword other than a keyword (for example, a particle). Convert to a character string (hereinafter referred to as “dictation”).
  • the voice recognition unit 50 transmits the dictated character string information (character string information) to the television 10 as a result of the voice recognition.
  • the voice recognition unit 50 may acquire voice information other than the command from the received voice information, or may convert voice information other than the command from the received voice information into a character string and send it back. Alternatively, voice information excluding commands may be transmitted from the television 10 to the voice recognition unit 50.
  • the recognition result acquisition unit 103 acquires command information as a result of voice recognition from the voice processing unit 102. Further, the recognition result acquisition unit 103 acquires character string information as a result of voice recognition from the voice recognition unit 50 via the network 40 and the transmission / reception unit 150.
  • the intention interpretation processing unit 104 is an example of a “selection unit”.
  • the intention interpretation processing unit 104 is configured to select, from the character string information, reserved word information that is an example of “third information” and free word information that is an example of “fourth information”.
  • the intention interpretation processing unit 104 acquires the command information and the character string information from the recognition result acquisition unit 103, the intention interpretation processing unit 104 selects “free word” and “reserved word” from the character string information. Then, based on the selected free word, reserved word, and command information, intention interpretation for specifying the intention of the voice operation spoken by the user 700 is performed. Details of this operation will be described later.
  • the intention interpretation processing unit 104 outputs the command information interpreted as intended to the command processing unit 106. Also, free word information representing free words, reserved word information representing reserved words, and command information are output to the word storage processing unit 105.
  • the intention interpretation processing unit 104 may output free word information and reserved word information to the command processing unit 106.
  • the word storage processing unit 105 stores the command information, free word information, and reserved word information output from the intention interpretation processing unit 104 in the storage unit 170.
  • the command processing unit 106 is an example of a “processing unit”.
  • the command processing unit 106 is configured to execute processing based on command information, reserved word information, and free word information.
  • the command processing unit 106 performs command processing corresponding to the command information that is intentionally interpreted by the intention interpretation processing unit 104. Further, the command processing unit 106 performs command processing corresponding to the user operation received by the operation receiving unit 110.
  • the command processing unit 106 may perform new command processing based on one or two of command information, free word information, and reserved word information stored in the storage unit 170 by the word storage processing unit 105. . That is, if there is one or two pieces of missing information among command information, reserved word information, and free word information, the command processing unit 106 supplements the missing information using information stored in the storage unit 170. , Configured to execute command processing. Details of this will be described later.
  • the search processing unit 107 is an example of a “processing unit”. If the command information is a search command, the search processing unit 107 is configured to execute a search process based on reserved word information and free word information. If the command information corresponds to a search command associated with a preset application, the search processing unit 107 performs a search based on free word information and reserved word information with the application.
  • the search processing unit 107 uses the Internet search application based on free word information and reserved word information. Perform a search.
  • the search processing unit 107 uses the program guide application based on free word information and reserved word information. Perform a search.
  • the search processing unit 107 can search all applications (searchable applications) that can perform a search based on the free word information and reserved word information. ) To perform a search based on the free word information and reserved word information.
  • the search processing unit 107 complements the missing information using information stored in the storage unit 170 to perform the search processing. Is configured to run. If the shortage information is command information and the immediately preceding command process is a search process in the search processing unit 107, the search process is executed again.
  • the display control unit 108 displays the search result in the search processing unit 107 on the display unit 140.
  • the display control unit 108 displays the keyword search result in the Internet search application, the keyword search result in the program guide application, or the keyword search result in the searchable application on the display unit 140.
  • the operation receiving unit 110 receives an input signal generated by an input operation performed by the user 700 using the input unit 22 of the remote controller 20 or an input signal generated by the user 700 using the input unit 32 of the portable terminal 30. 20 or from the mobile terminal 30 via the wireless communication unit 180. In this way, the operation reception unit 110 receives an operation (user operation) performed by the user 700.
  • the first starting method is as follows.
  • the user 700 presses a microphone button (not shown) that is one of the input units 22 provided in the remote controller 20 in order to start the voice recognition process.
  • the operation reception unit 110 receives that the microphone button of the remote controller 20 has been pressed.
  • the television 10 changes the volume of a speaker (not shown) of the television 10 to a preset volume.
  • This volume is a sufficiently small volume that does not hinder voice recognition by the microphone 21.
  • the voice recognition processing device 100 starts the voice recognition process.
  • the television 10 does not need to perform the above volume adjustment, so the volume remains as it is.
  • a mobile terminal 30 for example, a smartphone having a touch panel
  • the user 700 activates software (software for operating the television 10 by voice) provided in the mobile terminal 30 and presses a microphone button displayed on the touch panel when the software operates. This user operation corresponds to the user operation of pressing the microphone button of the remote controller 20.
  • the speech recognition processing apparatus 100 starts the speech recognition process.
  • the second starting method is as follows.
  • the user 700 utters a voice (for example, “speech operation start”, etc.) representing a command (start command) for starting a preset voice recognition process to the built-in microphone 130 of the television 10.
  • a voice for example, “speech operation start”, etc.
  • start command a command for starting a preset voice recognition process to the built-in microphone 130 of the television 10.
  • the voice recognition processing device 100 recognizes that the voice collected by the built-in microphone 130 is a preset start command, the television 10 changes the volume of the speaker to a preset volume as described above. Then, the voice recognition processing by the voice recognition processing device 100 is started.
  • control unit (not shown) that controls each block of the television 10.
  • the display control unit 108 starts the voice recognition processing and prompts the user 700 to perform voice operation in order to prompt the user 700 to speak.
  • a recognition icon 201 and an indicator 202 indicating the volume of the collected sound are displayed on the image display surface of the display unit 140.
  • the display control unit 108 may display a message indicating that the voice recognition processing has started on the display unit 140 instead of the voice recognition icon 201.
  • a message indicating that the voice recognition process has been started may be output by voice from a speaker.
  • voice recognition icon 201 and the indicator 202 are not limited to the design shown in FIG. Any design may be used as long as the desired effect can be obtained.
  • the speech recognition processing apparatus 100 performs two types of speech recognition processing.
  • One is speech recognition processing (command recognition processing) for recognizing speech corresponding to a preset command.
  • the other is voice recognition processing (keyword recognition processing) for recognizing keywords other than preset commands.
  • the command recognition process is performed by the voice processing unit 102 as described above.
  • the voice processing unit 102 compares the voice information based on the voice uttered by the user 700 to the television 10 with the “voice-command” correspondence information stored in the storage unit 170 in advance. If the voice information includes a command registered in the “voice-command” correspondence information, the command is specified.
  • various commands for operating the television 10 are registered. For example, free word search operation commands and the like are also registered.
  • the keyword recognition process is performed using the voice recognition unit 50 connected to the television 10 via the network 40 as described above.
  • the voice recognition unit 50 acquires voice information from the television 10 via the network 40. Then, the voice recognition unit 50 divides the acquired voice information into phrases and divides the information into keywords and other than keywords (for example, particles, etc.). Thus, the voice recognition unit 50 performs dictation.
  • the voice recognition unit 50 uses a database in which voice information and a character string (including one character) are associated when performing dictation.
  • the voice recognition unit 50 separates the acquired voice information into a keyword and other than the keyword by comparing with the database, and converts each into a character string.
  • the voice recognition unit 50 receives all voices (voice information) acquired by the voice acquisition unit 101 from the television 10, performs dictation on all the voice information, and results thereof. Are transmitted to the television 10.
  • the voice processing unit 102 of the television 10 may be configured to transmit voice information other than the command recognized by the “voice-command” correspondence information to the voice recognition unit 50.
  • FIG. 3 is a diagram showing an outline of dictation performed by the speech recognition processing system 11 according to the first embodiment.
  • FIG. 3 shows a state in which the web browser is displayed on the display unit 140 of the television 10.
  • the user 700 performs a search by keyword (keyword search) using the Internet search application of the web browser
  • the voice recognition processing is started in the voice recognition processing device 100, an image shown as an example in FIG.
  • the input field 203 is an area for inputting a keyword used for a search on a web browser. If the cursor is displayed in the input field 203, the user 700 can input a keyword in the input field 203.
  • a voice signal based on the voice is input to the voice acquisition unit 101 and converted into voice information. Then, the voice information is transmitted from the television 10 to the voice recognition unit 50 via the network 40. For example, if the user 700 speaks “ABC”, audio information based on the audio is transmitted from the television 10 to the audio recognition unit 50.
  • the voice recognition unit 50 converts voice information received from the television 10 into a character string by comparing it with a database. Then, the voice recognition unit 50 transmits the character string information (character string information) to the television 10 via the network 40 as a result of the voice recognition based on the received voice information. If the received voice information is based on the voice “ABC”, the voice recognition unit 50 compares the voice information with the database to convert it to a character string “ABC”, and transmits the character string information to the television 10. To do.
  • the television 10 Upon receiving the character string information from the voice recognition unit 50, the television 10 operates the recognition result acquisition unit 103, the intention interpretation processing unit 104, the command processing unit 106, and the display control unit 108 based on the character string information, A character string corresponding to the column information is displayed in the input field 203. For example, when receiving the character string information corresponding to the character string “ABC” from the voice recognition unit 50, the television 10 displays the character string “ABC” in the input field 203.
  • the web browser displayed on the display unit 140 of the television 10 performs a keyword search using the character string displayed in the input field 203.
  • FIG. 4 is a flowchart illustrating an operation example of the keyword single search process performed by the speech recognition processing apparatus 100 according to the first embodiment.
  • FIG. 5 is a flowchart illustrating an operation example of the keyword association search process performed by the speech recognition processing apparatus 100 according to the first embodiment.
  • FIG. 6 is a flowchart showing an operation example of the speech recognition interpretation process performed by the speech recognition processing apparatus 100 according to the first embodiment.
  • the flowchart shown in FIG. 6 is a flowchart showing details of the speech recognition / interpretation processing step in each search process shown in FIGS.
  • FIG. 7 is a diagram schematically illustrating an example of a reserved word table of the speech recognition processing apparatus 100 according to the first embodiment.
  • the speech recognition interpretation process (step S101) of the keyword single search process shown in FIG. 4 and the speech recognition interpretation process (step S201) of the keyword associative search process shown in FIG. Then, substantially the same processing is performed. First, the speech recognition interpretation process will be described with reference to FIG.
  • the voice recognition processing of the voice recognition processing device 100 is started.
  • the voice of the user 700 is converted into a voice signal by the built-in microphone 130, the microphone 21 of the remote controller 20, or the microphone 31 of the portable terminal 30, and the voice signal is converted into a voice acquisition unit. 101.
  • the voice acquisition unit 101 acquires the voice signal of the user 700 (step S301).
  • the voice acquisition unit 101 converts the acquired voice signal of the user 700 into voice information that can be used for various processes in the subsequent stage. If the user 700 speaks, for example, “Search ABC image”, the voice acquisition unit 101 outputs voice information based on the voice.
  • the voice processing unit 102 compares the voice information output from the voice acquisition unit 101 with the “voice-command” correspondence information stored in the storage unit 170 in advance. Then, it is checked whether or not the voice information output from the voice acquisition unit 101 corresponds to the command registered in the “voice-command” correspondence information (step S302).
  • the voice information output from the voice acquisition unit 101 includes voice information based on the word “search” issued by the user 700, and “search” is registered as command information in the “voice-command” correspondence information. Then, the voice processing unit 102 determines that the “search” command is included in the voice information.
  • the command information includes command information corresponding to voice information such as “search”, “channel up”, “voice up”, “play”, “stop”, “word conversion”, “character display”, etc. It is included.
  • the “voice-command” correspondence information can be updated by adding or deleting command information.
  • the user 700 can add new command information to the “voice-command” correspondence information.
  • new command information can be added to the “voice-command” correspondence information via the network 40.
  • the speech recognition processing apparatus 100 can perform speech recognition processing based on the latest “voice-command” correspondence information.
  • step S302 the voice processing unit 102 transmits the voice information output from the voice acquisition unit 101 from the transmission / reception unit 150 to the voice recognition unit 50 via the network 40.
  • the voice recognition unit 50 converts the received voice information into a character string delimited by a keyword and a keyword (for example, a particle). Therefore, the voice recognition unit 50 performs dictation based on the received voice information.
  • the voice recognition unit 50 compares the database in which keywords and character strings are associated with the received voice information. If a keyword registered in the database is included in the received voice information, a character string (including a word) corresponding to the keyword is selected. In this way, the voice recognition unit 50 performs dictation and converts the received voice information into a character string. For example, if the voice recognition unit 50 receives voice information based on the voice “search for ABC image” uttered by the user 700, the voice recognition unit 50 converts the voice information into “ABC” and “NO” by dictation. , “Image”, “to”, and “search”. The voice recognition unit 50 transmits character string information representing each converted character string to the television 10 via the network 40.
  • This database is provided in the voice recognition unit 50, but may be in another location on the network 40.
  • the database may be configured such that keyword information is updated regularly or irregularly.
  • the recognition result acquisition unit 103 of the television 10 acquires command information output as a result of speech recognition from the speech processing unit 102 and character string information transmitted as a result of speech recognition from the speech recognition unit 50, The data is output to the interpretation processing unit 104.
  • the intention interpretation processing unit 104 performs intention interpretation for specifying the intention of the voice operation uttered by the user 700 based on the command information and the character string information acquired from the recognition result acquisition unit 103 (step S303).
  • the intention interpretation processing unit 104 performs selection of character string information for intention interpretation.
  • the selection types include free words, reserved words, and commands. If there is a character string information that overlaps with command information, the intention interpretation processing unit 104 determines that it is a command and selects it. Further, the reserved words are selected from the character string information based on the reserved word table shown as an example in FIG. Free words are selected by removing character strings such as particles that do not correspond to keywords from the remaining character string information.
  • the intention interpretation processing unit 104 acquires, for example, character string information such as “ABC”, “NO”, “image”, “O”, “search”, and command information indicating “search”, “ABC” is selected as a free word, “image” as a reserved word, and “search” as a command.
  • the speech recognition processing apparatus 100 can perform an operation based on the intention of the user 700 (the intention of the voice operation spoken by the user 700). For example, the speech recognition processing apparatus 100 can execute the command “search” using the free word “ABC” for the reserved word “image”.
  • the intention interpretation processing unit 104 compares the reserved word table shown in FIG. 7 with the character string information as an example, and if the character string information includes a term registered in the reserved word table, the term Are selected from the string information as reserved words.
  • the reserved word is a predetermined term such as “image”, “moving image”, “program”, “Web”, etc. as shown in an example in FIG.
  • the reserved words are not limited to these terms.
  • the intention interpretation processing unit 104 may perform intention interpretation using a character string such as a particle included in the character string information.
  • the intention interpretation processing unit 104 executes the speech recognition interpretation process (step S101 shown in FIG. 4 and step S201 shown in FIG. 5).
  • the intention interpretation processing unit 104 executes the speech recognition interpretation process shown in FIG. 6 based on the voice uttered by the user 700 (step S101). Since they overlap, detailed description of step S101 is omitted.
  • the intention interpretation processing unit 104 determines whether or not the reserved word information is included in the character string information based on the processing result in step S101 (step S102).
  • step S102 If it is determined in step S102 that no reserved word information is included (No), the process proceeds to step S104.
  • step S102 When it is determined in step S102 that reserved word information is included (Yes), the word storage processing unit 105 stores the reserved word information in the storage unit 170 (step S103). In the example described above, the “image” of reserved word information is stored in the storage unit 170.
  • the voice recognition processing device 100 determines whether or not the character string information includes free word information based on the processing result in step S101 (step S104).
  • step S104 If it is determined in step S104 that free word information is not included (No), the process proceeds to step S106.
  • step S104 When it is determined in step S104 that free word information is included (Yes), the word storage processing unit 105 stores the free word information in the storage unit 170 (step S105).
  • the word storage processing unit 105 stores the free word information in the storage unit 170 (step S105).
  • “ABC” of free word information is stored in the storage unit 170.
  • the word storage processing unit 105 stores command information in the storage unit 170.
  • the command processing unit 106 executes command processing based on the free word information, reserved word information, and command information (step S106).
  • the command processing unit 106 When the command processing unit 106 receives command information from the intention interpretation processing unit 104 and receives free word information and / or reserved word information from the word storage processing unit 105, the command processing unit 106 converts the free word information and / or reserved word information into each or both. On the other hand, an instruction (command) based on the command information is executed. Note that the command processing unit 106 may receive free word information and reserved word information from the intention interpretation processing unit 104. Command information may be received from the word storage processing unit 105.
  • the command processing unit 106 mainly performs command processing other than search.
  • This command processing includes, for example, channel change and volume change of the television 10.
  • the search processing unit 107 executes the search process (step S107).
  • the search processing unit 107 sets the search target content to “image” based on the “image” of the reserved word information, and performs an image search using “ABC” of the free word information.
  • the search result in step S107 is displayed on the display unit 140 by the display control unit.
  • the keyword single search process ends.
  • the keyword associative search processing is based on the previous input content and the newly input content without inputting again the content input in the previous search when the user 700 executes the search processing continuously. This is a process for executing a new search.
  • an example in which an input operation is performed by a voice uttered by the user 700 will be described. ) May be used.
  • keyword associative search processing will be described with specific examples.
  • the user 700 first speaks “Search ABC image” and the search for “image” by the free word “ABC” has already been performed.
  • the user 700 newly searches for “moving image” using the same free word “ABC” as the free word used for the previous image search.
  • the user 700 can omit the utterance of the free word “ABC” that overlaps with the previous search. That is, the user 700 may say “Search for a moving image”.
  • the intention interpretation processing unit 104 executes the speech recognition interpretation process shown in FIG. 6 based on the voice uttered by the user 700 (step S201). Since they overlap, detailed description of step S201 is omitted.
  • Voice information based on the voice uttered by the user is transmitted from the voice recognition processing device 100 to the voice recognition unit 50 via the network 40.
  • the voice recognition unit 50 returns character string information based on the received voice information.
  • This character string information includes reserved word information (for example, “moving image”) and command information (for example, “search”), but does not include free word information.
  • the returned character string information is received by the recognition result acquisition unit 103 and output to the intention interpretation processing unit 104.
  • the voice processing unit 102 of the voice recognition processing apparatus 100 determines that the command “search” is included in the voice information based on the voice uttered by the user 700. Then, the voice processing unit 102 outputs command information corresponding to the command “search” to the recognition result acquisition unit 103. Further, the recognition result acquisition unit 103 receives character string information including the character string “moving image” from the voice recognition unit 50. Then, the intention interpretation processing unit 104 determines that “moving image” included in the character string information acquired from the recognition result acquisition unit 103 is a reserved word. Further, since the free word information is not included in the character string information, the free word information is not output from the intention interpretation processing unit 104.
  • the intention interpretation processing unit 104 determines whether or not the reserved word information is included in the character string information based on the processing result in step S201 (step S202).
  • step S202 If it is determined in step S202 that the reserved word information is not included (No), the process proceeds to step S205. The operations after step S205 will be described later.
  • the word storage processing unit 105 stores the reserved word information (for example, “moving image”) as a new search target content. (Step S203).
  • the new reserved word information is stored in the storage unit 170, so that the reserved word information is updated.
  • the previous reserved word information “image” is switched to the new reserved word information “moving image” (step S204).
  • free word information is not output from the intention interpretation processing unit 104, so the word storage processing unit 105 reads the free word information (for example, “ABC”) stored in the storage unit 170 and performs command processing.
  • the command processing unit 106 receives command information from the intention interpretation processing unit 104, and receives the read free word information and new reserved word information from the word storage processing unit 105. Then, command processing corresponding to the command information is performed on the read free word information and new reserved word information (step S208). As described above, the command processing unit 106 mainly performs command processing other than search.
  • search processing unit 107 executes search processing (step S209).
  • the search processing unit 107 sets the search target content to “video” based on the “video” of the new reserved word information, and performs video search using “ABC” of the free word information read from the storage unit 170. Do.
  • step S209 The search result in step S209 is displayed on the display unit 140 by the display control unit 108.
  • the keyword associative search process ends.
  • step S202 the keyword associative search process when it is determined in step S202 that reserved word information is not included (No) will be described.
  • the user 700 searches for “image” using a free word “XYZ” different from the free word used for the previous image search.
  • the user 700 can omit the utterance of the reserved word “image” and the command “search”, which overlaps with the previous search. That is, the user 700 may speak “XYZ”.
  • step S201 Since it overlaps, detailed description of step S201, S202 is abbreviate
  • Voice information (for example, “XYZ”) based on the voice uttered by the user is transmitted from the voice recognition processing device 100 to the voice recognition unit 50 via the network 40.
  • the voice recognition unit 50 returns character string information based on the received voice information.
  • This character string information includes free word information (for example, “XYZ”), but does not include reserved word information and command information.
  • the returned character string information is received by the recognition result acquisition unit 103 and output to the intention interpretation processing unit 104.
  • the reserved word information is not included in the character string information, and the command information is not output from the voice processing unit 102. Accordingly, reserved word information and command information are not output from the intention interpretation processing unit 104.
  • step S202 it is determined in step S202 that the reserved word information is not included (No).
  • the intention interpretation processing unit 104 determines whether or not the character string information includes free word information based on the processing result in step S201 (step S205).
  • step S205 If it is determined in step S205 that the free word information is not included (No), the process proceeds to step S208.
  • step S205 When it is determined in step S205 that free word information is included (Yes), the word storage processing unit 105 stores the free word information (for example, “XYZ”) as new free word information. (Step S206).
  • the free word information for example, “XYZ”
  • the new free word information is stored in the storage unit 170, whereby the free word information is updated.
  • the previous free word information “ABC” is switched to the new free word information “XYZ” (step S207).
  • reserved word information and command information are not output from the intention interpretation processing unit 104, so the word storage processing unit 105 stores reserved word information (for example, “image”) and command information stored in the storage unit 170. (For example, “search”) is read and output to the command processing unit 106.
  • the command processing unit 106 receives the reserved word information and command information read from the storage unit 170 by the word storage processing unit 105 and new free word information (for example, “XYZ”). Then, command processing corresponding to the read command information is performed on the read reserved word information and new free word information (step S208).
  • the search processing unit 107 executes the search process (step S209).
  • the search processing unit 107 sets the search target content to “image” based on the “image” of the reserved word information read from the storage unit 170, and performs image search using “XYZ” of the new free word information. Do.
  • step S209 The search result in step S209 is displayed on the display unit 140 by the display control unit 108.
  • the keyword associative search process ends.
  • step S205 When it is determined in step S205 that free word information is not included (No), the search processing unit 107 proceeds to step S208 and performs normal command processing or search processing.
  • the speech recognition processing apparatus 100 includes the speech acquisition unit 101, the speech processing unit 102 that is an example of the first speech recognition unit, and the speech recognition that is an example of the second speech recognition unit.
  • Unit 50 intention interpretation processing unit 104 which is an example of a selection unit, storage unit 170, command processing unit 106 and search processing unit 107 which are examples of a processing unit.
  • the voice acquisition unit 101 is configured to acquire voice uttered by the user and output voice information.
  • the voice processing unit 102 is configured to convert voice information into command information which is an example of first information.
  • the voice recognition unit 50 is configured to convert voice information into character string information, which is an example of second information.
  • the intention interpretation processing unit 104 is configured to sort reserved word information, which is an example of third information, and free word information, which is an example of fourth information, from character string information.
  • the storage unit 170 is configured to store command information, reserved word information, and free word information.
  • the command processing unit 106 is configured to execute processing based on command information, reserved word information, and free word information. Then, if there is one or two pieces of missing information in command information, reserved word information, and free word information, the command processing unit 106 and the search processing unit 107 use the information stored in the storage unit 170 as the missing information. It is comprised so that it may complement and may perform a process.
  • the search processing unit 107 is configured to execute a search process based on the search command, reserved word information, and free word information.
  • the voice recognition unit 50 may be installed on the network 40, and the voice recognition processing apparatus 100 may include a transmission / reception unit 150 configured to communicate with the voice recognition unit 50 via the network 40.
  • the voice processing unit 102 may be configured to convert voice information into command information using “voice-command” correspondence information in which a plurality of command information and voice information that are set in advance are associated with each other.
  • the user 700 When the user 700 using the speech recognition processing device 100 configured as described above continuously performs the voice operation, the user 700 newly adds the previous utterance content without newly uttering the content uttered by the previous voice operation. New operations can be performed based on the content of the utterance. For example, when the user 700 continuously performs search processing, a new search based on the previous utterance content and the newly uttered content can be performed without uttering again the content input by voice operation in the previous search. It can be performed.
  • the user 700 utters “Search ABC image”, searches for “image” with the free word “ABC”, and subsequently searches for “ABC video”.
  • the utterance of the free word “ABC” which overlaps with the search of “No. This makes it possible to execute the same search process as when “Search ABC video” is uttered.
  • the user 700 utters “search ABC image”, searches for “image” with the free word “ABC”, and subsequently searches for “XYZ image”, which overlaps with the previous search.
  • the reserved word “image” and the command “search” may be omitted and only “XYZ” may be spoken. This makes it possible to execute the same search process as when “search for XYZ images” is uttered.
  • the speech recognition processing device 100 can reduce the complexity when the user 700 performs a voice operation, and can improve the operability.
  • the first embodiment has been described as an example of the technique disclosed in the present application.
  • the technology in the present disclosure is not limited to this, and can also be applied to embodiments in which changes, replacements, additions, omissions, and the like are performed.
  • the “voice-command” correspondence information includes, for example, command information corresponding to voice information such as “channel up”, “voice up”, “play”, “stop”, “change language”, “character display”, and the like. It may be registered.
  • the speech recognition processing apparatus 100 recognizes the free word “optical disc” and the command information “playback”. As a result, the video recorded on the optical disc is played back on the optical disc playback device on which the speech recognition processing device 100 is mounted.
  • the command information “stop” is recognized by the speech recognition processing device 100, and the optical disc playback device stops the playback of the optical disc.
  • the free word “optical disk” is stored in the storage unit 170 by the word storage processing unit 105, so the command processing unit 106 reads the newly input command information “playback” process from the storage unit 170. This is because it is executed for the free word “optical disk”.
  • the user 700 can control the operation of the optical disk playback apparatus by simply speaking “stop” without speaking “stop optical disk”.
  • the speech recognition processing apparatus 100 recognizes the free word information “Japanese” and the command information “character display”. Thereby, on the television 10 on which the speech recognition processing device 100 is mounted, the command “character display” for displaying Japanese subtitles on the display unit 140 of the television 10 is executed. Following this state, when the user 700 speaks “English”, the free word information “English” is voice-recognized by the voice recognition processing device 100. Then, the television 10 reads the command information “character display” from the storage unit 170, continues the operation of “character display” as it is, and changes the character displayed on the display unit 140 from “Japanese” to “English”. That is, the user 700 can change the display character of the television 10 from “Japanese” to “English” by simply speaking “English” without speaking “English character display”.
  • the voice recognition processing apparatus 100 reads out the information from the storage unit 170 and complements it, and executes the command process. Therefore, the user 700 overlaps with the previous voice operation. There is no need to repeatedly speak a word, and the complexity of voice operation is reduced and the operability is improved.
  • a reserved word is not included in the utterance of the user 700, but the command processing unit 106 can execute the command processing.
  • the intention interpretation processing unit 104 stores the fact that the reserved word or free word may not be included.
  • the information is transmitted to the processing unit 105 and the command processing unit 106 (search processing unit 107). Therefore, based on the information transmitted from the intention interpretation processing unit 104, the command processing unit 106 (search processing unit 107) determines whether command processing should be performed using a combination of free word information, reserved word information, and command information.
  • the command processing can be executed by determining whether the command processing should be performed in combination with the command information or the command processing should be performed in combination with the reserved word information and the command information. Further, the word storage processing unit 105 is prevented from reading unnecessary information from the storage unit 170. In the above example, the reserved word information is not included in the voice information, but the reserved word information is unnecessary, so the word storage processing unit 105 does not read the reserved word information from the storage unit 170.
  • the voice processing unit 102 may operate so as to output the information along with the command information to the subsequent stage.
  • the search target is not limited to “image” or “video”. Etc. may be the search target.
  • the voice uttered by the user 700 includes the command information “search” and the keyword, and the type of “search” is the Internet.
  • the speech recognition processing apparatus 100 performs a search using the keyword by the Internet search application. For example, if the user 700 says “Search ABC on the Internet”, the speech recognition processing apparatus 100 recognizes the voice “Search on the Internet” as “Search” by the Internet search application. Therefore, the user 700 can cause the television 10 to perform an Internet search using the keyword only by emitting the voice.
  • the voice recognition process if the voice uttered by the user 700 includes command information “search” and a keyword, and the type of the “search” is a search by a program guide application, the voice recognition is performed.
  • a search using the keyword is performed in the program guide application. For example, if the user 700 utters “Search ABC in the program guide”, the speech recognition processing apparatus 100 recognizes the voice “Search in the program guide” as “search” by the program guide application. For this reason, the user 700 can cause the television 10 to perform a program table search using the keyword only by speaking the voice.
  • the speech recognition processing apparatus 100 “Search” by the free word may be performed by all the applications including the free word, and the search results of all the searched applications may be displayed on the display unit 140.
  • the voice recognition process can be started by the method described above. Therefore, if the voice recognition process is started, the user 700 can perform the above-described search even while watching the program on the television 10.
  • the voice recognition unit 50 may be provided in the voice recognition processing device 100.
  • reserved word information may be read from the storage unit 170 to complement command processing, or command information may be read from the storage unit 170 to complement command processing.
  • reserved word information and free word information may be read from the storage unit 170 to complement command processing, or free word information and command information may be read from the storage unit 170 to complement command processing.
  • Each block shown in FIG. 2 may be configured as an independent circuit block, or may be configured such that software programmed to realize the operation of each block is executed by a processor.
  • This disclosure is applicable to a device that executes a processing operation instructed by a user.
  • the present disclosure is applicable to portable terminal devices, television receivers, personal computers, set-top boxes, video recorders, game machines, smartphones, tablet terminals, and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The present invention improves the operability of voice operation. To this end, this voice recognition processing device (100) comprises a voice processing unit (102) that converts voice information into command information. A voice recognition unit (50) converts the voice information into character string information. An intention interpretation processing unit (104) selects reserved word information and free word information from the character string information. A storage unit stores the command information, the reserved word information, and the free word information. A search processing unit (107) performs search processing that is based on the command information, the reserved word information, and the free word information. If one or two pieces of information are lacking among the command information, the reserved word information, and the free word information, the search processing unit (107) reads the lacking information from the storage unit (170) and performs search processing.

Description

音声認識処理装置、音声認識処理方法、および表示装置Speech recognition processing device, speech recognition processing method, and display device
 本開示は、ユーザが発した音声を認識して動作する音声認識処理装置、音声認識処理方法、および表示装置に関する。 The present disclosure relates to a speech recognition processing device, a speech recognition processing method, and a display device that operate by recognizing speech uttered by a user.
 特許文献1は、音声認識機能を有する音声入力装置を開示する。この音声入力装置は、ユーザが発した音声を受信し、受信した音声を解析することによりユーザの音声が示す命令を認識(音声認識)し、音声認識した命令に応じて機器を制御するように構成されている。すなわち、特許文献1の音声入力装置は、ユーザが任意に発した音声を音声認識し、その音声認識した結果である命令(コマンド)に応じて機器を制御することができる。 Patent Document 1 discloses a voice input device having a voice recognition function. The voice input device receives voice uttered by the user, analyzes the received voice, recognizes a command indicated by the user's voice (voice recognition), and controls the device according to the voice-recognized command. It is configured. That is, the voice input device of Patent Document 1 can recognize a voice arbitrarily generated by a user, and can control the device according to a command (command) that is a result of the voice recognition.
 例えば、この音声入力装置を使用するユーザは、テレビジョン受像機(以下、「テレビ」と記す)やPC(Personal Computer)などでブラウザを操作しているときに、ブラウザ上に表示されているハイパーテキストの選択を、この音声入力装置の音声認識機能を利用して行うことができる。また、ユーザは、検索サービスを提供するウェブサイト(検索サイト)上での検索を、この音声認識機能を利用して行うこともできる。 For example, when a user who uses this voice input device is operating the browser on a television receiver (hereinafter referred to as “TV”), a PC (Personal Computer), or the like, the hypertext displayed on the browser is displayed. The text can be selected using the voice recognition function of the voice input device. The user can also perform a search on a website (search site) that provides a search service by using this voice recognition function.
日本国特許第4812941号公報Japanese Patent No. 4812941
 本開示は、ユーザの操作性を向上する音声認識処理装置および音声認識処理方法を提供する。 This disclosure provides a voice recognition processing device and a voice recognition processing method that improve user operability.
 本開示における音声認識処理装置は、音声取得部と、第1音声認識部と、第2音声認識部と、選別部と、記憶部と、処理部と、を備えている。音声取得部は、ユーザが発する音声を取得して音声情報を出力するように構成されている。第1音声認識部は、音声情報を第1情報に変換するように構成されている。第2音声認識部は、音声情報を第2情報に変換するように構成されている。選別部は、第2情報から第3情報と第4情報とを選別するように構成されている。記憶部は、第1情報、第3情報、および第4情報を記憶するように構成されている。処理部は、第1情報、第3情報、および第4情報にもとづく処理を実行するように構成されている。そして、処理部は、第1情報、第3情報、および第4情報のうち1つまたは2つの不足情報があれば、その不足情報を記憶部に記憶された情報を用いて補完して処理を実行するように構成されている。 The speech recognition processing device according to the present disclosure includes a speech acquisition unit, a first speech recognition unit, a second speech recognition unit, a selection unit, a storage unit, and a processing unit. The voice acquisition unit is configured to acquire voice uttered by the user and output voice information. The first voice recognition unit is configured to convert voice information into first information. The second voice recognition unit is configured to convert voice information into second information. The sorting unit is configured to sort the third information and the fourth information from the second information. The storage unit is configured to store the first information, the third information, and the fourth information. The processing unit is configured to execute processing based on the first information, the third information, and the fourth information. Then, if there is one or two pieces of deficient information among the first information, the third information, and the fourth information, the processing unit complements the deficient information using the information stored in the storage unit and performs processing. Is configured to run.
 本開示における音声認識処理方法は、ユーザが発する音声を取得して音声情報に変換するステップと、音声情報を第1情報に変換するステップと、音声情報を第2情報に変換するステップと、第2情報から第3情報と第4情報とを選別するステップと、第1情報、第3情報、および第4情報を記憶部に記憶するステップと、第1情報、第3情報、および第4情報にもとづく処理を実行するステップと、第1情報、第3情報、および第4情報のうち1つまたは2つの不足情報があれば、記憶部に記憶された情報を用いて補完するステップと、を備える。 A speech recognition processing method according to the present disclosure includes a step of acquiring speech uttered by a user and converting it into speech information, a step of converting speech information into first information, a step of converting speech information into second information, Selecting the third information and the fourth information from the two information, storing the first information, the third information, and the fourth information in the storage unit, the first information, the third information, and the fourth information A step of executing a process based on the information, and a step of complementing using one or two missing information of the first information, the third information, and the fourth information using information stored in the storage unit, Prepare.
 本開示における表示装置は、音声取得部と、第1音声認識部と、第2音声認識部と、選別部と、記憶部と、処理部と、表示部と、を備えている。音声取得部は、ユーザが発する音声を取得して音声情報を出力するように構成されている。第1音声認識部は、音声情報を第1情報に変換するように構成されている。第2音声認識部は、音声情報を第2情報に変換するように構成されている。選別部は、第2情報から第3情報と第4情報とを選別するように構成されている。記憶部は、第1情報、第3情報、および第4情報を記憶するように構成されている。処理部は、第1情報、第3情報、および第4情報にもとづく処理を実行するように構成されている。表示部は、処理部における処理結果を表示するように構成されている。そして、処理部は、第1情報、第3情報、および第4情報のうち1つまたは2つの不足情報があれば、その不足情報を記憶部に記憶された情報を用いて補完して処理を実行するように構成されている。 The display device according to the present disclosure includes a voice acquisition unit, a first voice recognition unit, a second voice recognition unit, a selection unit, a storage unit, a processing unit, and a display unit. The voice acquisition unit is configured to acquire voice uttered by the user and output voice information. The first voice recognition unit is configured to convert voice information into first information. The second voice recognition unit is configured to convert voice information into second information. The sorting unit is configured to sort the third information and the fourth information from the second information. The storage unit is configured to store the first information, the third information, and the fourth information. The processing unit is configured to execute processing based on the first information, the third information, and the fourth information. The display unit is configured to display a processing result in the processing unit. Then, if there is one or two pieces of deficient information among the first information, the third information, and the fourth information, the processing unit complements the deficient information using the information stored in the storage unit and performs processing. Is configured to run.
 本開示における音声認識処理装置は、ユーザが音声操作するときの操作性を向上することができる。 The voice recognition processing device according to the present disclosure can improve operability when the user performs voice operation.
図1は、実施の形態1における音声認識処理システムを概略的に示す図である。FIG. 1 is a diagram schematically showing a speech recognition processing system according to the first embodiment. 図2は、実施の形態1における音声認識処理システムの一構成例を示すブロック図である。FIG. 2 is a block diagram illustrating a configuration example of the speech recognition processing system according to the first embodiment. 図3は、実施の形態1における音声認識処理システムで行うディクテーションの概要を示す図である。FIG. 3 is a diagram showing an outline of dictation performed by the speech recognition processing system according to the first embodiment. 図4は、実施の形態1における音声認識処理装置で行うキーワード単一検索処理の一動作例を示すフローチャートである。FIG. 4 is a flowchart showing an operation example of the keyword single search process performed by the speech recognition processing apparatus according to the first embodiment. 図5は、実施の形態1における音声認識処理装置で行うキーワード連想検索処理の一動作例を示すフローチャートである。FIG. 5 is a flowchart showing an operation example of the keyword associative search process performed by the speech recognition processing apparatus according to the first embodiment. 図6は、実施の形態1における音声認識処理装置で行う音声認識解釈処理の一動作例を示すフローチャートである。FIG. 6 is a flowchart illustrating an operation example of the speech recognition interpretation process performed by the speech recognition processing apparatus according to the first embodiment. 図7は、実施の形態1における音声認識処理装置の予約語テーブルの一例を概略的に示す図である。FIG. 7 is a diagram schematically illustrating an example of a reserved word table of the speech recognition processing device according to the first embodiment.
 以下、適宜図面を参照しながら、実施の形態を詳細に説明する。但し、必要以上に詳細な説明は省略する場合がある。例えば、既によく知られた事項の詳細説明や実質的に同一の構成に対する重複説明を省略する場合がある。これは、以下の説明が不必要に冗長になるのを避け、当業者の理解を容易にするためである。 Hereinafter, embodiments will be described in detail with reference to the drawings as appropriate. However, more detailed description than necessary may be omitted. For example, detailed descriptions of already well-known matters and repeated descriptions for substantially the same configuration may be omitted. This is to avoid the following description from becoming unnecessarily redundant and to facilitate understanding by those skilled in the art.
 なお、添付図面および以下の説明は、当業者が本開示を十分に理解するために提供されるのであって、これらにより特許請求の範囲に記載の主題を限定することは意図されていない。 It should be noted that the accompanying drawings and the following description are provided for those skilled in the art to fully understand the present disclosure, and are not intended to limit the subject matter described in the claims.
 (実施の形態1)
 以下、図1~図7を用いて、実施の形態1を説明する。なお、本実施の形態では、音声認識処理装置を備えた表示装置の一例としてテレビジョン受像機(テレビ)10を挙げているが、表示装置は何らテレビ10に限定されるものではない。例えば、PCやタブレット端末等であってもよい。
(Embodiment 1)
The first embodiment will be described below with reference to FIGS. In the present embodiment, a television receiver (television) 10 is cited as an example of a display device including a voice recognition processing device, but the display device is not limited to the television 10 at all. For example, a PC or a tablet terminal may be used.
 [1-1.構成]
 図1は、実施の形態1における音声認識処理システム11を概略的に示す図である。本実施の形態では、表示装置の一例であるテレビ10に音声認識処理装置が内蔵されている。
[1-1. Constitution]
FIG. 1 schematically shows a speech recognition processing system 11 according to the first embodiment. In the present embodiment, a speech recognition processing device is built in the television 10 which is an example of a display device.
 本実施の形態における音声認識処理システム11は、テレビ10と、音声認識部50と、を備える。また、音声認識処理システム11は、リモートコントローラ(以下、「リモコン」とも記す)20と携帯端末30の少なくとも一方を備えていてもよい。 The voice recognition processing system 11 in the present embodiment includes a television 10 and a voice recognition unit 50. The voice recognition processing system 11 may include at least one of a remote controller (hereinafter also referred to as “remote controller”) 20 and a portable terminal 30.
 テレビ10で音声認識処理装置が起動すると、テレビ10の表示部140には、入力映像信号や受信された放送信号等にもとづく映像とともに、音声認識アイコン201と、集音されている音声の音量を示すインジケータ202と、が表示される。これは、ユーザ700の音声にもとづくテレビ10の操作(以下、「音声操作」と記す)が可能な状態になったことをユーザ700に示すとともに、ユーザ700に発話を促すためである。 When the speech recognition processing device is activated on the television 10, the display unit 140 of the television 10 displays the sound recognition icon 201 and the volume of the collected sound along with the video based on the input video signal, the received broadcast signal, and the like. An indicator 202 is displayed. This is to indicate to the user 700 that the operation of the television 10 based on the voice of the user 700 (hereinafter referred to as “voice operation”) is possible and to prompt the user 700 to speak.
 ユーザ700が音声を発すると、その音声は、ユーザ700が使用するリモートコントローラ20や携帯端末30に内蔵されたマイクで集音され、テレビ10に転送される。そして、ユーザ700が発した音声がテレビ10に内蔵された音声認識処理装置で音声認識される。テレビ10では、その音声認識の結果に応じてテレビ10の制御が行われる。 When the user 700 utters a sound, the sound is collected by the microphone incorporated in the remote controller 20 or the portable terminal 30 used by the user 700 and transferred to the television 10. Then, the voice uttered by the user 700 is recognized by the voice recognition processing device built in the television 10. In the television 10, the television 10 is controlled according to the result of the voice recognition.
 テレビ10は、内蔵マイク130を備えていてもよい。その場合、テレビ10が備える内蔵マイク130に向かってユーザ700が発話すると、その音声は内蔵マイク130で集音され、音声認識処理装置で音声認識される。したがって、音声認識処理システム11を、リモートコントローラ20および携帯端末30を備えない構成にすることも可能である。 The television 10 may include a built-in microphone 130. In this case, when the user 700 speaks toward the built-in microphone 130 included in the television 10, the sound is collected by the built-in microphone 130 and recognized by the voice recognition processing device. Therefore, the voice recognition processing system 11 can be configured not to include the remote controller 20 and the portable terminal 30.
 また、テレビ10は、ネットワーク40を介して音声認識部50に接続されている。そして、テレビ10と音声認識部50との間で通信することができる。 Further, the television 10 is connected to the voice recognition unit 50 via the network 40. Communication between the television 10 and the voice recognition unit 50 is possible.
 図2は、実施の形態1における音声認識処理システム11の一構成例を示すブロック図である。 FIG. 2 is a block diagram showing a configuration example of the speech recognition processing system 11 according to the first embodiment.
 テレビ10は、音声認識処理装置100と、表示部140と、送受信部150と、チューナ160と、記憶部171と、内蔵マイク130と、無線通信部180と、を有する。 The television 10 includes a voice recognition processing device 100, a display unit 140, a transmission / reception unit 150, a tuner 160, a storage unit 171, a built-in microphone 130, and a wireless communication unit 180.
 音声認識処理装置100は、ユーザ700が発する音声を取得し、取得した音声を解析するように構成されている。そして、その音声が示すキーワードおよびコマンドを認識し、認識した結果に応じてテレビ10の制御を行うように構成されている。音声認識処理装置100の具体的な構成については後述する。 The voice recognition processing device 100 is configured to acquire a voice uttered by the user 700 and analyze the acquired voice. And it recognizes the keyword and command which the audio | voice shows, and it is comprised so that control of the television 10 may be performed according to the recognized result. A specific configuration of the speech recognition processing apparatus 100 will be described later.
 内蔵マイク130は、主に表示部140の表示面に対向する方向から来る音声を集音するように構成されたマイクである。すなわち、内蔵マイク130は、テレビ10の表示部140に対面しているユーザ700が発する音声を集音できるように集音方向が設定されており、ユーザ700が発した音声を集音することが可能である。内蔵マイク130は、テレビ10の筐体内に設けられていてもよく、図1に一例を示したようにテレビ10の筐体外に設置されていてもよい。 The built-in microphone 130 is a microphone configured to collect sound mainly coming from a direction facing the display surface of the display unit 140. That is, the built-in microphone 130 has a sound collection direction set so as to collect the sound emitted by the user 700 facing the display unit 140 of the television 10, and can collect the sound emitted by the user 700. Is possible. The built-in microphone 130 may be provided in the casing of the television 10 or may be installed outside the casing of the television 10 as shown as an example in FIG.
 リモートコントローラ20は、テレビ10をユーザ700が遠隔操作するためのコントローラである。リモートコントローラ20は、テレビ10の遠隔操作に必要な一般的な構成に加え、マイク21および入力部22を有する。マイク21は、ユーザ700が発した音声を集音し、音声信号を出力するように構成されている。入力部22は、ユーザ700が手動で行う入力操作を受け付け、入力操作に応じた入力信号を出力するように構成されている。入力部22は、例えばタッチパッドであるが、キーボードやボタン等であってもよい。マイク21で集音された音声により生じる音声信号、または、ユーザ700が入力部22に入力操作することで生じる入力信号は、例えば赤外線や電波等によってテレビ10に無線送信される。 The remote controller 20 is a controller for the user 700 to remotely operate the television 10. The remote controller 20 includes a microphone 21 and an input unit 22 in addition to a general configuration necessary for remote operation of the television 10. The microphone 21 is configured to collect a sound uttered by the user 700 and output a sound signal. The input unit 22 is configured to receive an input operation manually performed by the user 700 and output an input signal corresponding to the input operation. The input unit 22 is, for example, a touch pad, but may be a keyboard, a button, or the like. An audio signal generated by the sound collected by the microphone 21 or an input signal generated when the user 700 performs an input operation on the input unit 22 is wirelessly transmitted to the television 10 by, for example, infrared rays or radio waves.
 表示部140は、例えば液晶ディスプレイであるが、プラズマディスプレイ、または有機EL(ElectroLuminescence)ディスプレイ等であってもよい。表示部140は、表示制御部108によって制御され、外部からの入力映像信号やチューナ160で受信された放送信号等にもとづく画像が表示される。 The display unit 140 is, for example, a liquid crystal display, but may be a plasma display, an organic EL (ElectroLuminescence) display, or the like. The display unit 140 is controlled by the display control unit 108 and displays an image based on an externally input video signal, a broadcast signal received by the tuner 160, or the like.
 送受信部150は、ネットワーク40に接続されており、ネットワーク40に接続された外部機器(例えば、音声認識部50)と、ネットワーク40を通して通信を行うように構成されている。 The transmission / reception unit 150 is connected to the network 40 and is configured to communicate with an external device (for example, the voice recognition unit 50) connected to the network 40 through the network 40.
 チューナ160は、地上放送や衛星放送のテレビジョン放送信号をアンテナ(図示せず)を介して受信するように構成されている。チューナ160は、専用ケーブルを介して送信されるテレビジョン放送信号を受信するように構成されていてもよい。 The tuner 160 is configured to receive a terrestrial broadcast or satellite broadcast television broadcast signal via an antenna (not shown). The tuner 160 may be configured to receive a television broadcast signal transmitted via a dedicated cable.
 記憶部171は、例えば不揮発性の半導体メモリであるが、揮発性の半導体メモリ、またはハードディスク、等であってもよい。記憶部171は、テレビ10の各部の制御に用いられる情報(データ)やプログラム等を記憶している。 The storage unit 171 is, for example, a nonvolatile semiconductor memory, but may be a volatile semiconductor memory, a hard disk, or the like. The storage unit 171 stores information (data), a program, and the like used for controlling each unit of the television 10.
 携帯端末30は、例えばスマートフォンであり、テレビ10を遠隔操作するためのソフトウエアの動作が可能である。したがって、本実施の形態における音声認識処理システム11では、そのソフトウエアが動作している携帯端末30をテレビ10の遠隔操作に使用することができる。携帯端末30は、マイク31および入力部32を有する。マイク31は、携帯端末30に内蔵されたマイクであり、リモートコントローラ20に備えられたマイク21と同様に、ユーザ700が発した音声を集音し、音声信号を出力するように構成されている。入力部32は、ユーザ700が手動で行う入力操作を受け付け、入力操作に応じた入力信号を出力するように構成されている。入力部32は、例えばタッチパネルであるが、キーボードやボタン等であってもよい。そのソフトウエアが動作している携帯端末30は、リモートコントローラ20と同様に、マイク31で集音された音声による音声信号、または、ユーザ700が入力部32に入力操作することで生じる入力信号を、例えば赤外線や電波等によってテレビ10に無線送信する。 The mobile terminal 30 is, for example, a smartphone, and can operate software for remotely operating the television 10. Therefore, in the speech recognition processing system 11 in the present embodiment, the mobile terminal 30 on which the software is operating can be used for remote operation of the television 10. The portable terminal 30 has a microphone 31 and an input unit 32. The microphone 31 is a microphone built in the mobile terminal 30, and is configured to collect the sound emitted by the user 700 and output the sound signal, similarly to the microphone 21 provided in the remote controller 20. . The input unit 32 is configured to receive an input operation manually performed by the user 700 and output an input signal corresponding to the input operation. The input unit 32 is, for example, a touch panel, but may be a keyboard, a button, or the like. Similar to the remote controller 20, the mobile terminal 30 in which the software is operating receives an audio signal based on the sound collected by the microphone 31 or an input signal generated when the user 700 performs an input operation on the input unit 32. For example, wireless transmission to the television 10 by infrared rays or radio waves is performed.
 テレビ10と、リモートコントローラ20または携帯端末30とは、例えば、無線LAN(Local Area Network)、Bluetooth(登録商標)等の無線通信により接続されている。 The television 10 and the remote controller 20 or the portable terminal 30 are connected by wireless communication such as a wireless LAN (Local Area Network) or Bluetooth (registered trademark).
 ネットワーク40は、例えばインターネットであるが、他のネットワークであってもよい。 The network 40 is, for example, the Internet, but may be another network.
 音声認識部50は、ネットワーク40を介してテレビ10と接続されるサーバ(クラウド上のサーバ)である。音声認識部50は、テレビ10から送信されてくる音声情報を受信し、受信した音声情報を文字列に変換する。なお、この文字列は、複数の文字であってもよく、1文字であってもよい。そして、音声認識部50は、変換後の文字列を示す文字列情報を、音声認識の結果として、ネットワーク40を介してテレビ10に送信する。 The voice recognition unit 50 is a server (a server on the cloud) connected to the television 10 via the network 40. The voice recognition unit 50 receives voice information transmitted from the television 10 and converts the received voice information into a character string. The character string may be a plurality of characters or a single character. Then, the voice recognition unit 50 transmits character string information indicating the converted character string to the television 10 via the network 40 as a result of the voice recognition.
 音声認識処理装置100は、音声取得部101と、音声処理部102と、認識結果取得部103と、意図解釈処理部104と、ワード記憶処理部105と、コマンド処理部106と、検索処理部107と、表示制御部108と、操作受付部110と、記憶部170と、を有する。 The voice recognition processing device 100 includes a voice acquisition unit 101, a voice processing unit 102, a recognition result acquisition unit 103, an intention interpretation processing unit 104, a word storage processing unit 105, a command processing unit 106, and a search processing unit 107. A display control unit 108, an operation reception unit 110, and a storage unit 170.
 記憶部170は、例えば不揮発性の半導体メモリであるが、揮発性の半導体メモリ、またはハードディスク、等であってもよい。記憶部170は、ワード記憶処理部105によって制御され、任意にデータの書き込みと読み出しが可能である。また、記憶部170は、音声処理部102により参照される情報(例えば、後述する「音声-コマンド」対応情報)等も記憶している。「音声-コマンド」対応情報は、音声情報とコマンドとを対応付けた情報である。なお、記憶部170と記憶部171とは、一体に構成されていてもよい。 The storage unit 170 is, for example, a nonvolatile semiconductor memory, but may be a volatile semiconductor memory, a hard disk, or the like. The storage unit 170 is controlled by the word storage processing unit 105 and can arbitrarily write and read data. The storage unit 170 also stores information (for example, “voice-command” correspondence information described later) referred to by the voice processing unit 102 and the like. The “voice-command” correspondence information is information in which voice information is associated with a command. Note that the storage unit 170 and the storage unit 171 may be configured integrally.
 音声取得部101は、ユーザ700が発した音声による音声信号を取得する。音声取得部101は、ユーザ700が発した音声による音声信号を、テレビ10の内蔵マイク130から取得してもよいし、あるいは、リモートコントローラ20に内蔵されたマイク21、または携帯端末30に内蔵されたマイク31から、無線通信部180を介して取得してもよい。そして、音声取得部101は、その音声信号を、後段での各種処理に用いることができる音声情報に変換し、音声処理部102に出力する。なお、音声取得部101は、音声信号がデジタル信号であれば、その音声信号をそのまま音声情報として用いてもよい。 The voice acquisition unit 101 acquires a voice signal based on a voice uttered by the user 700. The voice acquisition unit 101 may acquire a voice signal based on voice generated by the user 700 from the built-in microphone 130 of the television 10, or may be built into the microphone 21 built into the remote controller 20 or the portable terminal 30. The microphone 31 may be acquired via the wireless communication unit 180. Then, the voice acquisition unit 101 converts the voice signal into voice information that can be used for various processes in the subsequent stage, and outputs the voice information to the voice processing unit 102. Note that if the audio signal is a digital signal, the audio acquisition unit 101 may use the audio signal as it is as audio information.
 音声処理部102は、「第1音声認識部」の一例である。音声処理部102は、音声情報を、「第1情報」の一例であるコマンド情報に変換するように構成されている。音声処理部102は、「コマンド認識処理」を行う。「コマンド認識処理」とは、音声取得部101から取得した音声情報に、予め設定されたコマンドが含まれているかどうかを判断し、含まれている場合はそのコマンドを特定する処理である。具体的には、音声処理部102は、音声取得部101から取得した音声情報にもとづき、予め記憶部170に記憶されている「音声-コマンド」対応情報を参照する。「音声-コマンド」対応情報は、音声情報と、テレビ10に対する指示情報であるコマンドとが関連付けられた対応表である。コマンドには複数の種類があり、それぞれのコマンドに互いに異なる音声情報が対応付けられている。音声処理部102は、「音声-コマンド」対応情報を参照し、音声取得部101から取得した音声情報に含まれるコマンドを特定できれば、音声認識の結果として、そのコマンドを表す情報(コマンド情報)を認識結果取得部103に出力する。 The voice processing unit 102 is an example of a “first voice recognition unit”. The voice processing unit 102 is configured to convert voice information into command information that is an example of “first information”. The voice processing unit 102 performs “command recognition processing”. The “command recognition process” is a process for determining whether or not a preset command is included in the voice information acquired from the voice acquisition unit 101 and specifying the command if included. Specifically, the voice processing unit 102 refers to the “voice-command” correspondence information stored in advance in the storage unit 170 based on the voice information acquired from the voice acquisition unit 101. The “voice-command” correspondence information is a correspondence table in which voice information and a command that is instruction information for the television 10 are associated with each other. There are a plurality of types of commands, and different voice information is associated with each command. If the voice processing unit 102 can identify a command included in the voice information acquired from the voice acquisition unit 101 with reference to the “voice-command” correspondence information, information (command information) representing the command is obtained as a result of voice recognition. The result is output to the recognition result acquisition unit 103.
 また、音声処理部102は、音声取得部101から取得した音声情報を、送受信部150からネットワーク40を介して音声認識部50に送信する。 Further, the voice processing unit 102 transmits the voice information acquired from the voice acquisition unit 101 from the transmission / reception unit 150 to the voice recognition unit 50 via the network 40.
 音声認識部50は、「第2音声認識部」の一例である。音声認識部50は、音声情報を、「第2情報」の一例である文字列情報に変換するように構成されており、「キーワード認識処理」を行う。音声認識部50は、テレビ10から送信されてくる音声情報を受信すると、キーワードとキーワード以外(例えば、助詞、等)とを区別するために、その音声情報を文節毎に区切り、各文節をそれぞれ文字列へ変換(以下、「ディクテーション」という)する。そして、音声認識部50は、ディクテーション後の文字列の情報(文字列情報)を、音声認識の結果としてテレビ10に送信する。音声認識部50は、受信した音声情報からコマンド以外の音声情報を取得してもよく、または、受信した音声情報からコマンド以外の音声情報を文字列に変換して返信してもよい。あるいは、テレビ10から音声認識部50へコマンドを除く音声情報を送信してもよい。 The voice recognition unit 50 is an example of a “second voice recognition unit”. The voice recognition unit 50 is configured to convert voice information into character string information that is an example of “second information”, and performs “keyword recognition processing”. When the voice recognition unit 50 receives the voice information transmitted from the television 10, the voice recognition unit 50 divides the voice information into phrases and distinguishes each phrase in order to distinguish a keyword from a keyword other than a keyword (for example, a particle). Convert to a character string (hereinafter referred to as “dictation”). Then, the voice recognition unit 50 transmits the dictated character string information (character string information) to the television 10 as a result of the voice recognition. The voice recognition unit 50 may acquire voice information other than the command from the received voice information, or may convert voice information other than the command from the received voice information into a character string and send it back. Alternatively, voice information excluding commands may be transmitted from the television 10 to the voice recognition unit 50.
 認識結果取得部103は、音声処理部102から、音声認識の結果としてのコマンド情報を取得する。また、認識結果取得部103は、音声認識部50から、音声認識の結果としての文字列情報を、ネットワーク40および送受信部150を介して、取得する。 The recognition result acquisition unit 103 acquires command information as a result of voice recognition from the voice processing unit 102. Further, the recognition result acquisition unit 103 acquires character string information as a result of voice recognition from the voice recognition unit 50 via the network 40 and the transmission / reception unit 150.
 意図解釈処理部104は、「選別部」の一例である。意図解釈処理部104は、文字列情報から、「第3情報」の一例である予約語情報と、「第4情報」の一例であるフリーワード情報と、を選別するように構成されている。意図解釈処理部104は、認識結果取得部103からコマンド情報と文字列情報とを取得すると、文字列情報から「フリーワード」と「予約語」を選別する。そして、選別されたフリーワードと予約語、およびコマンド情報にもとづき、ユーザ700が発話した音声操作の意図を特定するための意図解釈を行う。この動作の詳細は後述する。意図解釈処理部104は、意図解釈されたコマンド情報をコマンド処理部106に出力する。また、フリーワードを表すフリーワード情報、予約語を表す予約語情報、およびコマンド情報を、ワード記憶処理部105へ出力する。意図解釈処理部104は、フリーワード情報、および予約語情報をコマンド処理部106に出力してもよい。 The intention interpretation processing unit 104 is an example of a “selection unit”. The intention interpretation processing unit 104 is configured to select, from the character string information, reserved word information that is an example of “third information” and free word information that is an example of “fourth information”. When the intention interpretation processing unit 104 acquires the command information and the character string information from the recognition result acquisition unit 103, the intention interpretation processing unit 104 selects “free word” and “reserved word” from the character string information. Then, based on the selected free word, reserved word, and command information, intention interpretation for specifying the intention of the voice operation spoken by the user 700 is performed. Details of this operation will be described later. The intention interpretation processing unit 104 outputs the command information interpreted as intended to the command processing unit 106. Also, free word information representing free words, reserved word information representing reserved words, and command information are output to the word storage processing unit 105. The intention interpretation processing unit 104 may output free word information and reserved word information to the command processing unit 106.
 ワード記憶処理部105は、意図解釈処理部104から出力されるコマンド情報、フリーワード情報、予約語情報を、記憶部170に記憶する。 The word storage processing unit 105 stores the command information, free word information, and reserved word information output from the intention interpretation processing unit 104 in the storage unit 170.
 コマンド処理部106は、「処理部」の一例である。コマンド処理部106は、コマンド情報、予約語情報、およびフリーワード情報にもとづく処理を実行するように構成されている。コマンド処理部106は、意図解釈処理部104により意図解釈されたコマンド情報に対応するコマンド処理を行う。また、コマンド処理部106は、操作受付部110で受け付けられたユーザ操作に対応するコマンド処理を行う。 The command processing unit 106 is an example of a “processing unit”. The command processing unit 106 is configured to execute processing based on command information, reserved word information, and free word information. The command processing unit 106 performs command processing corresponding to the command information that is intentionally interpreted by the intention interpretation processing unit 104. Further, the command processing unit 106 performs command processing corresponding to the user operation received by the operation receiving unit 110.
 さらに、コマンド処理部106は、ワード記憶処理部105により記憶部170に記憶されたコマンド情報、フリーワード情報、および予約語情報、の1つまたは2つにもとづく新たなコマンド処理を行うこともある。すなわち、コマンド処理部106は、コマンド情報、予約語情報、およびフリーワード情報のうち1つまたは2つの不足情報があれば、その不足情報を記憶部170に記憶された情報を用いて補完して、コマンド処理を実行するように構成されている。この詳細は、後述する。 Further, the command processing unit 106 may perform new command processing based on one or two of command information, free word information, and reserved word information stored in the storage unit 170 by the word storage processing unit 105. . That is, if there is one or two pieces of missing information among command information, reserved word information, and free word information, the command processing unit 106 supplements the missing information using information stored in the storage unit 170. , Configured to execute command processing. Details of this will be described later.
 検索処理部107は、「処理部」の一例である。検索処理部107は、コマンド情報が検索コマンドであれば、予約語情報、およびフリーワード情報にもとづく検索処理を実行するように構成されている。検索処理部107は、コマンド情報が、予め設定されたアプリケーションに関連付けられた検索コマンドに対応したものであれば、そのアプリケーションで、フリーワード情報および予約語情報にもとづく検索を行う。 The search processing unit 107 is an example of a “processing unit”. If the command information is a search command, the search processing unit 107 is configured to execute a search process based on reserved word information and free word information. If the command information corresponds to a search command associated with a preset application, the search processing unit 107 performs a search based on free word information and reserved word information with the application.
 例えば、検索処理部107は、コマンド情報が、予め設定されたアプリケーションの一つであるインターネット検索アプリケーションに関連付けられた検索コマンドであれば、そのインターネット検索アプリケーションで、フリーワード情報および予約語情報にもとづく検索を行う。 For example, if the command information is a search command associated with an Internet search application that is one of preset applications, the search processing unit 107 uses the Internet search application based on free word information and reserved word information. Perform a search.
 あるいは、検索処理部107は、コマンド情報が、予め設定されたアプリケーションの一つである番組表アプリケーションに関連付けられた検索コマンドであれば、その番組表アプリケーションで、フリーワード情報および予約語情報にもとづく検索を行う。 Alternatively, if the command information is a search command associated with a program guide application that is one of preset applications, the search processing unit 107 uses the program guide application based on free word information and reserved word information. Perform a search.
 また、検索処理部107は、コマンド情報が、予め設定されたアプリケーションに関連付けられた検索コマンドでなければ、そのフリーワード情報および予約語情報にもとづく検索を行うことができる全てのアプリケーション(検索可能アプリケーション)で、そのフリーワード情報および予約語情報にもとづく検索を行う。 In addition, if the command information is not a search command associated with a preset application, the search processing unit 107 can search all applications (searchable applications) that can perform a search based on the free word information and reserved word information. ) To perform a search based on the free word information and reserved word information.
 なお、検索処理部107は、予約語情報およびフリーワード情報のうち1つまたは2つの不足情報があれば、その不足情報を記憶部170に記憶された情報を用いて補完して、検索処理を実行するように構成されている。また、不足情報がコマンド情報であり、直前のコマンド処理が検索処理部107における検索処理であれば、検索処理を再度実行する。 Note that if there is one or two pieces of missing information of reserved word information and free word information, the search processing unit 107 complements the missing information using information stored in the storage unit 170 to perform the search processing. Is configured to run. If the shortage information is command information and the immediately preceding command process is a search process in the search processing unit 107, the search process is executed again.
 表示制御部108は、検索処理部107における検索の結果を、表示部140に表示する。例えば、表示制御部108は、インターネット検索アプリケーションでのキーワード検索の結果や、番組表アプリケーションでのキーワード検索の結果、または検索可能アプリケーションでのキーワード検索の結果を、表示部140に表示する。 The display control unit 108 displays the search result in the search processing unit 107 on the display unit 140. For example, the display control unit 108 displays the keyword search result in the Internet search application, the keyword search result in the program guide application, or the keyword search result in the searchable application on the display unit 140.
 操作受付部110は、ユーザ700がリモートコントローラ20の入力部22で行った入力操作により生じる入力信号、または、ユーザ700が携帯端末30の入力部32で行った入力操作による入力信号を、リモートコントローラ20または携帯端末30から無線通信部180を介して受信する。こうして、操作受付部110は、ユーザ700が行った操作(ユーザ操作)を受け付ける。 The operation receiving unit 110 receives an input signal generated by an input operation performed by the user 700 using the input unit 22 of the remote controller 20 or an input signal generated by the user 700 using the input unit 32 of the portable terminal 30. 20 or from the mobile terminal 30 via the wireless communication unit 180. In this way, the operation reception unit 110 receives an operation (user operation) performed by the user 700.
 [1-2.動作]
 次に、本実施の形態におけるテレビ10の音声認識処理装置100の動作について説明する。
[1-2. Operation]
Next, the operation of the speech recognition processing device 100 of the television 10 in the present embodiment will be described.
 まず、テレビ10の音声認識処理装置100による音声認識処理の開始方法について説明する。音声認識処理の開始方法としては、主に、以下の2つの方法が挙げられる。 First, a method for starting speech recognition processing by the speech recognition processing device 100 of the television 10 will be described. As a method for starting the speech recognition process, there are mainly the following two methods.
 1つ目の開始方法は、次の通りである。ユーザ700は、音声認識処理を開始するために、リモートコントローラ20に設けられた入力部22の1つであるマイクボタン(図示せず)を押す。ユーザ700が、リモートコントローラ20のマイクボタンを押せば、テレビ10では、操作受付部110が、リモートコントローラ20のマイクボタンが押されたことを受け付ける。そして、テレビ10は、テレビ10のスピーカ(図示せず)の音量を、予め設定された音量に変更する。この音量は、マイク21による音声認識を妨げない程度の、十分に小さい音量である。そして、テレビ10のスピーカの音量が予め設定された音量になれば、音声認識処理装置100は音声認識処理を開始する。このとき、テレビ10は、スピーカの音量が予め設定された音量以下であれば、上記の音量調整を行う必要はないので、音量をそのままにする。 The first starting method is as follows. The user 700 presses a microphone button (not shown) that is one of the input units 22 provided in the remote controller 20 in order to start the voice recognition process. When the user 700 presses the microphone button of the remote controller 20, in the television 10, the operation reception unit 110 receives that the microphone button of the remote controller 20 has been pressed. Then, the television 10 changes the volume of a speaker (not shown) of the television 10 to a preset volume. This volume is a sufficiently small volume that does not hinder voice recognition by the microphone 21. Then, when the volume of the speaker of the television 10 reaches a preset volume, the voice recognition processing device 100 starts the voice recognition process. At this time, if the volume of the speaker is equal to or lower than the preset volume, the television 10 does not need to perform the above volume adjustment, so the volume remains as it is.
 なお、この方法には、リモートコントローラ20に代えて、携帯端末30(例えば、タッチパネルを備えるスマートフォン)を使用することもできる。その場合、ユーザ700は、携帯端末30に備えられたソフトウエア(テレビ10を音声操作するためのソフトウエア)を起動し、そのソフトウエアが動作することでタッチパネルに表示されるマイクボタンを押す。このユーザ動作が、リモートコントローラ20のマイクボタンを押すユーザ動作に相当する。これにより、音声認識処理装置100は音声認識処理を開始する。 In this method, a mobile terminal 30 (for example, a smartphone having a touch panel) can be used instead of the remote controller 20. In that case, the user 700 activates software (software for operating the television 10 by voice) provided in the mobile terminal 30 and presses a microphone button displayed on the touch panel when the software operates. This user operation corresponds to the user operation of pressing the microphone button of the remote controller 20. Thereby, the speech recognition processing apparatus 100 starts the speech recognition process.
 2つ目の開始方法は、次の通りである。ユーザ700は、テレビ10の内蔵マイク130に対して、予め設定された音声認識処理を開始するコマンド(開始コマンド)を表す音声(例えば、「音声操作開始」、等)を発話する。内蔵マイク130により集音された音声は予め設定された開始コマンドである、と音声認識処理装置100が認識すれば、テレビ10は、上述と同様にスピーカの音量を予め設定された音量に変更し、音声認識処理装置100による音声認識処理が開始される。 The second starting method is as follows. The user 700 utters a voice (for example, “speech operation start”, etc.) representing a command (start command) for starting a preset voice recognition process to the built-in microphone 130 of the television 10. If the voice recognition processing device 100 recognizes that the voice collected by the built-in microphone 130 is a preset start command, the television 10 changes the volume of the speaker to a preset volume as described above. Then, the voice recognition processing by the voice recognition processing device 100 is started.
 なお、上記の方法を組み合わせて、音声認識処理の開始方法としてもよい。 Note that a combination of the above methods may be used as a method for starting speech recognition processing.
 なお、テレビ10におけるこれらの制御は、テレビ10の各ブロックを制御する制御部(図示せず)によって行われるものとする。 Note that these controls in the television 10 are performed by a control unit (not shown) that controls each block of the television 10.
 音声認識処理装置100による音声認識処理が開始されれば、表示制御部108は、ユーザ700に発話を促すために、音声認識処理が開始されユーザ700による音声操作が可能になったことを示す音声認識アイコン201と、集音されている音声の音量を示すインジケータ202とを、表示部140の画像表示面に表示する。 When the voice recognition processing by the voice recognition processing device 100 is started, the display control unit 108 starts the voice recognition processing and prompts the user 700 to perform voice operation in order to prompt the user 700 to speak. A recognition icon 201 and an indicator 202 indicating the volume of the collected sound are displayed on the image display surface of the display unit 140.
 なお、表示制御部108は、音声認識アイコン201に代えて、音声認識処理が開始されたことを示すメッセージを表示部140に表示してもよい。あるいは、音声認識処理が開始されたことを示すメッセージをスピーカから音声で出力してもよい。 Note that the display control unit 108 may display a message indicating that the voice recognition processing has started on the display unit 140 instead of the voice recognition icon 201. Alternatively, a message indicating that the voice recognition process has been started may be output by voice from a speaker.
 なお、音声認識アイコン201およびインジケータ202は、何ら図1に示すデザインに限定されるものではない。目的とする効果が得られるものであれば、どのようなデザインであってもよい。 Note that the voice recognition icon 201 and the indicator 202 are not limited to the design shown in FIG. Any design may be used as long as the desired effect can be obtained.
 次に、テレビ10の音声認識処理装置100で行う音声認識処理について説明する。 Next, the voice recognition process performed by the voice recognition processing device 100 of the television 10 will be described.
 本実施の形態において、音声認識処理装置100は、2種類の音声認識処理を行う。1つは、予め設定されているコマンドに対応する音声を認識するための音声認識処理(コマンド認識処理)である。もう1つは、予め設定されているコマンド以外のキーワードを認識するための音声認識処理(キーワード認識処理)である。 In this embodiment, the speech recognition processing apparatus 100 performs two types of speech recognition processing. One is speech recognition processing (command recognition processing) for recognizing speech corresponding to a preset command. The other is voice recognition processing (keyword recognition processing) for recognizing keywords other than preset commands.
 コマンド認識処理は、上述したように、音声処理部102で行われる。音声処理部102は、テレビ10に対してユーザ700が発した音声にもとづく音声情報を、予め記憶部170に記憶された「音声-コマンド」対応情報と比較する。そして、その音声情報に、「音声-コマンド」対応情報に登録されたコマンドがあれば、そのコマンドを特定する。なお、「音声-コマンド」対応情報には、テレビ10を操作するための様々なコマンドが登録されており、例えば、フリーワード検索の操作コマンド等も登録されている。 The command recognition process is performed by the voice processing unit 102 as described above. The voice processing unit 102 compares the voice information based on the voice uttered by the user 700 to the television 10 with the “voice-command” correspondence information stored in the storage unit 170 in advance. If the voice information includes a command registered in the “voice-command” correspondence information, the command is specified. In the “voice-command” correspondence information, various commands for operating the television 10 are registered. For example, free word search operation commands and the like are also registered.
 キーワード認識処理は、上述したように、ネットワーク40を介してテレビ10に接続されている音声認識部50を利用して行われる。音声認識部50は、音声情報を、テレビ10からネットワーク40を介して取得する。そして、音声認識部50は、取得した音声情報を文節毎に区切り、キーワードとキーワード以外(例えば、助詞、等)とに分ける。こうして、音声認識部50はディクテーションを行う。音声認識部50は、ディクテーションを行う際に、音声情報と文字列(1文字も含む)とを対応付けたデータベースを用いる。音声認識部50は、取得した音声情報を、そのデータベースと比較することによりキーワードとキーワード以外とに分離し、それぞれを文字列に変換する。 The keyword recognition process is performed using the voice recognition unit 50 connected to the television 10 via the network 40 as described above. The voice recognition unit 50 acquires voice information from the television 10 via the network 40. Then, the voice recognition unit 50 divides the acquired voice information into phrases and divides the information into keywords and other than keywords (for example, particles, etc.). Thus, the voice recognition unit 50 performs dictation. The voice recognition unit 50 uses a database in which voice information and a character string (including one character) are associated when performing dictation. The voice recognition unit 50 separates the acquired voice information into a keyword and other than the keyword by comparing with the database, and converts each into a character string.
 なお、本実施の形態では、音声認識部50は、音声取得部101で取得された全ての音声(音声情報)をテレビ10から受信し、それら全ての音声情報に対してディクテーションを行い、その結果の全ての文字列情報をテレビ10へ送信するように構成されている。しかし、テレビ10の音声処理部102は、「音声-コマンド」対応情報によって音声認識されたコマンド以外の音声情報を音声認識部50に送信するように構成されていてもよい。 In the present embodiment, the voice recognition unit 50 receives all voices (voice information) acquired by the voice acquisition unit 101 from the television 10, performs dictation on all the voice information, and results thereof. Are transmitted to the television 10. However, the voice processing unit 102 of the television 10 may be configured to transmit voice information other than the command recognized by the “voice-command” correspondence information to the voice recognition unit 50.
 次に、図3を用いてキーワード認識処理について説明する。 Next, the keyword recognition process will be described with reference to FIG.
 図3は、実施の形態1における音声認識処理システム11で行うディクテーションの概要を示す図である。 FIG. 3 is a diagram showing an outline of dictation performed by the speech recognition processing system 11 according to the first embodiment.
 図3には、テレビ10の表示部140にウェブブラウザが表示された状態を示す。例えば、ユーザ700がウェブブラウザのインターネット検索アプリケーションでキーワードによる検索(キーワード検索)を行うとき、音声認識処理装置100で音声認識処理が開始すると、図3に一例として示す画像が表示部140に表示される。 FIG. 3 shows a state in which the web browser is displayed on the display unit 140 of the television 10. For example, when the user 700 performs a search by keyword (keyword search) using the Internet search application of the web browser, when the voice recognition processing is started in the voice recognition processing device 100, an image shown as an example in FIG. The
 入力欄203は、ウェブブラウザ上で検索に用いるキーワードを入力するための領域である。入力欄203にカーソルが表示されていれば、ユーザ700は、入力欄203にキーワードを入力することができる。 The input field 203 is an area for inputting a keyword used for a search on a web browser. If the cursor is displayed in the input field 203, the user 700 can input a keyword in the input field 203.
 この状態で、ユーザ700がリモートコントローラ20または携帯端末30またはテレビ10の内蔵マイク130に向かって発話すると、その音声による音声信号は、音声取得部101に入力され、音声情報に変換される。そして、その音声情報は、テレビ10から、ネットワーク40を介して音声認識部50に送信される。例えば、ユーザ700が「ABC」と発話すれば、その音声にもとづく音声情報が、テレビ10から音声認識部50に送信される。 In this state, when the user 700 speaks toward the remote controller 20, the portable terminal 30, or the built-in microphone 130 of the television 10, a voice signal based on the voice is input to the voice acquisition unit 101 and converted into voice information. Then, the voice information is transmitted from the television 10 to the voice recognition unit 50 via the network 40. For example, if the user 700 speaks “ABC”, audio information based on the audio is transmitted from the television 10 to the audio recognition unit 50.
 音声認識部50は、テレビ10から受信した音声情報を、データベースと比較することで文字列に変換する。そして、音声認識部50は、受信した音声情報による音声認識の結果として、その文字列の情報(文字列情報)を、ネットワーク40を介してテレビ10に送信する。音声認識部50は、受信した音声情報が「ABC」という音声によるものであれば、その音声情報をデータベースと比較して「ABC」という文字列に変換し、その文字列情報をテレビ10に送信する。 The voice recognition unit 50 converts voice information received from the television 10 into a character string by comparing it with a database. Then, the voice recognition unit 50 transmits the character string information (character string information) to the television 10 via the network 40 as a result of the voice recognition based on the received voice information. If the received voice information is based on the voice “ABC”, the voice recognition unit 50 compares the voice information with the database to convert it to a character string “ABC”, and transmits the character string information to the television 10. To do.
 テレビ10は、音声認識部50から文字列情報を受信すると、その文字列情報にもとづき認識結果取得部103、意図解釈処理部104、コマンド処理部106、表示制御部108を動作させて、その文字列情報に対応した文字列を入力欄203に表示する。例えば、テレビ10は、「ABC」という文字列に対応した文字列情報を音声認識部50から受信すると、入力欄203に「ABC」という文字列を表示する。 Upon receiving the character string information from the voice recognition unit 50, the television 10 operates the recognition result acquisition unit 103, the intention interpretation processing unit 104, the command processing unit 106, and the display control unit 108 based on the character string information, A character string corresponding to the column information is displayed in the input field 203. For example, when receiving the character string information corresponding to the character string “ABC” from the voice recognition unit 50, the television 10 displays the character string “ABC” in the input field 203.
 そして、テレビ10の表示部140に表示されたウェブブラウザは、入力欄203に表示された文字列によるキーワード検索を行う。 The web browser displayed on the display unit 140 of the television 10 performs a keyword search using the character string displayed in the input field 203.
 次に、本実施の形態の音声認識処理装置100で行うキーワード単一検索処理とキーワード連想検索処理について、図4~図7を用いて説明する。 Next, the keyword single search process and the keyword associative search process performed by the speech recognition processing apparatus 100 according to the present embodiment will be described with reference to FIGS.
 図4は、実施の形態1における音声認識処理装置100で行うキーワード単一検索処理の一動作例を示すフローチャートである。 FIG. 4 is a flowchart illustrating an operation example of the keyword single search process performed by the speech recognition processing apparatus 100 according to the first embodiment.
 図5は、実施の形態1における音声認識処理装置100で行うキーワード連想検索処理の一動作例を示すフローチャートである。 FIG. 5 is a flowchart illustrating an operation example of the keyword association search process performed by the speech recognition processing apparatus 100 according to the first embodiment.
 図6は、実施の形態1における音声認識処理装置100で行う音声認識解釈処理の一動作例を示すフローチャートである。図6に示すフローチャートは、図4および図5に示した各検索処理における音声認識解釈処理ステップの詳細を示すフローチャートである。 FIG. 6 is a flowchart showing an operation example of the speech recognition interpretation process performed by the speech recognition processing apparatus 100 according to the first embodiment. The flowchart shown in FIG. 6 is a flowchart showing details of the speech recognition / interpretation processing step in each search process shown in FIGS.
 図7は、実施の形態1における音声認識処理装置100の予約語テーブルの一例を概略的に示す図である。 FIG. 7 is a diagram schematically illustrating an example of a reserved word table of the speech recognition processing apparatus 100 according to the first embodiment.
 本実施の形態における音声認識処理装置100では、図4に示すキーワード単一検索処理の音声認識解釈処理(ステップS101)と、図5に示すキーワード連想検索処理の音声認識解釈処理(ステップS201)とで、実質的に同じ処理を行う。まず、この音声認識解釈処理を、図6を用いて説明する。 In the speech recognition processing apparatus 100 according to the present embodiment, the speech recognition interpretation process (step S101) of the keyword single search process shown in FIG. 4 and the speech recognition interpretation process (step S201) of the keyword associative search process shown in FIG. Then, substantially the same processing is performed. First, the speech recognition interpretation process will be described with reference to FIG.
 上述したように、テレビ10では、ユーザ700が、例えばリモートコントローラ20のマイクボタンを押す等することで、音声認識処理装置100の音声認識処理が開始される。 As described above, in the television 10, when the user 700 presses the microphone button of the remote controller 20, for example, the voice recognition processing of the voice recognition processing device 100 is started.
 この状態で、ユーザ700が発話すると、ユーザ700の音声は、内蔵マイク130、またはリモートコントローラ20のマイク21、または携帯端末30のマイク31、により音声信号に変換され、その音声信号が音声取得部101に入力される。こうして音声取得部101はユーザ700の音声信号を取得する(ステップS301)。 When the user 700 speaks in this state, the voice of the user 700 is converted into a voice signal by the built-in microphone 130, the microphone 21 of the remote controller 20, or the microphone 31 of the portable terminal 30, and the voice signal is converted into a voice acquisition unit. 101. Thus, the voice acquisition unit 101 acquires the voice signal of the user 700 (step S301).
 音声取得部101は、取得したユーザ700の音声信号を、後段での各種処理に用いることができる音声情報に変換する。ユーザ700が、例えば「ABCの画像を検索」と発話すれば、音声取得部101はその音声に基づく音声情報を出力する。 The voice acquisition unit 101 converts the acquired voice signal of the user 700 into voice information that can be used for various processes in the subsequent stage. If the user 700 speaks, for example, “Search ABC image”, the voice acquisition unit 101 outputs voice information based on the voice.
 音声処理部102は、音声取得部101から出力される音声情報を、予め記憶部170に記憶された「音声-コマンド」対応情報と比較する。そして、音声取得部101から出力される音声情報に、「音声-コマンド」対応情報に登録されたコマンドに該当するものがあるかどうかを調べる(ステップS302)。 The voice processing unit 102 compares the voice information output from the voice acquisition unit 101 with the “voice-command” correspondence information stored in the storage unit 170 in advance. Then, it is checked whether or not the voice information output from the voice acquisition unit 101 corresponds to the command registered in the “voice-command” correspondence information (step S302).
 例えば、音声取得部101から出力される音声情報に、ユーザ700が発した「検索」という言葉にもとづく音声情報が含まれ、「音声-コマンド」対応情報にコマンド情報として「検索」が登録されていれば、音声処理部102は、音声情報に「検索」のコマンドが含まれていると判断する。 For example, the voice information output from the voice acquisition unit 101 includes voice information based on the word “search” issued by the user 700, and “search” is registered as command information in the “voice-command” correspondence information. Then, the voice processing unit 102 determines that the “search” command is included in the voice information.
 「音声-コマンド」対応情報には、テレビ10の動作や、表示部140に表示されているアプリケーションの動作、等に必要なコマンドが登録されている。これらのコマンド情報には、例えば、「検索」、「チャンネルアップ」、「音声アップ」、「プレイ」、「ストップ」、「ことば変換」、「文字表示」、等の音声情報に対応するコマンド情報が含まれている。 In the “voice-command” correspondence information, commands necessary for the operation of the television 10 and the operation of the application displayed on the display unit 140 are registered. The command information includes command information corresponding to voice information such as “search”, “channel up”, “voice up”, “play”, “stop”, “word conversion”, “character display”, etc. It is included.
 なお、「音声-コマンド」対応情報は、コマンド情報の追加、削除等による更新が可能である。例えば、ユーザ700が新たなコマンド情報を「音声-コマンド」対応情報に追加することができる。あるいは、ネットワーク40を介して新たなコマンド情報を「音声-コマンド」対応情報に追加することもできる。これにより音声認識処理装置100は、最新の「音声-コマンド」対応情報にもとづく音声認識処理を行うことが可能である。 Note that the “voice-command” correspondence information can be updated by adding or deleting command information. For example, the user 700 can add new command information to the “voice-command” correspondence information. Alternatively, new command information can be added to the “voice-command” correspondence information via the network 40. Thus, the speech recognition processing apparatus 100 can perform speech recognition processing based on the latest “voice-command” correspondence information.
 また、ステップS302では、音声処理部102は、音声取得部101から出力される音声情報を、送受信部150からネットワーク40を介して音声認識部50に送信する。 In step S302, the voice processing unit 102 transmits the voice information output from the voice acquisition unit 101 from the transmission / reception unit 150 to the voice recognition unit 50 via the network 40.
 音声認識部50は、受信した音声情報を、キーワードとキーワード以外(例えば、助詞、等)とに区切られた文字列に変換する。そのために、音声認識部50は、受信した音声情報にもとづくディクテーションを行う。 The voice recognition unit 50 converts the received voice information into a character string delimited by a keyword and a keyword (for example, a particle). Therefore, the voice recognition unit 50 performs dictation based on the received voice information.
 音声認識部50は、キーワードと文字列とを対応付けたデータベースと、受信した音声情報と、を比較する。データベースに登録されたキーワードが、受信した音声情報に含まれていれば、そのキーワードに対応する文字列(単語も含む)を選択する。このようにして音声認識部50は、ディクテーションを行い、受信した音声情報を文字列に変換する。例えば、ユーザ700が発話した「ABCの画像を検索」という音声にもとづく音声情報を音声認識部50が受信すれば、音声認識部50は、ディクテーションによりその音声情報を、「ABC」、「の」、「画像」、「を」、「検索」という文字列に変換する。音声認識部50は、変換された各文字列を表す文字列情報を、ネットワーク40を介してテレビ10へ送信する。 The voice recognition unit 50 compares the database in which keywords and character strings are associated with the received voice information. If a keyword registered in the database is included in the received voice information, a character string (including a word) corresponding to the keyword is selected. In this way, the voice recognition unit 50 performs dictation and converts the received voice information into a character string. For example, if the voice recognition unit 50 receives voice information based on the voice “search for ABC image” uttered by the user 700, the voice recognition unit 50 converts the voice information into “ABC” and “NO” by dictation. , “Image”, “to”, and “search”. The voice recognition unit 50 transmits character string information representing each converted character string to the television 10 via the network 40.
 このデータベースは、音声認識部50に備えられているが、ネットワーク40上の他の場所にあってもよい。また、このデータベースは、定期的または不定期にキーワードの情報が更新されるように構成されていてもよい。 This database is provided in the voice recognition unit 50, but may be in another location on the network 40. The database may be configured such that keyword information is updated regularly or irregularly.
 テレビ10の認識結果取得部103は、音声処理部102から音声認識の結果として出力されるコマンド情報と、音声認識部50から音声認識の結果として送信されてくる文字列情報とを取得し、意図解釈処理部104へ出力する。 The recognition result acquisition unit 103 of the television 10 acquires command information output as a result of speech recognition from the speech processing unit 102 and character string information transmitted as a result of speech recognition from the speech recognition unit 50, The data is output to the interpretation processing unit 104.
 意図解釈処理部104は、認識結果取得部103から取得したコマンド情報と文字列情報とにもとづき、ユーザ700が発話した音声操作の意図を特定するための意図解釈を行う(ステップS303)。 The intention interpretation processing unit 104 performs intention interpretation for specifying the intention of the voice operation uttered by the user 700 based on the command information and the character string information acquired from the recognition result acquisition unit 103 (step S303).
 意図解釈処理部104は、意図解釈のために文字列情報の選別を行う。この選別の種類には、フリーワード、予約語、コマンド、がある。意図解釈処理部104は、文字列情報にコマンド情報と重複するものがあれば、それをコマンドと判断し、選別する。また、図7に一例を示す予約語テーブルにもとづき、文字列情報から予約語を選別する。残った文字列情報から、キーワードに該当しない助詞等の文字列を除くことで、フリーワードを選別する。 The intention interpretation processing unit 104 performs selection of character string information for intention interpretation. The selection types include free words, reserved words, and commands. If there is a character string information that overlaps with command information, the intention interpretation processing unit 104 determines that it is a command and selects it. Further, the reserved words are selected from the character string information based on the reserved word table shown as an example in FIG. Free words are selected by removing character strings such as particles that do not correspond to keywords from the remaining character string information.
 意図解釈処理部104は、例えば、「ABC」、「の」、「画像」、「を」、「検索」、といった文字列情報と、「検索」を表すコマンド情報と、を取得したときは、「ABC」をフリーワードに、「画像」を予約語に、「検索」をコマンドに、それぞれ選別する。意図解釈処理部104でこのような意図解釈が行われることで、音声認識処理装置100は、ユーザ700の意図(ユーザ700が発話した音声操作の意図)にもとづく動作が可能になる。例えば、音声認識処理装置100は、予約語「画像」に関して、フリーワード「ABC」を用いた、コマンド「検索」、を実行することができる。 When the intention interpretation processing unit 104 acquires, for example, character string information such as “ABC”, “NO”, “image”, “O”, “search”, and command information indicating “search”, “ABC” is selected as a free word, “image” as a reserved word, and “search” as a command. By performing the intention interpretation in the intention interpretation processing unit 104, the speech recognition processing apparatus 100 can perform an operation based on the intention of the user 700 (the intention of the voice operation spoken by the user 700). For example, the speech recognition processing apparatus 100 can execute the command “search” using the free word “ABC” for the reserved word “image”.
 なお、意図解釈処理部104は、図7に一例として示す予約語テーブルと文字列情報とを比較し、文字列情報の中に予約語テーブルに登録された用語が含まれていれば、その用語を予約語として文字列情報から選別する。予約語は、例えば図7に一例を示すような、「画像」、「動画」、「番組」、「Web」、等のあらかじめ定められた用語である。しかし、予約語は何らこれらの用語に限定されるものではない。 Note that the intention interpretation processing unit 104 compares the reserved word table shown in FIG. 7 with the character string information as an example, and if the character string information includes a term registered in the reserved word table, the term Are selected from the string information as reserved words. The reserved word is a predetermined term such as “image”, “moving image”, “program”, “Web”, etc. as shown in an example in FIG. However, the reserved words are not limited to these terms.
 なお、意図解釈処理部104は、文字列情報に含まれる助詞等の文字列を用いて意図解釈を行ってもよい。 Note that the intention interpretation processing unit 104 may perform intention interpretation using a character string such as a particle included in the character string information.
 このようにして、意図解釈処理部104は、音声認識解釈処理(図4に示すステップS101、および図5に示すステップS201)を実行する。 In this way, the intention interpretation processing unit 104 executes the speech recognition interpretation process (step S101 shown in FIG. 4 and step S201 shown in FIG. 5).
 次に、本実施の形態におけるキーワード単一検索処理を、図4を用いて説明する。 Next, the keyword single search process in the present embodiment will be described with reference to FIG.
 意図解釈処理部104は、ユーザ700が発した音声にもとづき、図6に示した音声認識解釈処理を実行する(ステップS101)。重複するので、ステップS101の詳細な説明は省略する。 The intention interpretation processing unit 104 executes the speech recognition interpretation process shown in FIG. 6 based on the voice uttered by the user 700 (step S101). Since they overlap, detailed description of step S101 is omitted.
 意図解釈処理部104は、ステップS101での処理結果にもとづき、文字列情報に予約語情報が含まれているか否かを判断する(ステップS102)。 The intention interpretation processing unit 104 determines whether or not the reserved word information is included in the character string information based on the processing result in step S101 (step S102).
 ステップS102において予約語情報は含まれていないと判断されたとき(No)は、ステップS104へ進む。 If it is determined in step S102 that no reserved word information is included (No), the process proceeds to step S104.
 ステップS102において予約語情報が含まれていると判断されたとき(Yes)は、その予約語情報を、ワード記憶処理部105が記憶部170に記憶する(ステップS103)。上述した例では、予約語情報の「画像」が記憶部170に記憶される。 When it is determined in step S102 that reserved word information is included (Yes), the word storage processing unit 105 stores the reserved word information in the storage unit 170 (step S103). In the example described above, the “image” of reserved word information is stored in the storage unit 170.
 音声認識処理装置100は、ステップS101での処理結果にもとづき、文字列情報にフリーワード情報が含まれているか否かを判断する(ステップS104)。 The voice recognition processing device 100 determines whether or not the character string information includes free word information based on the processing result in step S101 (step S104).
 ステップS104においてフリーワード情報は含まれていないと判断されたとき(No)は、ステップS106へ進む。 If it is determined in step S104 that free word information is not included (No), the process proceeds to step S106.
 ステップS104においてフリーワード情報が含まれていると判断されたとき(Yes)は、そのフリーワード情報を、ワード記憶処理部105が記憶部170に記憶する(ステップS105)。上述した例では、フリーワード情報の「ABC」が記憶部170に記憶される。 When it is determined in step S104 that free word information is included (Yes), the word storage processing unit 105 stores the free word information in the storage unit 170 (step S105). In the example described above, “ABC” of free word information is stored in the storage unit 170.
 また、ワード記憶処理部105は、コマンド情報を記憶部170に記憶する。 Also, the word storage processing unit 105 stores command information in the storage unit 170.
 コマンド処理部106は、フリーワード情報、予約語情報、およびコマンド情報にもとづくコマンド処理を実行する(ステップS106)。 The command processing unit 106 executes command processing based on the free word information, reserved word information, and command information (step S106).
 コマンド処理部106は、意図解釈処理部104からコマンド情報を受け取り、ワード記憶処理部105からフリーワード情報および(または)予約語情報を受け取ると、フリーワード情報と予約語情報のそれぞれ、または両方に対して、コマンド情報にもとづく命令(コマンド)を実行する。なお、コマンド処理部106は、意図解釈処理部104からフリーワード情報と予約語情報を受け取ってもよい。また、ワード記憶処理部105からコマンド情報を受け取ってもよい。 When the command processing unit 106 receives command information from the intention interpretation processing unit 104 and receives free word information and / or reserved word information from the word storage processing unit 105, the command processing unit 106 converts the free word information and / or reserved word information into each or both. On the other hand, an instruction (command) based on the command information is executed. Note that the command processing unit 106 may receive free word information and reserved word information from the intention interpretation processing unit 104. Command information may be received from the word storage processing unit 105.
 なお、コマンド処理部106では、主に、検索以外のコマンド処理を行う。このコマンド処理には、例えば、テレビ10のチャンネル変更や音量変更、等がある。 The command processing unit 106 mainly performs command processing other than search. This command processing includes, for example, channel change and volume change of the television 10.
 コマンド情報に「検索」が含まれていれば、検索処理部107で検索処理が実行される(ステップS107)。上述した例では、検索処理部107は、予約語情報の「画像」にもとづき検索対象コンテンツを「画像」とし、フリーワード情報の「ABC」による画像検索を行う。 If “search” is included in the command information, the search processing unit 107 executes the search process (step S107). In the example described above, the search processing unit 107 sets the search target content to “image” based on the “image” of the reserved word information, and performs an image search using “ABC” of the free word information.
 ステップS107における検索結果は、表示制御部108により表示部140に表示される。こうして、キーワード単一検索処理が終了する。 The search result in step S107 is displayed on the display unit 140 by the display control unit. Thus, the keyword single search process ends.
 次に、本実施の形態におけるキーワード連想検索処理を、図5を用いて説明する。 Next, keyword associative search processing in the present embodiment will be described with reference to FIG.
 キーワード連想検索処理とは、ユーザ700が検索処理を連続して実行するときに、前回の検索で入力した内容を再び入力せずとも、前回の入力内容と、新たに入力する内容と、にもとづく新たな検索を実行する処理のことである。なお、本実施の形態では、ユーザ700が発する音声により入力操作が行われる例を説明するが、リモートコントローラ20の入力部22(例えば、タッチパッド)や携帯端末30の入力部32(例えば、タッチパネル)を使用した入力操作が行われてもよい。 The keyword associative search processing is based on the previous input content and the newly input content without inputting again the content input in the previous search when the user 700 executes the search processing continuously. This is a process for executing a new search. In the present embodiment, an example in which an input operation is performed by a voice uttered by the user 700 will be described. ) May be used.
 以下、具体例を示しながらキーワード連想検索処理を説明する。ここでは、ユーザ700が、まず、「ABCの画像を検索」と発話し、フリーワード「ABC」による「画像」の検索がすでに行われたものとする。 Hereinafter, keyword associative search processing will be described with specific examples. Here, it is assumed that the user 700 first speaks “Search ABC image” and the search for “image” by the free word “ABC” has already been performed.
 続いて、ユーザ700は、直前の画像検索に用いたフリーワードと同じフリーワード「ABC」で、新たに「動画」の検索を行うものとする。この場合、本実施の形態では、ユーザ700は、前回の検索と重複するフリーワード「ABC」の発話を省略することができる。すなわち、ユーザ700は、「動画を検索」と発話すればよい。 Subsequently, it is assumed that the user 700 newly searches for “moving image” using the same free word “ABC” as the free word used for the previous image search. In this case, in the present embodiment, the user 700 can omit the utterance of the free word “ABC” that overlaps with the previous search. That is, the user 700 may say “Search for a moving image”.
 意図解釈処理部104は、ユーザ700が発した音声にもとづき、図6に示した音声認識解釈処理を実行する(ステップS201)。重複するので、ステップS201の詳細な説明は省略する。 The intention interpretation processing unit 104 executes the speech recognition interpretation process shown in FIG. 6 based on the voice uttered by the user 700 (step S201). Since they overlap, detailed description of step S201 is omitted.
 ユーザが発した音声にもとづく音声情報(例えば、「動画を検索」)は、音声認識処理装置100からネットワーク40を介して音声認識部50へ送信される。音声認識部50は、受信した音声情報にもとづく文字列情報を返信する。この文字列情報には予約語情報(例えば、「動画」)およびコマンド情報(例えば、「検索」)が含まれているが、フリーワード情報は含まれていない。返信された文字列情報は、認識結果取得部103で受信され、意図解釈処理部104へ出力される。 Voice information based on the voice uttered by the user (for example, “search for moving image”) is transmitted from the voice recognition processing device 100 to the voice recognition unit 50 via the network 40. The voice recognition unit 50 returns character string information based on the received voice information. This character string information includes reserved word information (for example, “moving image”) and command information (for example, “search”), but does not include free word information. The returned character string information is received by the recognition result acquisition unit 103 and output to the intention interpretation processing unit 104.
 この動作例では、音声認識処理装置100の音声処理部102は、ユーザ700が発した音声にもとづく音声情報に、コマンド「検索」が含まれていると判断する。そして、音声処理部102は、コマンド「検索」に対応するコマンド情報を認識結果取得部103に出力する。また、認識結果取得部103は、音声認識部50から文字列「動画」が含まれた文字列情報を受信する。そして、意図解釈処理部104は、認識結果取得部103から取得する文字列情報に含まれる「動画」を予約語と判断する。また、文字列情報にフリーワード情報は含まれていないので、意図解釈処理部104からフリーワード情報は出力されない。 In this operation example, the voice processing unit 102 of the voice recognition processing apparatus 100 determines that the command “search” is included in the voice information based on the voice uttered by the user 700. Then, the voice processing unit 102 outputs command information corresponding to the command “search” to the recognition result acquisition unit 103. Further, the recognition result acquisition unit 103 receives character string information including the character string “moving image” from the voice recognition unit 50. Then, the intention interpretation processing unit 104 determines that “moving image” included in the character string information acquired from the recognition result acquisition unit 103 is a reserved word. Further, since the free word information is not included in the character string information, the free word information is not output from the intention interpretation processing unit 104.
 意図解釈処理部104は、ステップS201での処理結果にもとづき、文字列情報に予約語情報が含まれているか否かを判断する(ステップS202)。 The intention interpretation processing unit 104 determines whether or not the reserved word information is included in the character string information based on the processing result in step S201 (step S202).
 ステップS202において予約語情報は含まれていないと判断されたとき(No)は、ステップS205へ進む。ステップS205以降の動作は後述する。 If it is determined in step S202 that the reserved word information is not included (No), the process proceeds to step S205. The operations after step S205 will be described later.
 ステップS202において予約語情報が含まれていると判断されたとき(Yes)は、ワード記憶処理部105は、その予約語情報(例えば、「動画」)を、新たな検索対象コンテンツとして記憶部170に記憶する(ステップS203)。 When it is determined in step S202 that the reserved word information is included (Yes), the word storage processing unit 105 stores the reserved word information (for example, “moving image”) as a new search target content. (Step S203).
 新たな予約語情報が記憶部170に記憶されることで、予約語情報が更新される。上述の例では、前回の予約語情報「画像」が、新たな予約語情報「動画」に切り換えられる(ステップS204)。 The new reserved word information is stored in the storage unit 170, so that the reserved word information is updated. In the above example, the previous reserved word information “image” is switched to the new reserved word information “moving image” (step S204).
 この動作例では、意図解釈処理部104からフリーワード情報が出力されないので、ワード記憶処理部105は、記憶部170に記憶されているフリーワード情報(例えば、「ABC」)を読み出して、コマンド処理部106に出力する。コマンド処理部106は、意図解釈処理部104からコマンド情報を受け取り、ワード記憶処理部105から、読み出されたフリーワード情報と、新たな予約語情報と、を受け取る。そして、その読み出されたフリーワード情報と新たな予約語情報に対して、コマンド情報に応じたコマンド処理を行う(ステップS208)。なお、上述したように、コマンド処理部106では、主に検索以外のコマンド処理を行う。 In this operation example, free word information is not output from the intention interpretation processing unit 104, so the word storage processing unit 105 reads the free word information (for example, “ABC”) stored in the storage unit 170 and performs command processing. To the unit 106. The command processing unit 106 receives command information from the intention interpretation processing unit 104, and receives the read free word information and new reserved word information from the word storage processing unit 105. Then, command processing corresponding to the command information is performed on the read free word information and new reserved word information (step S208). As described above, the command processing unit 106 mainly performs command processing other than search.
 コマンド情報に「検索」が含まれていれば、検索処理部107で検索処理が実行される(ステップS209)。上述した例では、検索処理部107は、新たな予約語情報の「動画」にもとづき検索対象コンテンツを「動画」とし、記憶部170から読み出されたフリーワード情報の「ABC」による動画検索を行う。 If “search” is included in the command information, the search processing unit 107 executes search processing (step S209). In the example described above, the search processing unit 107 sets the search target content to “video” based on the “video” of the new reserved word information, and performs video search using “ABC” of the free word information read from the storage unit 170. Do.
 ステップS209における検索結果は、表示制御部108により表示部140に表示される。こうして、キーワード連想検索処理が終了する。 The search result in step S209 is displayed on the display unit 140 by the display control unit 108. Thus, the keyword associative search process ends.
 続いて、ステップS202において予約語情報は含まれていないと判断される(No)ときのキーワード連想検索処理を説明する。 Subsequently, the keyword associative search process when it is determined in step S202 that reserved word information is not included (No) will be described.
 以下、具体例を示しながら説明する。ここでは、ユーザ700が、まず、「ABCの画像を検索」と発話し、フリーワード「ABC」による「画像」の検索がすでに行われたものとする。 Hereinafter, a description will be given with specific examples. Here, it is assumed that the user 700 first speaks “Search ABC image” and the search for “image” by the free word “ABC” has already been performed.
 続いて、ユーザ700は、直前の画像検索に用いたフリーワードと異なるフリーワード「XYZ」で、「画像」の検索を行うものとする。この場合、本実施の形態では、ユーザ700は、前回の検索と重複する予約語「画像」とコマンド「検索」の発話を省略することができる。すなわち、ユーザ700は、「XYZ」と発話すればよい。 Subsequently, it is assumed that the user 700 searches for “image” using a free word “XYZ” different from the free word used for the previous image search. In this case, in this embodiment, the user 700 can omit the utterance of the reserved word “image” and the command “search”, which overlaps with the previous search. That is, the user 700 may speak “XYZ”.
 重複するので、ステップS201、S202の詳細な説明は省略する。 Since it overlaps, detailed description of step S201, S202 is abbreviate | omitted.
 ユーザが発した音声にもとづく音声情報(例えば、「XYZ」)は、音声認識処理装置100からネットワーク40を介して音声認識部50へ送信される。音声認識部50は、受信した音声情報にもとづく文字列情報を返信する。この文字列情報にはフリーワード情報(例えば、「XYZ」)が含まれているが、予約語情報とコマンド情報は含まれていない。返信された文字列情報は、認識結果取得部103で受信され、意図解釈処理部104へ出力される。 Voice information (for example, “XYZ”) based on the voice uttered by the user is transmitted from the voice recognition processing device 100 to the voice recognition unit 50 via the network 40. The voice recognition unit 50 returns character string information based on the received voice information. This character string information includes free word information (for example, “XYZ”), but does not include reserved word information and command information. The returned character string information is received by the recognition result acquisition unit 103 and output to the intention interpretation processing unit 104.
 このように、この動作例では、文字列情報に予約語情報は含まれず、音声処理部102からコマンド情報は出力されない。したがって、意図解釈処理部104から予約語情報およびコマンド情報は出力されない。 As described above, in this operation example, the reserved word information is not included in the character string information, and the command information is not output from the voice processing unit 102. Accordingly, reserved word information and command information are not output from the intention interpretation processing unit 104.
 これにより、ステップS202では予約語情報は含まれていないと判断される(No)。意図解釈処理部104は、ステップS201での処理結果にもとづき、文字列情報にフリーワード情報が含まれているか否かを判断する(ステップS205)。 Thus, it is determined in step S202 that the reserved word information is not included (No). The intention interpretation processing unit 104 determines whether or not the character string information includes free word information based on the processing result in step S201 (step S205).
 ステップS205においてフリーワード情報は含まれていないと判断されたとき(No)は、ステップS208へ進む。 If it is determined in step S205 that the free word information is not included (No), the process proceeds to step S208.
 ステップS205においてフリーワード情報が含まれていると判断されたとき(Yes)は、ワード記憶処理部105は、そのフリーワード情報(例えば、「XYZ」)を、新たなフリーワード情報として記憶部170に記憶する(ステップS206)。 When it is determined in step S205 that free word information is included (Yes), the word storage processing unit 105 stores the free word information (for example, “XYZ”) as new free word information. (Step S206).
 新たなフリーワード情報が記憶部170に記憶されることで、フリーワード情報が更新される。上述の例では、前回のフリーワード情報「ABC」が、新たなフリーワード情報「XYZ」に切り換えられる(ステップS207)。 The new free word information is stored in the storage unit 170, whereby the free word information is updated. In the above example, the previous free word information “ABC” is switched to the new free word information “XYZ” (step S207).
 この動作例では、意図解釈処理部104から予約語情報およびコマンド情報が出力されないので、ワード記憶処理部105は、記憶部170に記憶されている予約語情報(例えば、「画像」)およびコマンド情報(例えば、「検索」)を読み出して、コマンド処理部106に出力する。コマンド処理部106は、ワード記憶処理部105が記憶部170から読み出した予約語情報およびコマンド情報と、新たなフリーワード情報(例えば、「XYZ」)と、を受け取る。そして、その読み出された予約語情報と新たなフリーワード情報に対して、読み出されたコマンド情報に応じたコマンド処理を行う(ステップS208)。 In this operation example, reserved word information and command information are not output from the intention interpretation processing unit 104, so the word storage processing unit 105 stores reserved word information (for example, “image”) and command information stored in the storage unit 170. (For example, “search”) is read and output to the command processing unit 106. The command processing unit 106 receives the reserved word information and command information read from the storage unit 170 by the word storage processing unit 105 and new free word information (for example, “XYZ”). Then, command processing corresponding to the read command information is performed on the read reserved word information and new free word information (step S208).
 記憶部170から読み出されたコマンド情報に「検索」が含まれていれば、検索処理部107で検索処理が実行される(ステップS209)。上述した例では、検索処理部107は、記憶部170から読み出された予約語情報の「画像」にもとづき検索対象コンテンツを「画像」とし、新たなフリーワード情報の「XYZ」による画像検索を行う。 If the command information read from the storage unit 170 includes “search”, the search processing unit 107 executes the search process (step S209). In the example described above, the search processing unit 107 sets the search target content to “image” based on the “image” of the reserved word information read from the storage unit 170, and performs image search using “XYZ” of the new free word information. Do.
 ステップS209における検索結果は、表示制御部108により表示部140に表示される。こうして、キーワード連想検索処理が終了する。 The search result in step S209 is displayed on the display unit 140 by the display control unit 108. Thus, the keyword associative search process ends.
 なお、ステップS205においてフリーワード情報は含まれていないと判断されたとき(No)は、検索処理部107は、ステップS208に進み通常のコマンド処理または検索処理を行うものとする。 When it is determined in step S205 that free word information is not included (No), the search processing unit 107 proceeds to step S208 and performs normal command processing or search processing.
 [1-3.効果等]
 以上のように、本実施の形態において、音声認識処理装置100は、音声取得部101と、第1音声認識部の一例である音声処理部102と、第2音声認識部の一例である音声認識部50と、選別部の一例である意図解釈処理部104と、記憶部170と、処理部の一例であるコマンド処理部106および検索処理部107と、を備えている。音声取得部101は、ユーザが発する音声を取得して音声情報を出力するように構成されている。音声処理部102は、音声情報を第1情報の一例であるコマンド情報に変換するように構成されている。音声認識部50は、音声情報を第2情報の一例である文字列情報に変換するように構成されている。意図解釈処理部104は、文字列情報から第3情報の一例である予約語情報と第4情報の一例であるフリーワード情報とを選別するように構成されている。記憶部170は、コマンド情報、予約語情報、およびフリーワード情報を記憶するように構成されている。コマンド処理部106は、コマンド情報、予約語情報、およびフリーワード情報にもとづく処理を実行するように構成されている。そして、コマンド処理部106および検索処理部107は、コマンド情報、予約語情報、およびフリーワード情報のうち1つまたは2つの不足情報があれば、その不足情報を記憶部170に記憶された情報を用いて補完して処理を実行するように構成されている。
[1-3. Effect]
As described above, in the present embodiment, the speech recognition processing apparatus 100 includes the speech acquisition unit 101, the speech processing unit 102 that is an example of the first speech recognition unit, and the speech recognition that is an example of the second speech recognition unit. Unit 50, intention interpretation processing unit 104 which is an example of a selection unit, storage unit 170, command processing unit 106 and search processing unit 107 which are examples of a processing unit. The voice acquisition unit 101 is configured to acquire voice uttered by the user and output voice information. The voice processing unit 102 is configured to convert voice information into command information which is an example of first information. The voice recognition unit 50 is configured to convert voice information into character string information, which is an example of second information. The intention interpretation processing unit 104 is configured to sort reserved word information, which is an example of third information, and free word information, which is an example of fourth information, from character string information. The storage unit 170 is configured to store command information, reserved word information, and free word information. The command processing unit 106 is configured to execute processing based on command information, reserved word information, and free word information. Then, if there is one or two pieces of missing information in command information, reserved word information, and free word information, the command processing unit 106 and the search processing unit 107 use the information stored in the storage unit 170 as the missing information. It is comprised so that it may complement and may perform a process.
 検索処理部107は、第1情報が検索コマンドであるとき、その検索コマンドと、予約語情報およびフリーワード情報とにもとづく検索処理を実行するように構成されている。 When the first information is a search command, the search processing unit 107 is configured to execute a search process based on the search command, reserved word information, and free word information.
 音声認識部50はネットワーク40上に設置され、音声認識処理装置100は、ネットワーク40を介して音声認識部50と通信を行うように構成された送受信部150を備えていてもよい。 The voice recognition unit 50 may be installed on the network 40, and the voice recognition processing apparatus 100 may include a transmission / reception unit 150 configured to communicate with the voice recognition unit 50 via the network 40.
 音声処理部102は、あらかじめ設定された、複数のコマンド情報と音声情報とを対応付けた「音声-コマンド」対応情報を用いて音声情報をコマンド情報に変換するように構成されていてもよい。 The voice processing unit 102 may be configured to convert voice information into command information using “voice-command” correspondence information in which a plurality of command information and voice information that are set in advance are associated with each other.
 このように構成された音声認識処理装置100を使用するユーザ700は、音声操作を連続して行う場合、前回の音声操作で発話した内容を再び発話せずとも、前回の発話内容と、新たに発話する内容と、にもとづく新たな操作を行うことができる。例えば、ユーザ700は、検索処理を連続して行う場合、前回の検索で音声操作により入力した内容を再び発話せずとも、前回の発話内容と、新たに発話する内容と、にもとづく新たな検索を行うことができる。 When the user 700 using the speech recognition processing device 100 configured as described above continuously performs the voice operation, the user 700 newly adds the previous utterance content without newly uttering the content uttered by the previous voice operation. New operations can be performed based on the content of the utterance. For example, when the user 700 continuously performs search processing, a new search based on the previous utterance content and the newly uttered content can be performed without uttering again the content input by voice operation in the previous search. It can be performed.
 具体的な一例としては、ユーザ700は、「ABCの画像を検索」と発話してフリーワード「ABC」で「画像」を検索し、その後に続けて「ABCの動画を検索」する場合、前回の検索と重複するフリーワード「ABC」の発話を省略し、「動画を検索」と発話するだけでよい。これにより、「ABCの動画を検索」と発話したときと同じ検索処理を実行することができる。 As a specific example, the user 700 utters “Search ABC image”, searches for “image” with the free word “ABC”, and subsequently searches for “ABC video”. The utterance of the free word “ABC” which overlaps with the search of “No. This makes it possible to execute the same search process as when “Search ABC video” is uttered.
 あるいは、ユーザ700は、「ABCの画像を検索」と発話してフリーワード「ABC」で「画像」を検索し、その後に続けて「XYZの画像を検索」する場合、前回の検索と重複する予約語「画像」とコマンド「検索」の発話を省略し、「XYZ」と発話するだけでよい。これにより、「XYZの画像を検索」と発話したときと同じ検索処理を実行することができる。 Alternatively, the user 700 utters “search ABC image”, searches for “image” with the free word “ABC”, and subsequently searches for “XYZ image”, which overlaps with the previous search. The reserved word “image” and the command “search” may be omitted and only “XYZ” may be spoken. This makes it possible to execute the same search process as when “search for XYZ images” is uttered.
 このように、本実施の形態における音声認識処理装置100は、ユーザ700が音声操作する際の煩雑さを軽減し、操作性を向上することができる。 As described above, the speech recognition processing device 100 according to the present embodiment can reduce the complexity when the user 700 performs a voice operation, and can improve the operability.
 (他の実施の形態)
 以上のように、本出願において開示する技術の例示として、実施の形態1を説明した。しかしながら、本開示における技術は、これに限定されず、変更、置き換え、付加、省略等を行った実施の形態にも適用できる。また、上記実施の形態1で説明した各構成要素を組み合わせて、新たな実施の形態とすることも可能である。
(Other embodiments)
As described above, the first embodiment has been described as an example of the technique disclosed in the present application. However, the technology in the present disclosure is not limited to this, and can also be applied to embodiments in which changes, replacements, additions, omissions, and the like are performed. Moreover, it is also possible to combine each component demonstrated in the said Embodiment 1, and it can also be set as a new embodiment.
 そこで、以下、他の実施の形態を例示する。 Therefore, other embodiments will be exemplified below.
 実施の形態1では、コマンド情報が「検索」のときの動作例を説明したが、ここでは、その他のコマンドの例について記す。「音声-コマンド」対応情報には、例えば、「チャンネルアップ」、「音声アップ」、「再生」、「ストップ」、「ことば変更」、「文字表示」、等の音声情報に対応するコマンド情報が登録されていてもよい。 In the first embodiment, an example of operation when the command information is “search” has been described, but here, examples of other commands will be described. The “voice-command” correspondence information includes, for example, command information corresponding to voice information such as “channel up”, “voice up”, “play”, “stop”, “change language”, “character display”, and the like. It may be registered.
 例えば、「光ディスクを再生」とユーザが発話したとする。その場合、音声認識処理装置100では、フリーワード「光ディスク」、コマンド情報「再生」が音声認識される。これにより、音声認識処理装置100が搭載された光ディスク再生装置では、光ディスクに記録された映像が再生される。この状態に続けて、ユーザ700が「ストップ」と発話すると、コマンド情報「ストップ」が音声認識処理装置100で音声認識され、その光ディスク再生装置では、光ディスクの再生がストップする。これは、ワード記憶処理部105により記憶部170にフリーワード「光ディスク」が記憶されているため、コマンド処理部106は、新たに入力されたコマンド情報「再生」の処理を、記憶部170から読み出したフリーワード「光ディスク」に対して実行するためである。すなわち、ユーザ700は「光ディスクをストップ」と発話しなくとも、単に「ストップ」と発話するだけで、光ディスク再生装置の動作を制御することができる。 For example, suppose the user utters “Play optical disc”. In that case, the speech recognition processing apparatus 100 recognizes the free word “optical disc” and the command information “playback”. As a result, the video recorded on the optical disc is played back on the optical disc playback device on which the speech recognition processing device 100 is mounted. When the user 700 utters “stop” following this state, the command information “stop” is recognized by the speech recognition processing device 100, and the optical disc playback device stops the playback of the optical disc. This is because the free word “optical disk” is stored in the storage unit 170 by the word storage processing unit 105, so the command processing unit 106 reads the newly input command information “playback” process from the storage unit 170. This is because it is executed for the free word “optical disk”. In other words, the user 700 can control the operation of the optical disk playback apparatus by simply speaking “stop” without speaking “stop optical disk”.
 また、別の例では、例えば、ユーザ700が「日本語の文字表示」と発話したとする。その場合、音声認識処理装置100では、フリーワード情報「日本語」、コマンド情報「文字表示」が音声認識される。これにより、音声認識処理装置100が搭載されたテレビ10では、日本語の字幕をテレビ10の表示部140に表示するコマンド「文字表示」が実行される。この状態に続けて、ユーザ700が「英語」と発話すると、フリーワード情報「英語」が音声認識処理装置100で音声認識される。そして、テレビ10は、記憶部170からコマンド情報「文字表示」を読み出し、「文字表示」の動作をそのまま継続し、表示部140に表示する文字を「日本語」から「英語」に変更する。すなわち、ユーザ700は、「英語の文字表示」と発話しなくとも、単に「英語」と発話するだけで、テレビ10の表示文字を「日本語」から「英語」に変更することができる。 In another example, for example, it is assumed that the user 700 utters “Japanese character display”. In that case, the speech recognition processing apparatus 100 recognizes the free word information “Japanese” and the command information “character display”. Thereby, on the television 10 on which the speech recognition processing device 100 is mounted, the command “character display” for displaying Japanese subtitles on the display unit 140 of the television 10 is executed. Following this state, when the user 700 speaks “English”, the free word information “English” is voice-recognized by the voice recognition processing device 100. Then, the television 10 reads the command information “character display” from the storage unit 170, continues the operation of “character display” as it is, and changes the character displayed on the display unit 140 from “Japanese” to “English”. That is, the user 700 can change the display character of the television 10 from “Japanese” to “English” by simply speaking “English” without speaking “English character display”.
 このように、音声認識処理装置100は、音声情報に不足情報があれば、それを記憶部170から読み出して補完し、コマンド処理を実行するので、ユーザ700は、前回の音声操作時と重複する言葉を繰り返し発話する必要が無く、音声操作時の煩雑さが軽減され操作性が向上する。 As described above, if there is insufficient information in the voice information, the voice recognition processing apparatus 100 reads out the information from the storage unit 170 and complements it, and executes the command process. Therefore, the user 700 overlaps with the previous voice operation. There is no need to repeatedly speak a word, and the complexity of voice operation is reduced and the operability is improved.
 なお、ここに挙げた2つの例では、ユーザ700の発話に予約語は含まれていないが、コマンド処理部106はそのコマンド処理を実行可能である。このように、予約語またはフリーワードが含まれていなくても実行が可能なコマンド情報であれば、意図解釈処理部104は、予約語またはフリーワードが含まれていなくてもよいことをワード記憶処理部105およびコマンド処理部106(検索処理部107)に発信する。したがって、コマンド処理部106(検索処理部107)は、意図解釈処理部104から発信される情報にもとづき、フリーワード情報と予約語情報とコマンド情報との組み合わせでコマンド処理すべきか、フリーワード情報とコマンド情報との組み合わせでコマンド処理すべきか、または、予約語情報とコマンド情報との組み合わせでコマンド処理すべきか、を判断し、コマンド処理を実行することができる。また、ワード記憶処理部105では、不要な情報を記憶部170から読み出す動作が防止される。上述の例では、音声情報に予約語情報は含まれていないが、予約語情報が不要なので、ワード記憶処理部105は予約語情報を記憶部170から読み出さない。 In the two examples given here, a reserved word is not included in the utterance of the user 700, but the command processing unit 106 can execute the command processing. As described above, if the command information can be executed even if no reserved word or free word is included, the intention interpretation processing unit 104 stores the fact that the reserved word or free word may not be included. The information is transmitted to the processing unit 105 and the command processing unit 106 (search processing unit 107). Therefore, based on the information transmitted from the intention interpretation processing unit 104, the command processing unit 106 (search processing unit 107) determines whether command processing should be performed using a combination of free word information, reserved word information, and command information. The command processing can be executed by determining whether the command processing should be performed in combination with the command information or the command processing should be performed in combination with the reserved word information and the command information. Further, the word storage processing unit 105 is prevented from reading unnecessary information from the storage unit 170. In the above example, the reserved word information is not included in the voice information, but the reserved word information is unnecessary, so the word storage processing unit 105 does not read the reserved word information from the storage unit 170.
 なお、「音声-コマンド」対応情報に、コマンド情報に関連付けて、そのコマンド処理には予約語およびフリーワードの両方が必要なのか、それともいずれか一方でよいのか、を示す情報をあらかじめ登録しておいてもよい。そして、音声処理部102は、コマンド情報とともにその情報を後段に出力するように動作してもよい。 In addition, in the “voice-command” correspondence information, information indicating whether both a reserved word and a free word are required for the command processing, or either one of them may be registered in advance. It may be left. Then, the voice processing unit 102 may operate so as to output the information along with the command information to the subsequent stage.
 なお、本実施の形態では、「画像」や「動画」を検索する動作例を説明したが、検索の対象は何ら「画像」や「動画」に限定されるものではなく、番組表や録画番組等を検索の対象としてもよい。 In this embodiment, the operation example of searching for “image” or “video” has been described. However, the search target is not limited to “image” or “video”. Etc. may be the search target.
 なお、本実施の形態では特に言及していないが、音声認識処理において、ユーザ700が発した音声に、コマンド情報の「検索」とキーワードとが含まれており、その「検索」の種類がインターネット検索アプリケーションによる検索である場合には、音声認識処理装置100では、インターネット検索アプリケーションで、そのキーワードによる検索が行われる。例えば、「ABCをインターネットで検索」とユーザ700が発話すれば、音声認識処理装置100は、「インターネットで検索」という音声をインターネット検索アプリケーションによる「検索」であると認識する。このため、ユーザ700は、その音声を発するだけで、そのキーワードによるインターネット検索をテレビ10に行わせることができる。 Although not specifically mentioned in the present embodiment, in the voice recognition process, the voice uttered by the user 700 includes the command information “search” and the keyword, and the type of “search” is the Internet. When the search is performed by a search application, the speech recognition processing apparatus 100 performs a search using the keyword by the Internet search application. For example, if the user 700 says “Search ABC on the Internet”, the speech recognition processing apparatus 100 recognizes the voice “Search on the Internet” as “Search” by the Internet search application. Therefore, the user 700 can cause the television 10 to perform an Internet search using the keyword only by emitting the voice.
 また、音声認識処理において、ユーザ700が発した音声に、コマンド情報の「検索」とキーワードとが含まれており、その「検索」の種類が番組表アプリケーションによる検索である場合には、音声認識処理装置100では、番組表アプリケーションで、そのキーワードによる検索が行われる。例えば、「ABCを番組表で検索」とユーザ700が発話すれば、音声認識処理装置100は、「番組表で検索」という音声を番組表アプリケーションによる「検索」であると認識する。このため、ユーザ700は、その音声を発話するだけで、そのキーワードによる番組表検索をテレビ10に行わせることができる。 In the voice recognition process, if the voice uttered by the user 700 includes command information “search” and a keyword, and the type of the “search” is a search by a program guide application, the voice recognition is performed. In the processing apparatus 100, a search using the keyword is performed in the program guide application. For example, if the user 700 utters “Search ABC in the program guide”, the speech recognition processing apparatus 100 recognizes the voice “Search in the program guide” as “search” by the program guide application. For this reason, the user 700 can cause the television 10 to perform a program table search using the keyword only by speaking the voice.
 また、音声認識処理において、ユーザ700が発した音声に、コマンド情報の「検索」とフリーワードとは含まれているが、予約語情報は含まれていないときは、音声認識処理装置100では、そのフリーワードが含まれる全てのアプリケーションで、そのフリーワードによる「検索」を行い、検索を行った全てのアプリケーションでの検索結果を表示部140に表示してもよい。 In the speech recognition processing, when the speech uttered by the user 700 includes “search” of command information and free words, but does not include reserved word information, the speech recognition processing apparatus 100 “Search” by the free word may be performed by all the applications including the free word, and the search results of all the searched applications may be displayed on the display unit 140.
 なお、テレビ10では、音声認識処理を、上述した方法で開始できる。そのため、音声認識処理が開始されれば、ユーザ700は、テレビ10により番組を視聴している途中であっても、上記のような検索を行うことができる。 In the television 10, the voice recognition process can be started by the method described above. Therefore, if the voice recognition process is started, the user 700 can perform the above-described search even while watching the program on the television 10.
 なお、本実施の形態では、音声認識部50がネットワーク40上に配置された例を説明したが、音声認識部50は音声認識処理装置100に備えられていてもよい。 In this embodiment, the example in which the voice recognition unit 50 is arranged on the network 40 has been described. However, the voice recognition unit 50 may be provided in the voice recognition processing device 100.
 なお、本実施の形態では、フリーワード情報を記憶部170から読み出してコマンド処理を補完する動作例と、予約語情報およびコマンド情報を記憶部170から読み出してコマンド処理を補完する動作例を説明したが、本開示は何らこの構成に限定されない。例えば、予約語情報を記憶部170から読み出してコマンド処理を補完してもよく、コマンド情報を記憶部170から読み出してコマンド処理を補完してもよい。あるいは、予約語情報およびフリーワード情報を記憶部170から読み出してコマンド処理を補完してもよく、フリーワード情報およびコマンド情報を記憶部170から読み出してコマンド処理を補完してもよい。 In the present embodiment, an operation example in which free word information is read from the storage unit 170 to complement command processing and an operation example in which reserved word information and command information are read from the storage unit 170 to complement command processing have been described. However, the present disclosure is not limited to this configuration. For example, reserved word information may be read from the storage unit 170 to complement command processing, or command information may be read from the storage unit 170 to complement command processing. Alternatively, reserved word information and free word information may be read from the storage unit 170 to complement command processing, or free word information and command information may be read from the storage unit 170 to complement command processing.
 なお、図2に示した各ブロックは、それぞれが独立した回路ブロックとして構成されてもよく、各ブロックの動作を実現するようにプログラムされたソフトウエアをプロセッサで実行する構成であってもよい。 Each block shown in FIG. 2 may be configured as an independent circuit block, or may be configured such that software programmed to realize the operation of each block is executed by a processor.
 本開示は、ユーザが指示する処理動作を実行する機器に適用可能である。具体的には、携帯端末機器、テレビジョン受像機、パーソナルコンピュータ、セットトップボックス、ビデオレコーダ、ゲーム機、スマートフォン、タブレット端末、等に本開示は適用可能である。 This disclosure is applicable to a device that executes a processing operation instructed by a user. Specifically, the present disclosure is applicable to portable terminal devices, television receivers, personal computers, set-top boxes, video recorders, game machines, smartphones, tablet terminals, and the like.
10  テレビジョン受像機
11  音声認識処理システム
20  リモートコントローラ
21,31  マイク
22,32  入力部
30  携帯端末
40  ネットワーク
50  音声認識部
100  音声認識処理装置
101  音声取得部
102  音声処理部
103  認識結果取得部
104  意図解釈処理部
105  ワード記憶処理部
106  コマンド処理部
107  検索処理部
108  表示制御部
110  操作受付部
130  内蔵マイク
140  表示部
150  送受信部
160  チューナ
170,171  記憶部
180  無線通信部
201  音声認識アイコン
202  インジケータ
700  ユーザ
DESCRIPTION OF SYMBOLS 10 Television receiver 11 Voice recognition processing system 20 Remote controller 21, 31 Microphone 22, 32 Input part 30 Mobile terminal 40 Network 50 Voice recognition part 100 Voice recognition processing apparatus 101 Voice acquisition part 102 Voice processing part 103 Recognition result acquisition part 104 Intent interpretation processing unit 105 Word storage processing unit 106 Command processing unit 107 Search processing unit 108 Display control unit 110 Operation reception unit 130 Built-in microphone 140 Display unit 150 Transmission / reception unit 160 Tuners 170 and 171 Storage unit 180 Wireless communication unit 201 Voice recognition icon 202 Indicator 700 user

Claims (6)

  1. ユーザが発する音声を取得して音声情報を出力するように構成された音声取得部と、
    前記音声情報を第1情報に変換するように構成された第1音声認識部と、
    前記音声情報を第2情報に変換するように構成された第2音声認識部と、
    前記第2情報から第3情報と第4情報とを選別するように構成された選別部と、
    前記第1情報、前記第3情報、および前記第4情報を記憶するように構成された記憶部と、
    前記第1情報、前記第3情報、および前記第4情報にもとづく処理を実行するように構成された処理部と、
    を備え、
    前記処理部は、前記第1情報、前記第3情報、および前記第4情報のうち1つまたは2つの不足情報があれば、前記不足情報を前記記憶部に記憶された情報を用いて補完して処理を実行するように構成された、
    音声認識処理装置。
    An audio acquisition unit configured to acquire audio emitted by the user and output audio information;
    A first voice recognition unit configured to convert the voice information into first information;
    A second voice recognition unit configured to convert the voice information into second information;
    A sorting unit configured to sort third information and fourth information from the second information;
    A storage unit configured to store the first information, the third information, and the fourth information;
    A processing unit configured to execute processing based on the first information, the third information, and the fourth information;
    With
    If there is one or two pieces of missing information among the first information, the third information, and the fourth information, the processing unit supplements the missing information using information stored in the storage unit. Configured to perform processing,
    Speech recognition processing device.
  2. 前記処理部は、
    前記第1情報が検索コマンドであるとき、
    前記検索コマンドにもとづく検索処理を実行するように構成された、
    請求項1に記載の音声認識処理装置。
    The processor is
    When the first information is a search command,
    Configured to execute a search process based on the search command;
    The speech recognition processing apparatus according to claim 1.
  3. 前記第2音声認識部はネットワーク上に設置され、
    前記ネットワークを介して、前記第2音声認識部と通信を行うように構成された送受信部を備えた、
    請求項1に記載の音声認識処理装置。
    The second voice recognition unit is installed on a network;
    A transmission / reception unit configured to communicate with the second voice recognition unit via the network;
    The speech recognition processing apparatus according to claim 1.
  4. 前記第1音声認識部は、
    あらかじめ設定された複数の第1情報と前記音声情報とを対応付けた情報、を用いて前記音声情報を前記第1情報に変換するように構成された、
    請求項1に記載の音声認識処理装置。
    The first voice recognition unit
    Configured to convert the voice information into the first information using information in which a plurality of preset first information and the voice information are associated with each other.
    The speech recognition processing apparatus according to claim 1.
  5. ユーザが発する音声を取得して音声情報に変換するステップと、
    前記音声情報を第1情報に変換するステップと、
    前記音声情報を第2情報に変換するステップと、
    前記第2情報から第3情報と第4情報とを選別するステップと、
    前記第1情報、前記第3情報、および前記第4情報を記憶部に記憶するステップと、
    前記第1情報、前記第3情報、および前記第4情報にもとづく処理を実行するステップと、
    前記第1情報、前記第3情報、および前記第4情報のうち1つまたは2つの不足情報があれば、前記記憶部に記憶された情報を用いて補完するステップと、
    を備えた音声認識処理方法。
    Acquiring voice emitted by the user and converting it into voice information;
    Converting the audio information into first information;
    Converting the audio information into second information;
    Screening third information and fourth information from the second information;
    Storing the first information, the third information, and the fourth information in a storage unit;
    Executing a process based on the first information, the third information, and the fourth information;
    If there is one or two missing information among the first information, the third information, and the fourth information, the step of complementing using the information stored in the storage unit;
    A speech recognition processing method comprising:
  6. ユーザが発する音声を取得して音声情報を出力するように構成された音声取得部と、
    前記音声情報を第1情報に変換するように構成された第1音声認識部と、
    前記音声情報を第2情報に変換するように構成された第2音声認識部と、
    前記第2情報から第3情報と第4情報とを選別するように構成された選別部と、
    前記第1情報、前記第3情報、および前記第4情報を記憶するように構成された記憶部と、
    前記第1情報、前記第3情報、および前記第4情報にもとづく処理を実行するように構成された処理部と、
    前記処理部における処理結果を表示するように構成された表示部と、
    を備え、
    前記処理部は、前記第1情報、前記第3情報、および前記第4情報のうち1つまたは2つの不足情報があれば、前記不足情報を前記記憶部に記憶された情報を用いて補完して処理を実行するように構成された、
    表示装置。
    An audio acquisition unit configured to acquire audio emitted by the user and output audio information;
    A first voice recognition unit configured to convert the voice information into first information;
    A second voice recognition unit configured to convert the voice information into second information;
    A sorting unit configured to sort third information and fourth information from the second information;
    A storage unit configured to store the first information, the third information, and the fourth information;
    A processing unit configured to execute processing based on the first information, the third information, and the fourth information;
    A display unit configured to display a processing result in the processing unit;
    With
    If there is one or two pieces of missing information among the first information, the third information, and the fourth information, the processing unit supplements the missing information using information stored in the storage unit. Configured to perform processing,
    Display device.
PCT/JP2014/006367 2013-12-26 2014-12-22 Voice recognition processing device, voice recognition processing method, and display device WO2015098079A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP14874773.6A EP3089157B1 (en) 2013-12-26 2014-12-22 Voice recognition processing device, voice recognition processing method, and display device
JP2015554558A JP6244560B2 (en) 2013-12-26 2014-12-22 Speech recognition processing device, speech recognition processing method, and display device
US15/023,385 US9905225B2 (en) 2013-12-26 2014-12-22 Voice recognition processing device, voice recognition processing method, and display device
CN201480057905.0A CN105659318B (en) 2013-12-26 2014-12-22 Voice recognition processing unit, voice recognition processing method and display device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2013-268669 2013-12-26
JP2013268669 2013-12-26

Publications (1)

Publication Number Publication Date
WO2015098079A1 true WO2015098079A1 (en) 2015-07-02

Family

ID=53477977

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2014/006367 WO2015098079A1 (en) 2013-12-26 2014-12-22 Voice recognition processing device, voice recognition processing method, and display device

Country Status (5)

Country Link
US (1) US9905225B2 (en)
EP (1) EP3089157B1 (en)
JP (1) JP6244560B2 (en)
CN (1) CN105659318B (en)
WO (1) WO2015098079A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109147784A (en) * 2018-09-10 2019-01-04 百度在线网络技术(北京)有限公司 Voice interactive method, equipment and storage medium
JP2019049985A (en) * 2016-03-04 2019-03-28 株式会社リコー Voice control of interactive whiteboard appliance
JP2022009571A (en) * 2017-06-13 2022-01-14 グーグル エルエルシー Establishment of audio-based network session with unregistered resource

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014103568A1 (en) * 2012-12-28 2014-07-03 ソニー株式会社 Information processing device, information processing method and program
KR20160090584A (en) * 2015-01-22 2016-08-01 엘지전자 주식회사 Display device and method for controlling the same
US9898250B1 (en) * 2016-02-12 2018-02-20 Amazon Technologies, Inc. Controlling distributed audio outputs to enable voice output
US9858927B2 (en) * 2016-02-12 2018-01-02 Amazon Technologies, Inc Processing spoken commands to control distributed audio outputs
US10409552B1 (en) * 2016-09-19 2019-09-10 Amazon Technologies, Inc. Speech-based audio indicators
JP7044633B2 (en) * 2017-12-28 2022-03-30 シャープ株式会社 Operation support device, operation support system, and operation support method
JP7227093B2 (en) * 2019-07-05 2023-02-21 Tvs Regza株式会社 How to select electronic devices, programs and search services
US10972802B1 (en) * 2019-09-26 2021-04-06 Dish Network L.L.C. Methods and systems for implementing an elastic cloud based voice search using a third-party search provider

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001249685A (en) * 2000-03-03 2001-09-14 Alpine Electronics Inc Speech dialog device
JP2005059185A (en) * 2003-08-19 2005-03-10 Sony Corp Robot device and method of controlling the same
JP2007226642A (en) * 2006-02-24 2007-09-06 Honda Motor Co Ltd Voice recognition equipment controller
JP4812941B2 (en) 1999-01-06 2011-11-09 コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ Voice input device having a period of interest
JP2012501480A (en) * 2008-08-29 2012-01-19 マルチモーダル・テクノロジーズ・インク Hybrid speech recognition
JP2013205523A (en) * 2012-03-27 2013-10-07 Yahoo Japan Corp Response generation apparatus, response generation method and response generation program

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000356999A (en) 1999-06-16 2000-12-26 Ishikawajima Harima Heavy Ind Co Ltd Device and method for inputting command by voice
CN1320499C (en) * 2001-07-05 2007-06-06 皇家菲利浦电子有限公司 Method of providing an account information and method of and device for transcribing of dictations
FR2833103B1 (en) * 2001-12-05 2004-07-09 France Telecom NOISE SPEECH DETECTION SYSTEM
JP4849662B2 (en) * 2005-10-21 2012-01-11 株式会社ユニバーサルエンターテインメント Conversation control device
JP2008076811A (en) * 2006-09-22 2008-04-03 Honda Motor Co Ltd Voice recognition device, voice recognition method and voice recognition program
US20090144056A1 (en) * 2007-11-29 2009-06-04 Netta Aizenbud-Reshef Method and computer program product for generating recognition error correction information
US8099289B2 (en) * 2008-02-13 2012-01-17 Sensory, Inc. Voice interface and search for electronic devices including bluetooth headsets and remote systems
US8751229B2 (en) * 2008-11-21 2014-06-10 At&T Intellectual Property I, L.P. System and method for handling missing speech data
KR20130125067A (en) * 2012-05-08 2013-11-18 삼성전자주식회사 Electronic apparatus and method for controlling electronic apparatus thereof
KR101914708B1 (en) * 2012-06-15 2019-01-14 삼성전자주식회사 Server and method for controlling the same
CN102833633B (en) * 2012-09-04 2016-01-20 深圳创维-Rgb电子有限公司 A kind of television voice control system and method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4812941B2 (en) 1999-01-06 2011-11-09 コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ Voice input device having a period of interest
JP2001249685A (en) * 2000-03-03 2001-09-14 Alpine Electronics Inc Speech dialog device
JP2005059185A (en) * 2003-08-19 2005-03-10 Sony Corp Robot device and method of controlling the same
JP2007226642A (en) * 2006-02-24 2007-09-06 Honda Motor Co Ltd Voice recognition equipment controller
JP2012501480A (en) * 2008-08-29 2012-01-19 マルチモーダル・テクノロジーズ・インク Hybrid speech recognition
JP2013205523A (en) * 2012-03-27 2013-10-07 Yahoo Japan Corp Response generation apparatus, response generation method and response generation program

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019049985A (en) * 2016-03-04 2019-03-28 株式会社リコー Voice control of interactive whiteboard appliance
JP2022009571A (en) * 2017-06-13 2022-01-14 グーグル エルエルシー Establishment of audio-based network session with unregistered resource
JP7339310B2 (en) 2017-06-13 2023-09-05 グーグル エルエルシー Establishing audio-based network sessions with unregistered resources
CN109147784A (en) * 2018-09-10 2019-01-04 百度在线网络技术(北京)有限公司 Voice interactive method, equipment and storage medium
JP2019185062A (en) * 2018-09-10 2019-10-24 百度在線網絡技術(北京)有限公司 Voice interaction method, terminal apparatus, and computer readable recording medium
CN109147784B (en) * 2018-09-10 2021-06-08 百度在线网络技术(北京)有限公司 Voice interaction method, device and storage medium
US11176938B2 (en) 2018-09-10 2021-11-16 Baidu Online Network Technology (Beijing) Co., Ltd. Method, device and storage medium for controlling game execution using voice intelligent interactive system
JP7433000B2 (en) 2018-09-10 2024-02-19 バイドゥ オンライン ネットワーク テクノロジー(ペキン) カンパニー リミテッド Voice interaction methods, terminal equipment and computer readable storage media

Also Published As

Publication number Publication date
CN105659318A (en) 2016-06-08
US20160210966A1 (en) 2016-07-21
JP6244560B2 (en) 2017-12-13
US9905225B2 (en) 2018-02-27
EP3089157B1 (en) 2020-09-16
EP3089157A1 (en) 2016-11-02
CN105659318B (en) 2019-08-30
EP3089157A4 (en) 2017-01-18
JPWO2015098079A1 (en) 2017-03-23

Similar Documents

Publication Publication Date Title
JP6244560B2 (en) Speech recognition processing device, speech recognition processing method, and display device
USRE49493E1 (en) Display apparatus, electronic device, interactive system, and controlling methods thereof
JP6375521B2 (en) Voice search device, voice search method, and display device
US10586536B2 (en) Display device and operating method therefor
JP5746111B2 (en) Electronic device and control method thereof
JP5819269B2 (en) Electronic device and control method thereof
JP6111030B2 (en) Electronic device and control method thereof
JP6603754B2 (en) Information processing device
US9880808B2 (en) Display apparatus and method of controlling a display apparatus in a voice recognition system
WO2015098109A1 (en) Speech recognition processing device, speech recognition processing method and display device
JP2013037689A (en) Electronic equipment and control method thereof
KR20130018464A (en) Electronic apparatus and method for controlling electronic apparatus thereof
JP2014532933A (en) Electronic device and control method thereof
JP2014138421A (en) Video processing apparatus, control method for the same, and video processing system
KR20150089145A (en) display apparatus for performing a voice control and method therefor
KR20140089836A (en) Interactive server, display apparatus and controlling method thereof
CN108111922B (en) Electronic device and method for updating channel map thereof
KR102089593B1 (en) Display apparatus, Method for controlling display apparatus and Method for controlling display apparatus in Voice recognition system thereof
KR20190099676A (en) The system and an appratus for providig contents based on a user utterance
KR102124396B1 (en) Display apparatus, Method for controlling display apparatus and Method for controlling display apparatus in Voice recognition system thereof
KR102051480B1 (en) Display apparatus, Method for controlling display apparatus and Method for controlling display apparatus in Voice recognition system thereof
KR102045539B1 (en) Display apparatus, Method for controlling display apparatus and Method for controlling display apparatus in Voice recognition system thereof
JP2008096577A (en) Voice operation system for av device
KR20190048334A (en) Electronic apparatus, voice recognition method and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14874773

Country of ref document: EP

Kind code of ref document: A1

REEP Request for entry into the european phase

Ref document number: 2014874773

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2014874773

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2015554558

Country of ref document: JP

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 15023385

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE