WO2022156246A1 - 语音指令处理电路、接收装置、服务器、语音指令的累积***和累积方法 - Google Patents

语音指令处理电路、接收装置、服务器、语音指令的累积***和累积方法 Download PDF

Info

Publication number
WO2022156246A1
WO2022156246A1 PCT/CN2021/118683 CN2021118683W WO2022156246A1 WO 2022156246 A1 WO2022156246 A1 WO 2022156246A1 CN 2021118683 W CN2021118683 W CN 2021118683W WO 2022156246 A1 WO2022156246 A1 WO 2022156246A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
server
command
voice command
local
Prior art date
Application number
PCT/CN2021/118683
Other languages
English (en)
French (fr)
Inventor
石丸大
入江祐司
Original Assignee
海信视像科技股份有限公司
东芝视频解决方案株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 海信视像科技股份有限公司, 东芝视频解决方案株式会社 filed Critical 海信视像科技股份有限公司
Priority to CN202180006240.0A priority Critical patent/CN114667566A/zh
Publication of WO2022156246A1 publication Critical patent/WO2022156246A1/zh
Priority to US18/356,485 priority patent/US20240021199A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/10Speech classification or search using distance or distortion measures between unknown speech and reference templates
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/432Content retrieval operation from a local storage medium, e.g. hard-disk
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams

Definitions

  • the embodiments of the present application relate to a voice command processing circuit, a receiving device, a server, a voice command accumulation system, a voice command accumulation method, and a non-volatile storage medium.
  • Patent Document 1 Japanese Patent Publication No. 2015-535952
  • Patent Document 2 Japanese Patent Publication No. 2019-15952
  • the purpose of the present application is to provide a voice command processing circuit, a receiving device, a server, a system, a method, and a computer-readable non-volatile storage medium that can increase the voice commands that can be processed locally.
  • the voice command processing circuit of the embodiment of the present application performs voice recognition on voice data and outputs the recognition result, and determines whether there is a voice command corresponding to the recognition result in the database, wherein the voice command used to control the device is compared in the database.
  • the information of the voice command is associated with the information of the local command, that is, the internal control command of the device executed by the voice command, and the information of the database is obtained from the server based on the judgment result of the judgment mechanism.
  • FIG. 1 is a functional block diagram showing a configuration example of a system according to the embodiment
  • FIG. 2 is a functional block diagram showing a configuration example of a receiving apparatus according to the embodiment
  • FIG. 3 is a functional block diagram showing a configuration example of a voice command processing unit according to the embodiment.
  • FIG. 4 is a functional block diagram showing a configuration example of the server apparatus according to the embodiment.
  • FIG. 5 is a diagram showing an example of a voice command that can be processed by the voice command processing unit according to the first embodiment
  • FIG. 6 is a flowchart showing an example of the processing operation of the voice signal by the voice command processing unit according to the first embodiment
  • FIG. 7 is a diagram showing an example of a database in a local voice command database unit of the receiving device according to the first embodiment
  • FIG. 8 is a flowchart showing an example of the processing operation of creating local voice data by the voice command processing unit according to the first embodiment
  • FIG. 10 is a flowchart showing an example of the processing operation of voice data by the server device according to the first embodiment
  • FIG. 13 is a diagram showing an example of a voice command that can be processed by the voice command processing unit of the first embodiment
  • 16 is a flowchart showing an example of processing operations when the server device according to the third embodiment selects from a plurality of server commands and transmits a server command to a voice command processing unit;
  • FIG. 17 is a functional block diagram showing a configuration example of a system according to a modification.
  • FIG. 1 is a functional block diagram showing a configuration example of a system according to an embodiment of the present application.
  • the receiving device 1 is a receiving device for viewing digital content, for example, a receiving device (also referred to as a TV device, a TV set, a TV set that can receive digital broadcasts such as terrestrial broadcasts such as 2K or 4K/8K, satellite broadcasts, etc., and can watch them).
  • a receiving device also referred to as a TV device, a TV set, a TV set that can receive digital broadcasts such as terrestrial broadcasts such as 2K or 4K/8K, satellite broadcasts, etc., and can watch them.
  • receiver broadcast signal receiver
  • Digital content acquired from digital broadcasting is sometimes referred to as a broadcast program.
  • the receiving device 1 may include a digital signal processing mechanism such as a CPU, a memory, and a DSP (Digital Signal Processor), and can perform control using a voice recognition technology. For example, when a user issues a command by voice, the voice is received by a voice-collecting function such as a microphone (hereinafter, also referred to as a microphone in some cases) of the receiving device 1, and the voice command processing unit 2 uses a voice recognition technology or the like. The command is fetched, and various functions of the receiving apparatus 1 are controlled by the fetched command.
  • the reception apparatus 1 in the embodiment of the present application may be controlled by a remote controller 10 (hereinafter, also referred to as a remote controller 10 in some cases).
  • a microphone attached to the remote controller 10 receives the user's voice, and the remote controller 10 transmits the user's voice to the receiving device 1 as voice data.
  • the receiving device 1 fetches instructions according to the received voice data, for example, using voice recognition technology, and controls various functions of the receiving device 1 .
  • the reception device 1 in this embodiment outputs a control signal generated based on the fetched command to the recording and playback unit 19 to control the recording and playback unit 19 .
  • the receiving device 1 has a communication function for connecting to a network 5 such as the Internet, and can exchange data with various servers (may include servers constructed using the cloud) connected to the network 5 .
  • a network 5 such as the Internet
  • various servers may include servers constructed using the cloud
  • digital content can also be acquired from a content server device (not shown) connected to the network 5 .
  • the digital content acquired from the content server apparatus may be called network content.
  • the voice command processing unit 2 may also include a digital signal processing mechanism such as a CPU, a memory, and a DSP, and may have functions such as voice recognition technology.
  • the voice command processing unit 2 can control the internal functions of the receiving apparatus 1 by extracting commands from the voice uttered by the user.
  • the so-called voice command is a command input by the user to the receiving device 1 by voice in order to control the receiving device 1 . If the voice command is associated with an internal command for controlling the function of the receiving device 1 (hereinafter, also referred to as a local command in some cases), the receiving device 1 can control the function of the receiving device 1 by receiving the voice command.
  • a voice command such as “increase the volume” for increasing the volume output from the speaker of the receiving device 1
  • the local command of the receiving device 1 for example, set to volume_up
  • the receiving device 1 executes volume_up, and the volume of the speaker of the receiving device 1 increases.
  • the voice command for increasing the volume of the speaker not only “increase the volume” but also various variations such as “increase the volume”, “volume up”, and “increase the volume” are conceivable. Since the voice command processing unit 2 of the present embodiment associates such a change with the same local command (volume_up), natural language processing can also be used.
  • each of the plurality of reception apparatuses 1 shows an example in which only one receiving apparatus 1 is connected to the network 5 , but a plurality of receiving apparatuses 1 may be connected to the network 5 .
  • the server device 3 is a server installed on the network 5 capable of performing speech recognition, and includes, for example, a computer including a CPU, a memory, and the like, and may include a digital signal processing mechanism such as a DSP.
  • the server device 3 can also be constructed as a cloud server.
  • the server apparatus 3 is equipped with a speech recognition technology.
  • the server device 3 is capable of voice recognition, receives voice data that is digital data of the user's voice received by the microphone of the receiving device 1, etc. via the network 5, estimates or recognizes the voice uttered by the user, and uses the recognized voice as text data (with In the case of also called recognition voice data) and output.
  • the speech recognition technology is a common technology, and detailed explanation is omitted.
  • the server device 3 can perform natural language processing, and can extract the local command of the receiving device 1 according to the meaning of the language according to the above-mentioned language such as "increase the voice", “volume up”, “increase the volume”. That is, by utilizing natural language processing in the server device 3, the user can not only use a specific voice command as a voice command, but also can use an arbitrary language as a voice command. For example, the user can execute a local command (volume_up) of the receiving device 1 via the server device 3 by uttering words such as "increase the voice", “volume up”, “increase the volume", and increase the sound of the speaker.
  • the reception device 1 may be provided with the function of the server device 3. However, since the performance of natural language processing can be improved by utilizing large-capacity data such as big data, it is desirable to provide this function in the server device 3 constructed using the cloud or the like.
  • the server apparatus 3 can acquire various information of the reception apparatus 1 in addition to information such as a local command of the reception apparatus 1 .
  • the network 5 is a network capable of communicating with the receiving apparatus 1, the server apparatus 3, and the like, and is, for example, the Internet.
  • the network 5 is not limited to the Internet, and may be a network including a plurality of different networks irrespective of wired or wireless as long as each device can communicate.
  • the remote controller 10 is a remote controller for remotely controlling the receiving apparatus 1 .
  • the remote controller 10 in the present embodiment may include, for example, a voice-collecting function such as a microphone capable of receiving voices uttered by the user.
  • the remote controller 10 may have an interface function such as Bluetooth (registered trademark) and WiFi (registered trademark) for externally transmitting the received voice data.
  • FIG. 2 is a functional block diagram showing an example of the configuration of the receiving apparatus according to the embodiment.
  • the tuner 11 receives radio waves in a desired frequency band from an antenna, cable broadcasting, or the like, obtains a broadcast signal (digital data) through demodulation processing, or the like, and outputs it.
  • the broadcast signal reception processing unit 12 processes the broadcast signal received from the tuner 11 according to the standard of digital broadcasting, and acquires and outputs content data such as images, audio, and characters.
  • content data such as images, audio, and characters.
  • MPEG2TS method adopted in 2K digital broadcasting or the MPEG Media Transport method (MMT method) adopted in 4K/8K digital broadcasting, etc.
  • MMT method MPEG Media Transport method
  • the processing according to the standard of digital broadcasting includes demultiplexing processing of a digital data stream that separates the digital data input from the tuner 11 into content data such as video, audio, and text, decoding processing of error correction codes, and decoding the encrypted data.
  • the communication unit 13 is connected to the network 5 and communicates with various servers and devices on the network 5 . Specifically, digital data is exchanged by transmitting and receiving processing corresponding to a predetermined communication protocol such as TCP/IP and UDP/IP, for example.
  • a predetermined communication protocol such as TCP/IP and UDP/IP, for example.
  • the content processing unit 14 receives, for example, content data provided by a content server (not shown) connected to the network 5 via the communication unit 13 .
  • the content processing unit 14 performs decoding processing and the like on the data received via the communication unit 13 with respect to the encoding processing performed by the content server, acquires and outputs content data such as images, audio, and characters. More specifically, the content processing unit 14 may perform, as decoding processing, demultiplexing processing (separation processing), error correction code decoding processing, decoding processing of coded content data (image, character, audio, etc.), for example.
  • the control unit 15 shows that the control unit 15 adjusts the output timing, display method, and the like with respect to the content data output from the broadcast signal reception processing unit 12 , the content processing unit 14 , and the recording and playback unit 19 , and outputs the data.
  • the data output from the recording and playback unit 19 may be subjected to demultiplexing processing (separation processing), error correction code decoding processing, and coded content data (images, characters, After decoding processing and the like of audio etc.), it is input to the display control unit 15 .
  • the display unit 16 is, for example, a display for displaying images and characters, a speaker for outputting sound, or the like.
  • the presentation unit 16 outputs the content data output from the presentation control unit 15 as images, characters, sounds, and the like.
  • the user views digital content provided by a broadcast signal or a content server (not shown) by viewing images, characters, sounds, and the like output from the display unit 16 .
  • the control unit 17 controls each function of the reception device 1 . Specifically, the control unit 17 receives various command signals from the interface unit 18 , the voice command processing unit 2 and the like, and outputs control signals for controlling each function of the reception device 1 based on the received various command signals. For example, when the user specifies from the remote controller 10 whether to watch the content of the broadcast signal or the content from the content server, the control unit 17 receives a command signal from the remote controller via the interface unit 18 to control the function of the receiving device 1 , causing it to perform a user-specified action. In addition, in FIG. 2 , data exchange may be performed between functional modules that are not particularly connected to the control unit 17 .
  • the interface unit 18 is an interface for receiving a command signal from the remote controller 10 or the like, or for outputting a control signal from the control unit 17 or the like to an external device.
  • the interface unit 18 receives a command signal from a switch (not shown) of the receiving device 1 , the remote controller 10 , and the like, and outputs the command signal to the control unit 17 of the receiving device 1 .
  • the remote controller 10 it is also possible to have an interface for receiving a command signal from a terminal such as a smartphone (not shown).
  • the interface unit 18 has an interface for connecting to an external device, and may be, for example, an interface for connecting the receiving device 1 to an external recording and playback device.
  • the interface unit 18 in the present embodiment includes, for example, a microphone for receiving audio from the outside of the receiving device 1 .
  • the interface unit 18 may output the voice received by the microphone as digitized voice digital data (also referred to as voice data in some cases) by analog/digital conversion (A/D conversion) or the like.
  • the recording and playback unit 19 is, for example, a record player or an HDD recorder, and can record and play content data such as audio and video received from a broadcast signal, the Internet, or the like, for example.
  • the recording and playback unit 19 shown in FIG. 1 shows an example built in the receiving device 1, it may be an external device connected to the receiving device 1, and may be capable of recording and playing back content data, for example. Set Top Box, Sound Player, PC, etc.
  • the data storage unit 101 is, for example, a memory, or may be a database for storing various data.
  • the data storage unit 101 stores viewing information of the receiving apparatus 1 , analysis results obtained from the viewing information, model numbers, various functions, and other information specific to the receiving apparatus 1 (referred to as receiving apparatus data in some cases).
  • the voice command processing unit 2 outputs the voice data received from the interface unit 18 to the server device 3 via the communication unit 13 , and receives information related to the local command data from the server device 3 .
  • the voice command processing unit 2 of the present embodiment generates a control signal based on the information related to the local command data acquired from the server device 3, and outputs the generated control signal to the control unit 17 and the like.
  • FIG. 3 is a functional block diagram showing a configuration example of a voice command processing unit according to the embodiment.
  • the speech recognition unit 21 performs speech recognition on the speech data input from the interface unit 18 and outputs text data.
  • a method such as a hidden Markov model (HMM) is generally used.
  • HMM hidden Markov model
  • the above-mentioned two methods can be applied.
  • the speech recognition unit 21 can detect any character string when the character string is converted into a character string, and can change or increase the recognition target character string at any time when the character string recognition method is specified.
  • the determination unit 22 confirms whether or not the text data output by the speech recognition unit 21 is stored in the local speech command database unit 27 .
  • the determination unit 22 regards the confirmed local voice command as a voice command, and treats the local command for associating the voice command with the voice command.
  • the executed control signal and the like are output to the control unit 17 .
  • the local voice command is a voice command associated with the local command of the receiving apparatus 1 and stored in the local voice command database unit 27 .
  • a wake-up voice for activating voice recognition or the like may be pre-configured in the receiving apparatus 1 as a local voice command.
  • the local command processing unit 23 outputs the local command associated with the local voice command, the local command associated with the server command information acquired from the server data acquisition unit 24, and the like to the control unit 17 based on the control signal from the determination unit 22. .
  • the server data acquisition unit 24 requests the server device 3 for server command information, and receives the server command information from the server device 3 .
  • the server command information is information for generating a local voice command, and includes the local command of the receiving device 1 selected by the server device 3 based on input voice data or a voice command obtained by performing voice recognition on the voice data.
  • the server command database unit 25 is, for example, a memory, and may be a database that stores server command information and the like received from the server device 3 .
  • the local voice command generation unit 26 generates information of the local voice command based on the server command information stored in the server command database unit 25 .
  • the local command processing unit 23 may consider the frequency of use of the voice command, the command processing priority, and the like.
  • the usage frequency of the voice command may be, for example, a value that is counted every time the voice recognition unit 21 receives or recognizes a voice command registered in the server command database unit 25 or the like.
  • the high frequency filter 261 is a filter used when the local voice command generation unit 26 generates a local voice command based on the server command information. Specifically, the high frequency filter 261 counts the acquisition frequency (frequency of use) for each voice command every time the voice recognition unit 21 receives a voice command registered in the server command database unit 25 or the like, for example. The high frequency filter 261 stores the count information in the server command database unit 25 or the local voice command database unit 27 or the like. The high frequency filter 261 extracts information of at least one local voice command from the data in the server command database unit 25 based on the counted usage frequency. The voice command extracted by the high frequency filter 261 is regarded as a local voice command, associated with the local command, and stored in the local voice command database unit 27 .
  • the local voice command database unit 27 is, for example, a memory, and may be a database that stores information including the local voice commands output by the local voice command generation unit 26, associated local commands, and the like.
  • FIG. 4 is a functional block diagram showing a configuration example of a server apparatus according to an embodiment of the present application.
  • the communication unit 31 is an interface for data communication with devices on the network 5 such as the reception device 1 and the server device 3 and the like, and includes protocols such as TCP/IP and UDP/IP, for example.
  • the control unit 32 controls various functions in the server device 3 .
  • Various data such as various control signals are received from an external device via the communication unit 31 , analyzed and processed as necessary, and output to each functional module inside the server device 3 .
  • various data are received from each functional module inside the server device 3 , the data is modularized, formatted, and the like as necessary, and output to the communication unit 31 .
  • the text conversion unit 33 performs voice recognition on, for example, voice data uttered by the user, and outputs the recognized voice as text data (referred to as recognized voice data in some cases).
  • the same function as the voice recognition unit 21 of the receiving apparatus 1 may be used.
  • the natural language processing unit 34 performs natural language processing on the text data input from the text conversion unit 33, and generates or selects a server command (equivalent to a local command) corresponding to the processing represented by the text data.
  • a server command equivalent to a local command
  • the structure and meaning of text data are analyzed, for example, extracted from data groups such as voice commands stored in the server command data storage unit 382 of the server device 3 or the like, local commands of the receiving device 1, and the like Data similar to text data.
  • the server command generation unit 35 creates server command information that combines the text data (corresponding to a voice command) output by the text conversion unit 33 and the local command of the receiving device 1 extracted by the natural language processing unit 34 for the text command. association established.
  • the local command of the receiving apparatus 1 extracted by the natural language processing unit 34 is sometimes referred to as a server command.
  • the response voice generation unit 36 may generate, for example, voice data of the phrase or phrase when the inputted text command is a voice command such as a phrase or a phrase to be output by voice from the speaker of the receiving device 1 .
  • processing such as speech synthesis may be provided.
  • the server command generation unit 35 extracts the “local command of the receiving device 1 for outputting the voice from the speaker”, it may generate a “short sentence including the extracted local command and the response voice generation unit 36”. "Voice data" and other server instruction information.
  • the reception device 1 may output “voice data of a short sentence” from the speaker of the presentation unit 16 and display it to the user as a voice.
  • the receiving device 1 may store the received "local command of the receiving device 1 for outputting the voice from the speaker” in association with the received "voice data of a short sentence" in the local voice command database unit 27 . That is, the "speech data of the short sentence" as the voice information is stored in the database in association with the local command.
  • the voice command processing unit 2 receives a voice command from the user, it executes the local command "output the phrase 1 as a voice from the speaker” associated with the voice command in the local voice command database unit 27.
  • the speaker of the presentation unit 16 outputs the phrase 1 "speech data of the phrase" associated with the local command.
  • a function of speech synthesis may be provided on the side of the receiving device 1 .
  • the server command generation unit 35 transmits the extracted "local command of the receiving device 1 for outputting the voice from the speaker” to the receiving device 1 together with the text data of the phrase output as the voice.
  • the receiving device 1 generates speech data through speech synthesis or the like according to the text data of the received short sentence, and performs processing corresponding to the received local command at the same time. For example, when receiving the text data "Hello" of the short sentence together with the local instruction "output the received short sentence from the speaker", the receiving apparatus 1 generates voice data of "Hello" and outputs it from the speaker.
  • the receiving apparatus 1 may also store the text data of the received short sentence in the local voice command database unit 27 together with the local command.
  • the voice command processing unit 2 receives a voice command from the user, it executes the local command "output the phrase 1 as a voice from the speaker” associated with the voice command in the local voice command database unit 27, and executes the voice command. Synthesis or the like sets the "text data of a short sentence" associated with the local command as voice data, which can be output as voice from the speaker of the display unit 16 .
  • the server command generation unit 35 may use the extracted "local command of the reception device 1 for outputting the speech from the speaker" as a
  • the text data of the short sentence outputted by voice, and the voice data thereof are transmitted to the receiving apparatus 1 .
  • the receiving apparatus 1 can process the voice data according to a local command (server command), or set the text data as voice data and process it through voice synthesis or the like.
  • the unique data storage unit 37 is, for example, a memory, or may be a database for storing data on the receiving apparatus 1 .
  • the unique data storage unit 37 may store the data of the plurality of reception apparatuses 1 for each reception apparatus. 1 store.
  • the data stored in the unique data storage unit 37 may be acquired from the reception device 1 via the network 5 .
  • the specific information of the receiving device 1 transmitted from the receiving device 1 is stored in the receiving device data storage unit 371 , and for example, the following data is stored.
  • the channel information currently displayed by the receiving device 1 may also include the difference between external input such as broadcast programs, video playback, and the network 5, etc.
  • the local command data storage unit 372 stores local command information inherently provided in the receiving apparatus 1 .
  • the information of the local command may be acquired from the receiving device 1 via the network 5 , and stored in the local command data storage unit 372 for each receiving device 1 .
  • the administrator of the server apparatus 3 may directly input the information of the local commands to the server apparatus 3 .
  • the server device 3 may acquire information of local commands from the product information server via the network 5 .
  • the common data storage unit 38 may be a database of data that can be commonly used by the plurality of reception apparatuses 1 connected to the network 5 .
  • the common information data storage unit 381 may be a database of data that can be acquired from an external device or the like connected to the network 5 .
  • it is information of a program table that can be viewed through digital broadcasting.
  • the server apparatus 3 may acquire the program table from the reception apparatus 1 via the network 5 .
  • the server command data storage unit 382 may be a database in which server command information generated by the server command generation unit 35 is stored.
  • the server command generation unit 35 may use the database of the server command data storage unit 382 as reference data when generating the server command information.
  • a voice command obtained by using voice recognition of an external device such as the server device 3 for voice data received from a user is accumulated in the receiving device 1, and the accumulated voice command (local voice command) to execute the local command of the receiving device 1.
  • FIG. 5 is a diagram showing an example of a voice command that can be processed by the voice command processing unit according to the first embodiment.
  • the voice commands that can be used by the receiving device 1 and the voice commands on the left that can be executed are shown for each row.
  • a plurality of voice commands can be associated with one local command.
  • the voice commands of No2, No3, and No4 in FIG. 5 are associated with the local command "power_on”, and a plurality of voice commands can be used for the local command "power_on” of the receiving apparatus 1 .
  • the voice commands No. 5 to No. 8 are associated with the local command "volume_up”, which is an example of executing the command processing "increase the volume of the TV" in the receiving apparatus 1 by issuing the voice commands No. 5 to No. 8 by the user.
  • FIG. 6 is a flowchart showing an example of the processing operation of the voice signal by the voice command processing unit according to the first embodiment.
  • the voice data is input to the voice command processing unit 2 through the microphone of the interface unit 18 (step S101).
  • the voice data is input to the voice recognition unit 21, and converted into text data by voice recognition (step S102).
  • the text data is input to the determination unit 22, and the determination unit 22 confirms whether there is a local voice command corresponding to the text data input to the local voice command database unit 27 (step S103).
  • the determination unit 22 outputs the local command associated with the local voice command to the control unit 17 (step S103 ). yes).
  • the control unit 17 executes the input local command (step S104).
  • step S103 the condition that the text data input into the determination unit 22 and the local voice command of the local voice command database unit 27 are completely consistent can be regarded as a condition of YES (YES), and even if there are some differences, it can be regarded as a condition for yes.
  • the condition in step S103 may be set by the user.
  • the determination unit 22 when determining that there is no local voice command corresponding to the text data, the determination unit 22 outputs the voice command recognition request from the server data acquisition unit 24 to the server device 3 ( Step S105).
  • the server data acquisition unit 24 receives server command information from the server device 3 (step S106).
  • FIG. 7 is a diagram showing an example of a database in the local voice command database unit of the receiving device according to the first embodiment, and FIG. 7( a ) shows the voice commands received by the receiving device 1, according to the left
  • the local command of the receiving apparatus 1 that can be executed by the voice command on the side, and the command processing executed in the receiving apparatus 1 according to the local command on the left side.
  • the rightmost flag (Flag) is the flag information given by the server device 3 to the voice command of the peer.
  • Flag in FIG. 7( a ) shows the valid (OK) and invalid (NG) judgments of the server apparatus based on the condition for the voice command of the same line. For example, No. 5 and No. 10 of FIG.
  • the server device 3 may return to the receiving device 1 equivalent to a retry (retry) A local command (server command), a local command (server command) showing a response message such as "please say one more time".
  • the receiving apparatus 1 may execute processing according to the received server command, or wait for a user's command.
  • the server command information received from the server device 3 in step S106 may be one line of the voice command shown in FIG. 7( a ), or may be multiple lines.
  • the server data acquisition unit 24 receives server command information including only No. 3 in FIG. 7( a ) as one line of a voice command.
  • the server data acquisition unit 24 outputs the local command "power_on" included in the server command information to the control unit 17 to execute the local command "power_on”. Also, at the same time, the server data acquisition unit 24 outputs server command information including only No. 3 to the server command database unit 25 .
  • the server command database unit 25 stores the input server command information in the database (step S107).
  • the local voice command generation unit 26 confirms whether or not the voice command included in the server command information stored in the server command database unit 25 has been stored in the local voice command database unit 27, and if not confirmed, the local voice command database unit 27 stores the voice command included in the server command information.
  • the voice command is stored in the local voice command database unit 27 as a local voice command (No in step S108, step S109).
  • FIG. 7 shows the data of the local voice commands when each local command is extracted one by one on the basis of the frequency.
  • (b) of FIG. 7 shows that "want to watch TV” is selected as the local voice command for the local command "power_on” for No3, and "" is selected as the local voice command for the local command "volume_up” for No2 volume up” example.
  • the database of the local voice command database unit 27 can also be created based on the frequency of use of voice commands from the database stored in the server command database unit 25 .
  • FIG. 8 is a flowchart showing an example of processing operations for creating local voice data by the voice command processing unit according to the first embodiment. It is assumed that the data of FIG. 7( a ) has been stored in the server instruction database unit 25 .
  • the voice data is input to the voice command processing unit 2 through the microphone of the interface unit 18 (step S121).
  • the voice data is input to the voice recognition unit 21, and converted into text data by voice recognition (step S122).
  • the text data is input to the high frequency filter 261, and the high frequency filter 261 checks whether or not there is a voice command corresponding to the input text data in the server command database unit 25 (step S123).
  • the high frequency filter 261 finds a voice command corresponding to the text data in the server command database unit 25, the count of the voice command is incremented by 1 as the usage frequency (step S124).
  • FIG. 9 is an example of local voice data stored in the voice command processing unit of the first embodiment, and shows an example of data to which frequency of use is assigned to each voice command. For example, it is shown that the frequency of use of the voice command "turn on the power" of No. 1 is 5 times, and the frequency of use of the voice command of No. 8 "volume up" is 45 times.
  • the high-frequency filter 261 selects local voice commands for each local command from the voice commands accumulated in the server command database unit 25 based on the frequency of use (step S125 ).
  • the voice command extracted by the high frequency filter 261 is stored in the local voice command database unit 27 as a local voice command (step S126).
  • the local voice command may be stored in the local voice command database unit 27 as shown in (b) of FIG. 7 .
  • FIG. 10 is a flowchart showing an example of the processing operation of voice data by the server device according to the first embodiment, and shows the processing of the server device 3 between steps S105 and S106 in FIG. 6 as the processing of the voice command processing unit 2 Action example.
  • the voice command processing unit 2 transmits a voice command recognition request together with the voice data (step S105 in FIG. 6 ).
  • the control unit 32 of the server device 3 receives the voice command recognition request, it outputs the simultaneously received voice data to the text conversion unit 33 (step S151 ).
  • the text conversion unit 33 performs speech recognition on the speech data, converts it into text data, and outputs it to the natural language processing unit 34 (step S152).
  • the natural language processing unit 34 performs natural language processing on the input text data, and checks whether or not a local command corresponding to the processing represented by the text data is stored in the local command data storage unit 372 (step S153).
  • FIG. 11 is an example of a database stored in the server device according to the first embodiment, and is an example of data related to the local command of the receiving device 1 stored in the local command data storage unit 372 of the server device 3 .
  • the “local command” of the receiving apparatus 1 and the “command processing” executed by the command may be stored in each row.
  • the natural language processing unit 34 compares the meaning extracted from the input text data with the data in FIG. 11 , and selects a local command close to the meaning of the input text data (step S154 ).
  • the server command generation unit 35 sets a value of, eg, 1 indicating "OK" to Flag, and creates server command information including the Flag (step S155).
  • the server command generation unit 35 transmits the server command information from the communication unit 31 to the reception device 1 (step S156).
  • the voice command processing unit 2 receives the server command information (step S106 in FIG. 6).
  • the voice command processing unit 2 can execute the voice command by acquiring the server command information from the server device 3 even when the voice command processing unit 2 cannot respond to the received voice command.
  • the voice command processing unit 2 can use the voice command without going through the server device 3 when the same voice command is received by accumulating the server command information in its own memory or the like.
  • FIG. 12 is an example of a database used by the voice command processing unit of the first embodiment to process voice commands received from a plurality of users, and is an example of a database when a plurality of users use one receiving device 1 .
  • This database may also be stored in the server instruction data storage unit 382 .
  • the high-frequency filter 261 When the high-frequency filter 261 is used in the generation of the local voice command in the voice command processing unit 2, if the user is not identified, only the voice command of the user who frequently watches TV may be regarded as the local voice. order to register.
  • FIG. 12( a ) is an example of a database of voice commands for local commands when the reception device 1 can recognize the user who issued the voice command.
  • the voice commands are databased for each recognized user, the frequency of use is counted for each voice command, and the high-frequency filter 261 is applied for each user, so that it is possible to generate a database that considers the use of each user. Frequency of local voice commands.
  • FIG. 12( b ) is an example of a database in the case of combining the voice commands of all users in the voice commands of FIG. 12( a ), and is the same database as the example shown in FIG. 9 .
  • FIG. 13 is a diagram showing an example of a voice command that can be processed by the voice command processing unit according to the first embodiment, and is an example of a local voice command that can be complemented by the voice command processing unit 2 .
  • Each row shows the "execution date" of the voice command, the "voice command” executed on the execution day on the left, and the "server command” processed according to the voice command on the left (equivalent to a local command of the receiving device 1) , "command processing” which is processed according to the server command on the left, and "cacheability" which indicates whether the server command on the left can be cached.
  • the server command to the voice command is always a fixed response
  • information indicating that buffering is to be performed may be set in the "buffering possible” information.
  • the server command for the voice command is a response that is limited to the spot (for example, depending on the date and time) such as "please tell me the name of the program you are watching now”, it may be set to indicate that no Information to cache the server directive.
  • the "cache-enabled” information may be set to "Flag" in the database shown in FIG. 7 , and in this case, when the server device 3 determines that the server instructs "caching", the Flag may be set to If it is True, if it is judged as "do not cache", set Flag to false.
  • the row of No. 1 is an example of a case where, for example, when the user issues a voice command "What day is today?" on the execution day "January 8", in the receiving device 1, the voice command processing unit 2 responds to the In response to the voice command recognition request, the server command "voice response "January 8" is received from the server device 3 .
  • the voice command processing unit 2 outputs the received server command (also a local command) to the control unit 17, the control unit 17 executes command processing "voice output "January 8" from the speaker”, and the control unit 17 executes the command processing "voice output "January 8" from the speaker", and the display unit 16 speaker Output the sound "January 8".
  • the server command "voice response "January 8"" may be regarded as information that cannot be cached, or information that has no meaning to be cached, such as whether or not the cacheability of the row of No. 1 is set to "NG".
  • the server device 3 creates a server command (referred to as a variable server command) by setting the variable part as a variable in the form of "voice response "$Month month $Date day”" as in the line of No. 2 ).
  • the variation of the server command may be performed by the server device 3 or by the voice command processing unit 2 .
  • the voice command processing unit 2 for example, when the server command of the row No. 1 is received, the server command “voice response “January 8”” may be stored in the server command database unit 25, and the local The voice command generation unit 26 associates "voice response "$Month$Date”" as a local command for the local voice command "What day is today?".
  • the voice command processing unit 2 can base on the associated local command. , "Voice response "$Month month $Date”" and date information obtained from broadcast signals, etc., from the speaker of the display unit 16, a voice response of "February 18th” is performed, or the display is displayed.
  • the receiving device 1 or the voice command processing unit 2 may be capable of generating voice such as synthesized voice.
  • variable server command of the lines of No. 2 and No. 3 does not depend on the execution date. Therefore, the item "caching possible" may be both set to "OK" to enable caching.
  • FIG. 13 shows the example of the local command depending on the date, it is not limited to this example. For example, the local command depending on the time, the season, the context of the front and back, etc.
  • the voice command processing unit 2 complements.
  • speech recognition performed by a cloud server or the like has a function of absorbing fluctuations in the user's utterance, such as "increase the volume”, “increase the volume”, “volume up”, and “increase the volume”, which are speech commands for realizing volume up processing.
  • the combination of frequently used utterances (voice commands) and corresponding processing (local commands) may be determined by using the high-frequency filter 261 based on the frequency of use of the voice commands.
  • each user it is not necessary to distinguish each user as shown in (a) of FIG. 12 .
  • the accumulated voice commands The high frequency filter 261 is applied, whereby user identification is also performed.
  • the receiving device 1 or the voice command processing unit 2 can detect frequently used utterances at high speed. , it is possible to perform processing equivalent to natural language processing without using natural language processing, and to perform the target processing autonomously.
  • the server device 3 it is no longer necessary to go through the server device 3 , and it is possible to shorten the processing time of speech recognition or the like in the reception device 1 or the speech command processing unit 2 , and the like. Furthermore, the utterance content (local voice command) set in the receiving device 1 or the voice command processing unit 2 of the present embodiment can also be used offline thereafter.
  • the server command generated by the server device 3 for one recognized (or received) voice command is associated with a plurality of local commands.
  • the local voice command generation unit 26 determines the processing of the local command related to one voice command based on the priority set in the condition setting unit 262 .
  • server command information stored in the voice command processing unit of the second embodiment, and shows the voice command “I want to see giraffes” received by the server device 3, and the server command generation unit 35 responds to the voice command “I want to see giraffes” "The generated or acquired server command "output program K", and the command processing of the four local commands that can be performed in the receiving device 1 for the server command "output program K". Furthermore, the frequency and the priority are shown in the same row for each instruction processing.
  • the local voice command generation unit 26 determines command processing for the server command "output program K" based on the priority.
  • the local voice command generation unit 26 may be stored in the local voice command database unit 27 in association with the voice commands so that command processing is performed in order of priority. For example, in FIG. 14 , since the priorities are set in descending order of rows No4, No2, No3, and No1, command processing is executed in the order of rows No4, No2, No3, and No1. More specifically, when the user utters "I want to see a giraffe", the voice command processing unit first executes the command processing "display broadcast program K" of the line No. 4 . If the broadcast program K is being broadcasted at the time of execution, "the broadcast program K" can be "displayed", but if the broadcast program K is not being broadcasted, "the broadcast program K" cannot be “displayed”.
  • the command processing associated with the voice command may or may not be executed.
  • the command processing of the row of No. 4 cannot be executed, the command processing of the row of No. 2 having the next priority is executed.
  • command processing is continuously executed in order of priority in consideration of conditions, environments, and the like. Conditions such as priority for command processing may also be set by the user from the remote controller.
  • the voice command issued by the user can be associated with a plurality of local commands (command processing) according to the conditions of the receiving device 1 and various functional units in the receiving device 1 .
  • the command processing can be executed in order of priority, so that more appropriate command processing can be performed for the voice command issued by the user.
  • one command process with the highest priority may be associated with one voice command. How to use the priority for association may be set by the user from a remote controller or the like, or information regarding association may be downloaded from a server (not shown) connected to the network 5 .
  • the frequency shown in FIG. 14 may be the frequency of use of command processing, or, for example, the control unit 17 or the like may calculate the frequency of command processing in advance, and the local voice command generation unit 26 may determine priority based on the frequency. Spend.
  • server device 3 generates a plurality of server commands for one voice command.
  • FIG. 15 is an example of the database stored in the voice command processing unit according to the third embodiment, and is an example of data when the server device 3 generates three server commands in response to the voice command “How is the weather now?”.
  • the command processing, frequency, and expired of the server command are shown in each row for each server command.
  • the frequency may be the frequency of use of the server command, and may be determined on the side of the receiving device 1 or on the side of the server device 3 .
  • the database of the server command data storage unit 382 may be used and the information from the plurality of receiving devices 1 may be used for specifying.
  • the server device 3 can determine the frequency based on the frequency information from the plurality of reception devices 1 by providing the server device 3 with the frequency of use of server commands (equivalent to local commands) counted on the side of the reception device 1 .
  • the frequency information of the reception apparatuses 1 may be used separately, and a server command or a local command may be determined for each reception apparatus 1 .
  • the local voice command generation unit 26 basically determines the command processing executed by the reception apparatus 1 in the order of the frequency, using the frequency as the priority.
  • a condition such as expired is also considered. expired indicates the expiration date of the command processing.
  • expired indicates that expired "2021/1/20:00" of No. 1 in FIG. 15 indicates that the server command and command processing of No. 1 are "valid until 0:00 on January 2, 2021".
  • the server command of No. 1 "voice response "clearly turns cloudy”" is a command depending on the time and date, and therefore is an example of a condition given expired.
  • “expired” may be set to “Flag” in the database shown in FIG. 7 .
  • the server device 3 may determine the expiration date “expired” of the server command, and the server command may be within the expiration date. Set Flag to True in the case of , and set Flag to false if the server command has expired.
  • the voice command processing unit 2 may receive the voice command from the broadcast signal or the future on the network 5 regardless of expired.
  • the server or the like shown in the figure refers to the latest weather information, and causes the speaker of the display unit 16 to voice output the latest weather information.
  • 16 is a flowchart showing an example of the processing operation when the server device according to the third embodiment selects from a plurality of server commands and transmits the server command to the voice command processing unit, which is obtained by the server device 3 using an external device such as the receiving device 1
  • the information is an example of selecting a server command from a plurality of server commands and outputting it to the voice command processing unit.
  • control unit 32 of the server device 3 When the control unit 32 of the server device 3 receives the voice command recognition request sent from the voice command processing unit 2, it outputs the simultaneously received voice data to the text conversion unit 33 (step S251).
  • the text conversion unit 33 performs speech recognition on the speech data, converts it into text data, and outputs it to the natural language processing unit 34 (step S252).
  • the natural language processing unit 34 performs natural language processing on the input text data, and checks whether the local command data storage unit 372 and the common data storage unit 38 store local command information corresponding to the processing represented by the text data. (step S253).
  • the server command generation unit 35 acquires the information of the local command confirmed by the natural language processing unit 34 (step S254).
  • the server command generation unit 35 generates a server command based on the acquired local command information.
  • the server command generation unit 35 acquires the unique information of the receiving apparatus 1 from the unique data storage unit 37 (Yes in step S255, S256).
  • the server command generation unit 35 selects a server command to be transmitted to the reception device 1 from a plurality of server commands based on the unique information of the reception device 1 (step S257 ).
  • the server command of No. 1 in FIG. 15 may not be selected based on the fact that the specific information of the receiving apparatus 1 is checked, such as “prohibited voice output” and “speaker is disabled”.
  • the server command of No. 2 in FIG. 15 may not be selected based on the fact that “there is no weather program scheduled to be broadcast within one hour” from the program information.
  • the server command generation unit 35 generates server command information internally including the selected server command and the response voice generated by the response voice generation unit 36 as needed, and outputs it to the voice command processing unit 2 via the communication unit 31 . (step S258)
  • the server device 3 can use the data in the unique data storage unit 37 and the common data storage unit 38 and the like from the plurality of servers. It selects among the commands, and provides the server command information including them to the voice command processing unit 2 .
  • the voice command processing unit 2 registers the voice command acquired from the server command information provided by the server device 3 and the server command (equivalent to a local command) associated therewith in the local voice command database unit 27, and thereby passes the voice command issued by the user.
  • the reception device 1 executes command processing in consideration of the data in the unique data storage unit 37 and the common data storage unit 38 .
  • the server device 3 generates the server command information in consideration of the data of the unique data storage unit 37 and the common data storage unit 38, etc., so that the receiver device 1 does not need to program information such as program name and broadcast station name in advance.
  • the information of the unique data storage unit 37 and the common data storage unit 38 can be considered in the voice command issued to the user.
  • the user can not only use the voice command in a form close to ordinary language (natural language), but also set the command processing of the voice command to the user and the user's receiving device only by using the receiving device 1 of the present embodiment. 1's condition matches.
  • the server device 3 confirms "scheduled to be broadcast on ch5 of digital broadcasting at 17:00 in the future or scheduled to be distributed on the content server on the network 5" based on the program information, and further, at the same time, when it is confirmed that "connection to the network 5 is not possible” based on the information unique to the receiving apparatus, the server command "reservation for viewing: Saturday 17:05ch” is transmitted to the receiving apparatus 1 .
  • the voice command processing unit 2 can either use the received server command as a local command and cause the control unit 17 to execute it, or can be associated with the local voice command “Want to watch program A” and store it in the local voice command database Section 27.
  • the configuration in which the reception device 1 includes the voice command processing unit 2 is shown. In the present modification, other possible configurations will be described.
  • FIG. 17 is a functional block diagram showing a configuration example of a system according to a modification.
  • FIG. 17 is an example of a case where the voice command processing device 2A including the voice command processing unit 2 enables the reception device 1A to be controlled by a voice command.
  • the reception apparatus 1A corresponds to the reception apparatus in which the voice command processing unit 2 is removed from the reception apparatus 1 , but may be the same reception apparatus as the reception apparatus 1 .
  • the voice command processing device 2A includes the functions of the voice command processing unit 2 and a microphone, and may be a computer including a CPU and a memory.
  • the voice command processing device 2A may include A/D conversion for processing the audio signal output from the microphone, digital signal processing means such as DSP, and the like.
  • the voice command processing device 2A may be provided with a communication means (corresponding to the communication unit 13 in FIG. 2 ), not shown, for communicating with the server device 3 .
  • the local command output by the local command processing unit 23 of the voice command processing unit 2 may be input to the control unit 17 of the receiving device 1A via the network 5 .
  • the user issues a voice command to a microphone (not shown) of the voice command processing device 2A.
  • the voice received by the microphone is converted into voice data by A/D conversion or the like, and then the voice data is input to the voice command processing unit 2 .
  • the same processing as the voice command processing of the above-described embodiment can be performed, and the same effect can be obtained.
  • the reception device 1A can be remotely operated from the voice command processing device 2A via the network 5 .
  • the server command database unit 25 and the local voice command database unit 27 of the voice command processing unit 2 in the cloud server not only the receiving device 1A of a specific user but also the receiving device 1A of other users can perform The same voice command processing (sharing of the voice command processing device 2A) is also realized, and the voice command processing device 2A can be easily moved (portable).
  • (b) of FIG. 17 is an example of a case where the remote controller 10A including the voice command processing unit 2 controls the reception device 1A with a voice command.
  • the remote controller 10A is a remote controller including a voice command processing unit 2 in the remote controller 10 .
  • the remote controller 10A includes a function of a microphone, and may include a computer including a CPU and a memory, and a digital signal processing mechanism such as an A/D converter for processing a voice signal output from the microphone, a DSP, and the like.
  • the remote controller 10A may include a communication mechanism (corresponding to the communication unit 13 in FIG. 2 ), not shown, for communicating with the server device 3 .
  • the remote controller 10A includes a communication mechanism such as Bluetooth (BlueTooth) capable of communicating with the receiving device 1A
  • the remote controller 10A may be connected to the network 5 via the receiving device 1A to communicate with the server device 3 .
  • Bluetooth Bluetooth
  • the local command output by the local command processing unit 23 of the voice command processing unit 2 may be input to the control unit 17 of the receiving device 1A via a communication means such as Bluetooth (BlueTooth), or may be used as a signal using infrared rays or the like from the remote controller 10A.
  • Bluetooth Bluetooth
  • a normal remote control control signal is output to the receiving device 1A.
  • the user issues a voice command to a microphone (not shown) of the remote controller 10A.
  • the voice received by the microphone is converted into voice data by A/D conversion or the like, the voice data is input to the voice command processing unit 2 .
  • the same processing as the voice command processing of the above-described embodiment can be performed, and the same effect can be obtained.
  • the server command database unit 25 of the voice command processing unit 2, the local voice command database unit 27, and the like may be installed in the reception device 1A, a cloud server not shown, or the like.
  • a voice command processing circuit, a receiving device, a server, a system, a method, and a computer-readable non-volatile storage medium capable of adding locally processable voice commands can be provided.
  • conditional parameters displayed on the analysis screen and the like shown in the drawings the names, definitions, types, etc. of their options, values, evaluation indexes, etc. are shown as an example in the present embodiment, and are not It is limited to the example shown in this embodiment.
  • Embodiments of the present disclosure further provide a computer-readable non-volatile storage medium, where computer instructions are stored in the storage medium, and when the computer instructions are executed by a processor, the voice data processing in the above-mentioned embodiments is implemented.
  • control logic when expressed as a program including instructions for causing a computer to execute, and when expressed as a computer-readable nonvolatile storage medium in which the above-mentioned instructions are written In this case, the device of the present application can also be applied.
  • the names and terms used are not limited, and other expressions are included in the present application as long as they have substantially the same content and the same spirit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Telephonic Communication Services (AREA)

Abstract

提供一种能够增加在本地可处理的语音指令的语音指令处理电路、接收装置、服务器、语音指令的累积***和累积方法。语音指令处理电路对语音数据进行语音识别并输出识别结果,判定数据库中是否存在相当于识别结果的语音指令,其中,在数据库对用于控制装置的语音指令的信息、和所述语音指令执行的所述装置内部的控制指令即本地指令的信息建立了关联,基于判定机构的判定结果,从服务器获取所述数据库的信息。

Description

语音指令处理电路、接收装置、服务器、语音指令的累积***和累积方法
本申请要求在2021年1月21日提交日本专利局、申请号为2021-008062、发明名称为“语音指令处理电路、接收装置、服务器、***、方法及程序”的日本专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施方式涉及语音指令处理电路、接收装置、服务器、语音指令的累积***、语音指令的累积方法及非易失性存储介质。
背景技术
近年来,利用语音识别技术,能够通过人发出的语音指令来进行远程控制的家电装置得到了普及。在数字广播的电视接收装置中,在电视接收装置的内部(本地)实施特定发声模式等比较容易的语音识别,对于需要进行语法理解、自然语言处理等的复杂的任意的发声,通过组合云服务器等外部的服务器的语音识别,从而实现了高级的语音识别。
在先技术文献
专利文献
专利文献1:日本特表2015-535952号公报
专利文献2:日本特表2019-15952号公报
发明内容
然而,为了让用户以更接近自然语言的形式自由地发出语音指令,通常需要具备自然语言处理等高级的功能的外部的服务器。
本申请的目的在于,提供一种能够增加在本地能够处理的语音指令的语音指令处理电路、接收装置、服务器、***、方法及计算机可读的非易失性存储介质。
本申请实施方式的语音指令处理电路对语音数据进行语音识别并输出识别结果,判定数据库中是否存在相当于所述识别结果的语音指令,其中,在所述数据库中对用于控制装置的语音指令的信息、和所述语音指令执行的所述装置内部的控制指令即本地指令的信息建立了关联,基于判定机构的判定结果从服务器获取数据库的信息。
附图说明
图1是示出实施方式的***的构成例的功能模块图;
图2是示出实施方式的接收装置的构成例的功能模块图;
图3是示出实施方式的语音指令处理部的构成例的功能模块图;
图4是示出实施方式的服务器装置的构成例的功能模块图;
图5是示出第1实施方式的语音指令处理部能够处理的语音指令的例子的图;
图6是示出第1实施方式的语音指令处理部进行的语音信号的处理动作例的流程图;
图7是示出第1实施方式的接收装置的本地语音指令数据库部中的数据库的一个例子的图;
图8是示出第1实施方式的语音指令处理部制作本地语音数据的处理动作例的流程图;
图9是第1实施方式的语音指令处理部中存储的本地语音数据的一个例子;
图10是示出第1实施方式的服务器装置进行的语音数据的处理动作例的流程图;
图11是第1实施方式的服务器装置中存储的数据库的一个例子;
图12是用于第1实施方式的语音指令处理部对从多个用户接收到的语音指令进行处理的数据库的一个例子;
图13是示出第1实施方式的语音指令处理部能够处理的语音指令的例子的图;
图14是第2实施方式的语音指令处理部中存储的服务器指令信息的例子;
图15是第3实施方式的语音指令处理部中存储的数据库的例子;
图16是示出第3实施方式的服务器装置从多个服务器指令选择并向语音指令处理部发送服务器指令时的处理动作例的流程图;
图17是示出变形例的***的构成例的功能模块图。
附图标记说明
1…接收装置,2…语音指令处理部,3…服务器装置,5…网络,10…遥控器,11…调谐器,12…广播信号接收处理部,13…通信部,14…内容处理部,15…示出控制部,16…示出部,17…控制部,18…接口部,19…记录播放部,21…语音识别部,22…判定部,23…本地指令处理部,24…服务器数据获取部,25…服务器指令数据库部,26…本地指令生成部,27…本地语音指令数据库部,31…通信部,32…控制部,33…文本转换部,34…自然语言处理部,35…服务器指令生成部,36…应答语音生成部,37…固有数据存储部,38…共通数据存储部,101…数据存储部,261…高频度过滤器,262…条件设定部,371…接收装置数据存储部,372…本地指令数据存储部,381…共通信息数据存储部,382…服务器指令数据存储部。
具体实施方式
以下参照附图说明实施方式。
图1是示出本申请实施方式的***的构成例的功能模块图。
接收装置1是用于收看数字内容的接收装置,例如是接收2K或4K/8K这样的地面波广播、卫星广播等数字广播并能够收看的电视机的接收装置(也称为电视机装置、电视接收装置、广播信号接收装置)。有时也将从数字广播取得的数字内容称为广播节目。
接收装置1也可以具备CPU、存储器、DSP(Digital Signal Processor:数字信号处理器)等数字信号处理机构,能够进行使用了语音识别技术的控制。例如,若用户利用语音发出指令,则语音通过接收装置1的话筒(以后,有的情况下也称为麦克风)等语音集音功能来接收,在语音指令处理部2中,利用语音识别技术等取出指令,利用取出的指令来控制接收装置1的各种功能。另外,本申请实施方式中的接收装置1也可以进行来自远程控制器10(以后,有的情况下也称为遥控器10)的控制。具体而言,除了电源的开闭等通常的遥控器功能之外,例如遥控器10上附属的麦克风接收用户的语音,遥控器10将用户的语音作为语音数据发送到接收装置1。接收装置1根据接收到的语音数据,例如利用语音识别技术取出指令,控制接收装置1的各种功能。本实施方式中的接收装置1将基于取出的指令生成的控制信号向记录播放部19输出,控制记录播放部19。
另外,接收装置1具备用于与例如互联网等网络5连接的通信功能,能够与连接于网络5的各种服务器(还可以包含利用云构建的服务器)进行数据交互。例如,也能够从连接于网络5的未图示的内容服务器装置取得数字内容。有时也将从内容服务器装置取得的数字内容称为网络内容。
语音指令处理部2也可以具备CPU、存储器、DSP等数字信号处理机构,具备语音识别技术等功能。通过语音指令处理部2,能够从用户发出的语音中取出指令来控制接收装置1的内部功能。所谓语音指令,是用户为了控制接收装置1而通过语音向接收装置1输入的指令。如果语音指令与用于控制接收装置1的功能的内部指令(以后,有的情况下也称为本地指令)建立了关联,则接收装置1通过接收语音指令,从而能够控制接收装置1的功能。例如,若用于加大接收装置1的扬声器输出的音量的“提高音量”这样的语音指令与接收装置1的本地指令(例如设为volume_up)建立了关联,则当用户向接收装置1发出了“提高音量”时,接收装置1执行volume_up,接收装置1的扬声器的音量变大。在此,作为用于加大扬声器的音量的语音指令,不仅有“提高音量”,例如可以想到“提高声音”、“volume up”、“提高volume”等各种各样的变化。本实施方式的语音指令处理部2因为将这种变化与相同的本地指令(volume_up)建立关联,所以也能够使用自然语言处理。
此外,在图1中示出了在网络5上仅连接着1个接收装置1的例子,但是,也可以在网络5上连接多个接收装置1。另外,多个接收装置1不需要分别具备相同的功能,对制造者也没有限定。
服务器装置3是设置于网络5上的能够进行语音识别的服务器,例如包含具有CPU、存储器等的计算机,也可以具备DSP等数字信号处理机构等。服务器装置3也可以被构建为云服务器。服务器装置3具备语音识别技术。服务器装置3能够进行语音识别,经由网络5接收由接收装置1的麦克风等接收到的用户的语音的数字数据即语音数据,推定或者识别用户发出的语音,将识别出的语音作为文本数据(有的情况下也称为识别语音数据)而输出。关于语音识别技术,是通常的技术,省略详细的说明。
另外,服务器装置3能够进行自然语言处理,能够根据上述的“提高声音”、“volume up”、“提高volume”等语言来取出符合语言的意思的接收装置1的本地指令。即,通过在服务器装置3中利用自然语言处理,用户不仅能够将特定的语音指令作为语音指令,还能够将任意的语言作为语音指令。例如,用户通过发出“提高声音”、“volume up”、“提高volume”等语言,从而能够经由服务器装置3执行接收装置1的本地指令(volume_up),加大扬声器的声音。此外,还可以使接收装置1具备服务器装置3的功能,但是,因为自然语言处理通过利用大数据等大容量数据来达到性能改善,所以希望在利用云等构建的服务器装置3中具备该功能。
另外,服务器装置3除了能够取得接收装置1的本地指令等信息之外,还能够取得接收装置1的各种各样的信息。
网络5是连接着接收装置1、服务器装置3等并能够通信的网络,例如是互联网。另外,网络5不仅限于互联网,只要各装置能够通信,则也可以是无关于有线无线地包含多个不同的网络的网络。
遥控器10是用于对接收装置1进行远程控制的遥控器。本实施方式中的遥控器10例如可以具备能够接收用户发出的语音的麦克风等的语音集音功能。另外,遥控器10也可 以具备用于将接收到的语音数据进行外部发送的例如蓝牙(BlueTooth)(注册商标)、WiFi(注册商标)等接口功能。
图2是示出实施方式的接收装置的构成例的功能模块图。调谐器11从天线、有线广播等接收期望的频带的电波,通过解调处理等获得广播信号(数字数据),并输出。
广播信号接收处理部12对从调谐器11接收到的广播信号,根据数字广播的标准进行处理,获取图像、声音、文字等内容数据并输出。例如,作为数字广播的标准,既可以是2K数字广播中采用的MPEG2TS方式、或4K/8K数字广播中采用的MPEG Media Transport方式(MMT方式)等,也可以通过多个调谐器来对应双方。作为与数字广播的标准相应的处理,包含将从调谐器11输入的数字数据分离成图像、声音、文字等内容数据的数字数据流的解复用处理、纠错码解码处理、将加密的数据解密的解密处理、将对各内容数据实施的编码(图像编码、语音编码、文字编码等)进行解码处理等。
通信部13连接于网络5并与网络5上的各种服务器及装置进行通信。具体而言,例如通过与TCP/IP、UDP/IP这样的预先确定的通信协议等相应的收发处理来交互数字数据。
内容处理部14例如经由通信部13接收连接于网络5的未图示的内容服务器提供的内容数据。内容处理部14对于经由通信部13接收到的数据,针对内容服务器实施的编码处理实施解码处理等,获取图像、声音、文字等内容数据,并输出。更具体而言,内容处理部14作为解码处理,例如也可以实施解复用处理(分离处理)、纠错码解码处理、对于编码的内容数据(图像、文字、声音等)的解码处理等。
示出控制部15对于广播信号接收处理部12、内容处理部14、以及记录播放部19输出的内容数据,调整输出时刻、显示方法等,并输出。根据记录播放部19中记录的数据内容,对于从记录播放部19输出的数据,也可以在实施了解复用处理(分离处理)、纠错码解码处理、对编码的内容数据(图像、文字、声音等)的解码处理等之后,输入到示出控制部15。
示出部16例如是显示图像、文字的显示器或输出声音的扬声器等。示出部16将示出控制部15输出的内容数据作为图像、文字、声音等而输出。用户通过收看示出部16输出的图像、文字、声音等,从而收看由广播信号、未图示的内容服务器提供的数字内容。
控制部17控制接收装置1的各功能。具体而言,控制部17从接口部18、语音指令处理部2等接收各种指令信号,基于接收到的各种指令信号来输出用于控制接收装置1的各功能的控制信号。例如,在用户从遥控器10指定了要收看广播信号的内容、还是要收看来自内容服务器的内容的情况下,控制部17经由接口部18接收来自遥控器的指令信号,控制接收装置1的功能,使其进行用户指定的动作。此外,在图2中,在与控制部17之间没有特别连线的功能模块之间,也可以进行数据交互。
接口部18是用于从遥控器10等接收指令信号、或从控制部17等向外部装置输出控制信号的接口。例如,接口部18从接收装置1的未图示的开关、遥控器10等接收指令信号,并将指令信号向接收装置1的控制部17输出。也可以代替遥控器10,而具有从未图示的智能手机等终端接收指令信号的接口。另外,接口部18具有用于与外部装置连接的接口,例如也可以是用于将接收装置1与外接的记录播放装置连接的接口。
另外,本实施方式中的接口部18包含用于从接收装置1的外部接收语音的例如麦克风。接口部18也可以通过模拟/数字转换(A/D转换)等将用麦克风接收到的语音作为 数字化之后的语音数字数据(有的情况下也称为语音数据)而输出。
记录播放部19例如是唱片机、HDD录像机,例如能够对从广播信号、互联网等接收的声音、图像等内容数据进行记录,并播放。此外,图1所示的记录播放部19示出了内置在接收装置1中的例子,但是,也可以是连接于接收装置1的外部装置,例如也可以是能够进行内容数据的录像及播放的机顶盒(Set Top Box)、声音播放器、PC等。
数据存储部101例如是存储器,也可以是用于存储各种数据的数据库。数据存储部101存储接收装置1的收看信息、从收看信息获得的解析结果、型号、各种功能性能等接收装置1固有的信息(有的情况下也称为接收装置数据)。
语音指令处理部2将从接口部18接收到的语音数据经由通信部13向服务器装置3输出,从服务器装置3接收本地指令数据相关的信息。另外,本实施方式的语音指令处理部2基于从服务器装置3取得的本地指令数据相关的信息来生成控制信号,并将生成的控制信号输出到控制部17等。
图3是示出实施方式的语音指令处理部的构成例的功能模块图。
语音识别部21对从接口部18输入的语音数据实施语音识别,并输出文本数据。在语音识别技术中,通常使用隐马尔可夫模型(hidden malkov model)(HMM)这样的方法,但是,存在将文本的“字符列”作为对象而应用HMM的特定字符列识别方式、以及对于文章的每“1字符”应用HMM的转成文字方式这2个方式。在本实施方式中,能够应用上述两种方式。语音识别部21在转成文字方式的情况下能够进行任意的字符列的检测,在特定字符列识别方式的情况下能够随时变更或增加其识别对象字符列。
判定部22确认在本地语音指令数据库部27中是否存储有语音识别部21输出的文本数据。判定部22在确认了存在相当于文本数据的语音指令的数据(本地语音指令的数据)的情况下,将确认的本地语音指令视为语音指令,将用于使与语音指令相关联的本地指令执行的控制信号等输出到控制部17。所谓本地语音指令,是与接收装置1的本地指令建立关联并被存储在本地语音指令数据库部27中的语音指令。此外,例如也可以将用于使语音识别启动的唤醒语音等作为本地语音指令,在接收装置1中预先配置。
本地指令处理部23基于判定部22的控制信号,将与本地语音指令建立了关联的本地指令、与从服务器数据获取部24获取到的服务器指令信息建立了关联的本地指令等输出到控制部17。
服务器数据获取部24对服务器装置3请求服务器指令信息,从服务器装置3接收服务器指令信息。服务器指令信息是用于生成本地语音指令的信息,包含服务器装置3基于输入的语音数据或者对该语音数据进行语音识别而获得的语音指令所选择的接收装置1的本地指令。
服务器指令数据库部25例如是存储器,可以是存储从服务器装置3接收到的服务器指令信息等的数据库。
本地语音指令生成部26根据存储在服务器指令数据库部25中的服务器指令信息而生成本地语音指令的信息。本地指令处理部23在生成本地语音指令时,也可以考虑语音指令的使用频度、指令处理优先度等。语音指令的使用频度例如也可以采用每次语音识别部21接收或者识别在服务器指令数据库部25等中登记的语音指令时被计数的值。
高频度过滤器261是本地语音指令生成部26根据服务器指令信息生成本地语音指令时使用的过滤器。具体而言,高频度过滤器261例如在每次语音识别部21接收在服务器 指令数据库部25等中登记的语音指令时,按照每个语音指令计数获取频度(使用频度)。高频度过滤器261将计数信息存储于服务器指令数据库部25或者本地语音指令数据库部27等。高频度过滤器261基于计数的使用频度,从服务器指令数据库部25的数据中提取至少1个本地语音指令的信息。将由高频度过滤器261提取到的语音指令作为本地语音指令,与本地指令建立关联并存储到本地语音指令数据库部27。
本地语音指令数据库部27例如是存储器,可以是存储包含本地语音指令生成部26输出的本地语音指令、建立了关联的本地指令等在内的信息的数据库。
图4是示出本申请实施方式的服务器装置的构成例的功能模块图。
通信部31是用于与接收装置1、服务器装置3等网络5上的装置等进行数据通信的接口,例如具备TCP/IP、UDP/IP这样的协议。
控制部32控制服务器装置3内的各种功能。经由通信部31从外部装置接收各种控制信号等各种数据,根据需要而解析、加工,并输出到服务器装置3内部的各功能模块。另外,从服务器装置3内部的各功能模块接收各种数据,根据需要而进行数据的模块化、格式化等,并向通信部31输出。
文本转换部33例如对用户发出的语音数据进行语音识别,将识别出的语音作为文本数据(也有的情况下称为识别语音数据)而输出。也可以是与接收装置1的语音识别部21同样的功能。
自然语言处理部34对于从文本转换部33输入的文本数据实施自然语言处理,生成或选择与文本数据代表的处理相当的服务器指令(相当于本地指令)。在自然语言处理中,对文本数据的文章的构成、含义进行解析,例如从在服务器装置3的服务器指令数据存储部382等中存储的语音指令、接收装置1的本地指令等的数据组中提取与文本数据类似的数据。
服务器指令生成部35制作服务器指令信息,该服务器指令信息将文本转换部33输出的文本数据(相当于语音指令)、与针对该文本指令由自然语言处理部34提取出的接收装置1的本地指令建立了关联。有时也将由自然语言处理部34提取出的接收装置1的本地指令称为服务器指令。
应答语音生成部36也可以在被输入的文本指令是从接收装置1的扬声器通过声音输出短句或短语(phrase)那样的语音指令的情况下,例如生成该短句或短语的语音数据。为了生成语音数据,也可以具备语音合成等处理。例如,服务器指令生成部35在提取到“用于使语音从扬声器输出的接收装置1的本地指令”的情况下,也可以生成包含提取出的本地指令以及应答语音生成部36生成的“短句的语音数据”等的服务器指令信息。接收装置1若接收到服务器指令生成部35生成的服务器指令信息,则也可以从示出部16的扬声器输出“短句的语音数据”,作为语音向用户示出。接收装置1也可以将接收到的“用于使语音从扬声器输出的接收装置1的本地指令”连同接收到的“短句的语音数据”建立关联地存储于本地语音指令数据库部27。即,将作为语音信息的“短句的语音数据”与本地指令建立关联地存储到数据库中。由此,语音指令处理部2若从用户接收了语音指令,则执行在本地语音指令数据库部27中与语音指令建立了关联的本地指令“从扬声器将短句1作为语音而输出”,能够从示出部16的扬声器输出与本地指令建立了关联的短句1“短句的语音数据”。
另外,也可以在接收装置1侧具备语音合成的功能。在此情况下,服务器指令生成部 35将提取到的“用于使语音从扬声器输出的接收装置1的本地指令”连同作为语音而输出的短句的文本数据一起发送到接收装置1。接收装置1根据接收到的短句的文本数据,通过语音合成等来生成语音数据,同时实施与接收到的本地指令相应的处理。例如,接收装置1在连同本地指令“将接收到的短句从扬声器输出”一起接收到了短句的文本数据“您好”的情况下,生成“您好”的语音数据,并从扬声器输出。接收装置1也可以将接收到的短句的文本数据连同本地指令一起保存在本地语音指令数据库部27中。由此,语音指令处理部2若从用户接收了语音指令,则执行在本地语音指令数据库部27中与语音指令建立了关联的本地指令“从扬声器将短句1作为语音而输出”,通过语音合成等将与本地指令建立了关联的“短句的文本数据”设置为语音数据,能够从示出部16的扬声器作为语音而输出。
另外,在接收装置1、服务器装置3均具备语音合成的功能的情况下,服务器指令生成部35也可以将连同提取到的“用于使语音从扬声器输出的接收装置1的本地指令”一起作为语音而输出的短句的文本数据、以及其语音数据发送到接收装置1。接收装置1既可以根据本地指令(服务器指令)来处理语音数据,也可以通过语音合成等将文本数据设置为语音数据并进行处理。
固有数据存储部37例如是存储器,也可以是用于存储关于接收装置1的数据的数据库。另外,在网络5上连接有多个接收装置1从而多个接收装置1共有服务器装置3的情况下,在固有数据存储部37中,也可以将多个接收装置1的数据按照每个接收装置1存储。也可以经由网络5从接收装置1获取固有数据存储部37中存储的数据。
在接收装置数据存储部371存储有从接收装置1发送的接收装置1的固有信息,例如存储有以下那样的数据。
·接收装置1的型号、各种功能性能(录像功能等)
·接收装置1当前显示中的频道信息(也可以包含广播节目、录像播放等外部输入、网络5等内容的区别)
·接收装置1能够接收的广播电台的信息(频道编号、广播电台名等)
·接收装置1能够录像的节目的录像预约信息
·接收装置1录制的已录制内容信息
在本地指令数据存储部372中,存储有接收装置1固有地具备的本地指令的信息。对于本地指令的信息,也可以从接收装置1分别经由网络5而获取,按照每个接收装置1存储于本地指令数据存储部372。另外,在多个接收装置1为相同的产品的情况下,因为具备的本地指令是相同的,所以服务器装置3的管理者也可以将本地指令的信息直接输入到服务器装置3。在设置有公开了连接于网络5的该接收装置1的产品信息的未图示的产品信息服务器等的情况下,服务器装置3也可以从产品信息服务器经由网络5而获取本地指令的信息。
共通数据存储部38可以是对于连接于网络5的多个接收装置1能够共通地使用的数据的数据库。
共通信息数据存储部381可以是能够从连接于网络5的外部装置等获取的数据的数据库。例如,是能够通过数字广播来收看的节目表的信息等。在接收装置1能够从广播信号获取节目表等的情况下,服务器装置3也可以从接收装置1经由网络5而获取节目表。
服务器指令数据存储部382也可以是存储有服务器指令生成部35生成的服务器指令信息的数据库。另外,服务器指令生成部35在生成服务器指令信息时,也可以将服务器 指令数据存储部382的数据库作为参照数据来利用。
第1实施方式
在本实施方式中,说明以下的例子:将对于从用户接收到的语音数据使用服务器装置3等外部装置的语音识别而获得的语音指令累积在接收装置1中,通过累积的语音指令(本地语音指令)来执行接收装置1的本地指令。
图5是示出第1实施方式的语音指令处理部能够处理的语音指令的例子的图,按照每行示出了在接收装置1中能够使用的语音指令、根据左侧的语音指令而能够执行的本地指令、根据左侧的本地指令而在接收装置1中执行的指令处理。
例如,在No1的行的例子中,若在语音指令处理部2中识别了语音指令“接通电源”,则本地指令“power_on”被输入到控制部17,控制部17执行“power_on”,从而指令处理“打开电视机的电源”被执行。因此,若用户发出了“接通电源”的声音,则电视机(接收装置1)的电源变成ON。
在本实施方式中,对于1个本地指令能够关联多个语音指令。例如,图5的No2、No3、No4的语音指令与本地指令“power_on”建立了关联,对于接收装置1的本地指令“power_on”,能够使用多个语音指令。No5至No8的语音指令与本地指令“volume_up”建立了关联,是通过用户发出No5至No8的语音指令,从而在接收装置1中执行指令处理“提高电视机的音量”的例子。
以下,使用附图来说明本实施方式的动作。
图6是示出第1实施方式的语音指令处理部进行的语音信号的处理动作例的流程图。
若用户发出了语音指令,则语音数据通过接口部18的麦克风输入到语音指令处理部2(步骤S101)。语音数据被输入到语音识别部21,通过语音识别而转换成文本数据(步骤S102)。文本数据被输入到判定部22,判定部22确认是否存在与被输入到本地语音指令数据库部27中的文本数据相当的本地语音指令(步骤S103)。判定部22在判定为存在与被输入到本地语音指令数据库部27中的文本数据相当的本地语音指令的情况下,将与该本地语音指令建立了关联的本地指令输出到控制部17(步骤S103的是)。控制部17执行被输入的本地指令(步骤S104)。在步骤S103中,既可以将被输入到判定部22中的文本数据与本地语音指令数据库部27的本地语音指令完全一致的情况作为认定为是(YES)的条件,即使有一些不同也可以认定为是。步骤S103中的条件也可以是能够由用户设定。
另一方面,判定部22在判定为不存在与文本数据相当的本地语音指令的情况下,连同获取到文本数据的语音数据一起将语音指令识别请求从服务器数据获取部24输出到服务器装置3(步骤S105)。服务器数据获取部24从服务器装置3接收服务器指令信息(步骤S106)。
图7是示出第1实施方式的接收装置的本地语音指令数据库部中的数据库的一个例子的图,图7的(a)在每一行示出了接收装置1接收到的语音指令、根据左侧的语音指令而能够执行的接收装置1的本地指令、根据左侧的本地指令而在接收装置1中执行的指令处理。最靠右的标志(Flag)是服务器装置3对于同行的语音指令赋予的标志信息。例如,图7的(a)中的Flag示出了对于相同行的语音指令,服务器装置基于条件判断的有效(OK)、无效(NG)。例如,图7的(a)的No5、No10示出了在服务器装置3中未能与本地指令建立关联的语音指令,设置为Flag=NG。用于赋予Flag的条件不限定于上述条件,是任意的,另外,Flag的值也可以不是用OK、NG等2值表示的值。此外,在如No5、No10 那样在服务器侧无法识别被输入的语音指令的(未找到对应的本地指令的)情况下,服务器装置3也可以向接收装置1返回相当于重试(retry)那样的本地指令(服务器指令)、示出“请再说一次”等应答消息的本地指令(服务器指令)。接收装置1也可以根据接收到的服务器指令,实施处理、或者等待用户的命令。
回到图6,在步骤S106中从服务器装置3接收的服务器指令信息既可以是图7的(a)所示的语音指令的1行的量,也可以是多行的量。
例如,说明服务器数据获取部24作为语音指令的1行的量来接收到仅包含图7的(a)的No3的服务器指令信息的情况。服务器数据获取部24将服务器指令信息中包含的本地指令“power_on”输出到控制部17,使其执行本地指令“power_on”。另外,同时,服务器数据获取部24向服务器指令数据库部25输出仅包含No3的服务器指令信息。服务器指令数据库部25将被输入的服务器指令信息存储在数据库中(步骤S107)。本地语音指令生成部26确认在本地语音指令数据库部27中是否已存储有被存储到服务器指令数据库部25中的服务器指令信息中包含的语音指令,如果没有确认,则将服务器指令信息中包含的语音指令作为本地语音指令而存储到本地语音指令数据库部27中(步骤S108的否、步骤S109)。
图7的(b)示出了对每个本地指令以频度为基准,逐个地提取的情况下的本地语音指令的数据。图7的(b)示出了:作为针对No3的本地指令“power_on”的本地语音指令而选择了“想要看电视”、作为针对No2的本地指令“volume_up”的本地语音指令而选择了“volume up”的例子。
另外,还能够根据服务器指令数据库部25中存储的数据库,利用语音指令的使用频度来制作本地语音指令数据库部27的数据库。
图8是示出第1实施方式的语音指令处理部制作本地语音数据的处理动作例的流程图。假设图7的(a)的数据已被存储在服务器指令数据库部25中。若用户发出语音指令,则通过接口部18的麦克风,语音数据被输入到语音指令处理部2(步骤S121)。语音数据被输入到语音识别部21,并通过语音识别而转换成文本数据(步骤S122)。文本数据被输入到高频度过滤器261,高频度过滤器261确认在服务器指令数据库部25中是否存在与输入的文本数据相当的语音指令(步骤S123)。高频度过滤器261在服务器指令数据库部25中找到了与文本数据相当的语音指令的情况下,对于该语音指令,作为使用频度,计数加1(步骤S124)。
图9是第1实施方式的语音指令处理部中存储的本地语音数据的一个例子,示出了针对每个语音指令赋予了使用频度的数据的例子。例如示出了No1的语音指令“接通电源”的使用频度为5次,No8的语音指令“volume up”的使用频度为45次。
回到图8,高频度过滤器261以使用频度为基准,从服务器指令数据库部25中累积的语音指令中,按照每个本地指令选择本地语音指令(步骤S125)。将由高频度过滤器261提取出的语音指令作为本地语音指令存储到本地语音指令数据库部27中(步骤S126)。在本地语音指令数据库部27中也可以如图7的(b)那样存储本地语音指令。
利用以上的步骤,能够将对于从用户接收到的语音数据使用外部(服务器装置3)的语音识别而获得的服务器指令信息累积到接收装置1中,并通过从累积的服务器指令信息提取到的语音指令(本地语音指令)来执行接收装置1的本地指令。
以下,示出本实施方式中的服务器装置3的动作例。
图10是示出第1实施方式的服务器装置进行的语音数据的处理动作例的流程图,示出作为语音指令处理部2的处理的图6的步骤S105、S106之间的服务器装置3的处理动作例。
语音指令处理部2连同语音数据一起发送语音指令识别请求(图6的步骤S105)。服务器装置3的控制部32若接收到语音指令识别请求,则将同时接收到的语音数据输出到文本转换部33(步骤S151)。文本转换部33对语音数据进行语音识别,转换成文本数据,并输出到自然语言处理部34(步骤S152)。自然语言处理部34对于被输入的文本数据实施自然语言处理,确认在本地指令数据存储部372是否存储有与文本数据代表的处理相当的本地指令(步骤S153)。
图11是第1实施方式的服务器装置中存储的数据库的一个例子,是服务器装置3的本地指令数据存储部372中存储的接收装置1的本地指令相关的数据的例子。也可以如图11那样,在每行存储有接收装置1的“本地指令”和该指令执行的“指令处理”。
回到图10,自然语言处理部34将从被输入的文本数据提取到的意思等与图11的数据进行比较,选择与被输入的文本数据的意思接近的本地指令(步骤S154)。在找到了与文本数据相当的本地指令的情况下,服务器指令生成部35对Flag设定表示“OK”的例如1的值,包含Flag地制作服务器指令信息(步骤S155)。服务器指令生成部35将服务器指令信息从通信部31发送到接收装置1(步骤S156)。在接收装置1中,语音指令处理部2接收服务器指令信息(图6的步骤S106)。
利用以上的步骤,语音指令处理部2即使在无法应对接收到的语音指令的情况下,也通过从服务器装置3获取服务器指令信息,从而能够执行语音指令。另外,语音指令处理部2通过将服务器指令信息累积在自身的存储器等中,从而在接收到同样的语音指令的情况下,能够不经由服务器装置3地利用该语音指令。
图12是用于第1实施方式的语音指令处理部处理从多个用户接收到的语音指令的数据库的一个例子,是多个用户使用1个接收装置1的情况下的数据库的例子。本数据库也可以被存储在服务器指令数据存储部382中。
在语音指令处理部2中,在本地语音指令的生成时使用高频度过滤器261的情况下,若不识别用户,则有时会仅将电视的收看频度高的用户的语音指令作为本地语音指令而登记。
图12的(a)是接收装置1能够识别发出语音指令的用户的情况下的针对本地指令的语音指令的数据库的例子。通过如本例这样按照每个识别出的用户,将语音指令数据库化,对各个语音指令计数使用频度,按照每个用户应用高频度过滤器261,从而能够生成按照每个用户考虑了使用频度的本地语音指令。图12的(b)是将图12的(a)的语音指令中的全部的用户的语音指令合并起来的情况下的数据库的一个例子,是与图9所示的例子同样的数据库。
图13是示出第1实施方式的语音指令处理部能够处理的语音指令的例子的图,是在语音指令处理部2中能够补齐的本地语音指令的例子。在每行示出语音指令的“执行日”、在左侧的执行日执行过的“语音指令”、根据左侧的语音指令而处理的“服务器指令”(相当于接收装置1的本地指令)、根据左侧的服务器指令而处理的“指令处理”、表示左侧的服务器指令是否能够缓存的信息的“缓存可否”。
此外,在针对语音指令的服务器指令始终成为固定的应答那样的情况下,在“缓存可 否”信息中也可以设定表示进行缓存的信息。另一方面,在针对语音指令的服务器指令成为例如像“请告知现在观看的节目的名称”等那样只限于当场的(例如取决于时间日期那样的)应答的情况下,也可以设定表示不将该服务器指令进行缓存的信息。另外,“缓存可否”信息也可以设为图7所示的数据库中的“Flag”,在此情况下,也可以在服务器装置3判断为将服务器指令“进行缓存”的情况下,将Flag设为True,在判断为“不进行缓存”的情况下,将Flag设为false。
No1的行是如下情况的例子:在用户例如在执行日“1月8日”发出了语音指令“今天是几月几日?”的情况下,在接收装置1中,语音指令处理部2根据语音指令识别请求而从服务器装置3接收到服务器指令“语音应答“1月8日””。若语音指令处理部2将接收到的服务器指令(也是本地指令)输出到控制部17,则控制部17执行指令处理“从扬声器语音输出“1月8日””,从示出部16的扬声器输出“1月8日”的声音。
然而,若执行日变化,则服务器指令“语音应答“1月8日””的应答内容会变化。即,如No1的行的缓存可否设为“NG”那样,服务器指令“语音应答“1月8日””有时被视为无法进行缓存的信息、或者没有缓存的意义的信息。
因此,服务器装置3如No2的行那样,以“语音应答“$Month月$Date日””的方式将存在变动的可能性的部分设为变量而制作服务器指令(称为被变量化的服务器指令)。此外,服务器指令的变量化既可以由服务器装置3实施,也可以由语音指令处理部2实施。在语音指令处理部2实施的情况下,也可以例如在接收到No1的行的服务器指令的情况下,将服务器指令“语音应答“1月8日””存储在服务器指令数据库部25中,本地语音指令生成部26作为针对本地语音指令“今天是几月几日?”的本地指令,关联“语音应答“$Month月$Date日””。由此,如No3的行那样,在用户在执行日“2月18日”发出了语音指令“今天是几月几日?”的情况下,语音指令处理部2能够基于建立了关联的本地指令、“语音应答“$Month月$Date日””以及从广播信号等获取的日期信息,从示出部16的扬声器进行“2月18日”的语音应答,或者使显示器进行显示。接收装置1或语音指令处理部2也可以能够生成合成语音等语音。
No2、No3的行的变量化的服务器指令不取决于执行日,因此,项目“缓存可否”也可以在双方均设为“OK”,设为能够进行缓存。此外,在图13中示出了取决于日期的本地指令的例子,但是,不限定于本例,例如,关于取决于时间日期、季节、前后的上下文关系等的本地指令,也能够同样地在语音指令处理部2补全。
通过以上的步骤,通过将对于从用户接收到的语音数据使用服务器装置3(云服务器等)的语音识别而识别出的语音指令与本地指令建立关联,从而能够通过接收装置1过去无法应对的语音指令来执行接收装置1的本地指令。
通常,云服务器等进行的语音识别具有吸收作为用于实现音量UP处理的语音指令的“提高音量”、“提高声音”、“volume up”、“提高volume”等用户的发声的波动的作用。但是,实际上,在1个用户利用时不怎么存在发声的波动,大多时以恒定的表现来发声。在这样的情况下,有的情况下通过利用将语音指令的使用频度作为基准的高频度过滤器261,确定经常使用的发声(语音指令)和与其对应的处理(本地指令)的组合,对于1个本地指令将多个语音指令设定为本地语音指令,由此,能够设定每个用户的本地语音指令。在此情况下,不需要如图12的(a)那样按每个用户进行区别,有的情况下通过将图9所示的每个接收装置1接收到的语音指令累积,对累积的语音指令应用高频度过滤 器261,由此也进行用户识别。另外,通过在接收装置1或者语音指令处理部2中不断地设定、累积本地语音指令、与本地指令的关联信息等,从而接收装置1或者语音指令处理部2能够高速地检测经常使用的发声,不使用自然语言处理而能够进行与自然语言处理相当的处理,自主地使目标的处理进行。由此,不再需要经由服务器装置3,还能够实现接收装置1或者语音指令处理部2中的语音识别等的处理时间的缩短等。进一步,在本实施方式的接收装置1或者语音指令处理部2中设定的发声内容(本地语音指令)还能够在此后离线地使用。
第2实施方式
在本实施方式中,示出将服务器装置3对于识别(或者也可以是接收)的1个语音指令生成的服务器指令与多个本地指令建立关联的情况的例子。具体而言,本地语音指令生成部26基于条件设定部262中设定的优先度来确定与1个语音指令关联的本地指令的处理。
图14是第2实施方式的语音指令处理部中存储的服务器指令信息的例子,示出了服务器装置3接收到的语音指令“想看长颈鹿”、服务器指令生成部35针对语音指令“想看长颈鹿”生成或获取到的服务器指令“输出节目K”、以及针对服务器指令“输出节目K”在接收装置1中能够进行的4个本地指令的指令处理。进一步,对每一个指令处理在同一行中示出了其频度、优先度。
本地语音指令生成部26基于优先度来确定针对服务器指令“输出节目K”的指令处理。
本地语音指令生成部26也可以以按优先度顺序执行指令处理的方式与语音指令建立关联地存储于本地语音指令数据库部27。例如在图14中,因为优先度按No4、No2、No3、No1的行的顺序从高到低地设定,所以按No4、No2、No3、No1的行的顺序来执行指令处理。更具体而言,若用户发出“想看长颈鹿”,则语音指令处理部首先执行No4的行的指令处理“显示广播节目K”。如果在执行时正在对广播节目K进行广播,则能够“显示广播节目K”,但是,如果不是正在对广播节目K进行广播,则无法“显示广播节目K”。因此,根据条件的不同,与语音指令建立关联的指令处理能够执行、或者无法执行。在无法执行No4的行的指令处理的情况下,执行具有下一个优先度的No2的行的指令处理。以下,同样地,考虑条件、环境等按优先度顺序不断地执行指令处理。针对指令处理的优先度等条件也可以由用户从遥控器设定。
通过以上的步骤,能够根据接收装置1、接收装置1内部的各种功能部等的条件,对用户发出的语音指令与多个本地指令(指令处理)建立关联。另外,通过对建立了关联的指令处理赋予优先度,例如使得能够按优先度顺序来执行指令处理,从而对于用户发出的语音指令,能够进行更合适的指令处理。此外,也可以并非按优先度顺序来执行多个指令处理,而是将优先度最高的1个指令处理与1个语音指令建立关联。关于将优先度如何利用于建立关联,既可以能够由用户从遥控器等进行设定,也可以从连接于网络5的未图示的服务器下载关于建立关联的信息。另外,图14所示的频度既可以是指令处理的使用频度,也可以是例如控制部17等预先对指令处理的频度进行计算,本地语音指令生成部26基于该频度来确定优先度。
第3实施方式
在本实施方式中,示出服务器装置3对于1个语音指令生成了多个服务器指令的情况下的例子。
图15是第3实施方式的语音指令处理部中存储的数据库的例子,是服务器装置3对于语音指令“现在的天气怎么样?”生成了3个服务器指令的情况下的数据的例子。在图15中,对每个服务器指令按每行示出了服务器指令的指令处理、频度、期限(expired)。
频度可以是服务器指令的使用频度,既可以在接收装置1侧确定,也可以在服务器装置3侧确定。在服务器装置3侧确定的情况下,例如也可以利用服务器指令数据存储部382的数据库并利用来自多个接收装置1的信息来确定。另外,通过将在接收装置1侧计数的服务器指令(相当于本地指令)的使用频度提供给服务器装置3,从而服务器装置3能够基于来自多个接收装置1的频度信息来确定频度。也可以并非将来自多个接收装置1的频度信息一并利用,而是分别利用接收装置1的频度,对每一个接收装置1确定服务器指令或本地指令。
在本实施例中,将频度的大小用作优先度,本地语音指令生成部26基本上以频度的大小的顺序来确定接收装置1执行的指令处理,但是,还考虑expired这样的条件。expired表示指令处理的有效期限,例如,图15的No1的expired“2021/1/20:00”表示No1的服务器指令及指令处理“到2021年1月2日的0:00时为止有效”。No1的服务器指令“语音应答“晴转阴””是取决于时间日期的指令,因此,是被赋予expired的条件的例子。此外,“expired”也可以设为图7所示的数据库中的“Flag”,在此情况下,也可以是,服务器装置3判断服务器指令的有效期限“expired”,在服务器指令为有效期限内的情况下将Flag设为True,在服务器指令超出了有效期限的情况下,将Flag设为false。
在本实施例中,在用户于“2021/1/20:00”之前发出了语音指令“现在的天气怎样?”的情况下,在接收装置1中执行No1的指令处理。但是,在用户于“2021/1/20:00”之后发出了语音指令“现在的天气怎样?”的情况下,执行下一个频度高的No3的指令处理。优先度的利用的方式等也能够适用第2实施方式所示的方法。另外,在No1的指令处理中,“晴转阴”的部分能够进行第1实施方式中示出的变量化。在进行了变量化的情况下,也可以是,语音指令处理部2在从用户接收到语音指令“现在的天气怎样?”的情况下,与expired无关地,从广播信号、网络5上的未图示的服务器等参照最新的天气信息,使最新的天气信息从示出部16的扬声器进行语音输出。
图16是示出第3实施方式的服务器装置从多个服务器指令选择并向语音指令处理部发送服务器指令时的处理动作例的流程图,是服务器装置3利用从接收装置1等外部装置获得的信息从多个服务器指令中选择服务器指令并输出到语音指令处理部的例子。
服务器装置3的控制部32若接收到语音指令处理部2发送的语音指令识别请求,则将同时接收到的语音数据输出到文本转换部33(步骤S251)。文本转换部33对语音数据进行语音识别,转换成文本数据,输出到自然语言处理部34(步骤S252)。自然语言处理部34对输入的文本数据实施自然语言处理,确认是否在本地指令数据存储部372、共通数据存储部38中存储有与文本数据代表的处理相当的本地指令的信息。(步骤S253)。服务器指令生成部35获取由自然语言处理部34确认的本地指令的信息(步骤S254)。服务器指令生成部35基于获取到的本地指令的信息而生成服务器指令。在生成的服务器指令为多个的情况下,服务器指令生成部35从固有数据存储部37获取接收装置1的固有信息(步骤S255的是、S256)。服务器指令生成部35基于接收装置1的固有信息,选择要从多个服务器指令向接收装置1发送的服务器指令(步骤S257)。例如,也可以根据确认过接收装置1的固有信息“禁止语音输出”、“扬声器无效”等状况,而不选择图15的 No1的服务器指令。此外,也可以不仅利用接收装置1的固有信息,还利用节目信息等共通数据存储部38的数据。例如,也可以根据从节目信息确认了“在1小时以内不存在广播预定的天气节目”的状况,而不选择图15的No2的服务器指令。
服务器指令生成部35包含选择的服务器指令、根据需要而由应答语音生成部36制作出的应答语音等在内地制作服务器指令信息,并经由通信部31输出到语音指令处理部2。(步骤S258)
通过以上的步骤,在服务器装置3对于被输入的语音指令确认了多个对应的本地指令的情况下,服务器装置3能够使用固有数据存储部37、共通数据存储部38的数据等从多个服务器指令中选择,将包含它们的服务器指令信息提供给语音指令处理部2。语音指令处理部2通过将从服务器装置3提供的服务器指令信息获取的语音指令和与其建立了关联的服务器指令(相当于本地指令)登记到本地语音指令数据库部27,从而通过用户发出的语音指令而在接收装置1中执行考虑了固有数据存储部37、共通数据存储部38的数据的指令处理。
根据本实施方式,通过服务器装置3考虑固有数据存储部37、共通数据存储部38的数据等地生成服务器指令信息,从而在接收装置1侧无需事先编入节目名、广播电台名等信息,而能够对用户发出的语音指令中考虑固有数据存储部37、共通数据存储部38的信息。由此,用户仅通过利用本实施方式的接收装置1,不仅能够以接近普通的语言的形式(自然语言)来利用语音指令,还将语音指令的指令处理设定为与用户、用户的接收装置1的状况匹配。
例如,若用户发出了“想看节目A”,则服务器装置3根据节目信息来确认“在未来的星期六17时在数字广播的ch5预定广播或者在网络5上的内容服务器预定发布”,另外,同时根据接收装置固有的信息而确认“不能向网络5连接”时,将服务器指令“预约收看:星期六17时5ch”发送到接收装置1。在接收装置1侧,语音指令处理部2既可以将接收到的服务器指令作为本地指令而使控制部17执行,也可以与本地语音指令“想看节目A”建立关联地存储于本地语音指令数据库部27。
变形例
在以上示出的实施方式中,示出了接收装置1包含语音指令处理部2的构成的情况。在本变形例中,说明其它的可能的构成。
图17是示出变形例的***的构成例的功能模块图。
图17的(a)是通过包含语音指令处理部2在内的语音指令处理装置2A来使得能够用语音指令来控制接收装置1A的情况的例子。
接收装置1A相当于从接收装置1拆卸了语音指令处理部2后的接收装置,但是,也可以是与接收装置1同样的接收装置。
语音指令处理装置2A包含语音指令处理部2、麦克风的功能,也可以是具备CPU、存储器的计算机。语音指令处理装置2A也可以具备用于处理麦克风输出的声音信号的A/D转换、DSP等数字信号处理机构等。语音指令处理装置2A也可以具备用于与服务器装置3通信的未图示的通信机构(相当于图2的通信部13)。语音指令处理部2的本地指令处理部23输出的本地指令也可以经由网络5而被输入到接收装置1A的控制部17。
在图17的(a)的变形例中,用户向语音指令处理装置2A的未图示的麦克风发出语音指令。麦克风接收到的语音通过A/D转换等转换成语音数据之后,将语音数据输入到 语音指令处理部2。通过在以后的语音指令处理部2中进行与图6所示的流程图同样的处理动作,从而能够进行与上述的实施方式的语音指令处理同样的处理,能够获得同样的作用效果。
根据图17的(a)的变形例,能够从语音指令处理装置2A经由网络5远程操作接收装置1A。另外,通过将语音指令处理部2的服务器指令数据库部25、本地语音指令数据库部27等数据库设置于云服务器,从而不仅某个特定的用户的接收装置1A,其它用户的接收装置1A也能够进行同样的语音指令处理(语音指令处理装置2A的共有化),而且,还实现使得语音指令处理装置2A容易挪动(可搬运化)。
图17的(b)是使得通过包含语音指令处理部2在内的遥控器10A来用语音指令来控制接收装置1A的情况的例子。
遥控器10A是在遥控器10中具备语音指令处理部2的遥控器。遥控器10A包含麦克风的功能,也可以包含具备CPU、存储器的计算机、具备用于处理麦克风输出的语音信号的A/D转换、DSP等数字信号处理机构等。遥控器10A也可以具备用于与服务器装置3通信的未图示的通信机构(相当于图2的通信部13)。另外,在遥控器10A具备能够与接收装置1A通信的蓝牙(BlueTooth)等通信机构的情况下,也可以经由接收装置1A而与网络5连接,与服务器装置3通信。另外,语音指令处理部2的本地指令处理部23输出的本地指令既可以经由蓝牙(BlueTooth)等通信机构被输入到接收装置1A的控制部17,也可以作为使用来自遥控器10A的红外线等的通常的遥控器控制信号而输出到接收装置1A。
在图17的(b)的变形例中,用户向遥控器10A的未图示的麦克风发出语音指令。麦克风接收到的语音通过A/D转换等转换成语音数据之后,语音数据输入到语音指令处理部2。通过在以后的语音指令处理部2中进行与图6所示的流程图同样的处理动作,从而能够进行与上述的实施方式的语音指令处理同样的处理,能够获得同样的作用效果。
根据图17的(b)的变形例,通过向在用户的手里的遥控器10A发出语音指令,从而能够简单地获得上述实施方式的作用效果。也可以将语音指令处理部2的服务器指令数据库部25、本地语音指令数据库部27等、数据库设置于接收装置1A、未图示的云服务器等。根据以上所述的至少1个实施方式,能够提供一种能够增加可在本地处理的语音指令的语音指令处理电路、接收装置、服务器、***、方法及计算机可读的非易失性存储介质。
此外,附图所示的解析画面等中显示的条件参数、对它们的选择项、值、评价指标等的名称、定义、种类等是在本实施方式中作为一个例子而示出的,并不限定于本实施方式所示的例子。
本公开实施方式还提供一种计算机可读的非易失性存储介质,所述存储介质存储有计算机指令,所述计算机指令由处理器执行时实现上述的实施方式的语音数据处理。
说明了本申请的若干个实施方式,但是,这些实施方式是作为例子而示出的,并不意图限定申请的范围。这些新的实施方式能够以其它各种各样的形态来实施,在不脱离申请的主旨的范围内,能够进行各种省略、置换、变更。这些实施方式及其变形包含于申请的范围、主旨中,并且也包含于权利要求书中记载的技术方案及其等同的范围中。进一步,另外,在技术方案的各构成要素中,将构成要素分割地表现的情况、或者一并表现多个的情况、或者将它们组合而表现的情况,均是本申请的范畴。另外,也可以组合多个实施方式,通过该组合而构成的实施例也是申请的范畴。
另外,为了使得说明更清楚,附图存在与实际的形态相比,对于各部的宽度、厚度、形状等进行示意性地表示的情况。在框图中,对于未连线的模块间、或者即使连线也未示出箭头的方向,有的情况下也进行数据、信号的交互。流程图所示的处理也可以通过在包含IC芯片、数字信号处理器(Digital Signal Processor或DSP)等硬件或者微机的计算机等中动作的软件(程序等)或硬件与软件的组合来实现。另外,在将实施方式表现为控制逻辑的情况下,表现为包含使计算机执行的指令在内的程序的情况下,以及表现为记载有上述指令的计算机可读取的非易失性存储介质的情况下,也能够应用本申请的装置。另外,对于使用的名称、用语也没有限定,即使是其它的表现,只要实质上是相同内容、相同主旨,就包含于本申请。

Claims (15)

  1. 一种语音指令处理电路,其中,具备:
    语音数据接收机构,其获取语音数据;
    语音识别机构,其对所述语音数据进行语音识别并且输出识别结果;
    判定机构,其判定在数据库中是否存在相当于所述识别结果的语音指令,其中,在所述数据库中对用于控制装置的所述语音指令的信息、和所述语音指令执行的所述装置内部的控制指令即本地指令的信息建立了关联;以及
    服务器数据接收机构,其基于所述判定机构的判定结果,从服务器获取所述数据库的信息。
  2. 根据权利要求1所述的语音指令处理电路,其中,
    在所述判定机构判定为在所述数据库中不存在相当于所述识别结果的语音指令的情况下,
    所述服务器数据接收机构将用于使服务器识别所述语音数据的语音识别请求连同所述语音数据一起输出到所述服务器,并且接收包含服务器识别结果和与所述服务器识别结果建立了关联的本地指令在内的服务器指令信息,其中,所述服务器识别结果是由所述服务器进行的所述语音数据的语音识别的结果。
  3. 根据权利要求2所述的语音指令处理电路,其中,
    所述语音指令处理电路具备本地指令处理机构,该本地指令处理机构基于所述判定机构的判定结果,输出所述本地指令的信息。
  4. 根据权利要求3所述的语音指令处理电路,其中,
    所述语音指令处理电路具备数据库操作机构,该数据库操作机构将所述本地指令的信息和所述服务器识别结果存储在所述数据库,或者从所述数据库取出数据。
  5. 根据权利要求4所述的语音指令处理电路,其中,
    所述语音指令处理电路具备数据服务器信息操作机构,该数据服务器信息操作机构将所述服务器指令信息存储在服务器信息数据库,或者从所述服务器信息数据库取出数据。
  6. 根据权利要求5所述的语音指令处理电路,其中,
    所述语音指令处理电路具备提取机构,在所述服务器信息数据库中对1个本地指令关联有多个服务器识别结果的情况下,所述提取机构基于预先赋予的提取条件,从多个所述服务器识别结果中选择至少1个服务器识别结果,
    所述数据库操作机构将由所述提取机构选择的至少1个服务器识别结果与所述本地指令建立关联地存储在所述数据库。
  7. 根据权利要求6所述的语音指令处理电路,其中,
    所述语音指令处理电路具备语音指令接收计数机构,所述语音指令接收计数机构对所述服务器信息数据库中存储的相当于服务器识别结果的语音指令的接收次数进行计数,
    所述提取条件是基于所述语音指令的接收次数来确定的。
  8. 根据权利要求3所述的语音指令处理电路,其中,
    在所述判定机构判定为所述数据库中存在相当于所述识别结果的语音指令的情况下,
    所述本地指令处理机构输出与所述数据库中存在的所述语音指令相关联的本地指令的信息。
  9. 一种接收装置,其中,具备:
    接收机构,其从数字广播信号、网络接收数字内容;
    示出机构,其向用户示出所述数字内容;
    语音集音机构,其接收用户发声的语音并且输出语音数据;
    权利要求3或权利要求8所述的语音指令处理电路;以及
    控制机构,其基于所述语音指令处理电路输出的本地指令的信息,使控制对象进行动作。
  10. 根据权利要求9所述的接收装置,其中,所述接收装置具备:
    固有信息存储机构,其存储所述接收装置自身的固有信息;以及
    通信机构,其与服务器进行数据通信,
    所述通信机构向所述服务器输出所述固有信息。
  11. 一种服务器,其中,具备:
    通信机构,其接收语音数据和对所述语音数据进行语音识别的请求;
    接收装置数据存储机构,其存储有接收装置内部的控制指令即本地指令的信息;
    语音识别处理机构,其按照所述语音识别的请求,对所述语音数据进行语音识别并且输出识别结果;以及
    本地指令确定机构,其通过自然语言处理根据所述接收装置数据存储机构来确定相当于所述识别结果的本地指令,
    所述通信机构输出包含确定的所述本地指令、和所述识别结果在内的服务器数据信息。
  12. 根据权利要求11所述的服务器,其中,
    所述通信机构从具有固有信息的接收装置接收所述固有信息,
    所述语音指令确定机构基于所述固有信息而确定相当于所述识别结果的本地指令。
  13. 一种语音指令的累积***,其中,包含:
    权利要求9所述的接收装置;以及
    权利要求11所述的服务器。
  14. 一种语音指令的累积方法,其中,包括:
    对语音数据进行语音识别并且输出识别结果;
    判定数据库中是否存在相当于所述识别结果的语音指令,其中,在所述数据库中对用于控制装置的语音指令的信息、和所述语音指令执行的所述装置内部的控制指令即本地指令的信息建立了关联;以及
    基于所述判定的判定结果,从服务器获取所述数据库的信息。
  15. 一种计算机可读的非易失性存储介质,所述存储介质存储有程序或计算机指令,该程序或计算机指令使计算机将语音指令累积到数据库中,其中,所述程序或计算机指令使所述计算机执行:
    对语音数据进行语音识别并且输出识别结果;
    判定数据库中是否存在相当于所述识别结果的语音指令,其中,在所述数据库中对用于控制装置的语音指令的信息、和所述语音指令执行的所述装置内部的控制指令即本地指令的信息建立了关联;以及
    基于所述判定的判定结果,从服务器获取所述数据库的信息。
PCT/CN2021/118683 2021-01-21 2021-09-16 语音指令处理电路、接收装置、服务器、语音指令的累积***和累积方法 WO2022156246A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202180006240.0A CN114667566A (zh) 2021-01-21 2021-09-16 语音指令处理电路、接收装置、服务器、语音指令的累积***和累积方法
US18/356,485 US20240021199A1 (en) 2021-01-21 2023-07-21 Receiving device and method for voice command processing

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2021-008062 2021-01-21
JP2021008062A JP2022112292A (ja) 2021-01-21 2021-01-21 音声コマンド処理回路、受信装置、サーバ、システム、方法およびプログラム

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/356,485 Continuation US20240021199A1 (en) 2021-01-21 2023-07-21 Receiving device and method for voice command processing

Publications (1)

Publication Number Publication Date
WO2022156246A1 true WO2022156246A1 (zh) 2022-07-28

Family

ID=82548497

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/118683 WO2022156246A1 (zh) 2021-01-21 2021-09-16 语音指令处理电路、接收装置、服务器、语音指令的累积***和累积方法

Country Status (2)

Country Link
JP (1) JP2022112292A (zh)
WO (1) WO2022156246A1 (zh)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7831431B2 (en) * 2006-10-31 2010-11-09 Honda Motor Co., Ltd. Voice recognition updates via remote broadcast signal
CN103956168A (zh) * 2014-03-29 2014-07-30 深圳创维数字技术股份有限公司 一种语音识别方法、装置及终端
CN104538034A (zh) * 2014-12-31 2015-04-22 深圳雷柏科技股份有限公司 一种语音识别方法及***
CN104575494A (zh) * 2013-10-16 2015-04-29 中兴通讯股份有限公司 一种语音处理的方法和终端
CN107993654A (zh) * 2017-11-24 2018-05-04 珠海格力电器股份有限公司 一种语音指令识别方法及***
CN108183844A (zh) * 2018-02-06 2018-06-19 四川虹美智能科技有限公司 一种智能家电语音控制方法、装置及***
CN108447478A (zh) * 2018-01-31 2018-08-24 捷开通讯(深圳)有限公司 一种终端设备的语音控制方法、终端设备及装置
CN108873713A (zh) * 2018-06-25 2018-11-23 广州市锐尚展柜制作有限公司 一种应用于智能家居中的人机交互方法及***
CN109102807A (zh) * 2018-10-18 2018-12-28 珠海格力电器股份有限公司 个性化语音数据库创建***、语音识别控制***与终端
CN109922371A (zh) * 2019-03-11 2019-06-21 青岛海信电器股份有限公司 自然语言处理方法、设备及存储介质
CN111105798A (zh) * 2018-10-29 2020-05-05 宁波方太厨具有限公司 基于语音识别的设备控制方法

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7831431B2 (en) * 2006-10-31 2010-11-09 Honda Motor Co., Ltd. Voice recognition updates via remote broadcast signal
CN104575494A (zh) * 2013-10-16 2015-04-29 中兴通讯股份有限公司 一种语音处理的方法和终端
CN103956168A (zh) * 2014-03-29 2014-07-30 深圳创维数字技术股份有限公司 一种语音识别方法、装置及终端
CN104538034A (zh) * 2014-12-31 2015-04-22 深圳雷柏科技股份有限公司 一种语音识别方法及***
CN107993654A (zh) * 2017-11-24 2018-05-04 珠海格力电器股份有限公司 一种语音指令识别方法及***
CN108447478A (zh) * 2018-01-31 2018-08-24 捷开通讯(深圳)有限公司 一种终端设备的语音控制方法、终端设备及装置
CN108183844A (zh) * 2018-02-06 2018-06-19 四川虹美智能科技有限公司 一种智能家电语音控制方法、装置及***
CN108873713A (zh) * 2018-06-25 2018-11-23 广州市锐尚展柜制作有限公司 一种应用于智能家居中的人机交互方法及***
CN109102807A (zh) * 2018-10-18 2018-12-28 珠海格力电器股份有限公司 个性化语音数据库创建***、语音识别控制***与终端
CN111105798A (zh) * 2018-10-29 2020-05-05 宁波方太厨具有限公司 基于语音识别的设备控制方法
CN109922371A (zh) * 2019-03-11 2019-06-21 青岛海信电器股份有限公司 自然语言处理方法、设备及存储介质

Also Published As

Publication number Publication date
JP2022112292A (ja) 2022-08-02

Similar Documents

Publication Publication Date Title
USRE49493E1 (en) Display apparatus, electronic device, interactive system, and controlling methods thereof
US20190333515A1 (en) Display apparatus, method for controlling the display apparatus, server and method for controlling the server
US9520133B2 (en) Display apparatus and method for controlling the display apparatus
US8175885B2 (en) Controlling a set-top box via remote speech recognition
KR101309794B1 (ko) 디스플레이 장치, 디스플레이 장치의 제어 방법 및 대화형 시스템
KR102304052B1 (ko) 디스플레이 장치 및 그의 동작 방법
US20140195230A1 (en) Display apparatus and method for controlling the same
US9230559B2 (en) Server and method of controlling the same
JP6244560B2 (ja) 音声認識処理装置、音声認識処理方法、および表示装置
US8600732B2 (en) Translating programming content to match received voice command language
CN103546763A (zh) 用于提供内容信息的方法和广播接收设备
WO2022156246A1 (zh) 语音指令处理电路、接收装置、服务器、语音指令的累积***和累积方法
US20240021199A1 (en) Receiving device and method for voice command processing
CN113228166B (zh) 指令控制装置、控制方法及非易失性存储介质
JP2021015180A (ja) 電子機器、プログラムおよび音声認識方法
KR20200069936A (ko) 미디어에 포함된 정보를 제공하는 장치 및 그 방법

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21920624

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21920624

Country of ref document: EP

Kind code of ref document: A1