WO2017140153A1 - 语音控制方法及装置 - Google Patents

语音控制方法及装置 Download PDF

Info

Publication number
WO2017140153A1
WO2017140153A1 PCT/CN2016/107321 CN2016107321W WO2017140153A1 WO 2017140153 A1 WO2017140153 A1 WO 2017140153A1 CN 2016107321 W CN2016107321 W CN 2016107321W WO 2017140153 A1 WO2017140153 A1 WO 2017140153A1
Authority
WO
WIPO (PCT)
Prior art keywords
control signal
voice
voice control
preset
voice command
Prior art date
Application number
PCT/CN2016/107321
Other languages
English (en)
French (fr)
Inventor
袁文华
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2017140153A1 publication Critical patent/WO2017140153A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Definitions

  • the present invention relates to the field of intelligent control technologies, and in particular, to a voice control method and apparatus.
  • the transmitter device smart phone, tablet, notebook
  • the screen content of the computer, desktop, etc. is shared with other receiving devices (television, projector, etc.).
  • the user input Back channel (UIBC) function is implemented by the user of the Miracast protocol, and the receiving device can implement the content shared by the transmitting device.
  • UIBC user input Back channel
  • UIBC defines how to send back the control signal of the receiving device to the transmitting device
  • the control signal of the receiving device signal defined by UIBC has two kinds, one is a general mouse and a keyboard signal, and the other is a peripheral device.
  • Signals, peripheral signals are transmitted by Universal Serial Bus (USB), Wireless Fidelity (WIFI), Bluetooth, etc.
  • USB Universal Serial Bus
  • WIFI Wireless Fidelity
  • Bluetooth etc.
  • the main object of the present invention is to provide a voice control method and apparatus, which aims to solve the problem that when the receiving end device or peripheral device is inconveniently operated, the control of the transmitting end device cannot be realized.
  • the sound control method includes the following steps:
  • the receiving end device acquires a voice control signal for controlling the transmitting end device
  • the receiving end device matches the obtained voice control signal with a preset voice command
  • the receiving device If the matching is successful, the receiving device generates a UIBC message according to the preset voice command that is successfully matched according to the UIBC protocol, and transmits the UIBC message to the transmitting device, so that the transmitting device can receive the UIBC messages control their behavior.
  • the storing manner of the preset voice command is a voice waveform
  • the step of the receiving end device matching the acquired voice control signal with the preset voice command includes:
  • the receiving end device performs corresponding transformation on the obtained voice control signal and the preset voice command to obtain a sound spectrum map or a feature vector of the voice control signal and the preset voice command;
  • the receiving end device matches the sound spectrum map or feature vector of the voice control signal with the sound spectrum map or feature vector of the preset voice command.
  • the storing manner of the preset voice instruction is a sound spectrum map or a feature vector
  • the step of the receiving end device matching the acquired voice control signal with the preset voice command includes:
  • the receiving end device performs corresponding transformation on the obtained voice control signal to obtain a sound spectrum map or a feature vector of the voice control signal;
  • the receiving end device matches the sound spectrum map or feature vector of the voice control signal with the preset voice command.
  • the step of the receiving end device generating the UIBC packet according to the preset voice command that is successfully matched according to the UIBC protocol includes:
  • the receiving end device determines, according to the mapping relationship, a control signal segment corresponding to the preset voice command that is successfully matched;
  • the receiving end device generates a UIBC message by using a control signal segment corresponding to the preset voice command according to the UIBC protocol.
  • the voice control method further includes:
  • the receiving device prompts the user to re-issue the voice control signal.
  • an embodiment of the present invention further provides a voice control apparatus, where the voice control apparatus includes:
  • a matching module configured to match the obtained voice control signal with a preset voice command
  • Generating a module if the matching is successful, generating a UIBC message according to the preset voice command that is successfully matched according to the UIBC protocol, and transmitting the UIBC message to the transmitting device for the transmitting device to receive according to the received
  • the UIBC message controls its own behavior.
  • the storage form of the preset voice instruction is a voice waveform
  • the matching module includes:
  • the processing unit is configured to perform corresponding transformation on the acquired voice control signal and the preset voice command to obtain a sound spectrum map or a feature vector of the voice control signal and the preset voice command;
  • a matching unit configured to match a spectrogram or a feature vector of the voice control signal with a spectrogram or a feature vector of the preset voice command.
  • the storage mode of the preset voice instruction is a sound spectrum map or a feature vector
  • the processing unit is further configured to perform corresponding transformation on the acquired voice control signal to obtain the sound of the voice control signal.
  • the matching unit is further configured to match a spectrogram or a feature vector of the voice control signal with the preset voice command.
  • the generating module includes:
  • An obtaining unit configured to acquire a mapping relationship between the preset voice command and a control signal segment in the UIBC message
  • a determining unit configured to determine, according to the mapping relationship, a control signal segment corresponding to the preset voice command that is successfully matched
  • a generating unit configured to generate a UIBC message by using a control signal segment corresponding to the preset voice instruction according to the UIBC protocol.
  • the voice control device further includes:
  • the prompt module is set to prompt to re-issue the voice control signal if the match fails.
  • Another embodiment of the present invention provides a computer storage medium, where the computer storage medium stores execution instructions for performing one or a combination of the steps in the foregoing method embodiments.
  • the voice control signal is sent, and the receiving device receives the voice control signal sent by the user, and then the voice is received.
  • the control signal is matched with the preset voice command to determine a preset voice command corresponding to the voice control signal, and the receiving device generates the UIBC message by using the preset voice command after determining the preset voice command corresponding to the voice control signal. And transmitting the UIBC message to the transmitting end device, thereby realizing control of the transmitting end device, and solving the problem that when the receiving end device or the peripheral device is inconveniently operated, the control of the transmitting end device cannot be realized.
  • the control of the transmitting end device by the receiving end device is more convenient.
  • FIG. 1 is a schematic flow chart of a first embodiment of a voice control method according to the present invention
  • FIG. 2 is a schematic flow chart showing the steps of matching the voice control signal obtained in FIG. 1 with a preset voice command;
  • FIG. 3 is a flow chart showing the steps of another embodiment of matching the voice control signal obtained in FIG. 1 with a preset voice command;
  • FIG. 4 is a schematic flowchart of a step of generating a UIBC message by using a preset voice command that is successfully matched according to the UIBC protocol in FIG. 1;
  • FIG. 5 is a schematic diagram of functional modules of a first embodiment of a voice control apparatus according to the present invention.
  • FIG. 6 is a schematic diagram of a refinement function module of the matching module in FIG. 5;
  • FIG. 7 is a schematic diagram of a refinement function module of the generation module in FIG. 5.
  • the present invention provides a voice control method.
  • the application scenario of the present invention is that the user can wirelessly project the content of the display screen of the receiving device such as a mobile phone, a tablet computer or a notebook that is authenticated by Miracast to the receiving device supporting Miracast technology, and the user sees on the receiving device.
  • the content will be exactly the same as the content on the transfer device.
  • both the receiving end device and the transmitting end device have the UIBC function.
  • the UIBC function refers to that the user can control the transmitting device through the receiving device.
  • the function includes two types, one is hardware-independent, such as mouse click, button click, touch click, zoom in and out, and the other is HIDC human interface device control: including infrared, USB, Bluetooth, WIFI, joystick , remote control, etc.
  • FIG. 1 is a schematic flowchart diagram of a first embodiment of a voice control method according to the present invention.
  • the voice control method includes:
  • Step S10 The receiving end device acquires a voice control signal for controlling the transmitting end device.
  • the receiving end device in this embodiment is described by taking a wireless projection system supporting Miracast technology and having a UIBC function as an example.
  • the wireless projection system adds a voice input module compared with the existing wireless projection system.
  • the present invention can also be applied to other receiving end devices supporting the Miracast technology and having the UIBC function according to the core idea of the present invention.
  • the receiving device takes a notebook with Miracast authentication and UIBC function as an example.
  • the content displayed on the current screen of the notebook is a PPT document
  • the wireless projection system is used.
  • the content displayed on the screen is also the PPT document.
  • the user needs to view the next page PPT document and needs to perform a page turning operation, the user can issue a page turning voice control signal against the wireless projection system.
  • the content displayed on the current screen of the notebook is a movie
  • the content displayed on the screen of the wireless projection system is also the movie.
  • the wireless projection system may optionally receive a voice control signal sent by the user or other voice playing device through a microphone or other voice receiving device, and then receive the received voice control signal.
  • a voice input module of the wireless projection system or directly receiving a voice control signal sent by a user or other voice playback device through the voice input module, for the voice input module to perform corresponding processing on the voice control signal For example, filtering processing, matching processing, and the like.
  • Step S20 The receiving end device matches the obtained voice control signal with a preset voice command.
  • the wireless projection system pre-stores the preset voice command, and the preset voice command includes voice commands such as “slide,” “page flip,” “pause,” and the like, and the user is acquired.
  • the obtained voice control signal is matched with the preset voice command one by one until the voice control signal matches one of the preset voice commands, or Until the matching of the voice control signal with all the instructions in the preset voice command fails.
  • the voice control signal is sequentially matched with the voice control signal according to a storage order of each preset voice command.
  • the voice control signal sent by the user is a "page turning” signal, wireless
  • the preset voice commands stored in the projection system have “upward”, “page turning”, and “pause” voice commands, and the storage order of each of the preset voice commands is “upward”, “page turning”, “ Suspending the "voice command”, when the "page turning” signal is matched with the preset voice command, first matching the "page turning” signal with the “slide” voice command, and if the matching fails, And continuing to match the “page turning” signal with the “page turning” voice command. If the matching is successful, determining that the voice control signal is the “page turning” voice command, and if the matching fails, continuing with the The other voice commands of the voice command are matched until the match is successful or until the preset voice commands fail to match.
  • step S30 if the matching is successful, the receiving end device generates a UIBC message according to the preset voice command that is successfully matched according to the UIBC protocol, and transmits the UIBC message to the transmitting device, so that the transmitting device can receive the message according to the receiving device.
  • the UIBC message controls its own behavior.
  • the preset voice command that is successfully matched is generated according to the UIBC protocol to generate a UIBC message.
  • the UIBC protocol also known as the Wifi-display protocol, defines the format of the UIBC message corresponding to various control commands.
  • the UIBC protocol only defines the information of the universal input and the format of the UIBC message corresponding to the information of the human interface device class (HIDC)
  • the information of the universal input and the information of the human interface device class (HIDC) are both The information of the voice signal input is not included, so when the preset voice command is generated into the UIBC message, the preset voice command should first be converted into the general input information or the information of the human interface device class, and then according to the After the information received by the universal device or the parameter corresponding to the information of the human interface device (HIDC) is generated, the UIBC message is generated, and the UIBC message is transmitted to the transmitting device. After receiving the UIBC message, the transmitting device receives the UIBC message.
  • Controlling the behavior of the message according to the content in the UIBC message for example, the content in the message is to pause the video currently played by the transmitting device, and the transmitting device receives the message after receiving the message. Pause playback of the currently playing video immediately.
  • the voice control signal fails to match the preset voice command, the voice control signal is re-issued, and after receiving the prompt, the user can know that the voice control signal is sent to control the behavior of the sender, and then can be re-issued.
  • the voice control signal is used, or the voice playback device is re-used to issue a voice control signal, and the voice control signal is pre-recorded in the voice playback device.
  • the voice control signal is sent, and the receiving device receives the voice control signal sent by the user, and then the voice control signal is pre-
  • the voice command is matched to determine a preset voice command corresponding to the voice control signal, and after determining the preset voice command corresponding to the voice control signal, the receiving device generates the UIBC message by using the preset voice command, and The UIBC message is transmitted to the transmitting end device, thereby realizing the control of the transmitting end device, and solving the problem that when the receiving end device or the peripheral device is inconvenient to directly operate, the control of the transmitting end device cannot be realized, so that the receiving end device Control of the transmitting device is more convenient.
  • the step S20 includes:
  • Step S21 The receiving end device performs corresponding transformation on the acquired voice control signal and the preset voice command to obtain a sound spectrum map or a feature vector of the voice control signal and the preset voice command.
  • Step S22 the receiving end device matches the sound spectrum map or the feature vector of the voice control signal with the sound spectrum map or the feature vector of the preset voice command.
  • the voice commands corresponding to different voice waveforms may be the same, and the difference between the voice waveforms corresponding to the same voice command sent by different users may be large, so directly
  • the voice waveform corresponding to the voice control signal sent by the user is matched with the voice waveform corresponding to the preset voice command, it is difficult to match successfully, and the preset voice command and the voice control signal need to be processed correspondingly.
  • the obtained voice control signal and the preset voice command may be correspondingly transformed to obtain a sound spectrum map or a feature vector of the voice control signal and the preset voice command.
  • the corresponding transform mainly includes pre-emphasis processing, frame processing, windowing processing, fast Fourier transform processing, and gray level mapping. Processing, after the foregoing processing, obtaining the voice control signal and the preset voice finger The spectrum of the order.
  • the feature vector uses a Mel Frequency Cepstrum Coefficient of the voice control signal and the preset voice command.
  • MFCC is taken as an example, and the corresponding transformation mainly includes pre-emphasis processing, framing processing, windowing processing, fast Fourier transform processing, triangular band pass filter for filtering processing, and calculation of each filter group output pair.
  • the number of energy, discrete cosine transform (DCT) is used to obtain MFCC coefficients, spectral weighting processing, Cepstrum Mean Subtraction (CMS) processing and dynamic differential parameter extraction (including first-order difference and second-order difference).
  • DCT discrete cosine transform
  • CMS Cepstrum Mean Subtraction
  • the voice control signal and the preset voice command are transformed into corresponding sound spectrum maps or feature vectors, and then the obtained sound spectrum map or feature vector is matched, thereby improving the accuracy of the voice recognition.
  • a third embodiment of the voice control method of the present invention is proposed based on the first embodiment.
  • the step S20 includes:
  • Step S23 the receiving end device performs corresponding transformation on the acquired voice control signal to obtain a sound spectrum map or a feature vector of the voice control signal;
  • Step S24 The receiving end device matches the sound spectrum map or the feature vector of the voice control signal with the preset voice command.
  • the stored form of the preset voice command is a sound spectrum map or a feature vector
  • the sound spectrum map or the feature vector of the voice signal can directly represent the characteristics of the voice command
  • the acquired voice control signal and the preset When the voice command is matched, the obtained voice control signal needs to be correspondingly transformed to obtain a sound spectrum map or a feature vector of the voice control signal.
  • the specific conversion process is described in the foregoing embodiment. No longer.
  • the sound spectrum map or the feature vector of the voice control signal is described, the sound spectrum map or the feature vector is sequentially matched with each of the voice commands until the matching is successful or the matching with all the preset voice commands fails.
  • the voice control signal and the preset voice command are transformed into corresponding sound spectrum maps or feature vectors, and then the obtained sound spectrum map or feature vector is matched, thereby improving the accuracy of the voice recognition.
  • the fourth embodiment of the voice control method of the present invention is provided based on any of the foregoing embodiments.
  • the step of generating a UIBC message by using the preset voice command that is successfully matched according to the UIBC protocol includes:
  • Step S31 The receiving end device acquires a mapping relationship between the preset voice command and a control signal segment in the UIBC packet.
  • Step S32 The receiving end device determines, according to the mapping relationship, a control signal segment corresponding to the preset voice command that is successfully matched.
  • Step S33 The receiving end device generates a UIBC message by using a control signal segment corresponding to the preset voice command according to the UIBC protocol.
  • the preset voice command and the control signal segment in the UIBC message have a mapping table, that is, different preset voice commands correspond to different control signal segments.
  • the control signal segment is a data segment corresponding to controlling the behavior of the transmitting device.
  • the mapping relationship may be a mapping relationship between the preset voice command and the universal input information, or a correspondence between the preset voice command and the information of a human interface device class (HIDC), that is, converting the voice control command And correspondingly inputting a control signal or a human interface device type (HIDC) control signal, and then generating the UIBC message by using the information input by the universal input or the information of a human interface device class (HIDC) according to the UIBC protocol, optionally,
  • the content of the control signal segment of the preset control instruction may be defined in the UIBC protocol.
  • the preset voice command is converted into an instruction defined in the UIBC protocol, so that the preset voice command can control the behavior of the transmitting device, and the feedback form of the UIBC function is added.
  • the invention further provides a device voice control device.
  • FIG. 5 is a schematic diagram of functional modules of a first embodiment of a voice control apparatus according to the present invention.
  • the voice control device includes: an obtaining module 10, a matching module 20, a generating module 30, and a prompting module 40.
  • the obtaining module 10 is configured to acquire a voice control signal for controlling the device at the transmitting end;
  • the receiving end device in this embodiment is described by taking a wireless projection system supporting Miracast technology and having a UIBC function as an example.
  • the wireless projection system adds a voice input module compared with the existing wireless projection system.
  • the present invention can also be applied to other receiving end devices supporting the Miracast technology and having the UIBC function according to the core idea of the present invention.
  • the receiving device takes a notebook with Miracast authentication and UIBC function as an example.
  • the content displayed on the current screen of the notebook is a PPT document
  • the wireless projection system is used.
  • the content displayed on the screen is also the PPT document.
  • the user needs to view the next page PPT document, and needs to perform a page turning operation, and the user can issue a page turning voice control signal to the wireless projection system.
  • the content displayed on the current screen of the notebook is a movie
  • the content displayed on the screen of the wireless projection system is also the movie.
  • the wireless projection system is configured to receive a voice control signal sent by a user or other voice playback device through a microphone or other voice receiving device, and then input the received voice control signal. Receiving a voice control signal sent by a user or other voice playing device directly to the voice input module of the wireless projection system, or directly receiving, by the voice input module, the voice control signal to the voice input module
  • the number is processed accordingly, for example, filtering processing, matching processing, and the like.
  • the matching module 20 is configured to match the acquired voice control signal with a preset voice command.
  • the wireless projection system pre-stores the preset voice command, and the preset voice command includes voice commands such as “slide,” “page flip,” “pause,” and the like, and the user is acquired.
  • the obtained voice control signal is matched with the preset voice command one by one until the voice control signal matches one of the preset voice commands, or Until the matching of the voice control signal with all the instructions in the preset voice command fails.
  • the voice control signal is sequentially matched with the voice control signal according to a storage order of each preset voice command.
  • the voice control signal sent by the user is a “page turning” signal
  • the preset voice commands stored in the wireless projection system have “slide up”, “page turning”, “pause” voice commands, and each of the preset voice commands
  • the storage order is sequentially “slide”, “page”, “pause” voice commands, and when the "page turning” signal is matched with the preset voice command, the "page turning” is first performed.
  • the signal is matched with the "slide” voice command. If the match fails, the "page turning” signal is continuously matched with the "page turning” voice command. If the match is successful, the voice control signal is determined to be the " The page turning "voice command”, if the match fails, continues to match the other voice commands of the preset voice command until the match is successful or until the preset voice command fails to match.
  • the generating module 30 is configured to: if the matching is successful, generate a UIBC message according to the preset voice command that is successfully matched according to the UIBC protocol, and transmit the UIBC message to the transmitting device for the transmitting device to The received UIBC message controls its own behavior.
  • the preset voice command that is successfully matched is generated according to the UIBC protocol to generate a UIBC message.
  • the UIBC protocol which is also called the Wifi-display protocol, defines the format of the UIBC message corresponding to each control instruction. Since the UIBC protocol only defines the information of the universal input and the format of the UIBC message corresponding to the information of the human interface device class (HIDC), the universal input information and The information of the human interface device class (HIDC) does not include the information input by the voice signal.
  • the preset voice command when the preset voice command generates the UIBC message, the preset voice command should first be converted into the universal input information or The information of the human interface device class is then generated according to the universal input information or the parameter corresponding to the information of the human interface device class (HIDC), and the UIBC message is transmitted to the transmitting device, and the transmitting end After receiving the UIBC message, the device controls its behavior according to the content in the UIBC message, for example, the content in the message is to pause the video currently played by the transmitting device, and then The transmitting end device pauses the currently playing video immediately after receiving the message.
  • HIDC information of the human interface device class
  • the prompting module is configured to prompt the user to re-issue the voice control signal if the matching fails.
  • the voice control signal fails to match the preset voice command, the user is prompted to re-issue the voice control signal, and after receiving the prompt, the user can know that the voice control signal is sent to control the behavior of the sender, and then The voice control signal is re-issued, or the voice playback device is re-used to issue a voice control signal, and the voice control signal is pre-recorded in the voice playback device.
  • the voice control signal is sent, and the receiving device receives the voice control signal sent by the user, and then the voice control signal is pre-
  • the voice command is matched to determine a preset voice command corresponding to the voice control signal, and after determining the preset voice command corresponding to the voice control signal, the receiving device generates the UIBC message by using the preset voice command, and The UIBC message is transmitted to the transmitting end device, thereby realizing the control of the transmitting end device, and solving the problem that when the receiving end device or the peripheral device is inconvenient to directly operate, the control of the transmitting end device cannot be realized, so that the receiving end device Control of the transmitting device is more convenient.
  • the matching module 20 includes a processing unit 21 and a matching unit 22.
  • the processing unit 21 is configured to set the acquired voice control signal and the preset voice
  • the instruction is correspondingly transformed to obtain a sound spectrum map or a feature vector of the voice control signal and the preset voice command;
  • the matching unit 22 is configured to match a sound spectrum map or a feature vector of the voice control signal with a sound spectrum map or a feature vector of the preset voice command.
  • the voice commands corresponding to different voice waveforms may be the same, and the difference between the voice waveforms corresponding to the same voice command sent by different users may be large, so directly
  • the voice waveform corresponding to the voice control signal sent by the user is matched with the voice waveform corresponding to the preset voice command, it is difficult to match successfully, and the preset voice command and the voice control signal need to be processed correspondingly.
  • the obtained voice control signal and the preset voice command may be correspondingly transformed to obtain a sound spectrum map or a feature vector of the voice control signal and the preset voice command.
  • the corresponding transform mainly includes pre-emphasis processing, frame processing, windowing processing, fast Fourier transform processing, and gray level mapping. Processing, after the above processing, obtaining a sound spectrum of the voice control signal and the preset voice command.
  • the feature vector uses a Mel Frequency Cepstrum Coefficient of the voice control signal and the preset voice command.
  • MFCC is taken as an example, and the corresponding transformation mainly includes pre-emphasis processing, framing processing, windowing processing, fast Fourier transform processing, triangular band pass filter for filtering processing, and calculation of each filter group output pair.
  • the number of energy, discrete cosine transform (DCT) is used to obtain MFCC coefficients, spectral weighting processing, Cepstrum Mean Subtraction (CMS) processing and dynamic differential parameter extraction (including first-order difference and second-order difference).
  • DCT discrete cosine transform
  • CMS Cepstrum Mean Subtraction
  • the processing unit 21 is further configured to perform corresponding transformation on the acquired voice control signal to obtain a sound spectrum map or a feature vector of the voice control signal;
  • the matching unit 22 is further configured to match a spectrogram or a feature vector of the voice control signal with the preset voice command.
  • the stored form of the preset voice command is a sound spectrum map or a feature vector
  • the sound spectrum map or the feature vector of the voice signal can directly represent the characteristics of the voice command
  • the acquired voice control signal and the preset When the voice command is matched, the obtained voice control signal needs to be correspondingly transformed to obtain a sound spectrum map or a feature vector of the voice control signal.
  • the specific conversion process is described in the foregoing embodiment. No longer.
  • the sound spectrum map or the feature vector of the voice control signal is acquired, the sound spectrum map or the feature vector is sequentially matched with each of the voice commands until the matching is successful or matches all the preset voice commands. Until the failure.
  • the voice control signal and the preset voice command are transformed into corresponding sound spectrum maps or feature vectors, and then the obtained sound spectrum map or feature vector is matched, thereby improving the accuracy of the voice recognition.
  • the generating module 30 includes: an obtaining unit 31, a determining unit 32, and a generating unit 33.
  • the acquiring unit 31 is configured to acquire a mapping relationship between the preset voice command and a control signal segment in the UIBC message;
  • the determining unit 32 is configured to determine, according to the mapping relationship, a control signal segment corresponding to the preset voice command that is successfully matched;
  • the generating unit 33 is configured to generate a UIBC message by using a control signal segment corresponding to the preset voice command according to the UIBC protocol.
  • the preset voice command and the control signal segment in the UIBC message have a mapping table, that is, different preset voice commands correspond to different control signal segments.
  • the control signal segment is a data segment corresponding to controlling the behavior of the transmitting device.
  • the mapping relationship may be a mapping relationship between the preset voice command and the universal input information, or the preset voice command and the man machine Correspondence of information of an interface device class (HIDC), that is, converting the voice control command into a corresponding universal input control signal or a human interface device type (HIDC) control signal, and then inputting the universal input information according to the UIBC protocol or The information of the human interface device class (HIDC) generates a UIBC message.
  • HIDC interface device class
  • the content of the control signal segment of the preset control instruction may be defined in the UIBC protocol, and the UIBC report is generated by using the preset voice command.
  • the corresponding UIBC message can be directly generated according to the UIBC protocol when the preset voice command generates the UIBC message.
  • the preset voice command is converted into an instruction defined in the UIBC protocol, so that the preset voice command can control the behavior of the transmitting device, and the feedback form of the UIBC function is added.
  • the technical solution of the present invention which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk,
  • a storage medium such as ROM/RAM, disk
  • the optical disc includes a number of instructions for causing a terminal device (which may be a cell phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the methods described in various embodiments of the present invention.
  • the voice control method and apparatus provided by the embodiments of the present invention have the following beneficial effects: solving the problem that when the receiving end device or the peripheral device is inconveniently operated, the control of the transmitting end device cannot be realized, so that The receiving end device controls the transmitting end device more conveniently.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • User Interface Of Digital Computer (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

一种语音控制方法及装置,该语音控制方法包括以下步骤:接收端设备获取用于控制传送端设备的语音控制信号(S10);接收端设备将获取的语音控制信号与预设语音指令进行匹配(S20);若匹配成功,则接收端设备根据UIBC协议基于匹配成功的预设语音指令生成UIBC报文,并UIBC报文传送至传送端设备,以供传送端设备根据接收到的UIBC报文控制自身的行为(S30)。该方法解决了当不方便直接操作接收端设备或外设时,而无法实现对传送端设备的控制的问题。

Description

语音控制方法及装置 技术领域
本发明涉及智能控制技术领域,尤其涉及一种语音控制方法及装置。
背景技术
随着技术的发展,利用Miracast技术,使用者不再需要寻找各种规格的线材与转换器,亦毋须确认用于连接设备的正确接头,就能将传送端设备(智能手机、平板电脑、笔记本电脑、台式机等)的画面内容分享给其他接收端设备(电视机、投影仪等)。在将传送端设备的画面内容分享给接收端设备时,通过Miracast协议的用户输入反向信道(User Input Back Channel,简称为UIBC)功能,接收端设备可以实现对传送端设备分享的画面内容进行控制,其中,UIBC定义如何将接收端设备的控制信号回送到传送端设备,且UIBC定义的接收端设备信号的控制信号有两种,一种是通用鼠标、键盘信号,另一种是外设信号,外设信号由通用串行总线(Universal Serial Bus,简称为USB)、无线保真(Wireless Fidelity,简称为WIFI)、蓝牙等传入。当接收端设备对传送端设备分享的画面内容进行控制时,接收端设备首先需要生成这些控制信号,然后将它们传送给传送端设备,才能实现对传送端设备的控制。然而上述控制信号都需要直接操作接收端设备或外设来生成,当不方便直接操作接收端设备或外设时,接收端设备就无法生成控制信号,也就无法将控制信号传送到传送端设备并实现对传送端设备的控制。
发明内容
本发明的主要目的在于提供一种语音控制方法及装置,旨在解决当不方便直接操作接收端设备或外设时,而无法实现对传送端设备的控制的问题。
为实现上述目的,本发明实施例中提供了一种语音控制方法,所述语 音控制方法包括以下步骤:
接收端设备获取用于控制传送端设备的语音控制信号;
接收端设备将获取的所述语音控制信号与预设语音指令进行匹配;
若匹配成功,则接收端设备根据UIBC协议基于匹配成功的所述预设语音指令生成UIBC报文,并将所述UIBC报文传送至传送端设备,以供传送端设备根据接收到的所述UIBC报文控制自身的行为。
可选地,所述预设语音指令的存储形式为语音波形,所述接收端设备将获取的所述语音控制信号与预设语音指令进行匹配的步骤包括:
接收端设备将获取的所述语音控制信号及所述预设语音指令进行相应变换,以得到所述语音控制信号及所述预设语音指令的声谱图或特征向量;
接收端设备将所述语音控制信号的声谱图或特征向量与所述预设语音指令的声谱图或特征向量进行匹配。
可选地,所述预设语音指令的存储形式为声谱图或特征向量,所述接收端设备将获取的所述语音控制信号与预设语音指令进行匹配的步骤包括:
接收端设备将获取的所述语音控制信号进行相应变换,以得到所述语音控制信号的声谱图或特征向量;
接收端设备将所述语音控制信号的声谱图或特征向量与所述预设语音指令进行匹配。
可选地,所述接收端设备根据UIBC协议基于匹配成功的所述预设语音指令生成UIBC报文的步骤包括:
接收端设备获取所述预设语音指令与所述UIBC报文中控制信号段的映射关系;
接收端设备根据所述映射关系确定匹配成功的所述预设语音指令对应的控制信号段;
接收端设备根据UIBC协议将所述预设语音指令对应的控制信号段生成UIBC报文。
可选地,所述接收端设备将获取的所述语音控制信号与预设语音指令进行匹配的步骤之后,所述语音控制方法还包括:
若匹配失败,则接收端设备提示用户重新发出语音控制信号。
此外,为实现上述目的,本发明实施例中还提供了一种语音控制装置,所述语音控制装置包括:
获取模块,设置为获取用于控制传送端设备的语音控制信号;
匹配模块,设置为将获取的所述语音控制信号与预设语音指令进行匹配;
生成模块,设置为若匹配成功,则根据UIBC协议基于匹配成功的所述预设语音指令生成UIBC报文,并将所述UIBC报文传送至传送端设备,以供传送端设备根据接收到的所述UIBC报文控制自身的行为。
可选地,所述预设语音指令的存储形式为语音波形,所述匹配模块包括:
处理单元,设置为将获取的所述语音控制信号及所述预设语音指令进行相应变换,以得到所述语音控制信号及所述预设语音指令的声谱图或特征向量;
匹配单元,设置为将所述语音控制信号的声谱图或特征向量与所述预设语音指令的声谱图或特征向量进行匹配。
可选地,所述预设语音指令的存储形式为声谱图或特征向量,所述处理单元,还设置为将获取的所述语音控制信号进行相应变换,以得到所述语音控制信号的声谱图或特征向量;
所述匹配单元,还设置为将所述语音控制信号的声谱图或特征向量与所述预设语音指令进行匹配。
可选地,所述生成模块包括:
获取单元,设置为获取所述预设语音指令与所述UIBC报文中控制信号段的映射关系;
确定单元,设置为根据所述映射关系确定匹配成功的所述预设语音指令对应的控制信号段;
生成单元,设置为根据UIBC协议将所述预设语音指令对应的控制信号段生成UIBC报文。
可选地,所述语音控制装置还包括:
提示模块,设置为若匹配失败,则提示重新发出语音控制信号。
本发明另一实施例提供了一种计算机存储介质,所述计算机存储介质存储有执行指令,所述执行指令用于执行上述方法实施例中的步骤之一或其组合。
在本发明实施例中,通过在用户不方便操作接收端设备或与接收端设备连接的外设设备时,发出语音控制信号,接收端设备在接收到用户发出的语音控制信号后,将该语音控制信号与预设语音指令进行匹配,从而确定该语音控制信号对应的预设语音指令,接收端设备在确定语音控制信号对应的预设语音指令后,将所述预设语音指令生成UIBC报文,并将所述UIBC报文传送至传送端设备,从而实现对传送端设备的控制,解决了当不方便直接操作接收端设备或外设时,而无法实现对传送端设备的控制的问题,使得接收端设备对传送端设备的控制更加方便。
附图说明
图1为本发明语音控制方法的第一实施例的流程示意图;
图2为图1中将获取的语音控制信号与预设语音指令进行匹配的步骤细化流程示意图;
图3为图1中将获取的语音控制信号与预设语音指令进行匹配的另一实施例的步骤流程示意图;
图4为图1中根据UIBC协议将匹配成功的预设语音指令生成UIBC报文的步骤细化流程示意图;
图5为本发明语音控制装置的第一实施例的功能模块示意图;
图6为图5中匹配模块的细化功能模块示意图;
图7为图5中生成模块的细化功能模块示意图。
本发明目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
具体实施方式
应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。
基于上述问题,本发明提供一种语音控制方法。
本发明的应用场景为用户能把通过Miracast认证的手机、平板电脑或笔记本等接收端设备的显示屏的内容以无线方式投射到支持Miracast技术的接收端设备上,用户在接收端设备上看到的内容与传送端设备上的内容将会是一模一样的。本发明中接收端设备及传送端设备都具有UIBC功能。所述UIBC功能指的是用户可以通过接收端设备实现对传送端设备的控制。该功能包括两种类型,一种是硬件无关型,如鼠标点击、按键点击、touch点击、放大缩小等,另一种是HIDC人机接口设备控制:包括红外线、USB、蓝牙、WIFI、游戏杆、遥控器等。
参照图1,图1为本发明语音控制方法的第一实施例的流程示意图。
在本实施例中,所述语音控制方法包括:
步骤S10,接收端设备获取用于控制传送端设备的语音控制信号;
本实施例中的接收端设备以支持Miracast技术且具有UIBC功能的无线投影***为例进行说明,所述无线投影***同现有的无线投影***相比增设了语音输入模块。具体实施中也可以根据本发明核心思想将本发明应用到其他的支持Miracast技术且具有UIBC功能的接收端设备中。
无线投影***在播放接收端设备的屏幕内容时,所述接收端设备以通过Miracast认证且具有UIBC功能的笔记本为例,比如,笔记本当前屏幕显示的内容为一个PPT文档,则所述无线投影***的屏幕显示的内容也为该PPT文档,若用户需要观看下一页PPT文档,需要进行翻页操作,则用户可以对着所述无线投影***发出翻页的语音控制信号。又如,笔记本当前屏幕显示的内容为一个电影,则所述无线投影***的屏幕显示的内容也为该电影,此时,用户需要对当前播放的电影内容进行暂停,需要进行暂停操作,则用户可以对着所述无线投影***发出暂停的语音控制信号。所述无线投影***在获取用于控制笔记本的语音控制信号时,可选的,可以通过麦克风或者其他语音接收装置接收用户或其他语音播放设备发出的语音控制信号,然后将接收到的语音控制信号输入至所述无线投影***的语音输入模块,或者直接通过所述语音输入模块接收用户或其他语音播放设备发出的语音控制信号,以供所述语音输入模块对所述语音控制信号进行相应的处理,例如,滤波处理,匹配处理等。
步骤S20,接收端设备将获取的所述语音控制信号与预设语音指令进行匹配;
在本实施例中,所述无线投影***预先存有所述预设语音指令,所述预设语音指令包括诸如“上滑”、“翻页”、“暂停”等语音指令,在获取到用户发出的语音控制信号后,将获取的所述语音控制信号与所述预设语音指令一一进行匹配,直到所述语音控制信号与所述预设语音指令中的某个指令匹配成功为止,或者直到所述语音控制信号与所述预设语音指令中的所有指令进行匹配失败为止。可选的,所述语音控制信号在与所述预设语音指令进行匹配时,按照各个预设语音指令的存储顺序依次与所述语音控制信号进行匹配。例如,用户发出的语音控制信号为“翻页”信号,无线 投影***中存储的预设语音指令有“上滑”、“翻页”、“暂停”语音指令,且各个所述预设语音指令的存储顺序依次为“上滑”、“翻页”、“暂停”语音指令,则在将所述“翻页”信号与所述预设语音指令进行匹配时,首先将所述“翻页”信号与“上滑”语音指令进行匹配,若匹配失败,则继续将所述“翻页”信号与“翻页”语音指令进行匹配,若匹配成功,则确定所述语音控制信号为所述“翻页”语音指令,若匹配失败,则继续与所述预设语音指令的其他语音指令进行匹配,直到匹配成功为止或者直到所述预设语音指令都匹配失败为止。
步骤S30,若匹配成功,则接收端设备根据UIBC协议基于匹配成功的所述预设语音指令生成UIBC报文,并将所述UIBC报文传送至传送端设备,以供传送端设备根据接收到的所述UIBC报文控制自身的行为。
在将所述语音控制信号与预设的语音指令匹配成功时,根据UIBC协议将匹配成功的所述预设语音指令生成UIBC报文。所述UIBC协议又也称为Wifi-display协议,该协议定义了各种控制指令对应的UIBC报文的格式。由于所述UIBC协议只定义了通用输入的信息和人机接口设备类(HIDC)的信息对应的UIBC报文的格式,而所述通用输入的信息和人机接口设备类(HIDC)的信息都不包括语音信号输入的信息,故在将所述预设语音指令生成UIBC报文时,首先应将所述预设语音指令转换为通用输入的信息或者人机接口设备类的信息,然后根据所述通用输入的信息或人机接口设备类(HIDC)的信息对应的参数生成UIBC报文,并将所述UIBC报文传送至传送端设备,传送端设备在接收到所述UIBC报文后,根据所述UIBC报文中的内容控制自身的行为,例如所述报文中的内容为对所述传送端设备当前播放的视频进行暂停处理,则所述传送端设备在接收到该报文后立即对当前播放的视频进行暂停播放。在将所述语音控制信号与预设的语音指令匹配失败时,则提示重新发出语音控制信号,用户在收到该提示后,可知道发出的语音控制信号控制发送端的行为失败,然后可以重新发出语音控制信号,或重新使用语音播放设备发出语音控制信号,所述语音播放设备中预先录有所述语音控制信号。
本实施例通过在用户不方便操作接收端设备或与接收端设备连接的外设设备时,发出语音控制信号,接收端设备在接收到用户发出的语音控制信号后,将该语音控制信号与预设语音指令进行匹配,从而确定该语音控制信号对应的预设语音指令,接收端设备在确定语音控制信号对应的预设语音指令后,将所述预设语音指令生成UIBC报文,并将所述UIBC报文传送至传送端设备,从而实现对传送端设备的控制,解决了当不方便直接操作接收端设备或外设时,而无法实现对传送端设备的控制的问题,使得接收端设备对传送端设备的控制更加方便。
可选地,基于第一实施例提出本发明语音控制方法的第二实施例,参照图2,在所述预设语音指令的存储形式为语音波形时,所述步骤S20包括:
步骤S21,接收端设备将获取的所述语音控制信号及所述预设语音指令进行相应变换,以得到所述语音控制信号及所述预设语音指令的声谱图或特征向量;
步骤S22,接收端设备将所述语音控制信号的声谱图或特征向量与所述预设语音指令的声谱图或特征向量进行匹配。
当所述预设语音指令的存储形式为语音波形时,由于不同的语音波形对应的语音指令可能相同,而不同的用户发出的相同的语音指令所对应的语音波形的差别可能很大,故直接通过将用户发出的语音控制信号所对应的语音波形与预设语音指令对应的语音波形进行匹配时,很难匹配成功,需要对所述预设语音指令及所述语音控制信号进行相应的处理。可选的,可以对获取的所述语音控制信号及所述预设语音指令进行相应变换,以得到所述语音控制信号及所述预设语音指令的声谱图或特征向量。当需要得到所述语音控制信号及所述预设语音指令的声谱图时,所述相应变换主要包括预加重处理、分帧处理、加窗处理、快速傅里叶变换处理及灰度级映射处理,经过上述处理过程后,得到所述语音控制信号及所述预设语音指 令的声谱图。当需要得到所述语音控制信号及所述预设语音指令的特征向量时,所述特征向量以所述语音控制信号及所述预设语音指令的梅尔频率倒谱系数(Mel Frequency Cepstrum Coefficient,简称为MFCC)为例,所述相应变换主要包括预加重处理、分帧处理、加窗处理、快速傅里叶变换处理、三角带通滤波器进行滤波处理、计算每个滤波器组输出的对数能量、经离散余弦变换(DCT)得到MFCC系数、谱加权处理、倒谱均值减(Cepstrum Mean Subtraction,简称为CMS)处理及动态差分参数的提取(包括一阶差分和二阶差分)。在获得所述语音控制信号及所述预设语音指令的声谱图或者特征向量后,将所述语音控制信号的声谱图或者特征向量依次与所述预设语音指令的声谱图或者特征向量进行匹配,直到匹配成功为止或者与所有的预设语音指令都匹配失败为止。
本实施例通过将语音控制信号及所述预设语音指令变换为相应的声谱图或者特征向量,然后将得到的声谱图或者特征向量进行匹配,从而提高了语音识别的准确性。
可选地,基于第一实施例提出本发明语音控制方法的第三实施例,参照图3,在所述预设语音指令的存储形式为声谱图或特征向量时,所述步骤S20包括:
步骤S23,接收端设备将获取的所述语音控制信号进行相应变换,以得到所述语音控制信号的声谱图或特征向量;
步骤S24,接收端设备将所述语音控制信号的声谱图或特征向量与所述预设语音指令进行匹配。
当所述预设语音指令的存储形式为声谱图或特征向量时,由于语音信号的声谱图或者特征向量能够直接表征语音指令的特性,故在将获取的所述语音控制信号与预设语音指令进行匹配时,只需要将获取的所述语音控制信号进行相应的变换,从而得到所述语音控制信号的声谱图或特征向量,具体的变换过程在上述实施例中已描述,此处不再赘述。当获取到所 述语音控制信号的声谱图或者特征向量时,将所述声谱图或者特征向量与各个所述语音指令依次进行匹配,直到匹配成功为止或者与所有的预设语音指令都匹配失败为止。
本实施例通过将语音控制信号及所述预设语音指令变换为相应的声谱图或者特征向量,然后将得到的声谱图或者特征向量进行匹配,从而提高了语音识别的准确性。
可选地,基于上述任一实施例提出本发明语音控制方法的第四实施例,参照图4,所述根据UIBC协议将匹配成功的所述预设语音指令生成UIBC报文的步骤包括:
步骤S31,接收端设备获取所述预设语音指令与所述UIBC报文中控制信号段的映射关系;
步骤S32,接收端设备根据所述映射关系确定匹配成功的所述预设语音指令对应的控制信号段;
步骤S33,接收端设备根据UIBC协议将所述预设语音指令对应的控制信号段生成UIBC报文。
在本实施例中,所述预设语音指令与所述UIBC报文中控制信号段存在一个映射表,即不同的预设语音指令对应不同的控制信号段。所述控制信号段为控制传送端设备行为对应的数据段。所述映射关系可以为所述预设语音指令与通用输入的信息的映射关系,或者所述预设语音指令与人机接口设备类(HIDC)的信息的对应关系,即将所述语音控制指令转换为相应的通用输入控制信号或者人机接口设备类(HIDC)控制信号,然后根据UIBC协议将所述通用输入的信息或者人机接口设备类(HIDC)的信息生成UIBC报文,可选地,可以在所述UIBC协议中定义所述预设控制指令的控制信号段内容,在将所述预设语音指令生成UIBC报文时,则在将预设的语音指令生成UIBC报文时即可直接根据所述UIBC协议生成对应的UIBC报文。
本实施例通过将所述预设语音指令转换为UIBC协议中定义的指令,从而使得所述预设语音指令能控制所述传送端设备的行为,增加了UIBC功能的反馈形式。
本发明进一步提供一种装置语音控制装置。
参照图5,图5为本发明语音控制装置的第一实施例的功能模块示意图。
在本实施例中,所述语音控制装置包括:获取模块10、匹配模块20、生成模块30及提示模块40。
所述获取模块10,设置为获取用于控制传送端设备的语音控制信号;
本实施例中的接收端设备以支持Miracast技术且具有UIBC功能的无线投影***为例进行说明,所述无线投影***同现有的无线投影***相比增设了语音输入模块。具体实施中也可以根据本发明核心思想将本发明应用到其他的支持Miracast技术且具有UIBC功能的接收端设备中。
无线投影***在播放接收端设备的屏幕内容时,所述接收端设备以通过Miracast认证且具有UIBC功能的笔记本为例,比如,笔记本当前屏幕显示的内容为一个PPT文档,则所述无线投影***的屏幕显示的内容也为该PPT文档,此时,用户需要观看下一页PPT文档,需要进行翻页操作,则用户可以对着所述无线投影***发出翻页的语音控制信号。又如,笔记本当前屏幕显示的内容为一个电影,则所述无线投影***的屏幕显示的内容也为该电影,此时,用户需要对当前播放的电影内容进行暂停,需要进行暂停操作,则用户可以对着所述无线投影***发出暂停的语音控制信号。所述无线投影***在获取用于控制笔记本语音控制信号时,可选的,可以通过麦克风或者其他语音接收装置接收用户或其他语音播放设备发出的语音控制信号,然后将接收到的语音控制信号输入至所述无线投影***的语音输入模块,或者直接通过所述语音输入模块接收用户或其他语音播放设备发出的语音控制信号,以供所述语音输入模块对所述语音控制信 号进行相应的处理,例如,滤波处理,匹配处理等。
所述匹配模块20,设置为将获取的所述语音控制信号与预设语音指令进行匹配;
在本实施例中,所述无线投影***预先存有所述预设语音指令,所述预设语音指令包括诸如“上滑”、“翻页”、“暂停”等语音指令,在获取到用户发出的语音控制信号后,将获取的所述语音控制信号与所述预设语音指令一一进行匹配,直到所述语音控制信号与所述预设语音指令中的某个指令匹配成功为止,或者直到所述语音控制信号与所述预设语音指令中的所有指令进行匹配失败为止。可选的,所述语音控制信号在与所述预设语音指令进行匹配时,按照各个预设语音指令的存储顺序依次与所述语音控制信号进行匹配。例如,用户发出的语音控制信号为“翻页”信号,无线投影***中存储的预设语音指令有“上滑”、“翻页”、“暂停”语音指令,且各个所述预设语音指令的存储顺序依次为“上滑”、“翻页”、“暂停”语音指令,则在将所述“翻页”信号与所述预设语音指令进行匹配时,首先将所述“翻页”信号与“上滑”语音指令进行匹配,若匹配失败,则继续将所述“翻页”信号与“翻页”语音指令进行匹配,若匹配成功,则确定所述语音控制信号为所述“翻页”语音指令,若匹配失败,则继续与所述预设语音指令的其他语音指令进行匹配,直到匹配成功为止或者直到所述预设语音指令都匹配失败为止。
所述生成模块30,设置为若匹配成功,则根据UIBC协议基于匹配成功的所述预设语音指令生成UIBC报文,并将所述UIBC报文传送至传送端设备,以供传送端设备根据接收到的所述UIBC报文控制自身的行为。
在将所述语音控制信号与预设的语音指令匹配成功时,根据UIBC协议将匹配成功的所述预设语音指令生成UIBC报文。所述UIBC协议协议又也称为Wifi-display协议,该协议定义了各个控制指令对应的UIBC报文的格式。由于所述UIBC协议只定义了通用输入的信息和人机接口设备类(HIDC)的信息对应的UIBC报文的格式,而所述通用输入的信息和 人机接口设备类(HIDC)的信息都不包括语音信号输入的信息,故在将所述预设语音指令生成UIBC报文时,首先应将所述预设语音指令转换为通用输入的信息或者人机接口设备类的信息,然后根据所述通用输入的信息或人机接口设备类(HIDC)的信息对应的参数生成UIBC报文,并将所述UIBC报文传送至传送端设备,传送端设备在接收到所述UIBC报文后,根据所述UIBC报文中的内容控制自身的行为,例如所述报文中的内容为对所述传送端设备当前播放的视频进行暂停处理,则所述传送端设备在接收到该报文后立即对当前播放的视频进行暂停播放。
所述提示模块,设置为若匹配失败,则提示用户重新发出语音控制信号。
在将所述语音控制信号与预设的语音指令匹配失败时,则提示用户重新发出语音控制信号,用户在收到该提示后,即可知道发出的语音控制信号控制发送端的行为失败,然后可以重新发出语音控制信号,或重新使用语音播放设备发出语音控制信号,所述语音播放设备中预先录有所述语音控制信号。
本实施例通过在用户不方便操作接收端设备或与接收端设备连接的外设设备时,发出语音控制信号,接收端设备在接收到用户发出的语音控制信号后,将该语音控制信号与预设语音指令进行匹配,从而确定该语音控制信号对应的预设语音指令,接收端设备在确定语音控制信号对应的预设语音指令后,将所述预设语音指令生成UIBC报文,并将所述UIBC报文传送至传送端设备,从而实现对传送端设备的控制,解决了当不方便直接操作接收端设备或外设时,而无法实现对传送端设备的控制的问题,使得接收端设备对传送端设备的控制更加方便。
可选地,基于第一实施例提出本发明语音控制装置的第二实施例,参照图6,所述匹配模块20包括处理单元21及匹配单元22。
所述处理单元21,设置为将获取的所述语音控制信号及所述预设语音 指令进行相应变换,以得到所述语音控制信号及所述预设语音指令的声谱图或特征向量;
所述匹配单元22,设置为将所述语音控制信号的声谱图或特征向量与所述预设语音指令的声谱图或特征向量进行匹配。
当所述预设语音指令的存储形式为语音波形时,由于不同的语音波形对应的语音指令可能相同,而不同的用户发出的相同的语音指令所对应的语音波形的差别可能很大,故直接通过将用户发出的语音控制信号所对应的语音波形与预设语音指令对应的语音波形进行匹配时,很难匹配成功,需要对所述预设语音指令及所述语音控制信号进行相应的处理。可选的,可以对获取的所述语音控制信号及所述预设语音指令进行相应变换,以得到所述语音控制信号及所述预设语音指令的声谱图或特征向量。当需要得到所述语音控制信号及所述预设语音指令的声谱图时,所述相应变换主要包括预加重处理、分帧处理、加窗处理、快速傅里叶变换处理及灰度级映射处理,经过上述处理过程后,得到所述语音控制信号及所述预设语音指令的声谱图。当需要得到所述语音控制信号及所述预设语音指令的特征向量时,所述特征向量以所述语音控制信号及所述预设语音指令的梅尔频率倒谱系数(Mel Frequency Cepstrum Coefficient,简称为MFCC)为例,所述相应变换主要包括预加重处理、分帧处理、加窗处理、快速傅里叶变换处理、三角带通滤波器进行滤波处理、计算每个滤波器组输出的对数能量、经离散余弦变换(DCT)得到MFCC系数、谱加权处理、倒谱均值减(Cepstrum Mean Subtraction,简称为CMS)处理及动态差分参数的提取(包括一阶差分和二阶差分)。在获得所述语音控制信号及所述预设语音指令的声谱图或者特征向量后,将所述语音控制信号的声谱图或者特征向量依次与所述预设语音指令的声谱图或者特征向量进行匹配,直到匹配成功为止或者与所有的预设语音指令都匹配失败为止。
可选地,所述处理单元21,还设置为将获取的所述语音控制信号进行相应变换,以得到所述语音控制信号的声谱图或特征向量;
所述匹配单元22,还设置为将所述语音控制信号的声谱图或特征向量与所述预设语音指令进行匹配。
当所述预设语音指令的存储形式为声谱图或特征向量时,由于语音信号的声谱图或者特征向量能够直接表征语音指令的特性,故在将获取的所述语音控制信号与预设语音指令进行匹配时,只需要将获取的所述语音控制信号进行相应的变换,从而得到所述语音控制信号的声谱图或特征向量,具体的变换过程在上述实施例中已描述,此处不再赘述。当获取到所述语音控制信号的声谱图或者特征向量时,将所述声谱图或者特征向量与各个所述语音指令依次进行匹配,直到匹配成功为止或者与所有的预设语音指令都匹配失败为止。
本实施例通过将语音控制信号及所述预设语音指令变换为相应的声谱图或者特征向量,然后将得到的声谱图或者特征向量进行匹配,从而提高了语音识别的准确性。
可选地,基于上述任一实施例提出本发明语音控制装置的第三实施例,参照图7,所述生成模块30包括:获取单元31、确定单元32及生成单元33。
所述获取单元31,设置为获取所述预设语音指令与所述UIBC报文中控制信号段的映射关系;
所述确定单元32,设置为根据所述映射关系确定匹配成功的所述预设语音指令对应的控制信号段;
所述生成单元33,设置为根据UIBC协议将所述预设语音指令对应的控制信号段生成UIBC报文。
在本实施例中,所述预设语音指令与所述UIBC报文中控制信号段存在一个映射表,即不同的预设语音指令对应不同的控制信号段。所述控制信号段为控制传送端设备行为对应的数据段。所述映射关系可以为所述预设语音指令与通用输入的信息的映射关系,或者所述预设语音指令与人机 接口设备类(HIDC)的信息的对应关系,即将所述语音控制指令转换为相应的通用输入控制信号或者人机接口设备类(HIDC)控制信号,然后根据UIBC协议将所述通用输入的信息或者人机接口设备类(HIDC)的信息生成UIBC报文,可选地,可以在所述UIBC协议中定义所述预设控制指令的控制信号段内容,在将所述预设语音指令生成UIBC报文时,则在将预设的语音指令生成UIBC报文时即可直接根据所述UIBC协议生成对应的UIBC报文。
本实施例通过将所述预设语音指令转换为UIBC协议中定义的指令,从而使得所述预设语音指令能控制所述传送端设备的行为,增加了UIBC功能的反馈形式。
上述本发明实施例序号仅仅为了描述,不代表实施例的优劣。通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本发明各个实施例所述的方法。
以上仅为本发明的优选实施例,并非因此限制本发明的专利范围,凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本发明的专利保护范围内。
工业实用性
如上所述,本发明实施例提供的一种语音控制方法及装置具有以下有益效果:解决了当不方便直接操作接收端设备或外设时,而无法实现对传送端设备的控制的问题,使得接收端设备对传送端设备的控制更加方便。

Claims (10)

  1. 一种语音控制方法,所述语音控制方法包括以下步骤:
    接收端设备获取用于控制传送端设备的语音控制信号;
    接收端设备将获取的所述语音控制信号与预设语音指令进行匹配;
    若匹配成功,则接收端设备根据UIBC协议基于匹配成功的所述预设语音指令生成UIBC报文,并将所述UIBC报文传送至传送端设备,以供传送端设备根据接收到的所述UIBC报文控制自身的行为。
  2. 如权利要求1所述的语音控制方法,其中,所述预设语音指令的存储形式为语音波形,所述接收端设备将获取的所述语音控制信号与预设语音指令进行匹配的步骤包括:
    接收端设备将获取的所述语音控制信号及所述预设语音指令进行相应变换,以得到所述语音控制信号及所述预设语音指令的声谱图或特征向量;
    接收端设备将所述语音控制信号的声谱图或特征向量与所述预设语音指令的声谱图或特征向量进行匹配。
  3. 如权利要求1所述的语音控制方法,其中,所述预设语音指令的存储形式为声谱图或特征向量,所述接收端设备将获取的所述语音控制信号与预设语音指令进行匹配的步骤包括:
    接收端设备将获取的所述语音控制信号进行相应变换,以得到所述语音控制信号的声谱图或特征向量;
    接收端设备将所述语音控制信号的声谱图或特征向量与所述预设语音指令进行匹配。
  4. 如权利要求1所述的语音控制方法,其中,所述接收端设备根 据UIBC协议基于匹配成功的所述预设语音指令生成UIBC报文的步骤包括:
    接收端设备获取所述预设语音指令与所述UIBC报文中控制信号段的映射关系;
    接收端设备根据所述映射关系确定匹配成功的所述预设语音指令对应的控制信号段;
    接收端设备根据UIBC协议将所述预设语音指令对应的控制信号段生成UIBC报文。
  5. 如权利要求1至4任一项所述的语音控制方法,其中,所述接收端设备将获取的所述语音控制信号与预设语音指令进行匹配的步骤之后,所述语音控制方法还包括:
    若匹配失败,则接收端设备提示用户重新发出语音控制信号。
  6. 一种语音控制装置,所述语音控制装置包括:
    获取模块,设置为获取用于控制传送端设备的语音控制信号;
    匹配模块,设置为将获取的所述语音控制信号与预设语音指令进行匹配;
    生成模块,设置为若匹配成功,则根据UIBC协议基于匹配成功的所述预设语音指令生成UIBC报文,并将所述UIBC报文传送至传送端设备,以供传送端设备根据接收到的所述UIBC报文控制自身的行为。
  7. 如权利要求6所述的语音控制装置,其中,所述预设语音指令的存储形式为语音波形,所述匹配模块包括:
    处理单元,设置为将获取的所述语音控制信号及所述预设语音指 令进行相应变换,以得到所述语音控制信号及所述预设语音指令的声谱图或特征向量;
    匹配单元,设置为将所述语音控制信号的声谱图或特征向量与所述预设语音指令的声谱图或特征向量进行匹配。
  8. 如权利要求7所述的语音控制装置,其中,所述预设语音指令的存储形式为声谱图或特征向量,所述处理单元,还设置为将获取的所述语音控制信号进行相应变换,以得到所述语音控制信号的声谱图或特征向量;
    所述匹配单元,还设置为将所述语音控制信号的声谱图或特征向量与所述预设语音指令进行匹配。
  9. 如权利要求6所述的语音控制装置,其中,所述生成模块包括:
    获取单元,设置为获取所述预设语音指令与所述UIBC报文中控制信号段的映射关系;
    确定单元,设置为根据所述映射关系确定匹配成功的所述预设语音指令对应的控制信号段;
    生成单元,设置为根据UIBC协议将所述预设语音指令对应的控制信号段生成UIBC报文。
  10. 如权利要求6至9任一项所述的语音控制装置,其中,所述语音控制装置还包括:
    提示模块,设置为若匹配失败,则提示重新发出语音控制信号。
PCT/CN2016/107321 2016-02-17 2016-11-25 语音控制方法及装置 WO2017140153A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610089414.5 2016-02-17
CN201610089414.5A CN107093424A (zh) 2016-02-17 2016-02-17 语音控制方法及装置

Publications (1)

Publication Number Publication Date
WO2017140153A1 true WO2017140153A1 (zh) 2017-08-24

Family

ID=59624726

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/107321 WO2017140153A1 (zh) 2016-02-17 2016-11-25 语音控制方法及装置

Country Status (2)

Country Link
CN (1) CN107093424A (zh)
WO (1) WO2017140153A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107871507A (zh) * 2017-12-26 2018-04-03 安徽声讯信息技术有限公司 一种语音控制ppt翻页方法及***
CN111949188A (zh) * 2020-08-12 2020-11-17 上海众链科技有限公司 用于智能终端的操作控制映射***、方法及计算机可读存储介质

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111845751B (zh) * 2020-07-28 2021-02-09 盐城工业职业技术学院 一种可切换控制多个农用拖拉机的控制终端

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101719369A (zh) * 2009-12-02 2010-06-02 中兴通讯股份有限公司 投影仪的控制方法、装置以及终端
CN102339193A (zh) * 2010-07-21 2012-02-01 Tcl集团股份有限公司 一种声控会议演讲的方法及***
CN103209246A (zh) * 2012-01-16 2013-07-17 三星电子(中国)研发中心 一种通过蓝牙耳机控制手持设备的方法及手持设备
CN103530032A (zh) * 2012-07-06 2014-01-22 Lg电子株式会社 移动终端、图像显示装置及使用其的用户接口提供方法
CN104135540A (zh) * 2014-08-15 2014-11-05 南京奇幻通信科技有限公司 基于智能终端的远程语音控制技术的方法、智能终端和pc
CN104284246A (zh) * 2013-07-08 2015-01-14 华为终端有限公司 一种传输数据的方法及终端
CN104882141A (zh) * 2015-03-03 2015-09-02 盐城工学院 一种基于时延神经网络和隐马尔可夫模型的串口语音控制投影***
CN105325049A (zh) * 2013-07-19 2016-02-10 三星电子株式会社 用于通信的方法和装置

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8966131B2 (en) * 2012-01-06 2015-02-24 Qualcomm Incorporated System method for bi-directional tunneling via user input back channel (UIBC) for wireless displays
CN104202461A (zh) * 2014-08-11 2014-12-10 苏州易动智能科技有限公司 一种连接智能手机功能同步化的汽车音响***
CN204362241U (zh) * 2015-01-31 2015-05-27 深圳市芯晶彩科技有限公司 屏幕共享装置及***
CN105161106A (zh) * 2015-08-20 2015-12-16 深圳Tcl数字技术有限公司 智能终端的语音控制方法、装置及电视机***

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101719369A (zh) * 2009-12-02 2010-06-02 中兴通讯股份有限公司 投影仪的控制方法、装置以及终端
CN102339193A (zh) * 2010-07-21 2012-02-01 Tcl集团股份有限公司 一种声控会议演讲的方法及***
CN103209246A (zh) * 2012-01-16 2013-07-17 三星电子(中国)研发中心 一种通过蓝牙耳机控制手持设备的方法及手持设备
CN103530032A (zh) * 2012-07-06 2014-01-22 Lg电子株式会社 移动终端、图像显示装置及使用其的用户接口提供方法
CN104284246A (zh) * 2013-07-08 2015-01-14 华为终端有限公司 一种传输数据的方法及终端
CN105325049A (zh) * 2013-07-19 2016-02-10 三星电子株式会社 用于通信的方法和装置
CN104135540A (zh) * 2014-08-15 2014-11-05 南京奇幻通信科技有限公司 基于智能终端的远程语音控制技术的方法、智能终端和pc
CN104882141A (zh) * 2015-03-03 2015-09-02 盐城工学院 一种基于时延神经网络和隐马尔可夫模型的串口语音控制投影***

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107871507A (zh) * 2017-12-26 2018-04-03 安徽声讯信息技术有限公司 一种语音控制ppt翻页方法及***
CN111949188A (zh) * 2020-08-12 2020-11-17 上海众链科技有限公司 用于智能终端的操作控制映射***、方法及计算机可读存储介质

Also Published As

Publication number Publication date
CN107093424A (zh) 2017-08-25

Similar Documents

Publication Publication Date Title
JP6640502B2 (ja) ディスプレイ装置、音声取得装置およびその音声認識方法
CN109379613B (zh) 音视频同步调整方法、电视、计算机可读存储介质及***
JP6440346B2 (ja) ディスプレイ装置、電子装置、対話型システム及びそれらの制御方法
CN109378006B (zh) 一种跨设备声纹识别方法及***
EP3084633A1 (en) Attribute-based audio channel arbitration
JP2014021494A (ja) 音声認識エラー修正方法及びそれを適用した放送受信装置
WO2018100743A1 (ja) 制御装置および機器制御システム
WO2017140153A1 (zh) 语音控制方法及装置
CN110568926B (zh) 一种声音信号处理方法及终端设备
US20160165343A1 (en) Wireless connection and control method for wireless sound box and system thereof
WO2016008348A1 (zh) 一种用于视频控制的方法与设备
CN110992955A (zh) 一种智能设备的语音操作方法、装置、设备及存储介质
US11727940B2 (en) Autocorrection of pronunciations of keywords in audio/videoconferences
CN105681885A (zh) 移动终端录屏直播装置及方法
KR20140008870A (ko) 컨텐츠 정보 제공 방법 및 이를 적용한 방송 수신 장치
WO2018133656A1 (zh) 将语音输入转换成文本输入的方法、装置和语音输入设备
KR20140079582A (ko) 오디오 신호의 햅틱 신호 변환 방법 및 이를 수행하는 장치
WO2015167008A1 (ja) 案内装置、案内方法、プログラム及び情報記憶媒体
JP2020149054A (ja) スマートマイク制御サーバー及びシステム
WO2020135773A1 (zh) 数据处理方法、装置及计算机可读存储介质
WO2018059595A1 (zh) 车载无线交互方法、控制设备和车载设备
CN106454519A (zh) 智能电视装置的音量调节方法及其装置
WO2018020828A1 (ja) 翻訳装置および翻訳システム
WO2017067319A1 (zh) 信息传输方法和装置、及终端
JP2011086123A (ja) 情報処理装置、会議システム、情報処理方法及びコンピュータプログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16890378

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16890378

Country of ref document: EP

Kind code of ref document: A1