WO2017012511A1 - Voice control method and device, and projector apparatus - Google Patents

Voice control method and device, and projector apparatus Download PDF

Info

Publication number
WO2017012511A1
WO2017012511A1 PCT/CN2016/090170 CN2016090170W WO2017012511A1 WO 2017012511 A1 WO2017012511 A1 WO 2017012511A1 CN 2016090170 W CN2016090170 W CN 2016090170W WO 2017012511 A1 WO2017012511 A1 WO 2017012511A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
instruction
command
stored
wake
Prior art date
Application number
PCT/CN2016/090170
Other languages
French (fr)
Chinese (zh)
Inventor
朱渊
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2017012511A1 publication Critical patent/WO2017012511A1/en

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B19/00Programme-control systems
    • G05B19/02Programme-control systems electric
    • G05B19/04Programme control other than numerical control, i.e. in sequence controllers or logic controllers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Definitions

  • This document relates to, but is not limited to, the field of communications, and in particular, to a voice control method, apparatus, and projector apparatus.
  • a projector also known as a projector, is a device that can project images or video onto a screen. It can be connected to a computer, a video compact disc (VCD), or a digital video disc (Digital Video Disc) through different interfaces. Referred to as DVD), game consoles, etc., to play the corresponding video signals, projectors are widely used in homes, offices, schools and entertainment venues. According to the application environment, projectors can be divided into the following categories: home theater, Portable business projectors, educational conference projectors, mainstream engineering projectors, professional theater projectors, and measurement projectors.
  • a common feature of these projectors is that they require manual remote control when operating these projectors, and manual operation can cause cumbersome operation, resulting in poor user experience and lack of fun.
  • the present invention provides a voice control method, device and projector device, which can solve the problem that the operation of the projector manually operated in the related art is cumbersome and leads to a poor user experience.
  • This article provides a voice control method that includes:
  • the voice recognition state is a state in which an operation is performed according to the voice instruction
  • the determining that the projector device enters the voice recognition state comprises:
  • a touch signal, a voice signal, and a button signal of a predetermined track are provided.
  • the determining, by the received voice instruction, performing an operation corresponding to the voice instruction includes:
  • the identifying the received voice instruction, before performing the operation corresponding to the voice instruction further includes:
  • the obtained file name and/or application name is stored to the specified location, wherein when the voice recognizes the file name stored in the specified location, the file corresponding to the file name recognized by the voice is called, and when the voice recognition is stored in the specified location When the application name is called, the application corresponding to the application name recognized by the speech is called.
  • the receiving the input voice instruction includes:
  • the projector device receives the voice command through a peripheral device, wherein the peripheral device includes one or more of the following:
  • Wired earphones Wired earphones, Bluetooth headsets.
  • a voice control device comprising:
  • a determining module configured to determine that the projector device enters a voice recognition state, wherein the voice recognition state is a state in which an operation is performed according to the voice instruction;
  • a receiving module configured to receive an input voice command
  • An execution module configured to identify the received voice instruction, and perform execution corresponding to the voice instruction Operation.
  • the determining module includes:
  • a determining unit configured to determine that the projector device enters the voice recognition state by receiving a wake-up command, where the wake-up command includes one or more of the following:
  • a touch signal, a voice signal, and a button signal of a predetermined track are provided.
  • the execution module includes:
  • a determining unit configured to determine whether an instruction matching the voice command is stored in advance
  • the execution unit is configured to perform an operation corresponding to the voice instruction in a case where the determination result of the determination unit is that an instruction matching the voice instruction is stored in advance.
  • the foregoing apparatus further includes:
  • Obtaining a module configured to obtain a file name of a pre-stored file and/or an application name of a pre-installed application
  • a storage module configured to store the file name and/or the application name to a specified location
  • the execution unit is further configured to, when the voice recognition reaches the file name stored in the specified location, invoke a file corresponding to the file name recognized by the voice, and invoke voice recognition when the voice recognition identifies the application name stored in the specified location.
  • the application corresponding to the application name.
  • the receiving, by the receiving module, the input voice instruction includes:
  • the voice command is received by a peripheral device supported by the projector device, wherein the peripheral device includes one or more of the following:
  • Wired earphones Wired earphones, Bluetooth headsets.
  • a projector device comprising at least: a low power wake-up chip, a speech engine, and a standard stream component, wherein
  • the low power wake-up chip is configured to enter a voice recognition state according to the wake-up instruction, wherein the voice recognition state is a state in which an operation is performed according to the voice instruction;
  • the speech engine is configured to receive an input voice command
  • the standard stream component is configured to identify the received voice command, and execute the voice command Corresponding operation.
  • determining that the projector device enters a voice recognition state is a state in which an operation is performed according to a voice instruction; receiving an input voice command; identifying the received voice command, executing and The operation corresponding to the voice instruction solves the cumbersome operation when manually operating the projector in the related art, resulting in poor user experience, achieving the effect of reducing the complexity of the projector operation and improving the user experience.
  • FIG. 1 is a flow chart of a voice control method according to an embodiment of the present invention.
  • FIG. 2 is a block diagram showing the structure of a voice control apparatus according to an embodiment of the present invention.
  • FIG. 3 is a structural block diagram of a determining module 22 in a voice control apparatus according to an embodiment of the present invention
  • FIG. 4 is a block diagram showing the structure of an execution module 26 in a voice control apparatus according to an embodiment of the present invention.
  • FIG. 5 is a block diagram showing another structure of a voice control apparatus according to an embodiment of the present invention.
  • FIG. 6 is a block diagram showing the structure of a voice-activated projector system according to an embodiment of the present invention.
  • FIG. 7 is a low power consumption wake-up flowchart of a voice-activated projector system in accordance with an embodiment of the present invention.
  • Figure 8 is a diagram showing the operational state of a voice-activated projector system in accordance with an embodiment of the present invention.
  • FIG. 1 is a language according to an embodiment of the present invention.
  • a flow chart of the sound control method, as shown in FIG. 1, the process includes the following steps:
  • Step S102 determining that the projector device enters a voice recognition state, wherein the voice recognition state is a state in which an operation is performed according to the voice instruction;
  • Step S104 receiving an input voice instruction
  • Step S106 identifying the received voice command, and performing an operation corresponding to the voice command.
  • the projector device when the projector device is operated, the projector device can be operated by voice instructions, which can avoid the cumbersome steps of manual operation, solves the cumbersome operation of the manual operation of the projector in the related art, and leads to poor user experience, reaching Reduce the complexity of the projector operation and improve the user experience.
  • determining that the projector device enters the voice recognition state comprises: determining that the projector device enters the voice recognition state by receiving a wakeup command, wherein the wakeup command includes one or more of the following:
  • a touch signal, a voice signal, and a button signal of a predetermined track are provided.
  • identifying the received voice command, and performing an operation corresponding to the voice command includes: determining whether an instruction matching the voice command is pre-stored; and determining that the command matching the voice command is stored in advance In the case, an operation corresponding to the voice instruction is performed. If there is no instruction stored in the voice command matching, a prompt message, such as a "unable to recognize the command" prompt, may be fed back.
  • the performing an operation corresponding to the voice instruction may include: executing a pre-stored instruction that matches the voice instruction.
  • the method before the receiving the voice instruction, performing the operation corresponding to the voice instruction, the method further includes: acquiring a file name of the pre-stored file and/or an application name of the pre-installed application; storing the file A name and/or an application name, wherein the file name is used to invoke a file corresponding to the file name according to the voice instruction, the application name being used to invoke an application corresponding to the application name according to the voice instruction.
  • the purpose of storing the above file name and application name is to conveniently call the corresponding file and application according to the voice instruction.
  • the new file is stored or a new application is installed, the file name of the newly stored file and the new file are stored.
  • the app name of the installed app is acquiring a file name of the pre-stored file and/or an application name of the pre-installed application.
  • the projector device supports receiving the above language through a peripheral device.
  • An audio command wherein the peripheral device includes one or more of the following:
  • Wired earphones Wired earphones, Bluetooth headsets.
  • the method according to the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course, by hardware, but in many cases, the former is A more general embodiment.
  • portions of the embodiments of the present invention that contribute substantially or to the related art may be embodied in the form of a software product stored in a storage medium (eg, ROM/RAM, disk, optical disk).
  • the method includes a plurality of instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method described in the embodiments of the present invention.
  • a voice control device is also provided, which is used to implement the above-mentioned embodiments and optional embodiments, and has not been described again.
  • the term "module” may implement a combination of software and/or hardware of a predetermined function.
  • the apparatus described in the following embodiments is preferably implemented in software, hardware, or a combination of software and hardware, is also possible and contemplated.
  • FIG. 2 is a block diagram showing the structure of a voice control apparatus according to an embodiment of the present invention. As shown in FIG. 2, the apparatus includes a determining module 22, a receiving module 24, and an executing module 26. The apparatus will be described below.
  • the determining module 22 is configured to determine that the projector device enters a voice recognition state, wherein the voice recognition state is a state in which an operation is performed according to the voice instruction; the receiving module 24 is connected to the determining module 22, configured to receive the input voice command; The module 26, coupled to the receiving module 24, is configured to recognize the received voice command and perform an operation that matches the voice command.
  • FIG. 3 is a block diagram showing the structure of the determining module 22 in the voice control apparatus according to the embodiment of the present invention.
  • the determining module 22 includes a determining unit 32, which will be described below.
  • the determining unit 32 is configured to determine that the projector device enters a voice recognition state by receiving a wake-up command, where the wake-up command includes one or more of the following:
  • a touch signal, a voice signal, and a button signal of a predetermined track are provided.
  • FIG. 4 is a structural block diagram of an execution module 26 in a voice control apparatus according to an embodiment of the present invention. As shown in FIG. 4, the execution module 26 includes a determination unit 42 and an execution unit 44. Block 26 is described:
  • the determining unit 42 is configured to determine whether an instruction matching the voice command is stored in advance; the executing unit 44 is connected to the determining unit 42 and is configured to perform the determination result in the determining unit 42 that the voice command is matched in advance. In the case of an instruction, an operation corresponding to the voice instruction is performed.
  • FIG. 5 is another structural block diagram of a voice control apparatus according to an embodiment of the present invention. As shown in FIG. 5, the apparatus includes an acquisition module 52 and a storage module 54 in addition to all the modules shown in FIG. Explain the device:
  • the obtaining module 52 is configured to obtain a file name of the pre-stored file and/or an application name of the pre-installed application;
  • the storage module 54 is connected to the obtaining module 52 and the executing module 26, and is configured to store the file name and/or The above application name is to the specified location.
  • the execution unit is further configured to, when the voice recognition reaches the file name stored in the specified location, invoke a file corresponding to the file name recognized by the voice, and invoke the voice when the voice recognition identifies the application name stored in the specified location.
  • the application corresponding to the identified application name is configured to obtain a file name of the pre-stored file and/or an application name of the pre-installed application;
  • the storage module 54 is connected to the obtaining module 52 and the executing module 26, and is configured to store the file name and/or The above application name is to the specified location.
  • the execution unit is further configured to, when the voice recognition reaches the file name stored in the specified location, invoke a file corresponding to the file name recognized
  • the projector device described above supports receiving voice commands through a peripheral device, where the peripheral device includes one or more of the following:
  • Wired earphones Wired earphones, Bluetooth headsets.
  • the embodiment further provides a projector device, the device includes at least: a low-power wake-up chip, a voice engine, and a standard stream component, wherein the low-power wake-up chip is set to enter a voice recognition state according to the wake-up instruction, where
  • the speech recognition state is a state in which an operation is performed according to a voice instruction; the speech engine is configured to receive an input voice instruction; the standard stream component is configured to recognize the received voice instruction, and perform an operation corresponding to the voice instruction.
  • the low-power wake-up chip may be connected to a voice engine, and the voice engine may be connected to a standard stream component, and the low-power wake-up chip and the standard stream component may or may not be connected.
  • the related technologies may include the following aspects:
  • Speech recognition technology also known as Automatic Speech Recognition (ASR)
  • ASR Automatic Speech Recognition
  • Speech recognition technology aims to convert vocabulary content in human speech into computer readable input such as buttons, binary codes or sequences of characters. Unlike speaker recognition and speaker confirmation, the latter attempts to identify or confirm the speaker of the speech rather than the vocabulary content contained therein.
  • Speech recognition technology is a high-tech technique that allows a machine to transform a speech signal into a corresponding text or command through an identification and understanding process. Speech recognition technology mainly includes three aspects: feature extraction technology, pattern matching criterion and model training technology.
  • speech recognition tasks can be roughly divided into three categories, namely, isolated word recognition, keyword recognition (or keyword spotting) and continuous speech recognition.
  • isolated word recognition is to identify previously known isolated words, such as "boot”, "shutdown”, etc.
  • continuous speech recognition task is to identify any continuous speech, such as a sentence or a paragraph
  • continuous voice stream is for continuous speech, but it does not recognize all the words, but only detects where one or more known keywords appear, such as detecting "computer” and "world” in a paragraph. Two words.
  • voice recognition of an isolated word may be adopted, that is, a voice instruction that needs to be supported is pre-edited into a grammar file, and an engine is compiled to generate a corresponding recognition range.
  • the user only supports pre-defined instructions in the syntax.
  • the low-power digital signal processor (DSP) voice wake-up technology refers to the terminal (eg, mobile phone) wireless access point (Access Point, AP for short) after sleep (ie, the central processing unit (Central Processing Unit, Referred to as CPU) (stopping the work), relying on the DSP-specific processing unit, and through a specific trigger mode, can achieve the technology to wake up the CPU and make it back into the working state. It is a voice control scene focusing on completely liberating hands. Based on the maximum power saving in the sleep state of the mobile phone system, the technical operation of waking up the voice of the mobile phone is developed. The research work of this research can open up a complete use of "speech + auditory response" instead of "finger + visual touch" input operation premise for mobile phone operation, and achieve a complete voice intelligent human-computer interaction experience.
  • Speech interruption refers to a special speech recognition technology for speech recognition under steady-state background sound. With this function, when using the speech recognition system, you don't have to wait for the "click" sound before you can talk. Instead, you can interrupt the prompt tone with your voice and go directly to speech recognition (this process is called barge-in).
  • the key to speech interruption is the speech endpoint detection function.
  • the purpose of endpoint detection is to distinguish the speech signal from the non-speech signal in the signal stream in a complex application environment, and to determine the start and end of the speech signal.
  • the signal flow in the related art has a certain background sound, and the speech recognition model is based on the speech signal training, and the speech signal and the speech model are meaningful for pattern matching. Therefore, detecting a speech signal from a signal stream is a necessary pre-processing process for speech recognition.
  • endpoint detection can have two processes:
  • parameters such as energy, zero-crossing rate, entropy, pitch, and their derived parameters are used to determine the speech/non-speech signal in the signal stream.
  • endpoint detection is to:
  • the user wakes up the projector with a wake-up command and enters a voice recognition state.
  • the wake-up command supports custom recording training.
  • the user can also manually wake up the projector, such as long-pressing the device through the home button to enter the voice recognition state.
  • the user can speak any preset voice commands to tell the projector what to do next. Such as: open the projection, close the projection, play ****, open *** (where *** is the file name of the video, PPT document name or the application name of the installed application, etc.).
  • open the projection close the projection
  • play **** open *** (where *** is the file name of the video, PPT document name or the application name of the installed application, etc.).
  • the file name or application name can be automatically loaded into the succinct syntax, and the application can be automatically loaded into the grammar as long as it is installed in the system.
  • the projector when the user inputs an instruction that is not preset by the projector, the projector prompts the user to input an error and enters a re-input instruction flow.
  • the user can control the video playback through the voice interrupt technology, and the voice command can be input at any time during the video playback.
  • the user can say that the video control commands are such as: increase the volume, turn down the volume, pause, resume playing, and exit playback.
  • the user can control the PPT play through the voice interrupt technology, and the voice command can be input at any time during the PPT presentation.
  • Users can say PPT control commands such as: previous page, next page, first page, last page, exit full screen, full screen playback, etc.
  • Peripherals such as wired headsets, Bluetooth headsets, and connected to the projector, peripherals can be used as voice input devices to control the projector. If the user can stand farther from the projector, the projector can be voice-activated through the Bluetooth headset.
  • the entire process projector will have a user interface (User Interface, UI for short) prompt on the projector screen, and there will be a vocal or prompt tone to tell the user when to start inputting instructions, input end, input error, and so on.
  • UI User Interface
  • FIG. 6 is a block diagram showing the structure of a voice-activated projector system according to an embodiment of the present invention, as shown in FIG.
  • the system consists of three parts, including a low-power wake-up chip module (corresponding to the Low-power Wakeup DSP Chip in Figure 6, the low-power wake-up chip described above), the identification and broadcast engine.
  • Modules corresponding to the Voice Engine in Figure 6, as described above for the speech engine
  • standard stream component modules corresponding to the Standard Flow Component in Figure 6, as well as the standard stream components described above.
  • the main functions of each module are as follows:
  • the low-power wake-up chip module belongs to a hardware device and is configured to monitor a user's wake-up operation while the projector is sleeping;
  • the recognition and broadcast engine module is a core module for voice recognition and voice announcement, and is responsible for recognizing the collected audio.
  • FIG. 7 is a low power consumption wake-up flowchart of a voice-activated projector system according to an embodiment of the present invention. As shown in FIG. 7, the flow includes the following steps S702 to S712:
  • Step S702 the user inputs an awakening word
  • Step S704 the low-power wake-up chip continuously monitors the user voice input while the projector is sleeping;
  • Step S706 when the user's voice input is consistent with the wake-up words of the preset training, the low-power wake-up chip wakes up the CPU, and reports a wake-up event to the driver layer;
  • Step S708 the framework layer then notifies the application layer by means of a message
  • Step S710 the application layer adjusts the voice recognition process
  • the low-power wake-up chip makes it possible to completely liberate the user's hands and make the voice control process a closed-loop operation. Since the low-power wake-up chip is a hardware configuration and cannot be configured on some projector models, the system supports cropping the module on a low-profile projector, and the user can wake up by other means such as peripheral devices and projector buttons.
  • FIG. 8 is a diagram showing an operation state of a voice-activated projector system according to an embodiment of the present invention, which will be described below with reference to FIG. 8:
  • the device When the device is initialized and wakes up, the device enters the recording state and waits for the user to input a voice command.
  • the user has two possible operations at this time: one is that there is no sound, and the recognition process is timed out; the other is that the sound is recorded by the projector and enters the subsequent recognition state.
  • After entering the recognition state if it is recognized that the user has said the correct instruction, it will be distributed to the corresponding standard stream component for processing; if it is an unrecognizable instruction, the user is prompted to input an error, re-enter or exit.
  • recording interruption is a special recognition method under steady-state background sound. Such as voice control during video playback. At this point, the recording is continuously turned on to detect the user's voice input and denoise for the steady-state background sound. If a speech input consistent with the preset dynamic command is detected, the engine will return the recognition result to inform the standard component stream to perform the corresponding operation. Continue to detect the next voice input, the recording interruption will not stop until the user exits the video playback.
  • a voice-activated projector system is proposed to solve the problem.
  • the system works in hardware and software to enable the user to wake up the projector with sound and send voice commands.
  • the whole process can realize closed-loop operation, that is, the whole link is completed by voice control, and no manual operation is required, which liberates the user's hands and greatly enhances the use efficiency and fun of the projector.
  • the system supports cropping and tailors features and hardware configurations as needed.
  • modules may be implemented by software or hardware.
  • the foregoing may be implemented by, but not limited to, the foregoing modules are all located in the same processor; or, the modules are respectively located in multiple processes. In the device.
  • the embodiment of the invention further provides a storage medium.
  • the foregoing storage medium may be configured to store program code for performing the following steps:
  • the foregoing storage medium may include, but is not limited to, a USB flash drive, a Read-Only Memory (ROM), and a Random Access Memory (RAM).
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • a medium that can store program code such as a hard disk, a disk, or a disc.
  • the processor performs steps S1-S3 according to the stored program code in the storage medium.
  • the instructions are related to hardware (eg, a processor) that can be stored in a computer readable storage medium, such as a read only memory, a magnetic disk, or an optical disk.
  • a computer readable storage medium such as a read only memory, a magnetic disk, or an optical disk.
  • all or part of the steps of the above embodiments may also be implemented using one or more integrated circuits.
  • the modules/units in the above embodiments may be implemented in the form of hardware, for example, by implementing integrated functions to implement their respective functions, or may be implemented in the form of software function modules, for example, executing program instructions stored in the memory by the processor. To achieve its corresponding function.
  • This application is not limited to any specific combination of hardware and software.
  • determining that the projector device enters a voice recognition state is a state in which an operation is performed according to a voice instruction; receiving an input voice command; identifying the received voice command, performing the The operation corresponding to the voice instruction solves the cumbersome operation when the projector is manually operated in the related art, resulting in poor user experience, achieving the effect of reducing the complexity of the projector operation and improving the user experience.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Electrically Operated Instructional Devices (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

Disclosed are a voice control method and device, and a projector apparatus. The method comprises: determining that a projector apparatus enters into a voice recognition state, wherein the voice recognition state is a state in which an operation is executed according to a voice instruction (S102); receiving an input voice instruction (S104); and recognizing the received voice instruction and executing an operation corresponding to the voice instruction (S106).

Description

语音控制方法、装置及投影仪设备Voice control method, device and projector device 技术领域Technical field
本文涉及但不限于通信领域,尤其涉及一种语音控制方法、装置及投影仪设备。This document relates to, but is not limited to, the field of communications, and in particular, to a voice control method, apparatus, and projector apparatus.
背景技术Background technique
投影仪,又称投影机,是一种可以将图像或视频投射到幕布上的设备,可以通过不同的接口同计算机、视频光盘(Video Compact Disc,简称为VCD)、数字视盘(Digital Video Disc,简称为DVD)、游戏机等相连接,播放相应的视频信号,投影仪广泛应用于家庭、办公室、学校和娱乐场所,按照应用环境的不同,投影仪可以分为如下几类:家庭影院型、便携商务型投影仪、教育会议型投影仪、主流工程型投影仪、专业剧院型投影仪、测量投影仪。A projector, also known as a projector, is a device that can project images or video onto a screen. It can be connected to a computer, a video compact disc (VCD), or a digital video disc (Digital Video Disc) through different interfaces. Referred to as DVD), game consoles, etc., to play the corresponding video signals, projectors are widely used in homes, offices, schools and entertainment venues. According to the application environment, projectors can be divided into the following categories: home theater, Portable business projectors, educational conference projectors, mainstream engineering projectors, professional theater projectors, and measurement projectors.
这些投影仪都有一个共同的特点,就是在操作这些投影仪时,需要手动遥控器操作,而手动操作会造成操作繁琐的问题,从而导致用户体验差、缺乏趣味性。A common feature of these projectors is that they require manual remote control when operating these projectors, and manual operation can cause cumbersome operation, resulting in poor user experience and lack of fun.
针对相关技术中存在的手动操作投影仪时操作繁琐,导致用户体验差的问题,目前尚未提出有效的解决方案。In view of the cumbersome operation of the manual operation of the projector in the related art, resulting in a poor user experience, an effective solution has not yet been proposed.
发明内容Summary of the invention
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。The following is an overview of the topics detailed in this document. This Summary is not intended to limit the scope of the claims.
本文提供一种语音控制方法、装置及投影仪设备,可以解决相关技术中存在的手动操作投影仪时操作繁琐,导致用户体验差的问题。The present invention provides a voice control method, device and projector device, which can solve the problem that the operation of the projector manually operated in the related art is cumbersome and leads to a poor user experience.
本文提供了一种语音控制方法,包括:This article provides a voice control method that includes:
确定投影仪设备进入语音识别状态,其中,所述语音识别状态为根据语音指令执行操作的状态;Determining that the projector device enters a voice recognition state, wherein the voice recognition state is a state in which an operation is performed according to the voice instruction;
接收输入的语音指令; Receiving an input voice command;
识别接收的所述语音指令,执行与所述语音指令对应的操作。Identifying the received voice command and performing an operation corresponding to the voice command.
可选地,上述方法中,所述确定投影仪设备进入语音识别状态包括:Optionally, in the above method, the determining that the projector device enters the voice recognition state comprises:
确定所述投影仪设备通过接收唤醒指令的方式,进入所述语音识别状态,其中,所述唤醒指令包括以下一种或几种:Determining that the projector device enters the voice recognition state by receiving a wake-up command, where the wake-up command includes one or more of the following:
预定轨迹的触控信号、语音信号、按键信号。A touch signal, a voice signal, and a button signal of a predetermined track.
可选地,上述方法中,所述识别接收的所述语音指令,执行与所述语音指令对应的操作包括:Optionally, in the foregoing method, the determining, by the received voice instruction, performing an operation corresponding to the voice instruction includes:
判断是否预先存储有与所述语音指令匹配的指令;Determining whether an instruction matching the voice instruction is stored in advance;
在判断结果为预先存储有与所述语音指令匹配的指令的情况下,执行与所述语音指令对应的操作。In the case where the result of the judgment is that an instruction matching the voice command is stored in advance, an operation corresponding to the voice command is performed.
可选地,上述方法中,所述识别接收的所述语音指令,执行与所述语音指令对应的操作之前,还包括:Optionally, in the foregoing method, the identifying the received voice instruction, before performing the operation corresponding to the voice instruction, further includes:
获取预先存储的文件的文件名称和/或预先安装的应用的应用名称;Obtaining the file name of the pre-stored file and/or the application name of the pre-installed application;
将所获取的文件名称和/或应用名称存储至指定位置,其中,当语音识别到存储在指定位置的文件名称时,调用语音识别到的文件名称对应的文件,当语音识别到存储在指定位置的应用名称时,调用语音识别到的应用名称对应的应用。The obtained file name and/or application name is stored to the specified location, wherein when the voice recognizes the file name stored in the specified location, the file corresponding to the file name recognized by the voice is called, and when the voice recognition is stored in the specified location When the application name is called, the application corresponding to the application name recognized by the speech is called.
可选地,上述方法中,所述接收输入的语音指令包括:Optionally, in the above method, the receiving the input voice instruction includes:
所述投影仪设备通过***设备接收所述语音指令,其中,所述***设备包括以下一种或几种:The projector device receives the voice command through a peripheral device, wherein the peripheral device includes one or more of the following:
有线耳机、蓝牙耳机。Wired earphones, Bluetooth headsets.
本文还公开了一种语音控制装置,包括:Also disclosed herein is a voice control device comprising:
确定模块,设置为确定投影仪设备进入语音识别状态,其中,所述语音识别状态为根据语音指令执行操作的状态;a determining module, configured to determine that the projector device enters a voice recognition state, wherein the voice recognition state is a state in which an operation is performed according to the voice instruction;
接收模块,设置为接收输入的语音指令;a receiving module configured to receive an input voice command;
执行模块,设置为识别接收的所述语音指令,执行与所述语音指令对应 的操作。An execution module, configured to identify the received voice instruction, and perform execution corresponding to the voice instruction Operation.
可选地,上述装置中,所述确定模块包括:Optionally, in the foregoing apparatus, the determining module includes:
确定单元,设置为确定所述投影仪设备通过接收唤醒指令的方式,进入所述语音识别状态,其中,所述唤醒指令包括以下一种或几种:a determining unit, configured to determine that the projector device enters the voice recognition state by receiving a wake-up command, where the wake-up command includes one or more of the following:
预定轨迹的触控信号、语音信号、按键信号。A touch signal, a voice signal, and a button signal of a predetermined track.
可选地,上述装置中,所述执行模块包括:Optionally, in the foregoing apparatus, the execution module includes:
判断单元,设置为判断是否预先存储有与所述语音指令匹配的指令;a determining unit, configured to determine whether an instruction matching the voice command is stored in advance;
执行单元,设置为在所述判断单元的判断结果为预先存储有与所述语音指令匹配的指令的情况下,执行与所述语音指令对应的操作。The execution unit is configured to perform an operation corresponding to the voice instruction in a case where the determination result of the determination unit is that an instruction matching the voice instruction is stored in advance.
可选地,上述装置还包括:Optionally, the foregoing apparatus further includes:
获取模块,设置为获取预先存储的文件的文件名称和/或预先安装的应用的应用名称;Obtaining a module, configured to obtain a file name of a pre-stored file and/or an application name of a pre-installed application;
存储模块,设置为将所述文件名称和/或所述应用名称存储至指定位置,a storage module, configured to store the file name and/or the application name to a specified location,
所述执行单元,还设置为在语音识别到存储在指定位置中的文件名称时,调用语音识别到的文件名称对应的文件,在语音识别到存储在指定位置中的应用名称时,调用语音识别到的应用名称对应的应用。The execution unit is further configured to, when the voice recognition reaches the file name stored in the specified location, invoke a file corresponding to the file name recognized by the voice, and invoke voice recognition when the voice recognition identifies the application name stored in the specified location. The application corresponding to the application name.
可选地,上述装置中,所述接收模块接收输入的语音指令包括:Optionally, in the foregoing apparatus, the receiving, by the receiving module, the input voice instruction includes:
通过投影仪设备支持的***设备接收所述语音指令,其中,所述***设备包括以下一种或几种:The voice command is received by a peripheral device supported by the projector device, wherein the peripheral device includes one or more of the following:
有线耳机、蓝牙耳机。Wired earphones, Bluetooth headsets.
本文还公开了一种投影仪设备,至少包括:低功耗唤醒芯片、语音引擎和标准流组件,其中,Also disclosed herein is a projector device comprising at least: a low power wake-up chip, a speech engine, and a standard stream component, wherein
所述低功耗唤醒芯片设置为根据唤醒指令进入语音识别状态,其中,所述语音识别状态为根据语音指令执行操作的状态;The low power wake-up chip is configured to enter a voice recognition state according to the wake-up instruction, wherein the voice recognition state is a state in which an operation is performed according to the voice instruction;
所述语音引擎设置为接收输入的语音指令;The speech engine is configured to receive an input voice command;
所述标准流组件设置为识别接收的所述语音指令,执行与所述语音指令 对应的操作。The standard stream component is configured to identify the received voice command, and execute the voice command Corresponding operation.
通过本文提供的技术方案,采用确定投影仪设备进入语音识别状态,其中,所述语音识别状态为根据语音指令执行操作的状态;接收输入的语音指令;识别接收的所述语音指令,执行与所述语音指令对应的操作,解决了相关技术中存在的手动操作投影仪时操作繁琐,导致用户体验差,达到了降低投影仪操作复杂度,提高用户体验的效果。Through the technical solution provided herein, determining that the projector device enters a voice recognition state, wherein the voice recognition state is a state in which an operation is performed according to a voice instruction; receiving an input voice command; identifying the received voice command, executing and The operation corresponding to the voice instruction solves the cumbersome operation when manually operating the projector in the related art, resulting in poor user experience, achieving the effect of reducing the complexity of the projector operation and improving the user experience.
在阅读并理解了附图和详细描述后,可以明白其他方面。Other aspects will be apparent upon reading and understanding the drawings and detailed description.
附图概述BRIEF abstract
图1是根据本发明实施例的语音控制方法的流程图;1 is a flow chart of a voice control method according to an embodiment of the present invention;
图2是根据本发明实施例的语音控制装置的一种结构框图;2 is a block diagram showing the structure of a voice control apparatus according to an embodiment of the present invention;
图3是根据本发明实施例的语音控制装置中确定模块22的结构框图;FIG. 3 is a structural block diagram of a determining module 22 in a voice control apparatus according to an embodiment of the present invention; FIG.
图4是根据本发明实施例的语音控制装置中执行模块26的结构框图;4 is a block diagram showing the structure of an execution module 26 in a voice control apparatus according to an embodiment of the present invention;
图5是根据本发明实施例的语音控制装置的另一种结构框图;FIG. 5 is a block diagram showing another structure of a voice control apparatus according to an embodiment of the present invention; FIG.
图6是根据本发明实施例的声控投影仪***的结构框图;6 is a block diagram showing the structure of a voice-activated projector system according to an embodiment of the present invention;
图7是根据本发明实施例的声控投影仪***的低功耗唤醒流程图;7 is a low power consumption wake-up flowchart of a voice-activated projector system in accordance with an embodiment of the present invention;
图8是根据本发明实施例的声控投影仪***的工作状态图。Figure 8 is a diagram showing the operational state of a voice-activated projector system in accordance with an embodiment of the present invention.
本发明的实施方式Embodiments of the invention
下文中将结合附图对本文的实施例进行详细说明。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互任意组合。The embodiments herein will be described in detail below with reference to the accompanying drawings. It should be noted that, in the case of no conflict, the features in the embodiments and the embodiments in the present application may be arbitrarily combined with each other.
需要说明的是,本文及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。It should be noted that the terms "first", "second" and the like in the drawings and the above figures are used to distinguish similar objects, and are not necessarily used to describe a specific order or order.
在本实施例中提供了一种语音控制方法,图1是根据本发明实施例的语 音控制方法的流程图,如图1所示,该流程包括如下步骤:In the embodiment, a voice control method is provided, and FIG. 1 is a language according to an embodiment of the present invention. A flow chart of the sound control method, as shown in FIG. 1, the process includes the following steps:
步骤S102,确定投影仪设备进入语音识别状态,其中,该语音识别状态为根据语音指令执行操作的状态;Step S102, determining that the projector device enters a voice recognition state, wherein the voice recognition state is a state in which an operation is performed according to the voice instruction;
步骤S104,接收输入的语音指令;Step S104, receiving an input voice instruction;
步骤S106,识别接收的语音指令,执行与上述语音指令对应的操作。Step S106, identifying the received voice command, and performing an operation corresponding to the voice command.
通过上述步骤,在操作投影仪设备时,可以通过语音指令操作投影仪设备,可以避免手工操作的繁琐步骤,解决了相关技术中存在的手动操作投影仪时操作繁琐,导致用户体验差,达到了降低投影仪操作复杂度,提高用户体验的效果。Through the above steps, when the projector device is operated, the projector device can be operated by voice instructions, which can avoid the cumbersome steps of manual operation, solves the cumbersome operation of the manual operation of the projector in the related art, and leads to poor user experience, reaching Reduce the complexity of the projector operation and improve the user experience.
在一个可选的实施例中,确定投影仪设备进入语音识别状态包括:确定该投影仪设备通过接收唤醒指令的方式,进入上述语音识别状态,其中,该唤醒指令包括以下一种或几种:In an optional embodiment, determining that the projector device enters the voice recognition state comprises: determining that the projector device enters the voice recognition state by receiving a wakeup command, wherein the wakeup command includes one or more of the following:
预定轨迹的触控信号、语音信号、按键信号。A touch signal, a voice signal, and a button signal of a predetermined track.
在一个可选的实施例中,识别接收的上述语音指令,执行与语音指令对应的操作包括:判断是否预先存储有与上述语音指令匹配的指令;判断预先存储有与上述语音指令匹配的指令的情况下,执行与该语音指令对应的操作。其中,若没有存储于上述语音指令匹配的指令,则可以反馈一个提示信息,例如“无法识别该指令”的提示。In an optional embodiment, identifying the received voice command, and performing an operation corresponding to the voice command includes: determining whether an instruction matching the voice command is pre-stored; and determining that the command matching the voice command is stored in advance In the case, an operation corresponding to the voice instruction is performed. If there is no instruction stored in the voice command matching, a prompt message, such as a "unable to recognize the command" prompt, may be fed back.
其中,执行与该语音指令对应的操作可以包括:执行预先存储的、与所述语音指令匹配的指令。The performing an operation corresponding to the voice instruction may include: executing a pre-stored instruction that matches the voice instruction.
在一个可选的实施例中,识别接收的上述语音指令,执行与语音指令对应的操作之前,还包括:获取预先存储的文件的文件名称和/或预先安装的应用的应用名称;存储该文件名称和/或应用名称,其中,该文件名称用于根据语音指令调用与文件名称对应的文件,该应用名称用于根据语音指令调用与应用名称对应的应用。存储上述文件名称和应用名称的目的是为了方便地根据语音指令调用相应的文件和应用,当存储了新的文件或安装了新的应用后,会存储该新存储的文件的文件名称和该新安装的应用的应用名称。In an optional embodiment, before the receiving the voice instruction, performing the operation corresponding to the voice instruction, the method further includes: acquiring a file name of the pre-stored file and/or an application name of the pre-installed application; storing the file A name and/or an application name, wherein the file name is used to invoke a file corresponding to the file name according to the voice instruction, the application name being used to invoke an application corresponding to the application name according to the voice instruction. The purpose of storing the above file name and application name is to conveniently call the corresponding file and application according to the voice instruction. When the new file is stored or a new application is installed, the file name of the newly stored file and the new file are stored. The app name of the installed app.
在一个可选的实施例中,上述投影仪设备支持通过***设备接收上述语 音指令,其中,该***设备包括以下一种或几种:In an optional embodiment, the projector device supports receiving the above language through a peripheral device. An audio command, wherein the peripheral device includes one or more of the following:
有线耳机、蓝牙耳机。Wired earphones, Bluetooth headsets.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到根据上述实施例的方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更普通的实施方式。基于这样的理解,本发明实施例本质上或者说对相关技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括多个指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本发明实施例所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the method according to the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course, by hardware, but in many cases, the former is A more general embodiment. Based on such understanding, portions of the embodiments of the present invention that contribute substantially or to the related art may be embodied in the form of a software product stored in a storage medium (eg, ROM/RAM, disk, optical disk). The method includes a plurality of instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method described in the embodiments of the present invention.
在本实施例中还提供了一种语音控制装置,该装置用于实现上述实施例及可选实施方式,已经进行过说明的不再赘述。如以下所使用的,术语“模块”可以实现预定功能的软件和/或硬件的组合。尽管以下实施例所描述的装置较佳地以软件来实现,但是硬件,或者软件和硬件的组合的实现也是可能并被构想的。In the embodiment, a voice control device is also provided, which is used to implement the above-mentioned embodiments and optional embodiments, and has not been described again. As used below, the term "module" may implement a combination of software and/or hardware of a predetermined function. Although the apparatus described in the following embodiments is preferably implemented in software, hardware, or a combination of software and hardware, is also possible and contemplated.
图2是根据本发明实施例的语音控制装置的一种结构框图,如图2所示,该装置包括确定模块22、接收模块24和执行模块26,下面对该装置进行说明。2 is a block diagram showing the structure of a voice control apparatus according to an embodiment of the present invention. As shown in FIG. 2, the apparatus includes a determining module 22, a receiving module 24, and an executing module 26. The apparatus will be described below.
确定模块22,设置为确定投影仪设备进入语音识别状态,其中,该语音识别状态为根据语音指令执行操作的状态;接收模块24,连接至上述确定模块22,设置为接收输入的语音指令;执行模块26,连接至上述接收模块24,设置为识别接收的语音指令,执行与上述语音指令匹配的操作。The determining module 22 is configured to determine that the projector device enters a voice recognition state, wherein the voice recognition state is a state in which an operation is performed according to the voice instruction; the receiving module 24 is connected to the determining module 22, configured to receive the input voice command; The module 26, coupled to the receiving module 24, is configured to recognize the received voice command and perform an operation that matches the voice command.
图3是根据本发明实施例的语音控制装置中确定模块22的结构框图,如图3所示,该确定模块22包括确定单元32,下面对该确定模块22进行说明。FIG. 3 is a block diagram showing the structure of the determining module 22 in the voice control apparatus according to the embodiment of the present invention. As shown in FIG. 3, the determining module 22 includes a determining unit 32, which will be described below.
确定单元32,设置为确定投影仪设备通过接收唤醒指令的方式,进入语音识别状态,其中,该唤醒指令包括以下一种或几种:The determining unit 32 is configured to determine that the projector device enters a voice recognition state by receiving a wake-up command, where the wake-up command includes one or more of the following:
预定轨迹的触控信号、语音信号、按键信号。A touch signal, a voice signal, and a button signal of a predetermined track.
图4是根据本发明实施例的语音控制装置中执行模块26的结构框图,如图4所示,该执行模块26包括判断单元42和执行单元44,下面对该执行模 块26进行说明:4 is a structural block diagram of an execution module 26 in a voice control apparatus according to an embodiment of the present invention. As shown in FIG. 4, the execution module 26 includes a determination unit 42 and an execution unit 44. Block 26 is described:
判断单元42,设置为判断是否预先存储有与上述语音指令匹配的指令;执行单元44,连接至上述判断单元42,设置为在上述判断单元42的判断结果为预先存储有与上述语音指令匹配的指令的情况下,执行与该语音指令对应的操作。The determining unit 42 is configured to determine whether an instruction matching the voice command is stored in advance; the executing unit 44 is connected to the determining unit 42 and is configured to perform the determination result in the determining unit 42 that the voice command is matched in advance. In the case of an instruction, an operation corresponding to the voice instruction is performed.
图5是根据本发明实施例的语音控制装置的另一种结构框图,如图5所示,该装置除包括图2所示的所有模块外,还包括获取模块52和存储模块54,下面对该装置进行说明:FIG. 5 is another structural block diagram of a voice control apparatus according to an embodiment of the present invention. As shown in FIG. 5, the apparatus includes an acquisition module 52 and a storage module 54 in addition to all the modules shown in FIG. Explain the device:
获取模块52,设置为获取预先存储的文件的文件名称和/或预先安装的应用的应用名称;存储模块54,连接至上述获取模块52和上述执行模块26,设置为存储上述文件名称和/或上述应用名称至指定位置。此时,执行单元,还设置为在语音识别到存储在指定位置中的文件名称时,调用语音识别到的文件名称对应的文件,在语音识别到存储在指定位置中的应用名称时,调用语音识别到的应用名称对应的应用。The obtaining module 52 is configured to obtain a file name of the pre-stored file and/or an application name of the pre-installed application; the storage module 54 is connected to the obtaining module 52 and the executing module 26, and is configured to store the file name and/or The above application name is to the specified location. At this time, the execution unit is further configured to, when the voice recognition reaches the file name stored in the specified location, invoke a file corresponding to the file name recognized by the voice, and invoke the voice when the voice recognition identifies the application name stored in the specified location. The application corresponding to the identified application name.
可选地,上述的投影仪设备支持通过***设备接收语音指令,其中,该***设备包括以下一种或几种:Optionally, the projector device described above supports receiving voice commands through a peripheral device, where the peripheral device includes one or more of the following:
有线耳机、蓝牙耳机。Wired earphones, Bluetooth headsets.
本实施例还提供了一种投影仪设备,该设备至少包括:低功耗唤醒芯片、语音引擎和标准流组件,其中,该低功耗唤醒芯片设置为根据唤醒指令进入语音识别状态,其中,该语音识别状态为根据语音指令执行操作的状态;该语音引擎设置为接收输入的语音指令;该标准流组件设置为识别接收的语音指令,执行与该语音指令对应的操作。其中,上述的低功耗唤醒芯片可以和语音引擎连接,该语音引擎可以和标准流组件连接,低功耗唤醒芯片和标准流组件之间可以连接,也可以不连接。The embodiment further provides a projector device, the device includes at least: a low-power wake-up chip, a voice engine, and a standard stream component, wherein the low-power wake-up chip is set to enter a voice recognition state according to the wake-up instruction, where The speech recognition state is a state in which an operation is performed according to a voice instruction; the speech engine is configured to receive an input voice instruction; the standard stream component is configured to recognize the received voice instruction, and perform an operation corresponding to the voice instruction. The low-power wake-up chip may be connected to a voice engine, and the voice engine may be connected to a standard stream component, and the low-power wake-up chip and the standard stream component may or may not be connected.
在本发明实施例中,所涉及到的技术可以包含以下几个方面:In the embodiment of the present invention, the related technologies may include the following aspects:
1、语音识别技术:1. Speech recognition technology:
语音识别技术作为相关技术热点,已渗透到多个领域,开启从“键盘交互”、“触控交互”到“语音交互”的人机交互模式,为人们解放双手和提 高效率带来可能。As a related technology hotspot, speech recognition technology has penetrated into many fields, opening up the human-computer interaction mode from “keyboard interaction”, “touch interaction” to “voice interaction”, liberating hands and mentioning people High efficiency brings possibilities.
语音识别技术也被称为自动语音识别(Automatic Speech Recognition,简称为ASR),其目标是将人类的语音中的词汇内容转换为计算机可读的输入,例如按键、二进制编码或者字符序列。与说话人识别(Speaker recognition)及说话人确认不同,后者尝试识别或确认发出语音的说话人而非其中所包含的词汇内容。语音识别技术就是让机器通过识别和理解过程把语音信号转变为相应的文本或命令的高技术。语音识别技术主要包括特征提取技术、模式匹配准则及模型训练技术三个方面。Speech recognition technology, also known as Automatic Speech Recognition (ASR), aims to convert vocabulary content in human speech into computer readable input such as buttons, binary codes or sequences of characters. Unlike speaker recognition and speaker confirmation, the latter attempts to identify or confirm the speaker of the speech rather than the vocabulary content contained therein. Speech recognition technology is a high-tech technique that allows a machine to transform a speech signal into a corresponding text or command through an identification and understanding process. Speech recognition technology mainly includes three aspects: feature extraction technology, pattern matching criterion and model training technology.
根据识别的对象不同,语音识别任务大体可分为3类,即孤立词识别(isolated word recognition),关键词识别(或称关键词检出,keyword spotting)和连续语音识别。其中,孤立词识别的任务是识别事先已知的孤立的词,如“开机”、“关机”等;连续语音识别的任务则是识别任意的连续语音,如一个句子或一段话;连续语音流中的关键词检测针对的是连续语音,但它并不识别全部文字,而只是检测已知的一个或多个关键词在何处出现,如在一段话中检测“计算机”、“世界”这两个词。According to different recognized objects, speech recognition tasks can be roughly divided into three categories, namely, isolated word recognition, keyword recognition (or keyword spotting) and continuous speech recognition. Among them, the task of isolated word recognition is to identify previously known isolated words, such as "boot", "shutdown", etc.; continuous speech recognition task is to identify any continuous speech, such as a sentence or a paragraph; continuous voice stream The keyword detection is for continuous speech, but it does not recognize all the words, but only detects where one or more known keywords appear, such as detecting "computer" and "world" in a paragraph. Two words.
可选地,在本发明实施例中可以采用孤立词的语音识别,即将需要支持的语音指令预先编辑成语法文件,有引擎编译生成相应的识别范围。用户使用时仅支持语法中预先定义好的指令。Optionally, in the embodiment of the present invention, voice recognition of an isolated word may be adopted, that is, a voice instruction that needs to be supported is pre-edited into a grammar file, and an engine is compiled to generate a corresponding recognition range. The user only supports pre-defined instructions in the syntax.
2、低功耗唤醒:2, low power wake up:
低功耗数字信号处理器(Digital Signal Processor,简称为DSP)语音唤醒技术是指终端(如,手机)无线访问点(Access Point,简称为AP)休眠后(即中央处理器(Central Processing Unit,简称为CPU)停止工作),依靠DSP特有的处理单元,并通过特定的触发方式,能达到唤醒CPU从而使其重新进入工作状态的技术。它是着眼于完全的解放双手的语音操控场景,在手机***休眠状态中达到最大节电的基础上,开发对手机语音唤醒的技术操作。此研究的开发工作可以为手机操作开辟一种完全的使用“语音+听觉反应”代替“手指+视觉触控”的输入操作前提,达成完全的语音智能化的人机交互体验。The low-power digital signal processor (DSP) voice wake-up technology refers to the terminal (eg, mobile phone) wireless access point (Access Point, AP for short) after sleep (ie, the central processing unit (Central Processing Unit, Referred to as CPU) (stopping the work), relying on the DSP-specific processing unit, and through a specific trigger mode, can achieve the technology to wake up the CPU and make it back into the working state. It is a voice control scene focusing on completely liberating hands. Based on the maximum power saving in the sleep state of the mobile phone system, the technical operation of waking up the voice of the mobile phone is developed. The research work of this research can open up a complete use of "speech + auditory response" instead of "finger + visual touch" input operation premise for mobile phone operation, and achieve a complete voice intelligent human-computer interaction experience.
3、语音打断:3, voice interruption:
语音打断是指在稳态背景音下进行语音识别的一项特殊语音识别技术。 有了这一功能,使用语音识别***时就不必等待“嘀”声之后才能讲话了,而是可以随时用语音打断提示音,直接进入语音识别(这一过程称为barge-in)。Speech interruption refers to a special speech recognition technology for speech recognition under steady-state background sound. With this function, when using the speech recognition system, you don't have to wait for the "click" sound before you can talk. Instead, you can interrupt the prompt tone with your voice and go directly to speech recognition (this process is called barge-in).
语音打断的关键是语音端点检测功能,端点检测的目的就是在复杂的应用环境下的信号流中分辨出语音信号和非语音信号,并确定语音信号的开始及结束。相关技术中的信号流都存在一定的背景声,而语音识别的模型都是基于语音信号训练的,语音信号和语音模型进行模式匹配才有意义。因此从信号流中检测出语音信号是语音识别的必要的预处理过程。The key to speech interruption is the speech endpoint detection function. The purpose of endpoint detection is to distinguish the speech signal from the non-speech signal in the signal stream in a complex application environment, and to determine the start and end of the speech signal. The signal flow in the related art has a certain background sound, and the speech recognition model is based on the speech signal training, and the speech signal and the speech model are meaningful for pattern matching. Therefore, detecting a speech signal from a signal stream is a necessary pre-processing process for speech recognition.
其中,端点检测可以有两个过程:Among them, endpoint detection can have two processes:
a)基于语音信号的特征,用能量、过零率、商(entropy)、音高(pitch)等参数以及它们的衍生参数,来判断信号流中的语音/非语音信号。a) Based on the characteristics of the speech signal, parameters such as energy, zero-crossing rate, entropy, pitch, and their derived parameters are used to determine the speech/non-speech signal in the signal stream.
b)在信号流中检测到语音信号后,判断此处是否是语句的开始或结束点。在商用语音***中,由于信号多变的背景和自然对话模式而更容易使句中有停顿(非语音),特别是在爆发声母前总会有无声间隙。因此,这种开始/结束的判定尤为重要。b) After detecting the speech signal in the signal stream, determine whether it is the start or end point of the statement. In commercial speech systems, it is easier to have pauses (non-speech) in the sentence due to the changing background of the signal and the natural dialogue mode, especially in the presence of silent gaps before the initials. Therefore, this start/end determination is particularly important.
此外端点检测的目的还在于:In addition, the purpose of endpoint detection is to:
a)减少识别器的数据处理量:可以大量减少信号传输量及识别器的运算负载,对于语音对话的实时识别有重要作用。a) Reduce the data processing capacity of the recognizer: It can greatly reduce the signal transmission amount and the computing load of the recognizer, and plays an important role in the real-time recognition of the voice dialogue.
b)拒绝非语音的信号:对非语音信号的识别不仅是一种资源浪费,有可能改变对话的状态,造成对用户的困扰。b) Rejecting non-speech signals: The identification of non-speech signals is not only a waste of resources, but also a change in the state of the conversation, causing confusion for the user.
c)在需要打断(barge-in)功能的***中,语音的起始点是必须的。在端点检测找到语音的起始点时,***将停止提示音的播放。完成打断功能。c) In systems that require a barge-in function, the starting point of speech is required. When the endpoint detects the starting point of the speech, the system will stop the playback of the tone. Complete the interrupt function.
该***的技术方案如下:The technical solution of the system is as follows:
设备休眠时,用户通过唤醒指令唤醒投影仪,进入语音识别状态,该唤醒指令支持自定义录制培训。When the device is asleep, the user wakes up the projector with a wake-up command and enters a voice recognition state. The wake-up command supports custom recording training.
其中,用户也可手动唤醒投影仪,如通过home键长按唤醒设备,进入语音识别状态。 Among them, the user can also manually wake up the projector, such as long-pressing the device through the home button to enter the voice recognition state.
随即,用户可以说出预置的任何语音指令,告诉投影仪下一步需要做什么。如:打开投影,关闭投影,播放****,打开***(其中***为视频的文件名称,PPT文档名或者安装的应用的应用名称等)。文件只要拷贝到投影仪存储器即可自动将该文件名称或应用名称加载到可说语法,应用只要安装到***也可自动加载到可说语法。也就是说,先获取预先存储的文件的文件名称和/或预先安装的应用的应用名称;再将所获取的文件名称和/或应用名称存储至指定位置(即将文件名称和/或应用名称加载到可说语法),当语音识别到存储在指定位置的文件名称(即加载至可说语法中的文件名称)时,调用语音识别到的文件名称对应的文件,当语音识别到存储在指定位置的应用名称(即加载至可说语法中的应用名称)时,调用语音识别到的应用名称对应的应用。Immediately, the user can speak any preset voice commands to tell the projector what to do next. Such as: open the projection, close the projection, play ****, open *** (where *** is the file name of the video, PPT document name or the application name of the installed application, etc.). As long as the file is copied to the projector memory, the file name or application name can be automatically loaded into the succinct syntax, and the application can be automatically loaded into the grammar as long as it is installed in the system. That is, first obtain the file name of the pre-stored file and/or the application name of the pre-installed application; then store the obtained file name and/or application name to the specified location (ie, load the file name and/or application name) To the grammar, when the voice recognizes the file name stored in the specified location (ie, the file name loaded into the grammar), the file corresponding to the file name recognized by the voice is called, and when the voice recognition is stored in the specified location When the application name (that is, the name of the application loaded into the grammar) is called, the application corresponding to the application name recognized by the voice is called.
其中,当用户输入了投影仪未预置的指令,投影仪会提示用户输入错误,进入重新输入指令流程。Wherein, when the user inputs an instruction that is not preset by the projector, the projector prompts the user to input an error and enters a re-input instruction flow.
当视频开始播放,用户可以通过语音打断技术全程语音控制视频播放,即可在视频播放期间任何时候输入语音指令。用户可说视频控制指令如:调高音量、调低音量、暂停、继续播放、退出播放等。When the video starts playing, the user can control the video playback through the voice interrupt technology, and the voice command can be input at any time during the video playback. The user can say that the video control commands are such as: increase the volume, turn down the volume, pause, resume playing, and exit playback.
当PPT开始演示,用户可以通过语音打断技术全程语音控制PPT播放,即可在PPT演示期间任何时候输入语音指令。用户可说PPT控制指令如:上一页、下一页、首页、尾页、退出全屏、全屏播放等。When the PPT starts the demonstration, the user can control the PPT play through the voice interrupt technology, and the voice command can be input at any time during the PPT presentation. Users can say PPT control commands such as: previous page, next page, first page, last page, exit full screen, full screen playback, etc.
支持***设备语音控制投影仪。***设备如有线耳机,蓝牙耳机,连上投影仪以后,***设备可作为语音输入设备控制投影仪。如用户可以站在离投影仪较远的地方通过蓝牙耳机声控投影仪。Support for peripheral voice control projectors. Peripherals such as wired headsets, Bluetooth headsets, and connected to the projector, peripherals can be used as voice input devices to control the projector. If the user can stand farther from the projector, the projector can be voice-activated through the Bluetooth headset.
整个流程投影仪在投影仪屏幕上会有用户界面(User Interface,简称为UI)提示,同时会有人声或提示音告诉用户何时开始输入指令,输入结束,输入错误等。The entire process projector will have a user interface (User Interface, UI for short) prompt on the projector screen, and there will be a vocal or prompt tone to tell the user when to start inputting instructions, input end, input error, and so on.
以下将结合附图对本发明实施例的的方案进行较为详尽的说明。The solution of the embodiment of the present invention will be described in detail below with reference to the accompanying drawings.
图6是根据本发明实施例的声控投影仪***的结构框图,如图6所示。该***主要由3部分组成,包括低功耗唤醒芯片模块(对应于图6中的Low-power Wakeup DSP Chip,同上述的低功耗唤醒芯片)、识别和播报引擎 模块(对应于图6中的Voice Engine,同上述的语音引擎)和标准流组件模块(对应于图6中的Standard Flow Component,同上述的标准流组件)。每个模块的主要功能如下:6 is a block diagram showing the structure of a voice-activated projector system according to an embodiment of the present invention, as shown in FIG. The system consists of three parts, including a low-power wake-up chip module (corresponding to the Low-power Wakeup DSP Chip in Figure 6, the low-power wake-up chip described above), the identification and broadcast engine. Modules (corresponding to the Voice Engine in Figure 6, as described above for the speech engine) and standard stream component modules (corresponding to the Standard Flow Component in Figure 6, as well as the standard stream components described above). The main functions of each module are as follows:
低功耗唤醒芯片模块,属于硬件设备,设置为在投影仪休眠时监控用户的唤醒操作;识别和播报引擎模块,是语音识别和人声播报的核心模块,负责对搜集到的音频进行识别,并语音合成播报内容;标准流组件模块,实现每个功能点,如视频播放语音控制,打开应用语音控制,每个功能点以流的形式存在,有自己的生命周期。The low-power wake-up chip module belongs to a hardware device and is configured to monitor a user's wake-up operation while the projector is sleeping; the recognition and broadcast engine module is a core module for voice recognition and voice announcement, and is responsible for recognizing the collected audio. And voice synthesis broadcast content; standard stream component module, to achieve each function point, such as video playback voice control, open application voice control, each function point exists in the form of flow, has its own life cycle.
图7是根据本发明实施例的声控投影仪***的低功耗唤醒流程图,如图7所示,该流程包括如下步骤S702至S712:7 is a low power consumption wake-up flowchart of a voice-activated projector system according to an embodiment of the present invention. As shown in FIG. 7, the flow includes the following steps S702 to S712:
步骤S702,用户输入唤醒词;Step S702, the user inputs an awakening word;
步骤S704,低功耗唤醒芯片在投影仪休眠时持续监控用户语音输入;Step S704, the low-power wake-up chip continuously monitors the user voice input while the projector is sleeping;
步骤S706,当用户的语音输入于预置培训的唤醒词一致时,低功耗唤醒芯片唤醒CPU,并向驱动层上报唤醒事件;Step S706, when the user's voice input is consistent with the wake-up words of the preset training, the low-power wake-up chip wakes up the CPU, and reports a wake-up event to the driver layer;
步骤S708,随后框架层通过消息的方式通知应用层;Step S708, the framework layer then notifies the application layer by means of a message;
步骤S710,应用层调起语音识别流程;Step S710, the application layer adjusts the voice recognition process;
步骤S712,结束。Step S712, ending.
该低功耗唤醒芯片为完全解放用户双手,使语音控制流程成为闭环操作成为可能。鉴于低功耗唤醒芯片属于硬件配置,在某些投影仪机型无法配置,所以本***在低配置投影仪上支持裁剪该模块,用户可通过其他方式,如***设备,投影仪按键来唤醒。The low-power wake-up chip makes it possible to completely liberate the user's hands and make the voice control process a closed-loop operation. Since the low-power wake-up chip is a hardware configuration and cannot be configured on some projector models, the system supports cropping the module on a low-profile projector, and the user can wake up by other means such as peripheral devices and projector buttons.
图8是根据本发明实施例的声控投影仪***的工作状态图,下面结合图8进行说明:FIG. 8 is a diagram showing an operation state of a voice-activated projector system according to an embodiment of the present invention, which will be described below with reference to FIG. 8:
当设备初始化完成并被唤醒后,设备进入录音状态,等待用户输入语音指令。用户此时有两种可能操作:一是没有发声,识别流程超时结束;一是有发声被投影仪录入,进入后续的识别状态。进入识别状态后,如果识别到用户说了正确的指令,就会分发到相应的标准流组件进行处理;如果为不可识别的指令,就提示用户输入错误,重新输入或退出。 When the device is initialized and wakes up, the device enters the recording state and waits for the user to input a voice command. The user has two possible operations at this time: one is that there is no sound, and the recognition process is timed out; the other is that the sound is recorded by the projector and enters the subsequent recognition state. After entering the recognition state, if it is recognized that the user has said the correct instruction, it will be distributed to the corresponding standard stream component for processing; if it is an unrecognizable instruction, the user is prompted to input an error, re-enter or exit.
其中录音打断是一种在稳态背景音下的特殊识别方式。如视频播放时的语音控制。此时录音持续开启检测用户语音输入并针对稳态背景音消噪。如果检测到和预置的动态指令一致的语音输入,引擎会返回识别结果告知标准组件流进行相应操作。继续检测下一次语音输入,录音打断不会停止直到用户退出视频播放。Among them, recording interruption is a special recognition method under steady-state background sound. Such as voice control during video playback. At this point, the recording is continuously turned on to detect the user's voice input and denoise for the steady-state background sound. If a speech input consistent with the preset dynamic command is detected, the engine will return the recognition result to inform the standard component stream to perform the corresponding operation. Continue to detect the next voice input, the recording interruption will not stop until the user exits the video playback.
本发明实施例中,针对投影仪设备手动操作繁琐,用户体验差,缺乏趣味性的问题,提出声控投影仪***以解决该问题。该***通过硬件和软件配合使用户能通过声音唤醒投影仪并发送声音指令。整个流程能实现闭环操作,即整个环节都通过声控完成,不需要手动操作,解放了用户的双手,大大增强了投影仪的使用效率和趣味性。该***支持裁剪,可根据需要裁剪功能和硬件配置。In the embodiment of the present invention, for the problem that the manual operation of the projector device is cumbersome, the user experience is poor, and the interest is lacking, a voice-activated projector system is proposed to solve the problem. The system works in hardware and software to enable the user to wake up the projector with sound and send voice commands. The whole process can realize closed-loop operation, that is, the whole link is completed by voice control, and no manual operation is required, which liberates the user's hands and greatly enhances the use efficiency and fun of the projector. The system supports cropping and tailors features and hardware configurations as needed.
需要说明的是,上述模块是可以通过软件或硬件来实现的,对于后者,可以通过以下方式实现,但不限于此:上述模块均位于同一处理器中;或者,上述模块分别位于多个处理器中。It should be noted that the above modules may be implemented by software or hardware. For the latter, the foregoing may be implemented by, but not limited to, the foregoing modules are all located in the same processor; or, the modules are respectively located in multiple processes. In the device.
本发明实施例还提供了一种存储介质。可选地,在本实施例中,上述存储介质可以被设置为存储用于执行以下步骤的程序代码:The embodiment of the invention further provides a storage medium. Optionally, in the embodiment, the foregoing storage medium may be configured to store program code for performing the following steps:
S1,确定投影仪设备进入语音识别状态,其中,该语音识别状态为根据语音指令执行操作的状态;S1, determining that the projector device enters a voice recognition state, wherein the voice recognition state is a state in which an operation is performed according to the voice instruction;
S2,接收输入的语音指令;S2, receiving an input voice instruction;
S3,识别接收的语音指令,执行与上述语音指令对应的操作。S3, identifying the received voice command, and performing an operation corresponding to the voice command.
可选地,在本实施例中,上述存储介质可以包括但不限于:U盘、只读存储器(Read-Only Memory,简称为ROM)、随机存取存储器(Random Access Memory,简称为RAM)、移动硬盘、磁碟或者光盘等可以存储程序代码的介质。Optionally, in the embodiment, the foregoing storage medium may include, but is not limited to, a USB flash drive, a Read-Only Memory (ROM), and a Random Access Memory (RAM). A medium that can store program code, such as a hard disk, a disk, or a disc.
可选地,在本实施例中,处理器根据存储介质中已存储的程序代码执行步骤S1-S3。Optionally, in the embodiment, the processor performs steps S1-S3 according to the stored program code in the storage medium.
本领域普通技术人员可以理解上述方法中的全部或部分步骤可通过程序 来指令相关硬件(例如处理器)完成,所述程序可以存储于计算机可读存储介质中,如只读存储器、磁盘或光盘等。可选地,上述实施例的全部或部分步骤也可以使用一个或多个集成电路来实现。相应地,上述实施例中的模块/单元可以采用硬件的形式实现,例如通过集成电路来实现其相应功能,也可以采用软件功能模块的形式实现,例如通过处理器执行存储于存储器中的程序指令来实现其相应功能。本申请不限制于任何特定形式的硬件和软件的结合。One of ordinary skill in the art will appreciate that all or part of the steps in the above methods may be passed through the program. The instructions are related to hardware (eg, a processor) that can be stored in a computer readable storage medium, such as a read only memory, a magnetic disk, or an optical disk. Alternatively, all or part of the steps of the above embodiments may also be implemented using one or more integrated circuits. Correspondingly, the modules/units in the above embodiments may be implemented in the form of hardware, for example, by implementing integrated functions to implement their respective functions, or may be implemented in the form of software function modules, for example, executing program instructions stored in the memory by the processor. To achieve its corresponding function. This application is not limited to any specific combination of hardware and software.
工业实用性Industrial applicability
通过本发明实施例,采用确定投影仪设备进入语音识别状态,其中,所述语音识别状态为根据语音指令执行操作的状态;接收输入的语音指令;识别接收的所述语音指令,执行与所述语音指令对应的操作,解决了相关技术中存在的手动操作投影仪时操作繁琐,导致用户体验差,达到了降低投影仪操作复杂度,提高用户体验的效果。 According to the embodiment of the present invention, determining that the projector device enters a voice recognition state, wherein the voice recognition state is a state in which an operation is performed according to a voice instruction; receiving an input voice command; identifying the received voice command, performing the The operation corresponding to the voice instruction solves the cumbersome operation when the projector is manually operated in the related art, resulting in poor user experience, achieving the effect of reducing the complexity of the projector operation and improving the user experience.

Claims (11)

  1. 一种语音控制方法,包括:A voice control method includes:
    确定投影仪设备进入语音识别状态,其中,所述语音识别状态为根据语音指令执行操作的状态;Determining that the projector device enters a voice recognition state, wherein the voice recognition state is a state in which an operation is performed according to the voice instruction;
    接收输入的语音指令;Receiving an input voice command;
    识别接收的所述语音指令,执行与所述语音指令对应的操作。Identifying the received voice command and performing an operation corresponding to the voice command.
  2. 根据权利要求1所述的方法,其中,所述确定投影仪设备进入语音识别状态包括:The method of claim 1 wherein said determining that the projector device enters the speech recognition state comprises:
    确定所述投影仪设备通过接收唤醒指令的方式,进入所述语音识别状态,其中,所述唤醒指令包括以下一种或几种:Determining that the projector device enters the voice recognition state by receiving a wake-up command, where the wake-up command includes one or more of the following:
    预定轨迹的触控信号、语音信号、按键信号。A touch signal, a voice signal, and a button signal of a predetermined track.
  3. 根据权利要求1所述的方法,其中,所述识别接收的所述语音指令,执行与所述语音指令对应的操作包括:The method of claim 1, wherein the identifying the received voice instruction and performing an operation corresponding to the voice instruction comprises:
    判断是否预先存储有与所述语音指令匹配的指令;Determining whether an instruction matching the voice instruction is stored in advance;
    在判断结果为预先存储有与所述语音指令匹配的指令的情况下,执行与所述语音指令对应的操作。In the case where the result of the judgment is that an instruction matching the voice command is stored in advance, an operation corresponding to the voice command is performed.
  4. 根据权利要求1所述的方法,其中,所述识别接收的所述语音指令,执行与所述语音指令对应的操作之前,还包括:The method of claim 1, wherein the recognizing the received voice instruction to perform an operation corresponding to the voice instruction further comprises:
    获取预先存储的文件的文件名称和/或预先安装的应用的应用名称;Obtaining the file name of the pre-stored file and/or the application name of the pre-installed application;
    将所获取的文件名称和/或应用名称存储至指定位置,其中,当语音识别到存储在指定位置的文件名称时,调用语音识别到的文件名称对应的文件,当语音识别到存储在指定位置的应用名称时,调用语音识别到的应用名称对应的应用。The obtained file name and/or application name is stored to the specified location, wherein when the voice recognizes the file name stored in the specified location, the file corresponding to the file name recognized by the voice is called, and when the voice recognition is stored in the specified location When the application name is called, the application corresponding to the application name recognized by the speech is called.
  5. 根据权利要求1至4中任一项所述的方法,其中,所述接收输入的语音指令包括: The method according to any one of claims 1 to 4, wherein the receiving the input voice command comprises:
    所述投影仪设备通过***设备接收所述语音指令,其中,所述***设备包括以下一种或几种:The projector device receives the voice command through a peripheral device, wherein the peripheral device includes one or more of the following:
    有线耳机、蓝牙耳机。Wired earphones, Bluetooth headsets.
  6. 一种语音控制装置,包括:A voice control device comprising:
    确定模块,设置为确定投影仪设备进入语音识别状态,其中,所述语音识别状态为根据语音指令执行操作的状态;a determining module, configured to determine that the projector device enters a voice recognition state, wherein the voice recognition state is a state in which an operation is performed according to the voice instruction;
    接收模块,设置为接收输入的语音指令;a receiving module configured to receive an input voice command;
    执行模块,设置为识别接收的所述语音指令,执行与所述语音指令对应的操作。An execution module is configured to identify the received voice command and perform an operation corresponding to the voice command.
  7. 根据权利要求6所述的装置,其中,所述确定模块包括:The apparatus of claim 6 wherein said determining module comprises:
    确定单元,设置为确定所述投影仪设备通过接收唤醒指令的方式,进入所述语音识别状态,其中,所述唤醒指令包括以下一种或几种:a determining unit, configured to determine that the projector device enters the voice recognition state by receiving a wake-up command, where the wake-up command includes one or more of the following:
    预定轨迹的触控信号、语音信号、按键信号。A touch signal, a voice signal, and a button signal of a predetermined track.
  8. 根据权利要求6所述的装置,其中,所述执行模块包括:The apparatus of claim 6 wherein said execution module comprises:
    判断单元,设置为判断是否预先存储有与所述语音指令匹配的指令;a determining unit, configured to determine whether an instruction matching the voice command is stored in advance;
    执行单元,设置为在所述判断单元的判断结果为预先存储有与所述语音指令匹配的指令的情况下,执行与所述语音指令对应的操作。The execution unit is configured to perform an operation corresponding to the voice instruction in a case where the determination result of the determination unit is that an instruction matching the voice instruction is stored in advance.
  9. 根据权利要求6所述的装置,还包括:The apparatus of claim 6 further comprising:
    获取模块,设置为获取预先存储的文件的文件名称和/或预先安装的应用的应用名称;Obtaining a module, configured to obtain a file name of a pre-stored file and/or an application name of a pre-installed application;
    存储模块,设置为将所述文件名称和/或所述应用名称存储至指定位置,a storage module, configured to store the file name and/or the application name to a specified location,
    所述执行单元,还设置为在语音识别到存储在指定位置中的文件名称时,调用语音识别到的文件名称对应的文件,在语音识别到存储在指定位置中的应用名称时,调用语音识别到的应用名称对应的应用。The execution unit is further configured to, when the voice recognition reaches the file name stored in the specified location, invoke a file corresponding to the file name recognized by the voice, and invoke voice recognition when the voice recognition identifies the application name stored in the specified location. The application corresponding to the application name.
  10. 根据权利要求6至9中任一项所述的装置,其中,所述接收模块接 收输入的语音指令包括:The apparatus according to any one of claims 6 to 9, wherein the receiving module is connected The voice commands input are:
    通过投影仪设备支持的***设备接收所述语音指令,其中,所述***设备包括以下一种或几种:The voice command is received by a peripheral device supported by the projector device, wherein the peripheral device includes one or more of the following:
    有线耳机、蓝牙耳机。Wired earphones, Bluetooth headsets.
  11. 一种投影仪设备,至少包括:低功耗唤醒芯片、语音引擎和标准流组件,其中,A projector device comprising at least: a low power wake-up chip, a speech engine, and a standard stream component, wherein
    所述低功耗唤醒芯片设置为根据唤醒指令进入语音识别状态,其中,所述语音识别状态为根据语音指令执行操作的状态;The low power wake-up chip is configured to enter a voice recognition state according to the wake-up instruction, wherein the voice recognition state is a state in which an operation is performed according to the voice instruction;
    所述语音引擎设置为接收输入的语音指令;The speech engine is configured to receive an input voice command;
    所述标准流组件设置为识别接收的所述语音指令,执行与所述语音指令对应的操作。 The standard stream component is configured to identify the received voice command and perform an operation corresponding to the voice command.
PCT/CN2016/090170 2015-07-17 2016-07-15 Voice control method and device, and projector apparatus WO2017012511A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510424654.1A CN106356059A (en) 2015-07-17 2015-07-17 Voice control method, device and projector
CN201510424654.1 2015-07-17

Publications (1)

Publication Number Publication Date
WO2017012511A1 true WO2017012511A1 (en) 2017-01-26

Family

ID=57833698

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/090170 WO2017012511A1 (en) 2015-07-17 2016-07-15 Voice control method and device, and projector apparatus

Country Status (2)

Country Link
CN (1) CN106356059A (en)
WO (1) WO2017012511A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110908718A (en) * 2018-09-14 2020-03-24 上海擎感智能科技有限公司 Face recognition activated voice navigation method, system, storage medium and equipment

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106847285B (en) * 2017-03-31 2020-05-05 上海思依暄机器人科技股份有限公司 Robot and voice recognition method thereof
WO2018176387A1 (en) * 2017-03-31 2018-10-04 深圳市红昌机电设备有限公司 Voice control method and system for winding-type coil winder
CN107180631A (en) * 2017-05-24 2017-09-19 刘平舟 Voice interaction method and device
CN107360327B (en) 2017-07-19 2021-05-07 腾讯科技(深圳)有限公司 Speech recognition method, apparatus and storage medium
CN107680592B (en) * 2017-09-30 2020-09-22 惠州Tcl移动通信有限公司 Mobile terminal voice recognition method, mobile terminal and storage medium
CN107920240A (en) * 2017-12-27 2018-04-17 兴天通讯技术有限公司 A kind of smart projector of achievable speech control
CN108319171B (en) * 2018-02-09 2020-08-07 广景视睿科技(深圳)有限公司 Dynamic projection method and device based on voice control and dynamic projection system
CN108566634B (en) * 2018-03-30 2021-06-25 深圳市冠旭电子股份有限公司 Method and device for reducing continuous awakening delay of Bluetooth sound box and Bluetooth sound box
CN110505431A (en) * 2018-05-17 2019-11-26 视联动力信息技术股份有限公司 A kind of control method and device of terminal
CN108920128B (en) * 2018-07-12 2021-10-08 思必驰科技股份有限公司 Operation method and system of presentation
CN109375460B (en) * 2018-12-27 2021-03-23 成都极米科技股份有限公司 Control method of intelligent projector and intelligent projector
CN110322873B (en) 2019-07-02 2022-03-01 百度在线网络技术(北京)有限公司 Voice skill quitting method, device, equipment and storage medium
CN110517697A (en) * 2019-08-20 2019-11-29 中信银行股份有限公司 Prompt tone intelligence cutting-off device for interactive voice response
CN110992960A (en) * 2019-12-18 2020-04-10 Oppo广东移动通信有限公司 Control method, control device, electronic equipment and storage medium
CN113160806A (en) * 2020-01-07 2021-07-23 京东方科技集团股份有限公司 Projection system and control method thereof
CN111467198B (en) * 2020-04-28 2022-12-09 天赋光彩医疗科技(苏州)有限公司 Eyesight improving and consciousness restoring instrument
CN113763944B (en) * 2020-09-29 2024-06-04 浙江思考者科技有限公司 AI video cloud interaction system based on pseudo person logic knowledge base
CN112530430A (en) * 2020-11-30 2021-03-19 北京百度网讯科技有限公司 Vehicle-mounted operating system control method and device, earphone, terminal and storage medium
CN113127105B (en) * 2021-03-18 2022-06-10 福建马恒达信息科技有限公司 Excel automatic voice tool calling method
CN113157350B (en) * 2021-03-18 2022-06-07 福建马恒达信息科技有限公司 Office auxiliary system and method based on voice recognition
CN114097660A (en) * 2021-11-08 2022-03-01 广州回味源蛋类食品有限公司 Duck egg screening device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6230137B1 (en) * 1997-06-06 2001-05-08 Bsh Bosch Und Siemens Hausgeraete Gmbh Household appliance, in particular an electrically operated household appliance
CN101740028A (en) * 2009-11-20 2010-06-16 四川长虹电器股份有限公司 Voice control system of household appliance
CN103885350A (en) * 2014-03-19 2014-06-25 四川长虹电器股份有限公司 Method and device for voice control over household appliances
CN104216351A (en) * 2014-02-10 2014-12-17 美的集团股份有限公司 Household appliance voice control method and system
CN104538030A (en) * 2014-12-11 2015-04-22 科大讯飞股份有限公司 Control system and method for controlling household appliances through voice

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101648077A (en) * 2008-08-11 2010-02-17 巍世科技有限公司 Voice command game control device and method thereof
CN103971683A (en) * 2013-01-24 2014-08-06 上海果壳电子有限公司 Voice control method and system and handheld device
CN104599669A (en) * 2014-12-31 2015-05-06 乐视致新电子科技(天津)有限公司 Voice control method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6230137B1 (en) * 1997-06-06 2001-05-08 Bsh Bosch Und Siemens Hausgeraete Gmbh Household appliance, in particular an electrically operated household appliance
CN101740028A (en) * 2009-11-20 2010-06-16 四川长虹电器股份有限公司 Voice control system of household appliance
CN104216351A (en) * 2014-02-10 2014-12-17 美的集团股份有限公司 Household appliance voice control method and system
CN103885350A (en) * 2014-03-19 2014-06-25 四川长虹电器股份有限公司 Method and device for voice control over household appliances
CN104538030A (en) * 2014-12-11 2015-04-22 科大讯飞股份有限公司 Control system and method for controlling household appliances through voice

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110908718A (en) * 2018-09-14 2020-03-24 上海擎感智能科技有限公司 Face recognition activated voice navigation method, system, storage medium and equipment

Also Published As

Publication number Publication date
CN106356059A (en) 2017-01-25

Similar Documents

Publication Publication Date Title
WO2017012511A1 (en) Voice control method and device, and projector apparatus
JP6926241B2 (en) Hot word recognition speech synthesis
TWI576825B (en) A voice recognition system of a robot system ?????????????????????????????? and method thereof
TWI525532B (en) Set the name of the person to wake up the name for voice manipulation
US9466286B1 (en) Transitioning an electronic device between device states
WO2017071182A1 (en) Voice wakeup method, apparatus and system
US9047857B1 (en) Voice commands for transitioning between device states
WO2020029500A1 (en) Voice command customization method, device, apparatus, and computer storage medium
CN112201246B (en) Intelligent control method and device based on voice, electronic equipment and storage medium
CN110914828B (en) Speech translation method and device
WO2019007245A1 (en) Processing method, control method and recognition method, and apparatus and electronic device therefor
US9293134B1 (en) Source-specific speech interactions
JP2023015054A (en) Dynamic and/or context-specific hot word for calling automation assistant
US20210241768A1 (en) Portable audio device with voice capabilities
CN104247280A (en) Voice-controlled communication connections
KR20140089863A (en) Display apparatus, Method for controlling display apparatus and Method for controlling display apparatus in Voice recognition system thereof
JP2015520409A (en) Embedded system for building space-saving speech recognition with user-definable constraints
WO2017096843A1 (en) Headset device control method and device
CN106971723A (en) Method of speech processing and device, the device for speech processes
KR20220027251A (en) Key phrase detection with audio watermarking
CN109817220A (en) Audio recognition method, apparatus and system
WO2016078214A1 (en) Terminal processing method, device and computer storage medium
KR20200052638A (en) Electronic apparatus and method for voice recognition
JP7173049B2 (en) Information processing device, information processing system, information processing method, and program
WO2019239656A1 (en) Information processing device and information processing method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16827202

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16827202

Country of ref document: EP

Kind code of ref document: A1