WO2020102991A1 - 唤醒设备的方法、装置、存储介质及电子设备 - Google Patents
唤醒设备的方法、装置、存储介质及电子设备Info
- Publication number
- WO2020102991A1 WO2020102991A1 PCT/CN2018/116493 CN2018116493W WO2020102991A1 WO 2020102991 A1 WO2020102991 A1 WO 2020102991A1 CN 2018116493 W CN2018116493 W CN 2018116493W WO 2020102991 A1 WO2020102991 A1 WO 2020102991A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- electronic device
- wake
- recognition
- voice
- voice information
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 51
- 230000002618 waking effect Effects 0.000 title claims abstract description 35
- 230000000694 effects Effects 0.000 claims description 48
- 238000001514 detection method Methods 0.000 claims description 47
- 230000005236 sound signal Effects 0.000 claims description 42
- 238000004590 computer program Methods 0.000 claims description 11
- 238000013528 artificial neural network Methods 0.000 claims description 6
- 239000000203 mixture Substances 0.000 claims description 6
- 238000003909 pattern recognition Methods 0.000 claims description 5
- 238000010586 diagram Methods 0.000 description 8
- 239000002699 waste material Substances 0.000 description 8
- 239000000306 component Substances 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 6
- 238000012795 verification Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 5
- 230000001960 triggered effect Effects 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 3
- 238000012544 monitoring process Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000008358 core component Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
Definitions
- the present application belongs to the technical field of electronic equipment, and particularly relates to a method, device, storage medium, and electronic equipment for waking up the equipment.
- wake word recognition and voiceprint recognition technology can be used to wake up electronic devices, where wake word recognition is to detect whether the user's voice contains a preset to wake up the electronic device vocabulary. For example, after the wake-up word recognition and voiceprint recognition of the user's voice are both verified, the electronic device can light up the display screen and unlock the display screen.
- the related art uses wake word recognition and voiceprint recognition to wake up the electronic device, it is easy to waste the power consumption of the electronic device.
- Embodiments of the present application provide a method, an apparatus, a storage medium, and an electronic device for waking up a device, which can reduce waste of power consumption of the electronic device.
- this embodiment provides a method for waking up a device, including:
- the application processor of the electronic device performs a second wake-up word recognition and a second sound on the voice information Pattern recognition
- the electronic device is woken up.
- this embodiment provides an apparatus for waking up a device, including:
- Acquisition module for acquiring voice information
- the first recognition module is used for performing the first wake word recognition and the first voiceprint recognition on the voice information through the digital signal processor of the electronic device;
- a second recognition module configured to perform a second wake-up on the voice information through the application processor of the electronic device if the first wake-up word recognition and the first voiceprint recognition are both verified Word recognition and second voiceprint recognition;
- the wake-up module is configured to wake up the electronic device if both the second wake-up word recognition and the second voiceprint recognition are verified.
- this embodiment provides a storage medium on which a computer program is stored, wherein, when the computer program is executed on a computer, the computer is caused to execute the method for waking up the device provided in this embodiment.
- this embodiment provides an electronic device including a memory and a processor, and the processor is used to execute a computer program stored in the memory by executing:
- the application processor of the electronic device performs a second wake-up word recognition and a second sound on the voice information Pattern recognition
- the electronic device is woken up.
- the electronic device may perform the first voiceprint recognition on the voice information while performing the first wake-up word recognition on the obtained user's voice information. Only when the first wake-up word recognition and the first voiceprint recognition are verified, will the electronic device be triggered to perform wake-up word recognition and voiceprint recognition again, and the second wake-up word recognition and voiceprint recognition are both verified The wake-up operation is performed only when it passes. If the first voiceprint recognition fails, the electronic device will not perform the second wake-up word recognition and voiceprint recognition, thereby effectively avoiding the non-master user speaking wake-up words in the related art and recognizing the first wake-up word Triggers the electronic device to perform the second wake-up word recognition and voiceprint recognition for the second time, resulting in a waste of power consumption.
- FIG. 1 is a schematic flowchart of a method for waking up a device according to an embodiment of the present application.
- FIG. 2 is another schematic flowchart of a method for waking up a device according to an embodiment of the present application.
- 3 to 6 are schematic diagrams of scenarios of a method for waking up a device provided by an embodiment of the present application.
- FIG. 7 is a schematic structural diagram of an apparatus for awakening a device according to an embodiment of the present application.
- FIG. 8 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
- FIG 9 is another schematic structural diagram of an electronic device provided by an embodiment of the present application.
- the execution subject of the embodiments of the present application may be an electronic device such as a smart phone or a tablet computer.
- FIG. 1 is a schematic flowchart of a method for waking up a device according to an embodiment of the present application.
- the flowchart may include:
- wake word recognition and voiceprint recognition technology can be used to wake up electronic devices, where wake word recognition is to detect whether the user's voice contains a preset to wake up the electronic device vocabulary. For example, after the wake-up word recognition and voiceprint recognition of the user's voice are both verified, the electronic device can light up the display screen and unlock the display screen.
- the related art uses wake word recognition and voiceprint recognition to wake up the electronic device, it is easy to waste the power consumption of the electronic device.
- the related technology wakes up the electronic device, it firstly recognizes the acquired speech by a wake-up word to detect whether the speech contains the wake-up word preset by the owner. If the voice contains a preset wake-up word, the electronic device will perform wake-up word recognition and voiceprint recognition again. If the current user is detected as the owner by voiceprint recognition, and the second wake-up word recognition determines that the voice contains a preset wake-up word, the electronic device will be woken up, that is, the electronic device will light up and unlock. However, if the non-owner user knows the preset wake-up word and speaks the wake-up word to the electronic device, then the electronic device will also perform voiceprint recognition and second wake-up word recognition. However, since the current user is a non-host user, the electronic device will not be awakened. In other words, the fact that the electronic device recognizes the voiceprint and the second wake-up word recognition is unnecessary, which wastes power consumption of the electronic device.
- the electronic device may first obtain the voice information A spoken by the current user.
- the digital signal processor of the electronic device performs the first wake-up word recognition and the first voiceprint recognition on the voice information.
- the electronic device may perform the first wake word recognition and the first voiceprint recognition on the voice information A through its digital signal processor (DSP).
- DSP digital signal processor
- the digital signal processor is a microprocessor that processes signals, and it is the core component of the voice encoder and modem.
- the digital signal processor has the advantages of small size, low power consumption and fast operation speed.
- Wake word recognition refers to whether the electronic device detects whether a certain voice contains preset vocabulary content used to wake up the electronic device. For example, the wake-up word preset by the owner for waking up the electronic device is "Xiaoou wakes up”. Then, when the user's voice is obtained, the electronic device can first convert the user's voice into text, and detect whether the converted text contains the wake-up word "Xiaoou wakes up", that is, the electronic device can detect the user's voice Does it contain the wake-up word "Xiaoou wakes up”. If the user's voice contains the wake-up word "Xiaoou wakes up", the electronic device may determine that the wake-up word recognition verification has passed. If the user's voice does not contain the wake-up word "Xiaoou wakes up", the electronic device may determine that the wake-up word recognition verification fails.
- Voiceprint recognition is a biometric technology that uses voice to identify speakers. For example, if the voiceprint feature information of the owner is pre-stored in the electronic device, then after acquiring the voice information of the current user, the electronic device can extract the voiceprint feature information of the current user from it. After that, the electronic device can match the voiceprint feature information of the current user with the prestored voiceprint feature information of the owner. If the two match successfully, the electronic device can determine that the current user is the owner, that is, the voiceprint recognition verification is passed. If the two do not match, the electronic device can determine that the current user is not the owner, that is, the voiceprint recognition verification fails.
- the electronic device After performing the first wake-up word recognition and the first voiceprint recognition on the voice information A acquired in 101, the electronic device can detect whether both the first wake-up word recognition and the first voiceprint recognition are verified.
- both the first wake word recognition and the first voiceprint recognition are verified, then enter 103.
- the electronic device detects that the voice spoken by the current user contains the preset wake-up word "Xiaoou wakes up", and the electronic device determines that the current user is the owner through the first voiceprint recognition. In this case, you can enter 103.
- the electronic device can perform other operations. For example, when the first voiceprint recognition is not verified, the electronic device can perform other operations without triggering the second wake word recognition and second voiceprint recognition operations.
- the application processor of the electronic device performs the second wake-up word recognition and the second voiceprint recognition on the voice information.
- the electronic device uses the digital signal processor to verify the first wake-up word recognition and the first voiceprint recognition of the current user's voice information A, then the electronic device can be triggered to use its application processor (Application Processor, AP ) Perform a second wake-up word recognition and a second voiceprint recognition on the voice information A of the current user.
- Application Processor Application Processor
- the second wake-up word recognition is also to detect whether the current user's voice information A contains a preset wake-up word.
- the second wake-up word recognition also detects whether the current user's voice information A contains the wake-up word "Xiaoou wakes up" preset by the owner.
- the second voiceprint recognition is also to detect whether the current user is the owner.
- the resources and operations available to the application processor can be stronger than that of the digital signal processor, the recognition result of the application processor will be more accurate.
- the power consumption of the application processor is also greater than that of the digital signal processor.
- the electronic device After performing the second wake-up word recognition and the second voiceprint recognition on the voice information A obtained in 101, the electronic device can detect whether both the second wake-up word recognition and the second voiceprint recognition are verified.
- both the second wake word recognition and the second voiceprint recognition are verified, then go to 104.
- the electronic device detects that the voice spoken by the current user does contain the preset wake-up word "Xiaoou wakes up", and the electronic device determines that the current user is indeed the owner through the second voiceprint recognition. In this case, you can enter 104.
- the electronic device can perform other operations.
- the electronic device can perform the wake-up operation. For example, the electronic device can light up the display screen and unlock the display screen.
- the electronic device may perform the first voiceprint recognition on the voice information while performing the first wake-up word recognition on the obtained voice information of the user. Only when the first wake-up word recognition and the first voiceprint recognition are verified, will the electronic device be triggered to perform wake-up word recognition and voiceprint recognition again, and the second wake-up word recognition and voiceprint recognition are both verified The wake-up operation is performed only when it passes.
- the electronic device will not perform the second wake-up word recognition and voiceprint recognition, thereby effectively avoiding the non-master user speaking wake-up words in the related art and recognizing the first wake-up word recognition Triggers the electronic device to perform the second wake-up word recognition and voiceprint recognition for the second time, resulting in a waste of power consumption.
- FIG. 2 is another schematic flowchart of a method for waking up a device according to an embodiment of the present application.
- the process may include:
- the electronic device acquires a sound signal of the surrounding environment.
- an electronic device can collect sound signals of the surrounding environment through its microphone.
- the electronic device performs voice activity detection on the sound signal, wherein, when performing voice activity detection, the digital signal processor is in a preset low frequency mode.
- the electronic device can perform voice activity detection (Voice Activity Detection) on the sound signal through its digital signal processor DSP.
- voice activity detection Voice Activity Detection
- the digital signal processor of the electronic device may be in a preset low frequency mode.
- the voice activity detection is to distinguish the voice signal and the background noise signal from the voice signal. That is to say, in this embodiment, the voice activity detection is mainly for detecting whether the user's voice (voice) exists in the voice signal.
- the digital signal processor of the electronic device has two working modes, which are preset low frequency mode and high frequency mode, respectively. Among them, when in the low frequency mode, the clock frequency of the digital signal processor is low, the number of instructions executed by the digital signal processor per second is small, the processing capacity is low, and the corresponding power consumption is also low. When in high-frequency mode, the clock frequency of the digital signal processor is higher, the digital signal processor executes more instructions per second, the processing power is stronger, and the corresponding power consumption is also higher.
- the electronic device can control the digital signal processor to perform voice activity detection in a preset low-frequency mode.
- the electronic device may use an energy plus model algorithm to perform voice activity detection on the collected sound signals of the surrounding environment.
- the electronic device determines whether there is a user voice in the voice signal.
- the electronic device may determine whether there is a user voice in the collected sound signal according to the voice activity detection.
- the electronic device may perform other operations.
- the electronic device obtains the user's voice information.
- the electronic device determines that there is a user voice in the surrounding sound signal through voice activity detection. At this time, the electronic device can obtain corresponding voice information from the user voice. For example, the electronic device obtains the user's voice information A.
- the electronic device performs the first wake word recognition and the first voiceprint recognition on the voice information, wherein, when the first wake word recognition and the first voiceprint recognition are performed, the The digital signal processor is in a preset high frequency mode.
- the electronic device can control its digital signal processor to enter a preset high-frequency mode, and when in the high-frequency mode, the digital signal processor wakes up the voice information A Word recognition and voiceprint recognition. That is, the electronic device performs the first wake word recognition and the first voiceprint recognition on the voice information A.
- the wake-up word preset by the owner for waking up the electronic device is “Xiaoou wakes up”. Then, when the user's voice information A is obtained, the electronic device can first convert the user's voice information A into text, and detect whether the converted text contains the wake-up word "Xiaoou wakes up", that is, the electronic device can Detect whether the user's voice contains the wake-up word "Xiaoou wakes up”. If the user's voice contains the wake-up word "Xiaoou wakes up”, the electronic device may determine that the wake-up word recognition verification has passed. If the user's voice does not contain the wake-up word "Xiaoou wakes up”, the electronic device may determine that the wake-up word recognition verification fails.
- Voiceprint recognition is used to determine whether the current user is the owner.
- the electronic device may match the voiceprint feature information of the current user with the pre-stored voiceprint feature information of the owner. If the matching is successful, the electronic device can determine that the current user is the owner. If the two do not match, the electronic device can determine that the current user is not the owner.
- both the first wake word recognition and the first voiceprint recognition are verified, then proceed to 206.
- the electronic device detects that the voice spoken by the current user contains the preset wake-up word "Xiaoou wakes up", and the electronic device determines that the current user is the owner through the first voiceprint recognition. In this case, you can enter 206.
- the electronic device can perform other operations. For example, when the first voiceprint recognition is not verified, that is, the electronic device detects that the current user is not the owner through voiceprint recognition, then the electronic device can perform other operations without triggering the second wake-up word recognition and the second The operation of secondary voiceprint recognition.
- the electronic device performs the second wakeword recognition and the second voiceprint recognition on the voice information through the application processor.
- the electronic device verifies the wake word recognition and voiceprint recognition of the current user's voice information A through the digital signal processor, then the electronic device can be triggered to use its Application Processor (AP) to the current user's
- AP Application Processor
- the voice information A performs the wake-up word recognition and voiceprint recognition again, that is, the electronic device can perform the second wake-up word recognition and the second voiceprint recognition on the current user's voice information A.
- the second wake-up word recognition is also to detect whether the current user's voice information A contains a preset wake-up word.
- the second wake-up word recognition also detects whether the current user's voice information A contains the wake-up word "Xiaoou wakes up" preset by the owner.
- the second voiceprint recognition is also to detect whether the current user is the owner. Because the computing power of the application processor is stronger than that of the digital signal processor, the results of the wake-up word recognition and voiceprint recognition of the current user's voice information A through the application processor will be better than that of the digital signal processor.
- the result of the pattern recognition is more accurate, so as to further confirm whether the current user's voice information A contains a preset wake-up word and further confirm whether the current user is the owner.
- the resources and operations available to the application processor can be stronger than that of the digital signal processor, and the power consumption of the application processor is also greater than that of the digital signal processor.
- the electronic device After performing the second wake-up word recognition and the second voiceprint recognition on the speech information A acquired in 204, the electronic device can detect whether both the second wake-up word recognition and the second voiceprint recognition are verified.
- both the second wake word recognition and the second voiceprint recognition are verified, then proceed to 207.
- the electronic device detects that the voice spoken by the current user does contain the preset wake-up word "Xiaoou wakes up", and the electronic device determines that the current user is indeed the owner through the second voiceprint recognition. In this case, you can enter 207.
- the electronic device can perform other operations.
- the electronic device performs a wake-up operation.
- the electronic device can be woken up, that is, the electronic device can perform the wake-up operation. For example, the electronic device can light up the display screen and unlock the display screen.
- the electronic device performing the first wake word recognition and the first voiceprint recognition on the voice information through the digital signal processor may include:
- the electronic device performs the first wake word recognition and the first voiceprint recognition on the voice information, wherein the digital signal processor uses the first model to perform the first wake word recognition and the first voiceprint Identify.
- the electronic device performs the second wake word recognition and the second voiceprint recognition on the voice information through the application processor, which may include: through the application processor, the electronic device performs the second wake word recognition and the first Secondary voiceprint recognition, where the application processor uses the second model to perform the second wake word recognition and the second voiceprint recognition.
- the number of parameters used by the first model is smaller than the number of parameters used by the second model, so that the operating power consumption of the first model is smaller than the operating power consumption of the second model.
- the first model and the second model may be used to perform wake word recognition and voiceprint recognition on the voice information A of the current user.
- the number of parameters used by the first model is smaller than the number of parameters used by the second model, so that the operating power consumption of the first model is smaller than the operating power consumption of the second model.
- the first model uses three parameters, A, B, and C, for the wake word recognition and voiceprint recognition of the current user's voice information A.
- the second model uses 6 parameters for the wake-up word recognition and voiceprint recognition of the current user's voice information A, namely A, B, C, D, E, and F.
- both the first model and the second model can perform wake word recognition and voiceprint recognition on the current user's voice information A.
- the recognition results of the second model for wake word recognition and voiceprint recognition will be more accurate than the recognition results of the first model.
- the electronic device can use the first model to perform the first wake word recognition and the first voiceprint recognition on the current user's voice information A The first wake word recognition and the first voiceprint recognition are performed. Since the application processor can use more resources, the electronic device can use the second model to perform the second time when performing the second wake word recognition and the second voiceprint recognition on the current user's voice information A Awaken word recognition and second voiceprint recognition, so as to further confirm whether the current user's voice information A contains a preset wake word through the recognition result of the second model, and further confirm whether the current user is the owner.
- the electronic device performs the first wake word recognition and the first voiceprint recognition on the voice information, which may include:
- the electronic device uses the Gaussian mixture model to perform the first wake word recognition and the first voiceprint recognition on the voice information.
- the electronic device may use a Gaussian mixture model to perform wake-up word recognition and voiceprint recognition on the voice information A.
- the electronic device may also use other algorithm models to perform the first wake word recognition and the first voiceprint recognition on the current user's voice information A, which is not specifically limited in this embodiment.
- the electronic device performs the second wake word recognition and the second voiceprint recognition on the voice information, which may include:
- the electronic device uses the deep neural network algorithm model to perform the second wake word recognition and the second voiceprint recognition on the speech information.
- the electronic device when performing wake-up word recognition and voiceprint recognition on the current user's voice information A through the application processor, can use a deep neural network (DNN) algorithm model to perform wake-up word recognition on the voice information A Harmony recognition.
- DNN deep neural network
- the electronic device may also use other algorithm models to perform the second wake word recognition and the second voiceprint recognition on the current user's voice information A, which is not specifically limited in this embodiment.
- the electronic device may first perform voice activity detection on the collected sound signal of the surrounding environment to detect whether the sound signal includes user voice.
- the voice activity detection is performed when the digital signal processor is in a low-frequency mode. Therefore, this embodiment can perform voice activity detection with extremely low power consumption.
- the electronic device can control the digital signal processor to enter the high-frequency mode, and perform the first wake word recognition and the first voiceprint recognition on the user's voice. Only when the first wake-up word recognition and the first voiceprint recognition are both verified, the electronic device will trigger the application processor to perform the second wake-up word recognition and the second voiceprint recognition.
- the power consumption is generally 50 times to 100 times the power consumption of the digital signal processor. Therefore, this embodiment triggers the application processor to perform voice recognition and voiceprint when the digital signal processor determines that the current user is the owner and speaks the wake-up word. Identification, which can effectively avoid the waste of power consumption caused by voiceprint identification only on the application processor side in the related art.
- the wake-up word recognition and voiceprint recognition are both verified in this embodiment, the occurrence of accidentally waking up the electronic device can be reduced and the user's experience of waking up the electronic device can be improved.
- FIG. 3 to FIG. 6 are schematic diagrams of scenarios of a method for waking up a device according to an embodiment of the present application.
- the owner of the electronic device is A.
- the electronic device prompts the owner to speak the voice used as the wake-up word. For example, if the owner user A utters the voice of "Xiaoou wakes up,” then the electronic device can collect the voice through its microphone. After that, the electronic device can perform voice recognition and voiceprint recognition on the voice, thereby obtaining the wake-up word "Xiaoou wakes up" used to wake up the electronic device, and the voiceprint characteristics of the owner user A.
- the electronic device can collect sound signals of the surrounding environment through its microphone. After the sound signal is collected, the electronic device can control its digital signal processor to enter a preset low-frequency mode, and perform voice activity detection on the collected sound signal through the digital signal processor in the low-frequency mode to detect the sound signal Whether to include user voice.
- the current user utters a voice "Xiao Ou wakes up" (voice B) to the electronic device.
- voice B a voice “Xiao Ou wakes up”
- the microphone of the electronic device detects the voice activity of the voice B through the digital signal processor in the low-frequency mode, and detects that it contains the user's voice.
- the electronic device can obtain the current user's voice information from the sound signal. After that, the electronic device can control its digital signal processor to enter a preset high-frequency mode, and perform wake-up word recognition and voiceprint recognition on the acquired voice information of the current user in the high-frequency mode. That is, the electronic device performs the first wake word recognition and the first voiceprint recognition on the current user's voice information.
- the electronic device may perform the first wake-up word recognition on the current user's voice B through the digital signal processor in the high-frequency mode to detect whether the voice B is the preset wake-up word "Xiao Ou Wake Up".
- the electronic device can extract the voiceprint feature of the current user from the voice B of the current user, and match the voiceprint feature of the current user with the preset voiceprint feature of the owner user A to detect whether the current user is a machine Primary user A.
- the electronic device can re-awaken the word recognition and voiceprint recognition of the current user's voice information through its application processor. That is, the electronic device performs the second wake-up word recognition and the second voiceprint recognition on the current user's voice information.
- the second wake-up word recognition is also to detect whether the current user's voice information contains a preset wake-up word.
- the second voiceprint recognition is also to detect whether the current user is the main user A.
- the electronic device can perform a wake-up operation, for example, the electronic device can light up the on-screen display screen and unlock.
- the electronic device detects that the current user is the owner user A through the first voiceprint recognition, and detects that the voice B is the preset wake-up word "Xiaoou wakes up” through the first wake-up word recognition. Moreover, the electronic device further confirms that the current user is the owner user A through the second voiceprint recognition, and further confirms that the voice B is indeed the preset wake-up word "Xiaoou wakes up” through the second wake-up word recognition. In this case, the electronic device can light up the display screen and unlock the display screen.
- the electronic device can perform other operations without performing the second voiceprint recognition and the second wakeword recognition.
- the non-host user B speaks to the electronic device the voice C "Xiaoou wakes up" containing the wake word.
- the electronic device will not perform the second wake word recognition and the second voiceprint recognition on the speech C.
- the electronic device since the electronic device performs voiceprint recognition only when performing the second wake-up word recognition, it does not perform voiceprint recognition when performing the first wake-up word recognition. Therefore, although the voice C is spoken by the user B, since the electronic device does not perform voiceprint recognition when performing the first wake-up word recognition, the electronic device detects that the voice C contains the wake-up word "Xiaoou wakes up" It will trigger the second wake-up word recognition and voiceprint recognition of speech C. During voiceprint recognition, the electronic device will detect that the current user is not the owner user A, so the electronic device will not perform the wake-up operation. Therefore, in the related art, the second wake word recognition and voiceprint recognition performed by the electronic device actually cause a waste of power consumption of the electronic device.
- FIG. 7 is a schematic structural diagram of an apparatus for waking up a device according to an embodiment of the present application.
- the apparatus 300 for waking up the device may include: an acquisition module 301, a first recognition module 302, a second recognition module 303, and a wake-up module 304.
- the obtaining module 301 is used to obtain voice information.
- the first recognition module 302 is used to perform the first wake word recognition and the first voiceprint recognition on the voice information through the digital signal processor of the electronic device.
- the second recognition module 303 is configured to, if both the first wake word recognition and the first voiceprint recognition are verified, pass the application processor of the electronic device for the second time on the voice information Awaken word recognition and second voiceprint recognition.
- the wake-up module 304 is configured to wake up the electronic device if both the second wake-up word recognition and the second voiceprint recognition are verified.
- the acquisition module 301 can be used to:
- the voice activity detection determine whether there is a user voice in the voice signal
- the voice information is obtained.
- the acquisition module 301 can be used to:
- the digital signal processor of the electronic device performs voice activity detection on the sound signal.
- the acquisition module 301 can be used to:
- the digital signal processor of the electronic device performs voice activity detection on the sound signal, wherein, when performing the voice activity detection, the digital signal processor is in a preset low frequency mode.
- the first identification module 302 may be used to:
- the digital signal processor of the electronic device performs the first wake word recognition and the first voiceprint recognition on the voice information, wherein, when the first wake word recognition and the first voiceprint recognition are performed,
- the digital signal processor is in a preset high-frequency mode.
- the first recognition module 302 may be used to: through the digital signal processor of the electronic device, perform the first wake word recognition and the first voiceprint recognition on the voice information, wherein the number The signal processor uses the first model to perform the first wake word recognition and the first voiceprint recognition;
- the second recognition module 303 may be used to: perform a second wake word recognition and a second voiceprint recognition on the voice information through the application processor of the electronic device, wherein the application processor uses the second model Perform the second wake word recognition and the second voiceprint recognition.
- the number of parameters used by the first model is smaller than the number of parameters used by the second model, so that the operating power consumption of the first model is smaller than the operating power consumption of the second model.
- the first recognition module 302 may be used to: through the digital signal processor of the electronic device, use a Gaussian mixture model to perform the first wake word recognition and the first voiceprint recognition on the speech information.
- the second recognition module 303 may be used to: through the application processor of the electronic device, use a deep neural network algorithm model to perform a second wake word recognition and a second voiceprint on the speech information Identify.
- An embodiment of the present application provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed on a computer, the computer is caused to perform the process in the method for waking up a device as provided in this embodiment .
- An embodiment of the present application further provides an electronic device, including a memory and a processor, and the processor is used to execute a process in the method for waking up the device provided by this embodiment by calling a computer program stored in the memory.
- the aforementioned electronic device may be a mobile terminal such as a tablet computer or a smart phone.
- FIG. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
- the electronic device 400 may include a display screen 401, a memory 402, a processor 403, a microphone 404, and other components.
- a display screen 401 may include a display screen 401, a memory 402, a processor 403, a microphone 404, and other components.
- FIG. 8 does not constitute a limitation on the electronic device, and may include more or less components than those illustrated, or combine certain components, or arrange different components.
- the display screen 401 can be used to display information such as images and text.
- the memory 402 may be used to store application programs and data.
- the application program stored in the memory 402 contains executable code.
- the application program can form various functional modules.
- the processor 403 executes application programs stored in the memory 402 to execute various functional applications and data processing.
- the processor 403 is the control center of the electronic device, and uses various interfaces and lines to connect the various parts of the entire electronic device. Various functions and processing data, so as to carry out overall monitoring of electronic equipment.
- the microphone 404 may be used to collect user's voice information.
- the processor 403 in the electronic device loads the executable code corresponding to the process of one or more application programs into the memory 402 according to the following instructions, and the processor 403 executes and stores the memory in the memory The application in 402, thereby executing:
- the electronic device 500 may include a display screen 501, a memory 502, a processor 503, a microphone 504, a speaker 505, a battery 506, and other components.
- the display screen 501 can be used to display information such as images and text.
- the memory 502 may be used to store application programs and data.
- the application program stored in the memory 502 contains executable code.
- the application program can form various functional modules.
- the processor 503 executes application programs stored in the memory 502 to execute various functional applications and data processing.
- the processor 503 is the control center of the electronic device, and uses various interfaces and lines to connect the various parts of the entire electronic device. Various functions and processing data, so as to carry out overall monitoring of electronic equipment.
- the input unit 504 may be used to receive input numbers, character information, or user characteristic information (such as fingerprints), and generate keyboard, mouse, joystick, optical, or trackball signal inputs related to user settings and function control.
- user characteristic information such as fingerprints
- the output unit 505 may be used to display information input by the user or provided to the user and various graphical user interfaces of the electronic device. These graphical user interfaces may be composed of graphics, text, icons, video, and any combination thereof.
- the output unit may include a display panel.
- the processor 503 in the electronic device loads the executable code corresponding to the process of one or more application programs into the memory 502 according to the following instructions, and the processor 503 runs the stored code in the memory
- the application in 502 thus executes:
- the processor 503 may further execute: acquiring a sound signal of the surrounding environment, and performing voice activity detection on the sound signal; judging the voice activity detection Whether there is user voice in the voice signal.
- the processor 503 may execute: acquiring voice information if there is a user voice.
- the processor 503 when performing the voice activity detection on the sound signal, may perform: performing voice activity detection on the sound signal through the digital signal processor of the electronic device.
- the processor 503 when the processor 503 executes the digital signal processor passing through the electronic device to perform voice activity detection on the sound signal, it may execute: through the digital signal processor of the electronic device, Perform voice activity detection on the sound signal, wherein, when performing voice activity detection, the digital signal processor is in a preset low frequency mode.
- the processor 503 when the processor 503 executes the digital signal processor through the electronic device to perform the first wake-up word recognition and the first voiceprint recognition on the voice information, it may execute: through the electronic device The digital signal processor performs the first wake-up word recognition and the first voiceprint recognition on the voice information, wherein, when performing the first wake-up word recognition and the first voiceprint recognition, the digital The signal processor is in the preset high frequency mode.
- the processor 503 when the processor 503 executes the digital signal processor through the electronic device to perform the first wake-up word recognition and the first voiceprint recognition on the voice information, it may execute: through the electronic device A digital signal processor of the first time to recognize the first wake-up words and voiceprint recognition of the voice information, wherein the digital signal processor uses the first model to perform the first wake-up words recognition and the first Infrasound recognition.
- the processor 503 executes the application processor through the electronic device to perform the second wake word recognition and the second voiceprint recognition on the voice information, it may execute: the application through the electronic device A processor, performing a second wake word recognition and a second voiceprint recognition on the voice information, wherein the application processor uses a second model to perform the second wake word recognition and the second voiceprint recognition .
- the number of parameters used by the first model is smaller than the number of parameters used by the second model, so that the operating power consumption of the first model is smaller than the operating power consumption of the second model.
- the processor 503 when the processor 503 executes the digital signal processor through the electronic device to perform the first wake-up word recognition and the first voiceprint recognition on the voice information, it may execute: through the electronic device
- the digital signal processor uses the Gaussian mixture model to perform the first wake word recognition and the first voiceprint recognition on the speech information.
- the processor 503 when executing the application processor through the electronic device to perform the second wake word recognition and the second voiceprint recognition on the voice information, may execute:
- the application processor of the electronic device uses a deep neural network algorithm model to perform a second wake word recognition and a second voiceprint recognition on the speech information.
- the device for awakening the device provided in the embodiment of the present application and the method for awakening the device in the above embodiments belong to the same concept, and any device provided in the method embodiment of the method for awakening the device may be run on the device for awakening the device
- any device provided in the method embodiment of the method for awakening the device may be run on the device for awakening the device
- the computer program may be stored in a computer-readable storage medium, such as stored in a memory, and executed by at least one processor, during execution may include implementation of the method of waking up the device as described Example process.
- the storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM, Read Only Memory), a random access memory (RAM, Random Access Memory), and so on.
- each functional module may be integrated into one processing chip, or each module may exist alone physically, or two or more modules may be integrated into one module.
- the above integrated modules may be implemented in the form of hardware or software function modules. If the integrated module is implemented in the form of a software function module and sold or used as an independent product, it may also be stored in a computer-readable storage medium, such as a read-only memory, magnetic disk, or optical disk, etc. .
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Telephone Function (AREA)
Abstract
Description
Claims (20)
- 一种唤醒设备的方法,其中,包括:获取语音信息;通过电子设备的数字信号处理器,对所述语音信息进行第一次唤醒词识别和第一次声纹识别;若所述第一次唤醒词识别和所述第一次声纹识别均验证通过,则通过所述电子设备的应用处理器,对所述语音信息进行第二次唤醒词识别和第二次声纹识别;若所述第二次唤醒词识别和所述第二次声纹识别均验证通过,则唤醒所述电子设备。
- 根据权利要求1所述的唤醒设备的方法,其中,在所述获取语音信息之前,还包括:获取周围环境的声音信号,并对所述声音信号进行语音活动检测;根据所述语音活动检测,判断所述声音信号中是否存在用户声音;所述获取语音信息,包括:若存在用户声音,则获取语音信息。
- 根据权利要求2所述的唤醒设备的方法,其中,所述对所述声音信号进行语音活动检测,包括:通过所述电子设备的数字信号处理器,对所述声音信号进行语音活动检测。
- 根据权利要求3所述的唤醒设备的方法,其中,所述通过所述电子设备的数字信号处理器,对所述声音信号进行语音活动检测,包括:通过所述电子设备的数字信号处理器,对所述声音信号进行语音活动检测,其中,当进行所述语音活动检测时,所述数字信号处理器处于预设低频模式。
- 根据权利要求4所述的唤醒设备的方法,其中,所述通过电子设备的数字信号处理器,对所述语音信息进行第一次唤醒词识别和第一次声纹识别,包括:通过电子设备的数字信号处理器,对所述语音信息进行第一次唤醒词识别和第一次声纹识别,其中,当进行所述第一次唤醒词识别和第一次声纹识别时,所述数字信号处理器处于预设高频模式。
- 根据权利要求1所述的唤醒设备的方法,其中,所述通过电子设备的数字信号处理器,对所述语音信息进行第一次唤醒词识别和第一次声纹识别,包括:通过电子设备的数字信号处理器,对所述语音信息进行第一次唤醒词识别和第一次声纹识别,其中,所述数字信号处理器使用第一模型进行所述第一次唤醒词识别和第一次声纹识别;所述通过所述电子设备的应用处理器,对所述语音信息进行第二次唤醒词识别和第二次声纹识别,包括:通过所述电子设备的应用处理器,对所述语音信息进行第二次唤醒词识别和第二次声纹识别,其中,所述应用处理器使用第二模型进行所述第二次唤醒词识别和第二次声纹识别;其中,所述第一模型使用的参数数量小于所述第二模型使用的参数数量,以使所述第 一模型的运行功耗小于所述第二模型的运行功耗。
- 根据权利要求1所述的唤醒设备的方法,其中,所述通过电子设备的数字信号处理器,对所述语音信息进行第一次唤醒词识别和第一次声纹识别,包括:通过电子设备的数字信号处理器,使用高斯混合模型对所述语音信息进行第一次唤醒词识别和第一次声纹识别。
- 根据权利要求1所述的唤醒设备的方法,其中,所述通过所述电子设备的应用处理器,对所述语音信息进行第二次唤醒词识别和第二次声纹识别,包括:通过所述电子设备的应用处理器,使用深度神经网络算法模型对所述语音信息进行第二次唤醒词识别和第二次声纹识别。
- 一种唤醒设备的装置,其中,包括:获取模块,用于获取语音信息;第一识别模块,用于通过电子设备的数字信号处理器,对所述语音信息进行第一次唤醒词识别和第一次声纹识别;第二识别模块,用于若所述第一次唤醒词识别和所述第一次声纹识别均验证通过,则通过所述电子设备的应用处理器,对所述语音信息进行第二次唤醒词识别和第二次声纹识别;唤醒模块,用于若所述第二次唤醒词识别和所述第二次声纹识别均验证通过,则唤醒所述电子设备。
- 根据权利要求9所述的唤醒设备的装置,其中,所述获取模块还用于:获取周围环境的声音信号,并对所述声音信号进行语音活动检测;根据所述语音活动检测,判断所述声音信号中是否存在用户声音;若存在用户声音,则获取语音信息。
- 根据权利要求10所述的唤醒设备的装置,其中,所述获取模块用于:通过所述电子设备的数字信号处理器,对所述声音信号进行语音活动检测。
- 一种存储介质,其上存储有计算机程序,其中,当所述计算机程序在计算机上执行时,使得所述计算机执行如权利要求1至8中任一项所述的方法。
- 一种电子设备,包括存储器,处理器,其中,所述处理器通过调用所述存储器中存储的计算机程序,用于执行:获取语音信息;通过电子设备的数字信号处理器,对所述语音信息进行第一次唤醒词识别和第一次声纹识别;若所述第一次唤醒词识别和所述第一次声纹识别均验证通过,则通过所述电子设备的应用处理器,对所述语音信息进行第二次唤醒词识别和第二次声纹识别;若所述第二次唤醒词识别和所述第二次声纹识别均验证通过,则唤醒所述电子设备。
- 根据权利要求13所述的电子设备,其中,所述处理器用于执行:获取周围环境的声音信号,并对所述声音信号进行语音活动检测;根据所述语音活动检测,判断所述声音信号中是否存在用户声音;若存在用户声音,则获取语音信息。
- 根据权利要求14所述的电子设备,其中,所述处理器用于执行:通过所述电子设备的数字信号处理器,对所述声音信号进行语音活动检测。
- 根据权利要求15所述的电子设备,其中,所述处理器用于执行:通过所述电子设备的数字信号处理器,对所述声音信号进行语音活动检测,其中,当进行所述语音活动检测时,所述数字信号处理器处于预设低频模式。
- 根据权利要求16所述的电子设备,其中,所述处理器用于执行:通过电子设备的数字信号处理器,对所述语音信息进行第一次唤醒词识别和第一次声纹识别,其中,当进行所述第一次唤醒词识别和第一次声纹识别时,所述数字信号处理器处于预设高频模式。
- 根据权利要求13所述的电子设备,其中,所述处理器用于执行:通过电子设备的数字信号处理器,对所述语音信息进行第一次唤醒词识别和第一次声纹识别,其中,所述数字信号处理器使用第一模型进行所述第一次唤醒词识别和第一次声纹识别;通过所述电子设备的应用处理器,对所述语音信息进行第二次唤醒词识别和第二次声纹识别,其中,所述应用处理器使用第二模型进行所述第二次唤醒词识别和第二次声纹识别;其中,所述第一模型使用的参数数量小于所述第二模型使用的参数数量,以使所述第一模型的运行功耗小于所述第二模型的运行功耗。
- 根据权利要求13所述的电子设备,其中,所述处理器用于执行:通过电子设备的数字信号处理器,使用高斯混合模型对所述语音信息进行第一次唤醒词识别和第一次声纹识别。
- 根据权利要求13所述的电子设备,其中,所述处理器用于执行:通过所述电子设备的应用处理器,使用深度神经网络算法模型对所述语音信息进行第二次唤醒词识别和第二次声纹识别。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2018/116493 WO2020102991A1 (zh) | 2018-11-20 | 2018-11-20 | 唤醒设备的方法、装置、存储介质及电子设备 |
CN201880097795.9A CN112740321A (zh) | 2018-11-20 | 2018-11-20 | 唤醒设备的方法、装置、存储介质及电子设备 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2018/116493 WO2020102991A1 (zh) | 2018-11-20 | 2018-11-20 | 唤醒设备的方法、装置、存储介质及电子设备 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020102991A1 true WO2020102991A1 (zh) | 2020-05-28 |
Family
ID=70773736
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/116493 WO2020102991A1 (zh) | 2018-11-20 | 2018-11-20 | 唤醒设备的方法、装置、存储介质及电子设备 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN112740321A (zh) |
WO (1) | WO2020102991A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111951793A (zh) * | 2020-08-13 | 2020-11-17 | 北京声智科技有限公司 | 唤醒词识别的方法、装置及存储介质 |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117524228A (zh) * | 2024-01-08 | 2024-02-06 | 腾讯科技(深圳)有限公司 | 语音数据处理方法、装置、设备及介质 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2509291B1 (en) * | 2011-04-06 | 2015-06-03 | BlackBerry Limited | System and method for locating a misplaced mobile device |
CN106815507A (zh) * | 2015-11-30 | 2017-06-09 | 中兴通讯股份有限公司 | 语音唤醒实现方法、装置及终端 |
CN107886957A (zh) * | 2017-11-17 | 2018-04-06 | 广州势必可赢网络科技有限公司 | 一种结合声纹识别的语音唤醒方法及装置 |
CN108831477A (zh) * | 2018-06-14 | 2018-11-16 | 出门问问信息科技有限公司 | 一种语音识别方法、装置、设备及存储介质 |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9613626B2 (en) * | 2015-02-06 | 2017-04-04 | Fortemedia, Inc. | Audio device for recognizing key phrases and method thereof |
CN107767863B (zh) * | 2016-08-22 | 2021-05-04 | 科大讯飞股份有限公司 | 语音唤醒方法、***及智能终端 |
CN106448663B (zh) * | 2016-10-17 | 2020-10-23 | 海信集团有限公司 | 语音唤醒方法及语音交互装置 |
US10403279B2 (en) * | 2016-12-21 | 2019-09-03 | Avnera Corporation | Low-power, always-listening, voice command detection and capture |
US10311870B2 (en) * | 2017-05-10 | 2019-06-04 | Ecobee Inc. | Computerized device with voice command input capability |
-
2018
- 2018-11-20 CN CN201880097795.9A patent/CN112740321A/zh active Pending
- 2018-11-20 WO PCT/CN2018/116493 patent/WO2020102991A1/zh active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2509291B1 (en) * | 2011-04-06 | 2015-06-03 | BlackBerry Limited | System and method for locating a misplaced mobile device |
CN106815507A (zh) * | 2015-11-30 | 2017-06-09 | 中兴通讯股份有限公司 | 语音唤醒实现方法、装置及终端 |
CN107886957A (zh) * | 2017-11-17 | 2018-04-06 | 广州势必可赢网络科技有限公司 | 一种结合声纹识别的语音唤醒方法及装置 |
CN108831477A (zh) * | 2018-06-14 | 2018-11-16 | 出门问问信息科技有限公司 | 一种语音识别方法、装置、设备及存储介质 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111951793A (zh) * | 2020-08-13 | 2020-11-17 | 北京声智科技有限公司 | 唤醒词识别的方法、装置及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN112740321A (zh) | 2021-04-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021093449A1 (zh) | 基于人工智能的唤醒词检测方法、装置、设备及介质 | |
CN107767863B (zh) | 语音唤醒方法、***及智能终端 | |
US9779725B2 (en) | Voice wakeup detecting device and method | |
WO2021159688A1 (zh) | 声纹识别方法、装置、存储介质、电子装置 | |
US20170256270A1 (en) | Voice Recognition Accuracy in High Noise Conditions | |
US20220215853A1 (en) | Audio signal processing method, model training method, and related apparatus | |
US10147444B2 (en) | Electronic apparatus and voice trigger method therefor | |
CN109272991B (zh) | 语音交互的方法、装置、设备和计算机可读存储介质 | |
CN106448663A (zh) | 语音唤醒方法及语音交互装置 | |
CN110211599B (zh) | 应用唤醒方法、装置、存储介质及电子设备 | |
CN110825446B (zh) | 参数配置方法、装置、存储介质及电子设备 | |
CN110223687B (zh) | 指令执行方法、装置、存储介质及电子设备 | |
CN110544468B (zh) | 应用唤醒方法、装置、存储介质及电子设备 | |
CN112700782A (zh) | 语音处理方法和电子设备 | |
US9633655B1 (en) | Voice sensing and keyword analysis | |
CN111508493B (zh) | 语音唤醒方法、装置、电子设备及存储介质 | |
CN111312222A (zh) | 一种唤醒、语音识别模型训练方法及装置 | |
US20220292134A1 (en) | Device operation based on dynamic classifier | |
US11437022B2 (en) | Performing speaker change detection and speaker recognition on a trigger phrase | |
WO2019228135A1 (zh) | 匹配阈值的调整方法、装置、存储介质及电子设备 | |
WO2020102991A1 (zh) | 唤醒设备的方法、装置、存储介质及电子设备 | |
WO2021169711A1 (zh) | 指令执行方法、装置、存储介质及电子设备 | |
TW201717192A (zh) | 電子裝置及其透過語音辨識喚醒的方法 | |
CN110164431A (zh) | 一种音频数据处理方法及装置、存储介质 | |
US11205433B2 (en) | Method and apparatus for activating speech recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18940840 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18940840 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 30.09.2021) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18940840 Country of ref document: EP Kind code of ref document: A1 |