WO2020102991A1 - 唤醒设备的方法、装置、存储介质及电子设备 - Google Patents

唤醒设备的方法、装置、存储介质及电子设备

Info

Publication number
WO2020102991A1
WO2020102991A1 PCT/CN2018/116493 CN2018116493W WO2020102991A1 WO 2020102991 A1 WO2020102991 A1 WO 2020102991A1 CN 2018116493 W CN2018116493 W CN 2018116493W WO 2020102991 A1 WO2020102991 A1 WO 2020102991A1
Authority
WO
WIPO (PCT)
Prior art keywords
electronic device
wake
recognition
voice
voice information
Prior art date
Application number
PCT/CN2018/116493
Other languages
English (en)
French (fr)
Inventor
陈岩
Original Assignee
深圳市欢太科技有限公司
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市欢太科技有限公司, Oppo广东移动通信有限公司 filed Critical 深圳市欢太科技有限公司
Priority to PCT/CN2018/116493 priority Critical patent/WO2020102991A1/zh
Priority to CN201880097795.9A priority patent/CN112740321A/zh
Publication of WO2020102991A1 publication Critical patent/WO2020102991A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques

Definitions

  • the present application belongs to the technical field of electronic equipment, and particularly relates to a method, device, storage medium, and electronic equipment for waking up the equipment.
  • wake word recognition and voiceprint recognition technology can be used to wake up electronic devices, where wake word recognition is to detect whether the user's voice contains a preset to wake up the electronic device vocabulary. For example, after the wake-up word recognition and voiceprint recognition of the user's voice are both verified, the electronic device can light up the display screen and unlock the display screen.
  • the related art uses wake word recognition and voiceprint recognition to wake up the electronic device, it is easy to waste the power consumption of the electronic device.
  • Embodiments of the present application provide a method, an apparatus, a storage medium, and an electronic device for waking up a device, which can reduce waste of power consumption of the electronic device.
  • this embodiment provides a method for waking up a device, including:
  • the application processor of the electronic device performs a second wake-up word recognition and a second sound on the voice information Pattern recognition
  • the electronic device is woken up.
  • this embodiment provides an apparatus for waking up a device, including:
  • Acquisition module for acquiring voice information
  • the first recognition module is used for performing the first wake word recognition and the first voiceprint recognition on the voice information through the digital signal processor of the electronic device;
  • a second recognition module configured to perform a second wake-up on the voice information through the application processor of the electronic device if the first wake-up word recognition and the first voiceprint recognition are both verified Word recognition and second voiceprint recognition;
  • the wake-up module is configured to wake up the electronic device if both the second wake-up word recognition and the second voiceprint recognition are verified.
  • this embodiment provides a storage medium on which a computer program is stored, wherein, when the computer program is executed on a computer, the computer is caused to execute the method for waking up the device provided in this embodiment.
  • this embodiment provides an electronic device including a memory and a processor, and the processor is used to execute a computer program stored in the memory by executing:
  • the application processor of the electronic device performs a second wake-up word recognition and a second sound on the voice information Pattern recognition
  • the electronic device is woken up.
  • the electronic device may perform the first voiceprint recognition on the voice information while performing the first wake-up word recognition on the obtained user's voice information. Only when the first wake-up word recognition and the first voiceprint recognition are verified, will the electronic device be triggered to perform wake-up word recognition and voiceprint recognition again, and the second wake-up word recognition and voiceprint recognition are both verified The wake-up operation is performed only when it passes. If the first voiceprint recognition fails, the electronic device will not perform the second wake-up word recognition and voiceprint recognition, thereby effectively avoiding the non-master user speaking wake-up words in the related art and recognizing the first wake-up word Triggers the electronic device to perform the second wake-up word recognition and voiceprint recognition for the second time, resulting in a waste of power consumption.
  • FIG. 1 is a schematic flowchart of a method for waking up a device according to an embodiment of the present application.
  • FIG. 2 is another schematic flowchart of a method for waking up a device according to an embodiment of the present application.
  • 3 to 6 are schematic diagrams of scenarios of a method for waking up a device provided by an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of an apparatus for awakening a device according to an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • FIG 9 is another schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the execution subject of the embodiments of the present application may be an electronic device such as a smart phone or a tablet computer.
  • FIG. 1 is a schematic flowchart of a method for waking up a device according to an embodiment of the present application.
  • the flowchart may include:
  • wake word recognition and voiceprint recognition technology can be used to wake up electronic devices, where wake word recognition is to detect whether the user's voice contains a preset to wake up the electronic device vocabulary. For example, after the wake-up word recognition and voiceprint recognition of the user's voice are both verified, the electronic device can light up the display screen and unlock the display screen.
  • the related art uses wake word recognition and voiceprint recognition to wake up the electronic device, it is easy to waste the power consumption of the electronic device.
  • the related technology wakes up the electronic device, it firstly recognizes the acquired speech by a wake-up word to detect whether the speech contains the wake-up word preset by the owner. If the voice contains a preset wake-up word, the electronic device will perform wake-up word recognition and voiceprint recognition again. If the current user is detected as the owner by voiceprint recognition, and the second wake-up word recognition determines that the voice contains a preset wake-up word, the electronic device will be woken up, that is, the electronic device will light up and unlock. However, if the non-owner user knows the preset wake-up word and speaks the wake-up word to the electronic device, then the electronic device will also perform voiceprint recognition and second wake-up word recognition. However, since the current user is a non-host user, the electronic device will not be awakened. In other words, the fact that the electronic device recognizes the voiceprint and the second wake-up word recognition is unnecessary, which wastes power consumption of the electronic device.
  • the electronic device may first obtain the voice information A spoken by the current user.
  • the digital signal processor of the electronic device performs the first wake-up word recognition and the first voiceprint recognition on the voice information.
  • the electronic device may perform the first wake word recognition and the first voiceprint recognition on the voice information A through its digital signal processor (DSP).
  • DSP digital signal processor
  • the digital signal processor is a microprocessor that processes signals, and it is the core component of the voice encoder and modem.
  • the digital signal processor has the advantages of small size, low power consumption and fast operation speed.
  • Wake word recognition refers to whether the electronic device detects whether a certain voice contains preset vocabulary content used to wake up the electronic device. For example, the wake-up word preset by the owner for waking up the electronic device is "Xiaoou wakes up”. Then, when the user's voice is obtained, the electronic device can first convert the user's voice into text, and detect whether the converted text contains the wake-up word "Xiaoou wakes up", that is, the electronic device can detect the user's voice Does it contain the wake-up word "Xiaoou wakes up”. If the user's voice contains the wake-up word "Xiaoou wakes up", the electronic device may determine that the wake-up word recognition verification has passed. If the user's voice does not contain the wake-up word "Xiaoou wakes up", the electronic device may determine that the wake-up word recognition verification fails.
  • Voiceprint recognition is a biometric technology that uses voice to identify speakers. For example, if the voiceprint feature information of the owner is pre-stored in the electronic device, then after acquiring the voice information of the current user, the electronic device can extract the voiceprint feature information of the current user from it. After that, the electronic device can match the voiceprint feature information of the current user with the prestored voiceprint feature information of the owner. If the two match successfully, the electronic device can determine that the current user is the owner, that is, the voiceprint recognition verification is passed. If the two do not match, the electronic device can determine that the current user is not the owner, that is, the voiceprint recognition verification fails.
  • the electronic device After performing the first wake-up word recognition and the first voiceprint recognition on the voice information A acquired in 101, the electronic device can detect whether both the first wake-up word recognition and the first voiceprint recognition are verified.
  • both the first wake word recognition and the first voiceprint recognition are verified, then enter 103.
  • the electronic device detects that the voice spoken by the current user contains the preset wake-up word "Xiaoou wakes up", and the electronic device determines that the current user is the owner through the first voiceprint recognition. In this case, you can enter 103.
  • the electronic device can perform other operations. For example, when the first voiceprint recognition is not verified, the electronic device can perform other operations without triggering the second wake word recognition and second voiceprint recognition operations.
  • the application processor of the electronic device performs the second wake-up word recognition and the second voiceprint recognition on the voice information.
  • the electronic device uses the digital signal processor to verify the first wake-up word recognition and the first voiceprint recognition of the current user's voice information A, then the electronic device can be triggered to use its application processor (Application Processor, AP ) Perform a second wake-up word recognition and a second voiceprint recognition on the voice information A of the current user.
  • Application Processor Application Processor
  • the second wake-up word recognition is also to detect whether the current user's voice information A contains a preset wake-up word.
  • the second wake-up word recognition also detects whether the current user's voice information A contains the wake-up word "Xiaoou wakes up" preset by the owner.
  • the second voiceprint recognition is also to detect whether the current user is the owner.
  • the resources and operations available to the application processor can be stronger than that of the digital signal processor, the recognition result of the application processor will be more accurate.
  • the power consumption of the application processor is also greater than that of the digital signal processor.
  • the electronic device After performing the second wake-up word recognition and the second voiceprint recognition on the voice information A obtained in 101, the electronic device can detect whether both the second wake-up word recognition and the second voiceprint recognition are verified.
  • both the second wake word recognition and the second voiceprint recognition are verified, then go to 104.
  • the electronic device detects that the voice spoken by the current user does contain the preset wake-up word "Xiaoou wakes up", and the electronic device determines that the current user is indeed the owner through the second voiceprint recognition. In this case, you can enter 104.
  • the electronic device can perform other operations.
  • the electronic device can perform the wake-up operation. For example, the electronic device can light up the display screen and unlock the display screen.
  • the electronic device may perform the first voiceprint recognition on the voice information while performing the first wake-up word recognition on the obtained voice information of the user. Only when the first wake-up word recognition and the first voiceprint recognition are verified, will the electronic device be triggered to perform wake-up word recognition and voiceprint recognition again, and the second wake-up word recognition and voiceprint recognition are both verified The wake-up operation is performed only when it passes.
  • the electronic device will not perform the second wake-up word recognition and voiceprint recognition, thereby effectively avoiding the non-master user speaking wake-up words in the related art and recognizing the first wake-up word recognition Triggers the electronic device to perform the second wake-up word recognition and voiceprint recognition for the second time, resulting in a waste of power consumption.
  • FIG. 2 is another schematic flowchart of a method for waking up a device according to an embodiment of the present application.
  • the process may include:
  • the electronic device acquires a sound signal of the surrounding environment.
  • an electronic device can collect sound signals of the surrounding environment through its microphone.
  • the electronic device performs voice activity detection on the sound signal, wherein, when performing voice activity detection, the digital signal processor is in a preset low frequency mode.
  • the electronic device can perform voice activity detection (Voice Activity Detection) on the sound signal through its digital signal processor DSP.
  • voice activity detection Voice Activity Detection
  • the digital signal processor of the electronic device may be in a preset low frequency mode.
  • the voice activity detection is to distinguish the voice signal and the background noise signal from the voice signal. That is to say, in this embodiment, the voice activity detection is mainly for detecting whether the user's voice (voice) exists in the voice signal.
  • the digital signal processor of the electronic device has two working modes, which are preset low frequency mode and high frequency mode, respectively. Among them, when in the low frequency mode, the clock frequency of the digital signal processor is low, the number of instructions executed by the digital signal processor per second is small, the processing capacity is low, and the corresponding power consumption is also low. When in high-frequency mode, the clock frequency of the digital signal processor is higher, the digital signal processor executes more instructions per second, the processing power is stronger, and the corresponding power consumption is also higher.
  • the electronic device can control the digital signal processor to perform voice activity detection in a preset low-frequency mode.
  • the electronic device may use an energy plus model algorithm to perform voice activity detection on the collected sound signals of the surrounding environment.
  • the electronic device determines whether there is a user voice in the voice signal.
  • the electronic device may determine whether there is a user voice in the collected sound signal according to the voice activity detection.
  • the electronic device may perform other operations.
  • the electronic device obtains the user's voice information.
  • the electronic device determines that there is a user voice in the surrounding sound signal through voice activity detection. At this time, the electronic device can obtain corresponding voice information from the user voice. For example, the electronic device obtains the user's voice information A.
  • the electronic device performs the first wake word recognition and the first voiceprint recognition on the voice information, wherein, when the first wake word recognition and the first voiceprint recognition are performed, the The digital signal processor is in a preset high frequency mode.
  • the electronic device can control its digital signal processor to enter a preset high-frequency mode, and when in the high-frequency mode, the digital signal processor wakes up the voice information A Word recognition and voiceprint recognition. That is, the electronic device performs the first wake word recognition and the first voiceprint recognition on the voice information A.
  • the wake-up word preset by the owner for waking up the electronic device is “Xiaoou wakes up”. Then, when the user's voice information A is obtained, the electronic device can first convert the user's voice information A into text, and detect whether the converted text contains the wake-up word "Xiaoou wakes up", that is, the electronic device can Detect whether the user's voice contains the wake-up word "Xiaoou wakes up”. If the user's voice contains the wake-up word "Xiaoou wakes up”, the electronic device may determine that the wake-up word recognition verification has passed. If the user's voice does not contain the wake-up word "Xiaoou wakes up”, the electronic device may determine that the wake-up word recognition verification fails.
  • Voiceprint recognition is used to determine whether the current user is the owner.
  • the electronic device may match the voiceprint feature information of the current user with the pre-stored voiceprint feature information of the owner. If the matching is successful, the electronic device can determine that the current user is the owner. If the two do not match, the electronic device can determine that the current user is not the owner.
  • both the first wake word recognition and the first voiceprint recognition are verified, then proceed to 206.
  • the electronic device detects that the voice spoken by the current user contains the preset wake-up word "Xiaoou wakes up", and the electronic device determines that the current user is the owner through the first voiceprint recognition. In this case, you can enter 206.
  • the electronic device can perform other operations. For example, when the first voiceprint recognition is not verified, that is, the electronic device detects that the current user is not the owner through voiceprint recognition, then the electronic device can perform other operations without triggering the second wake-up word recognition and the second The operation of secondary voiceprint recognition.
  • the electronic device performs the second wakeword recognition and the second voiceprint recognition on the voice information through the application processor.
  • the electronic device verifies the wake word recognition and voiceprint recognition of the current user's voice information A through the digital signal processor, then the electronic device can be triggered to use its Application Processor (AP) to the current user's
  • AP Application Processor
  • the voice information A performs the wake-up word recognition and voiceprint recognition again, that is, the electronic device can perform the second wake-up word recognition and the second voiceprint recognition on the current user's voice information A.
  • the second wake-up word recognition is also to detect whether the current user's voice information A contains a preset wake-up word.
  • the second wake-up word recognition also detects whether the current user's voice information A contains the wake-up word "Xiaoou wakes up" preset by the owner.
  • the second voiceprint recognition is also to detect whether the current user is the owner. Because the computing power of the application processor is stronger than that of the digital signal processor, the results of the wake-up word recognition and voiceprint recognition of the current user's voice information A through the application processor will be better than that of the digital signal processor.
  • the result of the pattern recognition is more accurate, so as to further confirm whether the current user's voice information A contains a preset wake-up word and further confirm whether the current user is the owner.
  • the resources and operations available to the application processor can be stronger than that of the digital signal processor, and the power consumption of the application processor is also greater than that of the digital signal processor.
  • the electronic device After performing the second wake-up word recognition and the second voiceprint recognition on the speech information A acquired in 204, the electronic device can detect whether both the second wake-up word recognition and the second voiceprint recognition are verified.
  • both the second wake word recognition and the second voiceprint recognition are verified, then proceed to 207.
  • the electronic device detects that the voice spoken by the current user does contain the preset wake-up word "Xiaoou wakes up", and the electronic device determines that the current user is indeed the owner through the second voiceprint recognition. In this case, you can enter 207.
  • the electronic device can perform other operations.
  • the electronic device performs a wake-up operation.
  • the electronic device can be woken up, that is, the electronic device can perform the wake-up operation. For example, the electronic device can light up the display screen and unlock the display screen.
  • the electronic device performing the first wake word recognition and the first voiceprint recognition on the voice information through the digital signal processor may include:
  • the electronic device performs the first wake word recognition and the first voiceprint recognition on the voice information, wherein the digital signal processor uses the first model to perform the first wake word recognition and the first voiceprint Identify.
  • the electronic device performs the second wake word recognition and the second voiceprint recognition on the voice information through the application processor, which may include: through the application processor, the electronic device performs the second wake word recognition and the first Secondary voiceprint recognition, where the application processor uses the second model to perform the second wake word recognition and the second voiceprint recognition.
  • the number of parameters used by the first model is smaller than the number of parameters used by the second model, so that the operating power consumption of the first model is smaller than the operating power consumption of the second model.
  • the first model and the second model may be used to perform wake word recognition and voiceprint recognition on the voice information A of the current user.
  • the number of parameters used by the first model is smaller than the number of parameters used by the second model, so that the operating power consumption of the first model is smaller than the operating power consumption of the second model.
  • the first model uses three parameters, A, B, and C, for the wake word recognition and voiceprint recognition of the current user's voice information A.
  • the second model uses 6 parameters for the wake-up word recognition and voiceprint recognition of the current user's voice information A, namely A, B, C, D, E, and F.
  • both the first model and the second model can perform wake word recognition and voiceprint recognition on the current user's voice information A.
  • the recognition results of the second model for wake word recognition and voiceprint recognition will be more accurate than the recognition results of the first model.
  • the electronic device can use the first model to perform the first wake word recognition and the first voiceprint recognition on the current user's voice information A The first wake word recognition and the first voiceprint recognition are performed. Since the application processor can use more resources, the electronic device can use the second model to perform the second time when performing the second wake word recognition and the second voiceprint recognition on the current user's voice information A Awaken word recognition and second voiceprint recognition, so as to further confirm whether the current user's voice information A contains a preset wake word through the recognition result of the second model, and further confirm whether the current user is the owner.
  • the electronic device performs the first wake word recognition and the first voiceprint recognition on the voice information, which may include:
  • the electronic device uses the Gaussian mixture model to perform the first wake word recognition and the first voiceprint recognition on the voice information.
  • the electronic device may use a Gaussian mixture model to perform wake-up word recognition and voiceprint recognition on the voice information A.
  • the electronic device may also use other algorithm models to perform the first wake word recognition and the first voiceprint recognition on the current user's voice information A, which is not specifically limited in this embodiment.
  • the electronic device performs the second wake word recognition and the second voiceprint recognition on the voice information, which may include:
  • the electronic device uses the deep neural network algorithm model to perform the second wake word recognition and the second voiceprint recognition on the speech information.
  • the electronic device when performing wake-up word recognition and voiceprint recognition on the current user's voice information A through the application processor, can use a deep neural network (DNN) algorithm model to perform wake-up word recognition on the voice information A Harmony recognition.
  • DNN deep neural network
  • the electronic device may also use other algorithm models to perform the second wake word recognition and the second voiceprint recognition on the current user's voice information A, which is not specifically limited in this embodiment.
  • the electronic device may first perform voice activity detection on the collected sound signal of the surrounding environment to detect whether the sound signal includes user voice.
  • the voice activity detection is performed when the digital signal processor is in a low-frequency mode. Therefore, this embodiment can perform voice activity detection with extremely low power consumption.
  • the electronic device can control the digital signal processor to enter the high-frequency mode, and perform the first wake word recognition and the first voiceprint recognition on the user's voice. Only when the first wake-up word recognition and the first voiceprint recognition are both verified, the electronic device will trigger the application processor to perform the second wake-up word recognition and the second voiceprint recognition.
  • the power consumption is generally 50 times to 100 times the power consumption of the digital signal processor. Therefore, this embodiment triggers the application processor to perform voice recognition and voiceprint when the digital signal processor determines that the current user is the owner and speaks the wake-up word. Identification, which can effectively avoid the waste of power consumption caused by voiceprint identification only on the application processor side in the related art.
  • the wake-up word recognition and voiceprint recognition are both verified in this embodiment, the occurrence of accidentally waking up the electronic device can be reduced and the user's experience of waking up the electronic device can be improved.
  • FIG. 3 to FIG. 6 are schematic diagrams of scenarios of a method for waking up a device according to an embodiment of the present application.
  • the owner of the electronic device is A.
  • the electronic device prompts the owner to speak the voice used as the wake-up word. For example, if the owner user A utters the voice of "Xiaoou wakes up,” then the electronic device can collect the voice through its microphone. After that, the electronic device can perform voice recognition and voiceprint recognition on the voice, thereby obtaining the wake-up word "Xiaoou wakes up" used to wake up the electronic device, and the voiceprint characteristics of the owner user A.
  • the electronic device can collect sound signals of the surrounding environment through its microphone. After the sound signal is collected, the electronic device can control its digital signal processor to enter a preset low-frequency mode, and perform voice activity detection on the collected sound signal through the digital signal processor in the low-frequency mode to detect the sound signal Whether to include user voice.
  • the current user utters a voice "Xiao Ou wakes up" (voice B) to the electronic device.
  • voice B a voice “Xiao Ou wakes up”
  • the microphone of the electronic device detects the voice activity of the voice B through the digital signal processor in the low-frequency mode, and detects that it contains the user's voice.
  • the electronic device can obtain the current user's voice information from the sound signal. After that, the electronic device can control its digital signal processor to enter a preset high-frequency mode, and perform wake-up word recognition and voiceprint recognition on the acquired voice information of the current user in the high-frequency mode. That is, the electronic device performs the first wake word recognition and the first voiceprint recognition on the current user's voice information.
  • the electronic device may perform the first wake-up word recognition on the current user's voice B through the digital signal processor in the high-frequency mode to detect whether the voice B is the preset wake-up word "Xiao Ou Wake Up".
  • the electronic device can extract the voiceprint feature of the current user from the voice B of the current user, and match the voiceprint feature of the current user with the preset voiceprint feature of the owner user A to detect whether the current user is a machine Primary user A.
  • the electronic device can re-awaken the word recognition and voiceprint recognition of the current user's voice information through its application processor. That is, the electronic device performs the second wake-up word recognition and the second voiceprint recognition on the current user's voice information.
  • the second wake-up word recognition is also to detect whether the current user's voice information contains a preset wake-up word.
  • the second voiceprint recognition is also to detect whether the current user is the main user A.
  • the electronic device can perform a wake-up operation, for example, the electronic device can light up the on-screen display screen and unlock.
  • the electronic device detects that the current user is the owner user A through the first voiceprint recognition, and detects that the voice B is the preset wake-up word "Xiaoou wakes up” through the first wake-up word recognition. Moreover, the electronic device further confirms that the current user is the owner user A through the second voiceprint recognition, and further confirms that the voice B is indeed the preset wake-up word "Xiaoou wakes up” through the second wake-up word recognition. In this case, the electronic device can light up the display screen and unlock the display screen.
  • the electronic device can perform other operations without performing the second voiceprint recognition and the second wakeword recognition.
  • the non-host user B speaks to the electronic device the voice C "Xiaoou wakes up" containing the wake word.
  • the electronic device will not perform the second wake word recognition and the second voiceprint recognition on the speech C.
  • the electronic device since the electronic device performs voiceprint recognition only when performing the second wake-up word recognition, it does not perform voiceprint recognition when performing the first wake-up word recognition. Therefore, although the voice C is spoken by the user B, since the electronic device does not perform voiceprint recognition when performing the first wake-up word recognition, the electronic device detects that the voice C contains the wake-up word "Xiaoou wakes up" It will trigger the second wake-up word recognition and voiceprint recognition of speech C. During voiceprint recognition, the electronic device will detect that the current user is not the owner user A, so the electronic device will not perform the wake-up operation. Therefore, in the related art, the second wake word recognition and voiceprint recognition performed by the electronic device actually cause a waste of power consumption of the electronic device.
  • FIG. 7 is a schematic structural diagram of an apparatus for waking up a device according to an embodiment of the present application.
  • the apparatus 300 for waking up the device may include: an acquisition module 301, a first recognition module 302, a second recognition module 303, and a wake-up module 304.
  • the obtaining module 301 is used to obtain voice information.
  • the first recognition module 302 is used to perform the first wake word recognition and the first voiceprint recognition on the voice information through the digital signal processor of the electronic device.
  • the second recognition module 303 is configured to, if both the first wake word recognition and the first voiceprint recognition are verified, pass the application processor of the electronic device for the second time on the voice information Awaken word recognition and second voiceprint recognition.
  • the wake-up module 304 is configured to wake up the electronic device if both the second wake-up word recognition and the second voiceprint recognition are verified.
  • the acquisition module 301 can be used to:
  • the voice activity detection determine whether there is a user voice in the voice signal
  • the voice information is obtained.
  • the acquisition module 301 can be used to:
  • the digital signal processor of the electronic device performs voice activity detection on the sound signal.
  • the acquisition module 301 can be used to:
  • the digital signal processor of the electronic device performs voice activity detection on the sound signal, wherein, when performing the voice activity detection, the digital signal processor is in a preset low frequency mode.
  • the first identification module 302 may be used to:
  • the digital signal processor of the electronic device performs the first wake word recognition and the first voiceprint recognition on the voice information, wherein, when the first wake word recognition and the first voiceprint recognition are performed,
  • the digital signal processor is in a preset high-frequency mode.
  • the first recognition module 302 may be used to: through the digital signal processor of the electronic device, perform the first wake word recognition and the first voiceprint recognition on the voice information, wherein the number The signal processor uses the first model to perform the first wake word recognition and the first voiceprint recognition;
  • the second recognition module 303 may be used to: perform a second wake word recognition and a second voiceprint recognition on the voice information through the application processor of the electronic device, wherein the application processor uses the second model Perform the second wake word recognition and the second voiceprint recognition.
  • the number of parameters used by the first model is smaller than the number of parameters used by the second model, so that the operating power consumption of the first model is smaller than the operating power consumption of the second model.
  • the first recognition module 302 may be used to: through the digital signal processor of the electronic device, use a Gaussian mixture model to perform the first wake word recognition and the first voiceprint recognition on the speech information.
  • the second recognition module 303 may be used to: through the application processor of the electronic device, use a deep neural network algorithm model to perform a second wake word recognition and a second voiceprint on the speech information Identify.
  • An embodiment of the present application provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed on a computer, the computer is caused to perform the process in the method for waking up a device as provided in this embodiment .
  • An embodiment of the present application further provides an electronic device, including a memory and a processor, and the processor is used to execute a process in the method for waking up the device provided by this embodiment by calling a computer program stored in the memory.
  • the aforementioned electronic device may be a mobile terminal such as a tablet computer or a smart phone.
  • FIG. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • the electronic device 400 may include a display screen 401, a memory 402, a processor 403, a microphone 404, and other components.
  • a display screen 401 may include a display screen 401, a memory 402, a processor 403, a microphone 404, and other components.
  • FIG. 8 does not constitute a limitation on the electronic device, and may include more or less components than those illustrated, or combine certain components, or arrange different components.
  • the display screen 401 can be used to display information such as images and text.
  • the memory 402 may be used to store application programs and data.
  • the application program stored in the memory 402 contains executable code.
  • the application program can form various functional modules.
  • the processor 403 executes application programs stored in the memory 402 to execute various functional applications and data processing.
  • the processor 403 is the control center of the electronic device, and uses various interfaces and lines to connect the various parts of the entire electronic device. Various functions and processing data, so as to carry out overall monitoring of electronic equipment.
  • the microphone 404 may be used to collect user's voice information.
  • the processor 403 in the electronic device loads the executable code corresponding to the process of one or more application programs into the memory 402 according to the following instructions, and the processor 403 executes and stores the memory in the memory The application in 402, thereby executing:
  • the electronic device 500 may include a display screen 501, a memory 502, a processor 503, a microphone 504, a speaker 505, a battery 506, and other components.
  • the display screen 501 can be used to display information such as images and text.
  • the memory 502 may be used to store application programs and data.
  • the application program stored in the memory 502 contains executable code.
  • the application program can form various functional modules.
  • the processor 503 executes application programs stored in the memory 502 to execute various functional applications and data processing.
  • the processor 503 is the control center of the electronic device, and uses various interfaces and lines to connect the various parts of the entire electronic device. Various functions and processing data, so as to carry out overall monitoring of electronic equipment.
  • the input unit 504 may be used to receive input numbers, character information, or user characteristic information (such as fingerprints), and generate keyboard, mouse, joystick, optical, or trackball signal inputs related to user settings and function control.
  • user characteristic information such as fingerprints
  • the output unit 505 may be used to display information input by the user or provided to the user and various graphical user interfaces of the electronic device. These graphical user interfaces may be composed of graphics, text, icons, video, and any combination thereof.
  • the output unit may include a display panel.
  • the processor 503 in the electronic device loads the executable code corresponding to the process of one or more application programs into the memory 502 according to the following instructions, and the processor 503 runs the stored code in the memory
  • the application in 502 thus executes:
  • the processor 503 may further execute: acquiring a sound signal of the surrounding environment, and performing voice activity detection on the sound signal; judging the voice activity detection Whether there is user voice in the voice signal.
  • the processor 503 may execute: acquiring voice information if there is a user voice.
  • the processor 503 when performing the voice activity detection on the sound signal, may perform: performing voice activity detection on the sound signal through the digital signal processor of the electronic device.
  • the processor 503 when the processor 503 executes the digital signal processor passing through the electronic device to perform voice activity detection on the sound signal, it may execute: through the digital signal processor of the electronic device, Perform voice activity detection on the sound signal, wherein, when performing voice activity detection, the digital signal processor is in a preset low frequency mode.
  • the processor 503 when the processor 503 executes the digital signal processor through the electronic device to perform the first wake-up word recognition and the first voiceprint recognition on the voice information, it may execute: through the electronic device The digital signal processor performs the first wake-up word recognition and the first voiceprint recognition on the voice information, wherein, when performing the first wake-up word recognition and the first voiceprint recognition, the digital The signal processor is in the preset high frequency mode.
  • the processor 503 when the processor 503 executes the digital signal processor through the electronic device to perform the first wake-up word recognition and the first voiceprint recognition on the voice information, it may execute: through the electronic device A digital signal processor of the first time to recognize the first wake-up words and voiceprint recognition of the voice information, wherein the digital signal processor uses the first model to perform the first wake-up words recognition and the first Infrasound recognition.
  • the processor 503 executes the application processor through the electronic device to perform the second wake word recognition and the second voiceprint recognition on the voice information, it may execute: the application through the electronic device A processor, performing a second wake word recognition and a second voiceprint recognition on the voice information, wherein the application processor uses a second model to perform the second wake word recognition and the second voiceprint recognition .
  • the number of parameters used by the first model is smaller than the number of parameters used by the second model, so that the operating power consumption of the first model is smaller than the operating power consumption of the second model.
  • the processor 503 when the processor 503 executes the digital signal processor through the electronic device to perform the first wake-up word recognition and the first voiceprint recognition on the voice information, it may execute: through the electronic device
  • the digital signal processor uses the Gaussian mixture model to perform the first wake word recognition and the first voiceprint recognition on the speech information.
  • the processor 503 when executing the application processor through the electronic device to perform the second wake word recognition and the second voiceprint recognition on the voice information, may execute:
  • the application processor of the electronic device uses a deep neural network algorithm model to perform a second wake word recognition and a second voiceprint recognition on the speech information.
  • the device for awakening the device provided in the embodiment of the present application and the method for awakening the device in the above embodiments belong to the same concept, and any device provided in the method embodiment of the method for awakening the device may be run on the device for awakening the device
  • any device provided in the method embodiment of the method for awakening the device may be run on the device for awakening the device
  • the computer program may be stored in a computer-readable storage medium, such as stored in a memory, and executed by at least one processor, during execution may include implementation of the method of waking up the device as described Example process.
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM, Read Only Memory), a random access memory (RAM, Random Access Memory), and so on.
  • each functional module may be integrated into one processing chip, or each module may exist alone physically, or two or more modules may be integrated into one module.
  • the above integrated modules may be implemented in the form of hardware or software function modules. If the integrated module is implemented in the form of a software function module and sold or used as an independent product, it may also be stored in a computer-readable storage medium, such as a read-only memory, magnetic disk, or optical disk, etc. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Telephone Function (AREA)

Abstract

本实施例提供一种唤醒设备的方法。该方法包括:获取语音信息;通过数字信号处理器,对语音信息进行第一次唤醒词识别和第一次声纹识别;若第一次唤醒词识别和第一次声纹识别均验证通过,则通过应用处理器,对语音信息进行第二次唤醒词识别和第二次声纹识别;若第二次唤醒词识别和第二次声纹识别均验证通过,则唤醒电子设备。

Description

唤醒设备的方法、装置、存储介质及电子设备 技术领域
本申请属于电子设备技术领域,尤其涉及一种唤醒设备的方法、装置、存储介质及电子设备。
背景技术
随着电子设备的智能化程度越来越高,相关技术中可以利用唤醒词识别和声纹识别技术来唤醒电子设备,其中唤醒词识别是检测用户语音中是否包含用于唤醒电子设备的预设词汇。比如,在对用户语音进行的唤醒词识别和声纹识别均验证通过后,电子设备可以点亮显示屏并对该显示屏进行解锁。然而,相关技术在利用唤醒词识别和声纹识别来唤醒电子设备时容易对电子设备的功耗造成浪费。
发明内容
本申请实施例提供一种唤醒设备的方法、装置、存储介质及电子设备,可以减少电子设备的功耗浪费。
第一方面,本实施例提供一种唤醒设备的方法,包括:
获取语音信息;
通过电子设备的数字信号处理器,对所述语音信息进行第一次唤醒词识别和第一次声纹识别;
若所述第一次唤醒词识别和所述第一次声纹识别均验证通过,则通过所述电子设备的应用处理器,对所述语音信息进行第二次唤醒词识别和第二次声纹识别;
若所述第二次唤醒词识别和所述第二次声纹识别均验证通过,则唤醒所述电子设备。
第二方面,本实施例提供一种唤醒设备的装置,包括:
获取模块,用于获取语音信息;
第一识别模块,用于通过电子设备的数字信号处理器,对所述语音信息进行第一次唤醒词识别和第一次声纹识别;
第二识别模块,用于若所述第一次唤醒词识别和所述第一次声纹识别均验证通过,则通过所述电子设备的应用处理器,对所述语音信息进行第二次唤醒词识别和第二次声纹识别;
唤醒模块,用于若所述第二次唤醒词识别和所述第二次声纹识别均验证通过,则唤醒所述电子设备。
第三方面,本实施例提供一种存储介质,其上存储有计算机程序,其中,当所述计算机程序在计算机上执行时,使得所述计算机执行本实施例提供的唤醒设备的方法。
第四方面,本实施例提供一种电子设备,包括存储器,处理器,所述处理器通过调用所述存储器中存储的计算机程序,用于执行:
获取语音信息;
通过电子设备的数字信号处理器,对所述语音信息进行第一次唤醒词识别和第一次声纹识别;
若所述第一次唤醒词识别和所述第一次声纹识别均验证通过,则通过所述电子设备的应用处理器,对所述语音信息进行第二次唤醒词识别和第二次声纹识别;
若所述第二次唤醒词识别和所述第二次声纹识别均验证通过,则唤醒所述电子设备。
本实施例中,电子设备可以在对获取到的用户的语音信息进行第一次唤醒词识别的同时,对该语音信息进行第一次声纹识别。只有在该第一次唤醒词识别和第一次声纹识别均验证通过时,才会触发电子设备再次进行唤醒词识别和声纹识别,并在第二次唤醒词识别和声纹识别均验证通过时才执行唤醒操作。若第一次声纹识别未通过,则电子设备不会进行第二次的唤醒词识别和声纹识别,从而有效避免相关技术中非机主用户说出唤醒词而通过第一次唤醒词识别时触发电子设备进行第二次唤醒词识别和声纹识别所造成的功耗浪费。
附图说明
下面结合附图,通过对本申请的具体实施方式详细描述,将使本申请的技术方案及其有益效果显而易见。
图1是本申请实施例提供的唤醒设备的方法的流程示意图。
图2是本申请实施例提供的唤醒设备的方法的另一流程示意图。
图3至图6是本申请实施例提供的唤醒设备的方法的场景示意图。
图7是本申请实施例提供的唤醒设备的装置的结构示意图。
图8是本申请实施例提供的电子设备的结构示意图。
图9是本申请实施例提供的电子设备的另一结构示意图。
具体实施方式
请参照图示,其中相同的组件符号代表相同的组件,本申请的原理是以实施在一适当的运算环境中来举例说明。以下的说明是基于所例示的本申请具体实施例,其不应被视为限制本申请未在此详述的其它具体实施例。
可以理解的是,本申请实施例的执行主体可以是诸如智能手机或平板电脑等的电子设备。
请参阅图1,图1是本申请实施例提供的唤醒设备的方法的流程示意图,流程可以包括:
在101中,获取语音信息。
随着电子设备的智能化程度越来越高,相关技术中可以利用唤醒词识别和声纹识别技术来唤醒电子设备,其中唤醒词识别是检测用户语音中是否包含用于唤醒电子设备的预设词汇。比如,在对用户语音进行的唤醒词识别和声纹识别均验证通过后,电子设备可以点亮显示屏并对该显示屏进行解锁。然而,相关技术在利用唤醒词识别和声纹识别来唤醒电 子设备时容易对电子设备的功耗造成浪费。
例如,相关技术在唤醒电子设备时会先对获取到的语音进行唤醒词识别,以检测该语音是否包含机主预设的唤醒词。若该语音中包含预设的唤醒词,则电子设备会再进行一次唤醒词识别并进行声纹识别。如果通过声纹识别检测到当前用户为机主,并且通过第二次唤醒词识别再次确定该语音中包含预设的唤醒词,那么电子设备会被唤醒,即电子设备会亮屏并解锁。但是,如果非机主用户知晓了预设的唤醒词,并向电子设备说出了该唤醒词,那么电子设备同样会进行声纹识别和第二次唤醒词识别。但由于当前用户是非机主用户,因此电子设备不会被唤醒。也就是说,事实上电子设备进行声纹识别和第二次唤醒词识别是多余的,这就对电子设备的功耗造成了浪费。
在本申请实施例的101中,比如,电子设备可以先获取当前用户说出的语音信息A。
在102中,通过电子设备的数字信号处理器,对语音信息进行第一次唤醒词识别和第一次声纹识别。
比如,在获取到当前用户的语音信息A后,电子设备可以通过其数字信号处理器(Digital Signal Processor,DSP)对该语音信息A进行第一次唤醒词识别和第一次声纹识别。
需要说明的是,数字信号处理器是一个处理信号的微处理器,它是语音编码器和调制解调器的核心部件。数字信号处理器具有体积小、功率消耗少、运算速度快的优点。
唤醒词识别是指电子设备检测某段语音中是否包含用于唤醒本电子设备的预设词汇内容。例如,机主预设的用于唤醒电子设备的唤醒词为“小欧醒过来”。那么,当获取到用户语音后,电子设备可以先将该用户语音转换为文字,并检测转换后得到的文字中是否包含“小欧醒过来”这一唤醒词,即电子设备可以检测该用户语音中是否包含“小欧醒过来”这一唤醒词。若该用户语音中包含“小欧醒过来”这一唤醒词,则电子设备可以确定唤醒词识别验证通过。若该用户语音中不包含“小欧醒过来”这一唤醒词,则电子设备可以确定唤醒词识别验证不通过。
声纹识别是一种通过声音判别说话人身份的生物识别技术。比如,电子设备中预存了机主的声纹特征信息,那么在获取到当前用户的语音信息后,电子设备可以从中提取当前用户的声纹特征信息。之后,电子设备可以将当前用户的声纹特征信息与预存的机主的声纹特征信息进行匹配。若二者匹配成功,则电子设备可以确定当前用户为机主,即声纹识别验证通过。若二者不匹配,则电子设备可以确定当前用户不是机主,即声纹识别验证不通过。
在对101中获取到的语音信息A进行第一次唤醒词识别和第一次声纹识别后,电子设备可以检测该第一次唤醒词识别和第一次声纹识别是否均验证通过。
若第一次唤醒词识别和第一次声纹识别均验证通过,则进入103中。例如,电子设备检测到当前用户说出的语音中包含“小欧醒过来”这一预设的唤醒词,并且电子设备通过 第一次声纹识别确定出该当前用户为机主,在这种情况下,可以进入103中。
若第一次唤醒词识别和第一次声纹识别并未全部验证通过,则电子设备可以执行其它操作。例如,当第一次声纹识别未验证通过时,电子设备可以执行其它操作,而不会触发进行第二次唤醒词识别和第二次声纹识别的操作。
在103中,若第一次唤醒词识别和第一次声纹识别均验证通过,则通过电子设备的应用处理器,对语音信息进行第二次唤醒词识别和第二次声纹识别。
比如,电子设备通过数字信号处理器对当前用户的语音信息A进行的第一次唤醒词识别和第一次声纹识别均验证通过,那么可以触发电子设备利用其应用处理器(Application Processor,AP)对该当前用户的语音信息A进行第二次唤醒词识别和第二次声纹识别。
需要说明的是,第二次唤醒词识别同样是检测当前用户的语音信息A中是否包含预设的唤醒词。例如,第二次唤醒词识别同样是检测当前用户的语音信息A中是否包含机主预设的唤醒词“小欧醒过来”。第二次声纹识别同样是检测当前用户是否为机主。此外,由于应用处理器能够获得的资源和运算能够相对于数字信号处理器更强,因此应用处理器的识别结果会更准确。应用处理器的功耗也大于数字信号处理器。
在对101中获取到的语音信息A进行第二次唤醒词识别和第二次声纹识别后,电子设备可以检测该第二次唤醒词识别和第二次声纹识别是否均验证通过。
若第二次唤醒词识别和第二次声纹识别均验证通过,则进入104中。例如,电子设备检测到当前用户说出的语音中确实是包含“小欧醒过来”这一预设的唤醒词,并且电子设备通过第二次声纹识别确定出该当前用户确实是机主,在这种情况下,可以进入104中。
若第二次唤醒词识别和第二次声纹识别并未全部验证通过,则电子设备可以执行其它操作。
在104中,若第二次唤醒词识别和第二次声纹识别均验证通过,则唤醒电子设备。
比如,电子设备通过应用处理器对当前用户的语音信息A进行的第二次唤醒词识别和第二次声纹识别均验证通过,那么电子设备可以执行唤醒操作。例如,电子设备可以点亮显示屏并解锁显示屏等。
可以理解的是,本实施例中,电子设备可以在对获取到的用户的语音信息进行第一次唤醒词识别的同时,对该语音信息进行第一次声纹识别。只有在该第一次唤醒词识别和第一次声纹识别均验证通过时,才会触发电子设备再次进行唤醒词识别和声纹识别,并在第二次唤醒词识别和声纹识别均验证通过时才执行唤醒操作。若第一次声纹识别未通过,则电子设备不会进行第二次的唤醒词识别和声纹识别,从而有效避免相关技术中非机主用户说出唤醒词而通过第一次唤醒词识别时触发电子设备进行第二次唤醒词识别和声纹识别所造成的功耗浪费。
请参阅图2,图2为本申请实施例提供的唤醒设备的方法的另一流程示意图,流程可以包括:
在201中,电子设备获取周围环境的声音信号。
比如,电子设备可以通过其麦克风采集周围环境的声音信号。
在202中,通过数字信号处理器,电子设备对声音信号进行语音活动检测,其中,当进行语音活动检测时,该数字信号处理器处于预设低频模式。
比如,在采集到周围环境的声音信号后,电子设备可以通过其数字信号处理器DSP对该声音信号进行语音活动检测(Voice Activity Detection,VAD)。其中,当进行语音活动检测时,电子设备的数字信号处理器可以处于预设的低频模式。
需要说明的是,语音活动检测是为了从声音信号里区分出语音信号和背景噪声信号。也就是说,本实施例中,语音活动检测主要是为了检测声音信号中是否存在用户的声音(语音)。
在本实施例中,电子设备的数字信号处理器具有两种工作模式,分别为预设的低频模式和高频模式。其中,当处于低频模式时,数字信号处理器的时钟频率较低,数字信号处理器每秒执行的指令数较少,处理能力较低,相应的其功耗也较低。当处于高频模式时,数字信号处理器的时钟频率较高,数字信号处理器每秒执行的指令数较多,处理能力较强,相应的其功耗也较高一些。
在本实施例中,电子设备可以控制数字信号处理器在预设的低频模式下进行语音活动检测。
在一种实施方式中,电子设备可以使用能量加模型的算法来对采集到的周围环境的声音信号进行语音活动检测。
在203中,根据语音活动检测,电子设备判断声音信号中是否存在用户声音。
比如,在对采集到的声音信号进行语音活动检测的过程中,电子设备可以根据该语音活动检测判断采集到的声音信号中是否存在用户声音。
如果通过语音活动检测判断出采集到的声音信号中存在用户声音,那么进入204中。
如果通过语音活动检测判断出采集到的声音信号中不存在用户声音,而仅有背景噪声,那么电子设备可以执行其它操作。
在204中,若存在用户声音,则电子设备获取用户的语音信息。
比如,电子设备通过语音活动检测判断出周围的声音信号中存在用户声音,此时电子设备可以从该用户声音中获取对应的语音信息。例如,电子设备获取到用户的语音信息A。
在205中,通过数字信号处理器,电子设备对语音信息进行第一次唤醒词识别和第一次声纹识别,其中,当进行第一次唤醒词识别和第一次声纹识别时,该数字信号处理器处于预设高频模式。
比如,在获取到用户的语音信息A后,电子设备可以控制其数字信号处理器进入预设的高频模式,并在处于该高频模式时,由该数字信号处理器对语音信息A进行唤醒词识别和声纹识别。即,电子设备对语音信息A进行第一次唤醒词识别和第一次声纹识别。
需要说明的是,例如,机主预设的用于唤醒电子设备的唤醒词为“小欧醒过来”。那么,当获取到用户语音信息A后,电子设备可以先将该用户语音信息A转换为文字,并检测转换后得到的文字中是否包含“小欧醒过来”这一唤醒词,即电子设备可以检测该用户语音中是否包含“小欧醒过来”这一唤醒词。若该用户语音中包含“小欧醒过来”这一唤醒词,则电子设备可以确定唤醒词识别验证通过。若该用户语音中不包含“小欧醒过来”这一唤醒词,则电子设备可以确定唤醒词识别验证不通过。
声纹识别用于确定当前用户是否为机主。在进行声纹识别时,电子设备可以将当前用户的声纹特征信息与预存的机主的声纹特征信息进行匹配。若二者匹配成功,则电子设备可以确定当前用户为机主。若二者不匹配,则电子设备可以确定当前用户不是机主。
若第一次唤醒词识别和第一次声纹识别均验证通过,则进入206中。例如,电子设备检测到当前用户说出的语音中包含“小欧醒过来”这一预设的唤醒词,并且电子设备通过第一次声纹识别确定出该当前用户为机主,在这种情况下,可以进入206中。
若第一次唤醒词识别和第一次声纹识别并未全部验证通过,则电子设备可以执行其它操作。例如,当第一次声纹识别未验证通过时,即电子设备通过声纹识别检测出当前用户不是机主,那么电子设备可以执行其它操作,而不会触发进行第二次唤醒词识别和第二次声纹识别的操作。
在206中,若第一次唤醒词识别和第一次声纹识别均验证通过,则电子设备通过应用处理器对语音信息进行第二次唤醒词识别和第二次声纹识别。
比如,电子设备通过数字信号处理器对当前用户的语音信息A进行的唤醒词识别和声纹识别均验证通过,那么可以触发电子设备利用其应用处理器(Application Processor,AP)对该当前用户的语音信息A再次进行唤醒词识别和声纹识别,即电子设备可以对当前用户的语音信息A进行第二次唤醒词识别和第二次声纹识别。
需要说明的是,第二次唤醒词识别同样是检测当前用户的语音信息A中是否包含预设的唤醒词。例如,第二次唤醒词识别同样是检测当前用户的语音信息A中是否包含机主预设的唤醒词“小欧醒过来”。第二次声纹识别同样是检测当前用户是否为机主。由于应用处理器的运算能力比数字信号处理器更强,因此通过应用处理器对当前用户的语音信息A进行的唤醒词识别和声纹识别的结果会比数字信号处理器进行唤醒词识别和声纹识别的结果更加准确,从而起到进一步确认当前用户的语音信息A中是否包含预设的唤醒词以及进一步确认当前用户是否为机主的效果。此外,应用处理器能够获得的资源和运算能够相对于数字信号处理器更强,应用处理器的功耗也大于数字信号处理器。
在对204中获取到的语音信息A进行第二次唤醒词识别和第二次声纹识别后,电子设备可以检测该第二次唤醒词识别和第二次声纹识别是否均验证通过。
若第二次唤醒词识别和第二次声纹识别均验证通过,则进入207中。例如,电子设备检测到当前用户说出的语音中确实是包含“小欧醒过来”这一预设的唤醒词,并且电子设 备通过第二次声纹识别确定出该当前用户确实是机主,在这种情况下,可以进入207中。
若第二次唤醒词识别和第二次声纹识别并未全部验证通过,则电子设备可以执行其它操作。
在207中,若第二次唤醒词识别和第二次声纹识别均验证通过,则电子设备执行唤醒操作。
比如,电子设备通过应用处理器对当前用户的语音信息A进行的第二次唤醒词识别和第二次声纹识别均验证通过,那么可以唤醒电子设备,即电子设备可以执行唤醒操作。例如,电子设备可以点亮显示屏并解锁显示屏等。
在一种实施方式中,本实施例的流程205中电子设备通过数字信号处理器对语音信息进行第一次唤醒词识别和第一次声纹识别,可以包括:
通过数字信号处理器,电子设备对语音信息进行第一次唤醒词识别和第一次声纹识别,其中,数字信号处理器使用第一模型进行该第一次唤醒词识别和第一次声纹识别。
而流程206中电子设备通过应用处理器对语音信息进行第二次唤醒词识别和第二次声纹识别,可以包括:通过应用处理器,电子设备对语音信息进行第二次唤醒词识别和第二次声纹识别,其中,应用处理器使用第二模型进行该第二次唤醒词识别和第二次声纹识别。
其中,第一模型使用的参数数量小于第二模型使用的参数数量,以使第一模型的运行功耗小于第二模型的运行功耗。
比如,本实施例中可以使用第一模型和第二模型来对当前用户的语音信息A进行唤醒词识别和声纹识别。其中,第一模型使用到的参数数量小于第二模型使用到的参数数量,以使第一模型的运行功耗小于第二模型的运行功耗。例如,第一模型对当前用户的语音信息A进行唤醒词识别和声纹识别时使用到3个参数,分别为A、B、C。第二模型对当前用户的语音信息A进行唤醒词识别和声纹识别时使用到6个参数,分别为A、B、C、D、E、F。可以理解的是,本实施例中,第一模型和第二模型均可以对当前用户的语音信息A进行唤醒词识别和声纹识别。不过由于第二模型使用的参数更多,因此相对于第一模型的识别结果,第二模型进行唤醒词识别和声纹识别的识别结果会更加准确。
在本实施例中,由于数字信号处理器能够使用的资源有限,因此在对当前用户的语音信息A进行第一次唤醒词识别和第一次声纹识别时,电子设备可以使用第一模型来进行该第一次唤醒词识别和第一次声纹识别。而由于应用处理器能够使用的资源较多,因此在对当前用户的语音信息A进行第二次唤醒词识别和第二次声纹识别时,电子设备可以使用第二模型来进行该第二次唤醒词识别和第二次声纹识别,从而通过第二模型的识别结果来进一步确认当前用户的语音信息A中是否包含预设的唤醒词,以及进一步确认当前用户是否为机主。
在一种实施方式中,本实施例的流程205中通过数字信号处理器,电子设备对语音信息进行第一次唤醒词识别和第一次声纹识别,可以包括:
通过数字信号处理器,电子设备使用高斯混合模型对语音信息进行第一次唤醒词识别和第一次声纹识别。
比如,在通过数字信号处理器对当前用户的语音信息A进行唤醒词识别和声纹识别时,电子设备可以使用高斯混合模型来对该语音信息A进行唤醒词识别和声纹识别。
当然,其它实施方式中,电子设备也可以使用其它算法模型来对当前用户的语音信息A进行第一次唤醒词识别和第一次声纹识别,本实施例对此不做具体限定。
在一种实施方式中,本实施例的流程206中通过应用处理器,电子设备对语音信息进行第二次唤醒词识别和第二次声纹识别,可以包括:
通过应用处理器,电子设备使用深度神经网络算法模型对所述语音信息进行第二次唤醒词识别和第二次声纹识别。
比如,在通过应用处理器对当前用户的语音信息A进行唤醒词识别和声纹识别时,电子设备可以使用深度神经网络(Deep Neural Network,DNN)算法模型来对该语音信息A进行唤醒词识别和声纹识别。
当然,其它实施方式中,电子设备也可以使用其它算法模型来对当前用户的语音信息A进行第二次唤醒词识别和第二次声纹识别,本实施例对此不做具体限定。
可以理解的是,本实施例中,电子设备可以先对采集到的周围环境的声音信号进行语音活动检测,以检测该声音信号中是否包含用户声音。其中,语音活动检测是在数字信号处理器处于低频模式下进行的。因此,本实施例可以极低的功耗进行语音活动检测。
若通过语音活动检测判断出周围环境的声音信号中包含用户声音,那么电子设备可以控制数字信号处理器进入高频模式,并对用户语音进行第一次唤醒词识别和第一次声纹识别。只有在该第一次唤醒词识别和第一次声纹识别均验证通过时,电子设备才会触发应用处理器进行第二次唤醒词识别和第二次声纹识别,由于应用处理器的功耗一般是数字信号处理器功耗50倍至100倍,因此本实施例通过在数字信号处理器侧确定出当前用户为机主并且说出唤醒词时才触发应用处理器进行语音识别和声纹识别,这可以有效避免相关技术中仅在应用处理器侧进行声纹识别时带来的功耗浪费。
此外,由于本实施例唤醒词识别和声纹识别均进行了二次验证,因此可以降低误唤醒电子设备的情况发生,提高用户唤醒电子设备的体验。
请参阅图3至图6,图3至图6为本申请实施例提供的唤醒设备的方法的场景示意图。
比如,电子设备的机主为甲。如图3所示,电子设备提示机主说出用作唤醒词的语音。例如,机主用户甲说出了“小欧醒过来”的语音,那么电子设备可以通过其麦克风采集到该语音。之后,电子设备可以对该语音进行语音识别和声纹识别,从而得到用于唤醒本电子设备的唤醒词“小欧醒过来”,以及机主用户甲的声纹特征。
在息屏且锁屏状态下,电子设备可以通过其麦克风采集周围环境的声音信号。在采集到声音信号后,电子设备可以控制其数字信号处理器进入预设的低频模式,并通过低频模 式下的数字信号处理器对采集到的声音信号进行语音活动检测,以检测该声音信号中是否包含用户声音。
例如,如图4所示,当前用户对着电子设备说出了一段语音“小欧醒过来”(语音B)。电子设备的麦克风在采集到这段语音B后,通过低频模式下的数字信号处理器对语音B进行了语音活动检测,并检测出其中包含用户声音。
若通过语音活动检测判断出采集到的声音信号中包含用户声音,则电子设备可以从该声音信号中获取当前用户的语音信息。之后,电子设备可以控制其数字信号处理器进入预设的高频模式,并在高频模式下对获取到的当前用户的语音信息进行唤醒词识别和声纹识别。即,电子设备对当前用户的语音信息进行第一次唤醒词识别和第一次声纹识别。
例如,电子设备可以通过高频模式下的数字信号处理器对当前用户的语音B进行第一次唤醒词识别,以检测该语音B是否为预设的唤醒词“小欧醒过来”。并且,电子设备可以从当前用户的语音B中提取当前用户的声纹特征,并将当前用户的声纹特征与预设的机主用户甲的声纹特征进行匹配,以检测当前用户是否为机主用户甲。
若第一次唤醒词识别和第一次声纹识别均验证通过,则电子设备可以通过其应用处理器对当前用户的语音信息再次进行唤醒词识别和声纹识别。即,电子设备对当前用户的语音信息进行第二次唤醒词识别和第二次声纹识别。其中,第二次唤醒词识别同样是检测当前用户的语音信息中是否包含预设的唤醒词。第二次声纹识别同样是检测当前用户是否为机主用户甲。
若第二次唤醒词识别和第二次声纹识别也都验证通过,则电子设备可以执行唤醒操作,例如电子设备可以点亮屏显示屏并进行解锁。
例如,电子设备通过第一次声纹识别检测到当前用户是机主用户甲,并通过第一次唤醒词识别检测到语音B为预设的唤醒词“小欧醒过来”。并且,电子设备通过第二次声纹识别进一步确认了当前用户为机主用户甲,并通过第二次唤醒词识别进一步确认了语音B确实为预设的唤醒词“小欧醒过来”。在这种情况下,电子设备可以点亮显示屏并对显示屏进行解锁。
上述流程的示意图可以如图5所示。
如果第一次唤醒词识别和第一次声纹识别并未均验证通过,则电子设备可以执行其它操作,而不会进行第二次声纹识别和第二次唤醒词识别。
例如,如图6所示,非机主用户乙对着电子设备说出了包含唤醒词的语音C“小欧醒过来”。但是,在对用户乙说出的语音C进行第一次声纹识别时,由于用户乙不是机主用户甲,因此第一次声纹识别无法验证通过。那么,电子设备不会对语音C进行第二次唤醒词识别和第二次声纹识别。
而在相关技术中,由于电子设备仅会在进行第二次唤醒词识别时才会进行声纹识别,而不会在进行第一次唤醒词识别时进行声纹识别。因此,虽然语音C是用户乙说得,但是 由于电子设备在进行第一次唤醒词识别时不会进行声纹识别,因此电子设备在检测到语音C中包含唤醒词“小欧醒过来”之后就会触发对语音C进行第二次唤醒词识别和声纹识别。在进行声纹识别时,电子设备会检测到当前用户不是机主用户甲,因此电子设备不会执行唤醒操作。因此,相关技术中,电子设备进行的第二次唤醒词识别以及声纹识别事实上造成了电子设备功耗的浪费。
请参阅图7,图7为本申请实施例提供的唤醒设备的装置的结构示意图。唤醒设备的装置300可以包括:获取模块301,第一识别模块302,第二识别模块303,唤醒模块304。
获取模块301,用于获取语音信息。
第一识别模块302,用于通过电子设备的数字信号处理器,对所述语音信息进行第一次唤醒词识别和第一次声纹识别。
第二识别模块303,用于若所述第一次唤醒词识别和所述第一次声纹识别均验证通过,则通过所述电子设备的应用处理器,对所述语音信息进行第二次唤醒词识别和第二次声纹识别。
唤醒模块304,用于若所述第二次唤醒词识别和所述第二次声纹识别均验证通过,则唤醒所述电子设备。
在一种实施方式中,获取模块301可以用于:
获取周围环境的声音信号,并对所述声音信号进行语音活动检测;
根据所述语音活动检测,判断所述声音信号中是否存在用户声音;
若存在用户声音,则获取语音信息。
在一种实施方式中,获取模块301可以用于:
通过所述电子设备的数字信号处理器,对所述声音信号进行语音活动检测。
在一种实施方式中,所述获取模块301可以用于:
通过所述电子设备的数字信号处理器,对所述声音信号进行语音活动检测,其中,当进行所述语音活动检测时,所述数字信号处理器处于预设低频模式。
在一种实施方式中,第一识别模块302可以用于:
通过电子设备的数字信号处理器,对所述语音信息进行第一次唤醒词识别和第一次声纹识别,其中,当进行所述第一次唤醒词识别和第一次声纹识别时,所述数字信号处理器处于预设高频模式。
在一种实施方式中,第一识别模块302可以用于:通过电子设备的数字信号处理器,对所述语音信息进行第一次唤醒词识别和第一次声纹识别,其中,所述数字信号处理器使用第一模型进行所述第一次唤醒词识别和第一次声纹识别;
第二识别模块303可以用于:通过所述电子设备的应用处理器,对所述语音信息进行第二次唤醒词识别和第二次声纹识别,其中,所述应用处理器使用第二模型进行所述第二次唤醒词识别和第二次声纹识别。
其中,所述第一模型使用的参数数量小于所述第二模型使用的参数数量,以使所述第一模型的运行功耗小于所述第二模型的运行功耗。
在一种实施方式中,第一识别模块302可以用于:通过电子设备的数字信号处理器,使用高斯混合模型对所述语音信息进行第一次唤醒词识别和第一次声纹识别。
在一种实施方式中,第二识别模块303可以用于:通过所述电子设备的应用处理器,使用深度神经网络算法模型对所述语音信息进行第二次唤醒词识别和第二次声纹识别。
本申请实施例提供一种计算机可读的存储介质,其上存储有计算机程序,当所述计算机程序在计算机上执行时,使得所述计算机执行如本实施例提供的唤醒设备的方法中的流程。
本申请实施例还提供一种电子设备,包括存储器,处理器,所述处理器通过调用所述存储器中存储的计算机程序,用于执行本实施例提供的唤醒设备的方法中的流程。
例如,上述电子设备可以是诸如平板电脑或者智能手机等移动终端。请参阅图8,图8为本申请实施例提供的电子设备的结构示意图。
该电子设备400可以包括显示屏401、存储器402、处理器403、麦克风404等部件。本领域技术人员可以理解,图8中示出的电子设备结构并不构成对电子设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
显示屏401可以用于显示诸如图像、文字等信息。
存储器402可用于存储应用程序和数据。存储器402存储的应用程序中包含有可执行代码。应用程序可以组成各种功能模块。处理器403通过运行存储在存储器402的应用程序,从而执行各种功能应用以及数据处理。
处理器403是电子设备的控制中心,利用各种接口和线路连接整个电子设备的各个部分,通过运行或执行存储在存储器402内的应用程序,以及调用存储在存储器402内的数据,执行电子设备的各种功能和处理数据,从而对电子设备进行整体监控。
麦克风404可以用于采集用户的语音信息。
在本实施例中,电子设备中的处理器403会按照如下的指令,将一个或一个以上的应用程序的进程对应的可执行代码加载到存储器402中,并由处理器403来运行存储在存储器402中的应用程序,从而执行:
获取语音信息;通过电子设备的数字信号处理器,对所述语音信息进行第一次唤醒词识别和第一次声纹识别;若所述第一次唤醒词识别和所述第一次声纹识别均验证通过,则通过所述电子设备的应用处理器,对所述语音信息进行第二次唤醒词识别和第二次声纹识别;若所述第二次唤醒词识别和所述第二次声纹识别均验证通过,则唤醒所述电子设备。
请参阅图9,电子设备500可以包括显示屏501、存储器502、处理器503、麦克风504、扬声器505、电池506等部件。
显示屏501可以用于显示诸如图像、文字等信息。
存储器502可用于存储应用程序和数据。存储器502存储的应用程序中包含有可执行代码。应用程序可以组成各种功能模块。处理器503通过运行存储在存储器502的应用程序,从而执行各种功能应用以及数据处理。
处理器503是电子设备的控制中心,利用各种接口和线路连接整个电子设备的各个部分,通过运行或执行存储在存储器502内的应用程序,以及调用存储在存储器502内的数据,执行电子设备的各种功能和处理数据,从而对电子设备进行整体监控。
输入单元504可用于接收输入的数字、字符信息或用户特征信息(比如指纹),以及产生与用户设置以及功能控制有关的键盘、鼠标、操作杆、光学或者轨迹球信号输入。
输出单元505可用于显示由用户输入的信息或提供给用户的信息以及电子设备的各种图形用户接口,这些图形用户接口可以由图形、文本、图标、视频和其任意组合来构成。输出单元可包括显示面板。
在本实施例中,电子设备中的处理器503会按照如下的指令,将一个或一个以上的应用程序的进程对应的可执行代码加载到存储器502中,并由处理器503来运行存储在存储器502中的应用程序,从而执行:
获取语音信息;通过电子设备的数字信号处理器,对所述语音信息进行第一次唤醒词识别和第一次声纹识别;若所述第一次唤醒词识别和所述第一次声纹识别均验证通过,则通过所述电子设备的应用处理器,对所述语音信息进行第二次唤醒词识别和第二次声纹识别;若所述第二次唤醒词识别和所述第二次声纹识别均验证通过,则唤醒所述电子设备。
在一种实施方式中,在所述获取语音信息之前,处理器503还可以执行:获取周围环境的声音信号,并对所述声音信号进行语音活动检测;根据所述语音活动检测,判断所述声音信号中是否存在用户声音。
那么,处理器503在执行所述获取语音信息时,可以执行:若存在用户声音,则获取语音信息。
在一种实施方式中,处理器503在执行所述对所述声音信号进行语音活动检测时,可以执行:通过所述电子设备的数字信号处理器,对所述声音信号进行语音活动检测。
在一种实施方式中,处理器503在执行所述通过所述电子设备的数字信号处理器,对所述声音信号进行语音活动检测时,可以执行:通过所述电子设备的数字信号处理器,对所述声音信号进行语音活动检测,其中,当进行所述语音活动检测时,所述数字信号处理器处于预设低频模式。
在一种实施方式中,处理器503在执行所述通过电子设备的数字信号处理器,对所述语音信息进行第一次唤醒词识别和第一次声纹识别时,可以执行:通过电子设备的数字信号处理器,对所述语音信息进行第一次唤醒词识别和第一次声纹识别,其中,当进行所述第一次唤醒词识别和第一次声纹识别时,所述数字信号处理器处于预设高频模式。
在一种实施方式中,处理器503在执行所述通过电子设备的数字信号处理器,对所述 语音信息进行第一次唤醒词识别和第一次声纹识别时,可以执行:通过电子设备的数字信号处理器,对所述语音信息进行第一次唤醒词识别和第一次声纹识别,其中,所述数字信号处理器使用第一模型进行所述第一次唤醒词识别和第一次声纹识别。
那么,处理器503在执行所述通过所述电子设备的应用处理器,对所述语音信息进行第二次唤醒词识别和第二次声纹识别时,可以执行:通过所述电子设备的应用处理器,对所述语音信息进行第二次唤醒词识别和第二次声纹识别,其中,所述应用处理器使用第二模型进行所述第二次唤醒词识别和第二次声纹识别。其中,所述第一模型使用的参数数量小于所述第二模型使用的参数数量,以使所述第一模型的运行功耗小于所述第二模型的运行功耗。
在一种实施方式中,处理器503在执行所述通过电子设备的数字信号处理器,对所述语音信息进行第一次唤醒词识别和第一次声纹识别时,可以执行:通过电子设备的数字信号处理器,使用高斯混合模型对所述语音信息进行第一次唤醒词识别和第一次声纹识别。
在一种实施方式中,处理器503在执行所述通过所述电子设备的应用处理器,对所述语音信息进行第二次唤醒词识别和第二次声纹识别时,可以执行:通过所述电子设备的应用处理器,使用深度神经网络算法模型对所述语音信息进行第二次唤醒词识别和第二次声纹识别。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见上文针对唤醒设备的方法的详细描述,此处不再赘述。
本申请实施例提供的所述唤醒设备的装置与上文实施例中的唤醒设备的方法属于同一构思,在所述唤醒设备的装置上可以运行所述唤醒设备的方法实施例中提供的任一方法,其具体实现过程详见所述唤醒设备的方法实施例,此处不再赘述。
需要说明的是,对本申请实施例所述唤醒设备的方法而言,本领域普通技术人员可以理解实现本申请实施例所述唤醒设备的方法的全部或部分流程,是可以通过计算机程序来控制相关的硬件来完成,所述计算机程序可存储于一计算机可读取存储介质中,如存储在存储器中,并被至少一个处理器执行,在执行过程中可包括如所述唤醒设备的方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储器(ROM,Read Only Memory)、随机存取记忆体(RAM,Random Access Memory)等。
对本申请实施例的所述唤醒设备的装置而言,其各功能模块可以集成在一个处理芯片中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中,所述存储介质譬如为只读存储器,磁盘或光盘等。
以上对本申请实施例所提供的一种唤醒设备的方法、装置、存储介质以及电子设备进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施 例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (20)

  1. 一种唤醒设备的方法,其中,包括:
    获取语音信息;
    通过电子设备的数字信号处理器,对所述语音信息进行第一次唤醒词识别和第一次声纹识别;
    若所述第一次唤醒词识别和所述第一次声纹识别均验证通过,则通过所述电子设备的应用处理器,对所述语音信息进行第二次唤醒词识别和第二次声纹识别;
    若所述第二次唤醒词识别和所述第二次声纹识别均验证通过,则唤醒所述电子设备。
  2. 根据权利要求1所述的唤醒设备的方法,其中,在所述获取语音信息之前,还包括:
    获取周围环境的声音信号,并对所述声音信号进行语音活动检测;
    根据所述语音活动检测,判断所述声音信号中是否存在用户声音;
    所述获取语音信息,包括:若存在用户声音,则获取语音信息。
  3. 根据权利要求2所述的唤醒设备的方法,其中,所述对所述声音信号进行语音活动检测,包括:
    通过所述电子设备的数字信号处理器,对所述声音信号进行语音活动检测。
  4. 根据权利要求3所述的唤醒设备的方法,其中,所述通过所述电子设备的数字信号处理器,对所述声音信号进行语音活动检测,包括:
    通过所述电子设备的数字信号处理器,对所述声音信号进行语音活动检测,其中,当进行所述语音活动检测时,所述数字信号处理器处于预设低频模式。
  5. 根据权利要求4所述的唤醒设备的方法,其中,所述通过电子设备的数字信号处理器,对所述语音信息进行第一次唤醒词识别和第一次声纹识别,包括:
    通过电子设备的数字信号处理器,对所述语音信息进行第一次唤醒词识别和第一次声纹识别,其中,当进行所述第一次唤醒词识别和第一次声纹识别时,所述数字信号处理器处于预设高频模式。
  6. 根据权利要求1所述的唤醒设备的方法,其中,所述通过电子设备的数字信号处理器,对所述语音信息进行第一次唤醒词识别和第一次声纹识别,包括:
    通过电子设备的数字信号处理器,对所述语音信息进行第一次唤醒词识别和第一次声纹识别,其中,所述数字信号处理器使用第一模型进行所述第一次唤醒词识别和第一次声纹识别;
    所述通过所述电子设备的应用处理器,对所述语音信息进行第二次唤醒词识别和第二次声纹识别,包括:通过所述电子设备的应用处理器,对所述语音信息进行第二次唤醒词识别和第二次声纹识别,其中,所述应用处理器使用第二模型进行所述第二次唤醒词识别和第二次声纹识别;
    其中,所述第一模型使用的参数数量小于所述第二模型使用的参数数量,以使所述第 一模型的运行功耗小于所述第二模型的运行功耗。
  7. 根据权利要求1所述的唤醒设备的方法,其中,所述通过电子设备的数字信号处理器,对所述语音信息进行第一次唤醒词识别和第一次声纹识别,包括:
    通过电子设备的数字信号处理器,使用高斯混合模型对所述语音信息进行第一次唤醒词识别和第一次声纹识别。
  8. 根据权利要求1所述的唤醒设备的方法,其中,所述通过所述电子设备的应用处理器,对所述语音信息进行第二次唤醒词识别和第二次声纹识别,包括:
    通过所述电子设备的应用处理器,使用深度神经网络算法模型对所述语音信息进行第二次唤醒词识别和第二次声纹识别。
  9. 一种唤醒设备的装置,其中,包括:
    获取模块,用于获取语音信息;
    第一识别模块,用于通过电子设备的数字信号处理器,对所述语音信息进行第一次唤醒词识别和第一次声纹识别;
    第二识别模块,用于若所述第一次唤醒词识别和所述第一次声纹识别均验证通过,则通过所述电子设备的应用处理器,对所述语音信息进行第二次唤醒词识别和第二次声纹识别;
    唤醒模块,用于若所述第二次唤醒词识别和所述第二次声纹识别均验证通过,则唤醒所述电子设备。
  10. 根据权利要求9所述的唤醒设备的装置,其中,所述获取模块还用于:
    获取周围环境的声音信号,并对所述声音信号进行语音活动检测;
    根据所述语音活动检测,判断所述声音信号中是否存在用户声音;
    若存在用户声音,则获取语音信息。
  11. 根据权利要求10所述的唤醒设备的装置,其中,所述获取模块用于:
    通过所述电子设备的数字信号处理器,对所述声音信号进行语音活动检测。
  12. 一种存储介质,其上存储有计算机程序,其中,当所述计算机程序在计算机上执行时,使得所述计算机执行如权利要求1至8中任一项所述的方法。
  13. 一种电子设备,包括存储器,处理器,其中,所述处理器通过调用所述存储器中存储的计算机程序,用于执行:
    获取语音信息;
    通过电子设备的数字信号处理器,对所述语音信息进行第一次唤醒词识别和第一次声纹识别;
    若所述第一次唤醒词识别和所述第一次声纹识别均验证通过,则通过所述电子设备的应用处理器,对所述语音信息进行第二次唤醒词识别和第二次声纹识别;
    若所述第二次唤醒词识别和所述第二次声纹识别均验证通过,则唤醒所述电子设备。
  14. 根据权利要求13所述的电子设备,其中,所述处理器用于执行:
    获取周围环境的声音信号,并对所述声音信号进行语音活动检测;
    根据所述语音活动检测,判断所述声音信号中是否存在用户声音;
    若存在用户声音,则获取语音信息。
  15. 根据权利要求14所述的电子设备,其中,所述处理器用于执行:
    通过所述电子设备的数字信号处理器,对所述声音信号进行语音活动检测。
  16. 根据权利要求15所述的电子设备,其中,所述处理器用于执行:
    通过所述电子设备的数字信号处理器,对所述声音信号进行语音活动检测,其中,当进行所述语音活动检测时,所述数字信号处理器处于预设低频模式。
  17. 根据权利要求16所述的电子设备,其中,所述处理器用于执行:
    通过电子设备的数字信号处理器,对所述语音信息进行第一次唤醒词识别和第一次声纹识别,其中,当进行所述第一次唤醒词识别和第一次声纹识别时,所述数字信号处理器处于预设高频模式。
  18. 根据权利要求13所述的电子设备,其中,所述处理器用于执行:
    通过电子设备的数字信号处理器,对所述语音信息进行第一次唤醒词识别和第一次声纹识别,其中,所述数字信号处理器使用第一模型进行所述第一次唤醒词识别和第一次声纹识别;
    通过所述电子设备的应用处理器,对所述语音信息进行第二次唤醒词识别和第二次声纹识别,其中,所述应用处理器使用第二模型进行所述第二次唤醒词识别和第二次声纹识别;
    其中,所述第一模型使用的参数数量小于所述第二模型使用的参数数量,以使所述第一模型的运行功耗小于所述第二模型的运行功耗。
  19. 根据权利要求13所述的电子设备,其中,所述处理器用于执行:
    通过电子设备的数字信号处理器,使用高斯混合模型对所述语音信息进行第一次唤醒词识别和第一次声纹识别。
  20. 根据权利要求13所述的电子设备,其中,所述处理器用于执行:
    通过所述电子设备的应用处理器,使用深度神经网络算法模型对所述语音信息进行第二次唤醒词识别和第二次声纹识别。
PCT/CN2018/116493 2018-11-20 2018-11-20 唤醒设备的方法、装置、存储介质及电子设备 WO2020102991A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2018/116493 WO2020102991A1 (zh) 2018-11-20 2018-11-20 唤醒设备的方法、装置、存储介质及电子设备
CN201880097795.9A CN112740321A (zh) 2018-11-20 2018-11-20 唤醒设备的方法、装置、存储介质及电子设备

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/116493 WO2020102991A1 (zh) 2018-11-20 2018-11-20 唤醒设备的方法、装置、存储介质及电子设备

Publications (1)

Publication Number Publication Date
WO2020102991A1 true WO2020102991A1 (zh) 2020-05-28

Family

ID=70773736

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/116493 WO2020102991A1 (zh) 2018-11-20 2018-11-20 唤醒设备的方法、装置、存储介质及电子设备

Country Status (2)

Country Link
CN (1) CN112740321A (zh)
WO (1) WO2020102991A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111951793A (zh) * 2020-08-13 2020-11-17 北京声智科技有限公司 唤醒词识别的方法、装置及存储介质

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117524228A (zh) * 2024-01-08 2024-02-06 腾讯科技(深圳)有限公司 语音数据处理方法、装置、设备及介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2509291B1 (en) * 2011-04-06 2015-06-03 BlackBerry Limited System and method for locating a misplaced mobile device
CN106815507A (zh) * 2015-11-30 2017-06-09 中兴通讯股份有限公司 语音唤醒实现方法、装置及终端
CN107886957A (zh) * 2017-11-17 2018-04-06 广州势必可赢网络科技有限公司 一种结合声纹识别的语音唤醒方法及装置
CN108831477A (zh) * 2018-06-14 2018-11-16 出门问问信息科技有限公司 一种语音识别方法、装置、设备及存储介质

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9613626B2 (en) * 2015-02-06 2017-04-04 Fortemedia, Inc. Audio device for recognizing key phrases and method thereof
CN107767863B (zh) * 2016-08-22 2021-05-04 科大讯飞股份有限公司 语音唤醒方法、***及智能终端
CN106448663B (zh) * 2016-10-17 2020-10-23 海信集团有限公司 语音唤醒方法及语音交互装置
US10403279B2 (en) * 2016-12-21 2019-09-03 Avnera Corporation Low-power, always-listening, voice command detection and capture
US10311870B2 (en) * 2017-05-10 2019-06-04 Ecobee Inc. Computerized device with voice command input capability

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2509291B1 (en) * 2011-04-06 2015-06-03 BlackBerry Limited System and method for locating a misplaced mobile device
CN106815507A (zh) * 2015-11-30 2017-06-09 中兴通讯股份有限公司 语音唤醒实现方法、装置及终端
CN107886957A (zh) * 2017-11-17 2018-04-06 广州势必可赢网络科技有限公司 一种结合声纹识别的语音唤醒方法及装置
CN108831477A (zh) * 2018-06-14 2018-11-16 出门问问信息科技有限公司 一种语音识别方法、装置、设备及存储介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111951793A (zh) * 2020-08-13 2020-11-17 北京声智科技有限公司 唤醒词识别的方法、装置及存储介质

Also Published As

Publication number Publication date
CN112740321A (zh) 2021-04-30

Similar Documents

Publication Publication Date Title
WO2021093449A1 (zh) 基于人工智能的唤醒词检测方法、装置、设备及介质
CN107767863B (zh) 语音唤醒方法、***及智能终端
US9779725B2 (en) Voice wakeup detecting device and method
WO2021159688A1 (zh) 声纹识别方法、装置、存储介质、电子装置
US20170256270A1 (en) Voice Recognition Accuracy in High Noise Conditions
US20220215853A1 (en) Audio signal processing method, model training method, and related apparatus
US10147444B2 (en) Electronic apparatus and voice trigger method therefor
CN109272991B (zh) 语音交互的方法、装置、设备和计算机可读存储介质
CN106448663A (zh) 语音唤醒方法及语音交互装置
CN110211599B (zh) 应用唤醒方法、装置、存储介质及电子设备
CN110825446B (zh) 参数配置方法、装置、存储介质及电子设备
CN110223687B (zh) 指令执行方法、装置、存储介质及电子设备
CN110544468B (zh) 应用唤醒方法、装置、存储介质及电子设备
CN112700782A (zh) 语音处理方法和电子设备
US9633655B1 (en) Voice sensing and keyword analysis
CN111508493B (zh) 语音唤醒方法、装置、电子设备及存储介质
CN111312222A (zh) 一种唤醒、语音识别模型训练方法及装置
US20220292134A1 (en) Device operation based on dynamic classifier
US11437022B2 (en) Performing speaker change detection and speaker recognition on a trigger phrase
WO2019228135A1 (zh) 匹配阈值的调整方法、装置、存储介质及电子设备
WO2020102991A1 (zh) 唤醒设备的方法、装置、存储介质及电子设备
WO2021169711A1 (zh) 指令执行方法、装置、存储介质及电子设备
TW201717192A (zh) 電子裝置及其透過語音辨識喚醒的方法
CN110164431A (zh) 一种音频数据处理方法及装置、存储介质
US11205433B2 (en) Method and apparatus for activating speech recognition

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18940840

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18940840

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 30.09.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18940840

Country of ref document: EP

Kind code of ref document: A1