WO2023202442A1 - 唤醒设备的方法、电子设备和存储介质 - Google Patents

唤醒设备的方法、电子设备和存储介质 Download PDF

Info

Publication number
WO2023202442A1
WO2023202442A1 PCT/CN2023/087805 CN2023087805W WO2023202442A1 WO 2023202442 A1 WO2023202442 A1 WO 2023202442A1 CN 2023087805 W CN2023087805 W CN 2023087805W WO 2023202442 A1 WO2023202442 A1 WO 2023202442A1
Authority
WO
WIPO (PCT)
Prior art keywords
electronic device
audio signal
wake
recognition
user
Prior art date
Application number
PCT/CN2023/087805
Other languages
English (en)
French (fr)
Inventor
方策
郭峰
覃尧钧
陈一丹
张时
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023202442A1 publication Critical patent/WO2023202442A1/zh

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies
    • G10L17/14Use of phonemic categorisation or speech recognition prior to speaker recognition or verification
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/18Artificial neural networks; Connectionist approaches
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Definitions

  • the embodiments of the present application relate to the field of terminal technology, and in particular, to a method for waking up a device, an electronic device, and a storage medium.
  • a user speaks a wake word to wake up an electronic device.
  • the electronic device receives the audio signal input by the user, and after parsing the wake-up word, the device can be woken up.
  • the electronic device may be awakened by mistake or may not be successfully awakened, which reduces the accuracy of the electronic device being awakened.
  • Embodiments of the present application provide a method for waking up a device, an electronic device, and a storage medium, which improves the accuracy of waking up the electronic device.
  • a first aspect provides a method for waking up a device, including: when it is determined that the received audio signal includes a wake-up word, performing machine sound recognition and/or voiceprint recognition on the audio signal to obtain a recognition result; the recognition result is used to indicate wake-up.
  • the electronic device does not wake up the electronic device or is not sure whether to wake up the electronic device; if the recognition result indicates that it is uncertain whether to wake up the electronic device, prompt information is output to the user; the prompt information is used to guide the user to wake up the electronic device.
  • the method for waking up the device improves the accuracy of determining the user's identity by performing machine sound recognition and/or voiceprint recognition on the audio signal, and avoids false waking up of the device. Moreover, when it is uncertain whether to wake up the electronic device, prompt information is output to the user and human-computer interaction is performed with the user, so that the user further confirms whether to wake up the electronic device, thereby avoiding the electronic device from being accidentally awakened or failing to wake up successfully, and improving the efficiency of the electronic device. The accuracy of the device waking up.
  • the prompt information is used to guide the user to perform voice interaction with the electronic device to determine whether to wake up the electronic device; or, the prompt information is used to guide the user to execute the preset within the shooting range of the shooting device on the target device. action to determine whether to wake up the electronic device; alternatively, the prompt information is used to guide the user to operate in the target interface displayed on the target device to determine whether to wake up the electronic device; alternatively, the prompt information is used to guide the user to perform operations on the target physical device on the target device. Press the key to determine whether to wake up the electronic device.
  • the interaction between the electronic device and the user can take various forms, improving user experience.
  • the target device is an electronic device, or a first device that communicates with the electronic device.
  • the electronic device can be used to complete human-computer interaction with the user to confirm whether to wake up the electronic device, or the first device can be used to complete human-computer interaction with the user to confirm whether to wake up the electronic device.
  • the human-computer interaction method is more flexible.
  • the user accounts of the electronic device and the first device are the same.
  • the user accounts of the electronic device and the first device are the same, and the electronic device can more easily discover the first device, so that it can complete the interaction with the user through the first device and further confirm whether to wake up the electronic device.
  • outputting prompt information to the user includes: outputting voice prompt information to the user; or displaying a target interface to the user, and the target interface includes prompt information.
  • the electronic device directly outputs prompt information, and the electronic device can interact with the user through the first device.
  • outputting prompt information to the user includes: transmitting instruction information with the first device, where the instruction information is used to instruct the first device to output the prompt information to the user.
  • the first device prompts information, and the interaction with the user can be realized through the first device.
  • the first device includes a mobile phone and/or a watch.
  • machine sound recognition and/or voiceprint recognition are performed on the audio signal to obtain the recognition result, which includes: performing machine sound recognition and voiceprint recognition on the audio signal; if it is determined that the audio signal is a machine sound, then identifying The result indicates not to wake up the electronic device; if it is determined that the audio signal is not a machine sound, and the voiceprint recognition is successful, the recognition result indicates that the electronic device is woken up; if it is determined that the audio signal is not a machine sound, and the voiceprint recognition fails, the recognition result indicates that it is not sure whether Wake up electronic devices.
  • machine sound recognition and voiceprint recognition are performed on audio signals, which improves the accuracy of determining user identity.
  • the recognition result indicates three results: waking up the electronic device, not waking up the electronic device, or not sure whether to wake up the electronic device. By adding the determination result of uncertainty about whether to wake up the electronic device, it can be further confirmed whether to wake up the electronic device, thereby improving the accuracy of waking up the electronic device.
  • performing machine sound recognition and voiceprint recognition on the audio signal includes: performing machine sound recognition on the audio signal; if it is determined that the audio signal is not a machine sound, performing voiceprint recognition on the audio signal.
  • machine sound recognition is performed first. If it is determined that the sound is not a machine sound, voiceprint recognition is then performed, which improves processing efficiency.
  • performing machine sound recognition and voiceprint recognition on the audio signal includes: inputting the audio signal into a voiceprint authentication model to obtain the first result and the voiceprint feature information of the audio signal; the first result is used Indicate whether the audio signal is a machine sound; if the first result indicates that the audio signal is not a machine sound, perform voiceprint recognition on the audio signal based on the voiceprint feature information of the audio signal and the voiceprint template library.
  • machine voice recognition and voiceprint recognition are realized through the voiceprint authentication model, and the coupling between machine voice recognition and voiceprint recognition is realized using parameter sharing of the neural network model.
  • the audio signal is subjected to machine sound recognition and/or voiceprint recognition to obtain the recognition result, which includes: performing machine sound recognition on the audio signal; if it is determined that the audio signal is a machine sound, the recognition result indicates not to wake up. Electronic equipment; if it is determined that the audio signal is not a machine sound, the recognition result indicates waking up the electronic equipment; if not It is determined whether the audio signal is a machine sound, and the recognition result indicates that it is uncertain whether to wake up the electronic device.
  • machine sound recognition of audio signals is realized to determine whether to wake up the electronic device.
  • machine sound recognition and/or voiceprint recognition are performed on the audio signal to obtain the recognition result, which includes: performing voiceprint recognition on the audio signal; if it is determined that the voiceprint recognition is successful, the recognition result indicates to wake up the electronic device ; If it is determined that voiceprint recognition fails, the recognition result indicates that it is uncertain whether to wake up the electronic device.
  • voiceprint recognition of audio signals is realized to determine whether to wake up the electronic device.
  • the method further includes: obtaining response information input by the user according to the prompt information; and determining whether to wake up the electronic device according to the response information.
  • the prompt information is used to guide the user to perform a preset action within the shooting range of the shooting device on the electronic device.
  • the method further includes: initiating shooting on the electronic device. equipment.
  • the electronic device stores a voiceprint template library
  • the method further includes: if the electronic device is determined to be awakened based on the response information, updating the voiceprint template library according to the audio signal.
  • the electronic device interacts with the user and determines to wake up the electronic device based on the response information input by the user, it means that the audio signal can wake up the electronic device. Therefore, updating the voiceprint template library based on audio signals improves the probability of successful subsequent voiceprint recognition and improves the accuracy of waking up electronic devices.
  • a device for waking up a device including: a counterfeiting module, configured to perform machine sound recognition and/or voiceprint recognition on the audio signal to obtain a recognition result when it is determined that the received audio signal includes a wake-up word;
  • the recognition result is used to indicate to wake up the electronic device, not to wake up the electronic device, or it is uncertain whether to wake up the electronic device;
  • the output module is used to output prompt information to the user if the recognition result indicates that it is uncertain whether to wake up the electronic device; the prompt information is used to guide the user Wake up electronic devices.
  • the prompt information is used to guide the user to perform voice interaction with the electronic device to determine whether to wake up the electronic device; or, the prompt information is used to guide the user to execute the preset within the shooting range of the shooting device on the target device. action to determine whether to wake up the electronic device; alternatively, the prompt information is used to guide the user to operate in the target interface displayed on the target device to determine whether to wake up the electronic device; alternatively, the prompt information is used to guide the user to perform operations on the target physical device on the target device. Press the key to determine whether to wake up the electronic device.
  • the target device is an electronic device, or a first device that communicates with the electronic device.
  • the user accounts of the electronic device and the first device are the same.
  • the output module is used to: output voice prompt information to the user; or display a target interface to the user, and the target interface includes the prompt information.
  • a transmission module is further included, configured to transmit indication information with the first device, where the indication information is used to instruct the first device to output prompt information to the user.
  • the first device includes a mobile phone and/or a watch.
  • the counterfeit identification module is used to: perform machine sound recognition and voiceprint recognition on the audio signal; if it is determined that the audio signal is a machine sound, the recognition result indicates not to wake up the electronic device; if it is determined that the audio signal is not a machine sound , and the voiceprint recognition is successful, the recognition result indicates that the electronic device is awakened; if it is determined that the audio signal is not a machine sound, and the voiceprint recognition fails, the recognition result indicates that it is uncertain whether to wake up the electronic device.
  • the counterfeiting identification module is used to: perform machine sound recognition on the audio signal; if it is determined that the audio signal is not a machine sound, perform voiceprint recognition on the audio signal.
  • the counterfeiting identification module is used to: input the audio signal into the voiceprint identification model to obtain a first result and the voiceprint feature information of the audio signal; the first result is used to indicate whether the audio signal is a machine sound; If the first result indicates that the audio signal is not a machine sound, voiceprint recognition is performed on the audio signal based on the voiceprint feature information of the audio signal and the voiceprint template library.
  • the counterfeit identification module is used to: perform machine sound recognition on the audio signal; if it is determined that the audio signal is a machine sound, the recognition result indicates not to wake up the electronic device; if it is determined that the audio signal is not a machine sound, the recognition result Indicates to wake up the electronic device; if it is uncertain whether the audio signal is a machine sound, the recognition result indicates that it is uncertain whether to wake up the electronic device.
  • the counterfeit identification module is used to: perform voiceprint recognition on the audio signal; if it is determined that the voiceprint recognition is successful, the recognition result indicates that the electronic device is awakened; if it is determined that the voiceprint recognition fails, the recognition result indicates that it is uncertain. Whether to wake up electronic devices.
  • a possible implementation also includes a confirmation module, which is used to: obtain response information input by the user according to the prompt information; and determine whether to wake up the electronic device according to the response information.
  • the prompt information is used to guide the user to perform preset actions within the shooting range of the shooting device on the electronic device
  • the confirmation module is also used to: before obtaining the response information input by the user according to the prompt information, start the electronic Shooting equipment on the device.
  • the electronic device stores a voiceprint template library and further includes an update module.
  • the update module is configured to: if the electronic device is determined to be awakened based on the response information, update the voiceprint template library according to the audio signal.
  • a possible implementation also includes a wake word module, and the wake word module is used to: obtain the audio signal and determine whether the audio signal includes the wake word.
  • a third aspect provides an electronic device, including a processor.
  • the processor is configured to be coupled to a memory, read instructions in the memory, and cause the electronic device to execute the method provided in the first aspect according to the instructions.
  • a fourth aspect provides a program, which when executed by a processor is used to perform the method provided in the first aspect.
  • a computer-readable storage medium is provided. Instructions are stored in the computer-readable storage medium. When the instructions are run on a computer or processor, the method provided in the first aspect is implemented.
  • a sixth aspect provides a program product, the program product comprising a computer program, the computer program being stored in a readable storage medium, and at least one processor of the device being able to read the computer program from the readable storage medium. , the at least one processor executes the computer program so that the device implements the method provided in the first aspect.
  • FIGS. 1A to 1D are schematic diagrams of a set of application scenarios for waking up electronic devices provided by embodiments of the present application;
  • Figure 2 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • Figure 3 is another structural schematic diagram of an electronic device provided by an embodiment of the present application.
  • Figure 4 is a flow chart of a method for waking up a device provided by an embodiment of the present application
  • Figure 5 is a schematic structural diagram of a voiceprint authentication model provided by an embodiment of the present application.
  • Figures 6A to 6G are schematic diagrams of a set of application scenarios for outputting prompt information provided by embodiments of the present application.
  • Figure 7 is a schematic structural diagram of a device for waking up a device provided by an embodiment of the present application.
  • FIG. 8 is another schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the method for waking up a device is suitable for the scenario where the electronic device is woken up by the user.
  • the embodiments of this application do not limit the name and type of the electronic device.
  • electronic devices can also be called Internet of things (IOT) devices, terminals, mobile terminals, terminal devices, smart devices or user devices, etc.
  • IOT Internet of things
  • terminals mobile terminals
  • terminal devices smart devices or user devices
  • smart speakers smart home appliances
  • mobile phones etc.
  • the electronic device is a speaker as an example.
  • FIGS. 1A to 1D are schematic diagrams of a set of application scenarios for waking up electronic devices provided by embodiments of the present application, but FIGS. 1A to 1D do not limit the application scenarios.
  • the user can speak the speaker's wake-up word to wake up the speaker.
  • the speaker receives the audio signal input by the user and analyzes the audio signal. After parsing the wake-up word, the speaker can be woken up.
  • the wake-up word is preset information for waking up the electronic device.
  • the name and specific content of the wake-up word are not limited in the embodiments of this application. For example, wake words can also be called keywords.
  • the TV is playing a program
  • the character in the picture speaks a wake-up word or a sentence with a wake-up word.
  • the speaker receives the audio signal and analyzes the audio signal. In this scenario, no user wants to wake up the speaker. However, the speaker may parse the wake word after receiving the audio signal, causing the speaker to wake up by mistake.
  • the mobile phone records the wake word spoken by the user. Subsequently, the phone plays the user's recording near the speaker. Correspondingly, the speaker receives the audio signal and analyzes the audio signal. In this scenario, the user does not wake up the speaker. However, the speaker will parse the wake-up word after receiving the audio signal, which may cause the speaker to wake up accidentally.
  • the environment where the speaker is located is noisy, and the user is far away from the speaker.
  • the user can speak the speaker's wake word to wake up the speaker.
  • the speaker receives the audio signal and analyzes the audio signal.
  • the noise in the environment interferes with the user's voice, which may cause the speaker to fail to wake up.
  • the electronic devices can perform voiceprint recognition on input audio signals to avoid accidentally waking up the device.
  • Voiceprint refers to the sound wave spectrum that carries speech information, which is specific and stable.
  • Voiceprint recognition is a type of biometric technology that can identify the speaker's identity through his or her voice.
  • the electronic device may include a wake-up word module 21 and a voiceprint recognition module 22 .
  • Methods for waking up a device can include:
  • the wake-up word module 21 is used to obtain the audio signal and determine whether the audio signal includes the wake-up word. If the audio signal includes the wake-up word, the electronic device is controlled to enter the wake-up activation state and obtain the voiceprint feature information of the audio signal.
  • the voiceprint recognition module 22 is used to perform voiceprint recognition on the audio signal based on the voiceprint feature information of the audio signal. If the voiceprint recognition is successful, the device will be woken up; if the voiceprint recognition fails, the device will not be woken up.
  • one implementation method is: establishing and storing a voiceprint template library, which includes at least one voiceprint template information; and combining the voiceprint feature information of the audio signal and at least one voiceprint template information. Match and obtain at least one matching value; compare the target matching value with the largest value among the at least one matching value and the preset matching value; if the target matching value is greater than the preset matching value, it is determined that the voiceprint recognition is successful; if the target matches If the value is less than the preset matching value, it is determined that the voiceprint recognition has failed.
  • the target matching value is equal to the preset matching value, it can be determined that the voiceprint recognition is successful, or it can be determined that the voiceprint recognition has failed.
  • users need to register voiceprint information with electronic devices in advance to ensure that voiceprint recognition can be successful.
  • another implementation method is to use a pre-trained voiceprint recognition model to perform voiceprint recognition on audio signals.
  • the accuracy of determining the user's identity is improved, and false awakening of the device is avoided in some scenarios.
  • the speaker performs voiceprint recognition on the audio signal played by the TV.
  • the voiceprint recognition will fail and the speaker will not wake up.
  • electronic devices may still be awakened by mistake or fail to be awakened.
  • the voiceprint template library includes voiceprint template information corresponding to the recording.
  • the speaker performs voiceprint recognition on the audio signal played by the mobile phone.
  • the voiceprint recognition will be successful, causing the speaker to be accidentally awakened.
  • the noise in the environment interferes with the user's voice.
  • the speaker performs voiceprint recognition on the audio signal input by the user.
  • the voiceprint recognition may fail, causing the speaker to fail to wake up successfully.
  • the electronic device may include: a wake-up word module 21 and a counterfeit authentication module 31 .
  • the wake-up word module 21 can refer to the description in Figure 2 and will not be described again here.
  • the counterfeit identification module 31 is used to perform machine sound recognition and/or voiceprint recognition on audio signals to obtain recognition results. The identification result is used to indicate waking up the electronic device, not waking up the electronic device, or not determining whether to wake up the electronic device.
  • the recognition result indicates that the electronic device is awakened, the electronic device is awakened; if the recognition result indicates that the electronic device is not awakened, the electronic device is not awakened; if the recognition result indicates that it is uncertain whether to wake up the electronic device, prompt information is output to the user for further confirmation. Whether to wake up electronic devices.
  • the method for waking up a device improves the accuracy of determining the user's identity by performing machine sound recognition and/or voiceprint recognition on the audio signal, and avoids false waking up of the device. Moreover, when it is uncertain whether to wake up the electronic device, prompt information is output to the user and human-computer interaction is performed with the user, so that the user further confirms whether to wake up the electronic device, thereby avoiding the electronic device from being accidentally awakened or failing to wake up successfully, and improving the efficiency of the electronic device. The accuracy of the device waking up.
  • Figure 4 is a flow chart of a method for waking up a device provided by an embodiment of the present application.
  • the execution subject may be an electronic device.
  • the method for waking up a device provided by this embodiment may include:
  • S401 Receive audio signals.
  • pre-processing may include but is not limited to at least one of the following: noise reduction processing, filtering processing, dereverberation processing, parametric equalization adjustment processing, volume adjustment processing or gain processing.
  • the electronic device is controlled to enter the wake-up activation state, and S404 is subsequently executed.
  • the recognition result is used to indicate waking up the electronic device, not waking up the electronic device, or not being sure whether to wake up the electronic device. Execute one of S405 to S407 according to different recognition results.
  • machine sound recognition is used to determine whether the audio signal is a machine sound, or whether the audio signal is a human voice.
  • Machine sounds may also be called machine sounds, mechanical sounds, electronic sounds, etc.
  • the embodiments of this application do not limit the specific names and reasons for their formation.
  • a machine sound is formed by playing a voice including a wake-up word through the device, or a machine sound is formed due to environmental noise or background noise of the device.
  • Voiceprint recognition can determine a user's identity. By performing machine sound recognition and/or voiceprint recognition on audio signals, the accuracy of determining user identity is improved.
  • the recognition result indicates three results: waking up the electronic device, not waking up the electronic device, or not sure whether to wake up the electronic device. By adding the determination result of uncertainty about whether to wake up the electronic device, it can be further confirmed whether to wake up the electronic device, thereby improving the accuracy of waking up the electronic device.
  • performing machine sound recognition and/or voiceprint recognition on audio signals may include:
  • only the audio signal is subjected to machine sound recognition.
  • the results of machine voice recognition can include two situations.
  • the results of machine sound recognition include two categories: determining that the audio signal is a machine sound, or determining that the audio signal is not a machine sound.
  • the recognition result indicates that the electronic device is not to be awakened to avoid false awakening of the electronic device.
  • the recognition result indicates to wake up the electronic device or indicates that it is uncertain whether to wake up the electronic device. Considering the accuracy of machine sound recognition, when it is determined that the audio signal is not a machine sound, the recognition result can indicate that it is uncertain whether to wake up the electronic device. Subsequently, it can be further confirmed whether to wake up the electronic device, thereby improving the accuracy of waking up the electronic device.
  • the results of machine sound recognition include three categories: determining that the audio signal is a machine sound, determining that the audio signal is not a machine sound, or not being sure whether the audio signal is a machine sound.
  • the recognition result indicates not to wake up the electronic device.
  • the recognition result indicates waking up the electronic device.
  • the recognition result indicates that it is uncertain whether to wake up the electronic device. Subsequently, you can further confirm whether to wake up the electronic device and improve the accuracy of waking up the electronic device.
  • performing machine sound recognition and/or voiceprint recognition on the audio signal may include:
  • voiceprint recognition is only performed on audio signals.
  • the results of voiceprint recognition can include two categories: successful voiceprint recognition, or failed voiceprint recognition.
  • the recognition result indicates waking up the electronic device.
  • the recognition result indicates that it is uncertain whether to wake up the electronic device. Subsequently, you can further confirm whether to wake up the electronic device and improve the accuracy of waking up the electronic device.
  • performing machine sound recognition and/or voiceprint recognition on the audio signal may include:
  • machine sound recognition and voiceprint recognition are performed on the audio signal.
  • the results of machine voice recognition and voiceprint recognition are explained below.
  • the results of machine sound recognition include two categories: determining that the audio signal is a machine sound, or determining that the audio signal is not a machine sound.
  • the results of voiceprint recognition include two categories: successful voiceprint recognition, or failed voiceprint recognition.
  • the recognition result indicates not to wake up the electronic device.
  • the recognition result indicates waking up the electronic device.
  • the recognition result indicates that it is uncertain whether to wake up the electronic device. Subsequently, you can further confirm whether to wake up the electronic device and improve the accuracy of waking up the electronic device.
  • the results of machine sound recognition include three categories: determining that the audio signal is a machine sound, determining that the audio signal is not a machine sound, or not being sure whether the audio signal is a machine sound.
  • the results of voiceprint recognition include two categories: successful voiceprint recognition, or failed voiceprint recognition.
  • the recognition result indicates not to wake up the electronic device.
  • the recognition result indicates waking up the electronic device.
  • the recognition result indicates that it is uncertain whether to wake up the electronic device.
  • the recognition result indicates waking up the electronic device.
  • the recognition result indicates that it is uncertain whether to wake up the electronic device.
  • machine sound recognition and voiceprint recognition are performed on the audio signal.
  • machine sound recognition and voiceprint recognition can be performed on the audio signal respectively.
  • Machine sound recognition and voiceprint recognition are independent and uncoupled. of. This embodiment does not limit the execution order of machine voice recognition and voiceprint recognition.
  • Machine voice recognition and voiceprint recognition can be performed sequentially or simultaneously. For example, first perform machine sound recognition on the audio signal; if it is determined that the audio signal is not a machine sound, then perform voiceprint recognition on the audio signal. For another example, voiceprint recognition is first performed on the audio signal, and then machine sound recognition is performed on the audio signal.
  • the audio signal is subjected to machine sound recognition and voiceprint recognition.
  • machine sound recognition and voiceprint recognition can be related to each other in a coupled manner.
  • machine voice recognition and voiceprint recognition need to obtain the acoustic characteristics of the audio signal, and use the parameter sharing of the neural network model to realize the coupling between machine voice recognition and voiceprint recognition.
  • machine sound recognition and voiceprint recognition of audio signals may include:
  • the audio signal is input into the voiceprint authentication model to obtain the first result and the voiceprint feature information of the audio signal.
  • the counterfeiting identification model is a network model trained with the acoustic features of the audio signal as input and the first result and the voiceprint feature information of the audio signal as output.
  • the first result is used to indicate whether the audio signal is machine sound.
  • voiceprint recognition is performed on the audio signal based on the voiceprint feature information of the audio signal and the voiceprint template library.
  • the voiceprint identification model is a pre-trained network model.
  • FIG. 5 is a schematic structural diagram of a voiceprint authentication model provided by an embodiment of the present application.
  • the input of the voiceprint authentication model is the acoustic feature of the audio signal
  • the output is the first result and the voiceprint feature information of the audio signal.
  • the acoustic characteristics of audio signals are physical quantities that can reflect the characteristics of audio signals.
  • it can be Mel-Frequency Cepstral Coefficients (MFCC) of a preset dimension.
  • MFCC Mel-Frequency Cepstral Coefficients
  • the default dimension can be 39 dimensions.
  • the voiceprint authentication model shown in Figure 5 can be understood as including three parts, namely: the voiceprint recognition model on the left, the voice authentication model on the right, and the voiceprint recognition model and voice authentication model located below. Common parts of models shared. Among them, the voiceprint recognition model is used to output the voiceprint feature information of the audio signal, and the voice forgery identification model is used to output the first result, indicating whether the audio signal is a machine sound.
  • the public part can include at least two Time Delay Neural Network (TDNN) modules.
  • the TDNN module is the time-delay neural network layer in the x-vector framework.
  • x-vector is the mainstream model framework in the field of voiceprint recognition. It can accept input features of any length and map them into fixed-length feature expressions.
  • the voiceprint recognition model on the left after sharing the TDNN module with the speech forgery model on the right, it continues through the same structure but non-shared TDNN layer to complete the frame-level feature extraction of the audio signal. Then, the output of the frame-level feature extraction layer passes through the statistical pooling layer to complete the feature mapping of the audio signal from the frame level to the sentence level. Then, the output of the statistical pooling layer is input to the sentence-level feature extraction layer.
  • the sentence-level feature extraction layers are all composed of deep neural networks (Deep Neural Networks, DNN). This embodiment does not limit the number of DNN layers.
  • the output of the penultimate DNN layer is extracted as the voiceprint feature information of the audio signal.
  • the speech forgery detection model on the right after sharing the TDNN module with the voiceprint recognition model on the left, it continues through the same structure but non-shared TDNN layer to complete the frame-level feature extraction of the audio signal. Then, the output of the frame-level feature extraction layer passes through the statistical pooling layer to complete the feature mapping of the audio signal from the frame level to the sentence level. Then, the output of the statistical pooling layer is input to the sentence-level feature extraction layer to complete the feature extraction, and in the fake speech discrimination layer, it is judged whether it is a machine sound, and the first result is output.
  • voiceprint recognition and machine voice recognition share some hidden layers.
  • the learning of the two models is controlled through joint training, which ultimately makes the model converge and achieves the simultaneous extraction of voiceprints from audio signals. information and the ability to determine whether the audio signal is machine sound.
  • the prompt information can be in any of the following forms: audio, video, text, animation or display interface.
  • the prompt information is output to the user, and the electronic device can directly output the prompt information to the user. Direct interaction between electronic devices and users is achieved.
  • outputting prompt information to the user may include: transmitting instruction information with the first device, where the instruction information is used to instruct the first device to output the prompt information to the user.
  • the electronic device and the first device can communicate and transmit instruction information, and the first device outputs prompt information to the user based on the instruction information.
  • the electronic device can interact with the user through the first device.
  • the first device may be a wearable device, a mobile phone, a smart large screen, etc.
  • the wearable device may be a smart watch.
  • the user accounts of the electronic device and the first device are the same.
  • the electronic device is a speaker
  • the first device is a mobile phone with the same user account as the speaker. Since the user accounts of the electronic device and the first device are the same, the electronic device can more easily discover the first device, thereby completing the interaction with the user through the first device and further confirming whether to wake up the electronic device.
  • the speaker receives the audio signal input by the user. It can be determined that the audio signal is not a machine sound, and the voiceprint recognition is successful. The recognition result indicates that the electronic device is awakened, and the speaker is awakened.
  • the speaker receives the audio signal played by the TV and can determine that the audio signal is a machine sound.
  • the recognition result indicates not to wake up the electronic device, thus avoiding the speaker from being accidentally awakened.
  • the phone plays a recording of the user speaking the wake word.
  • the speaker receives the audio signal and can determine that the audio signal is machine sound.
  • the recognition result indicates not to wake up the electronic device, thus avoiding the speaker from being woken up by mistake.
  • the speaker receives the audio signal input by the user. Assuming that the speaker determines that the audio signal is not machine sound, but the voiceprint recognition fails, then the recognition result indicates that it is uncertain whether to wake up the electronic device, and the speaker outputs a prompt message to the user to guide the user to confirm whether to wake up the speaker, thereby improving the accuracy of the speaker being awakened. .
  • the method for waking up the device improves the accuracy of determining the user's identity by performing machine sound recognition and/or voiceprint recognition on the audio signal, and avoids false waking up of the device. Moreover, when it is uncertain whether to wake up the electronic device, prompt information is output to the user and human-computer interaction is performed with the user, so that the user further confirms whether to wake up the electronic device, thereby avoiding the electronic device from being accidentally awakened or failing to wake up successfully, and improving the efficiency of the electronic device. The accuracy of the device waking up.
  • the method for waking up the device may also include:
  • the response information can be in any of the following forms: audio, video, the user's touch operation on the display screen, the user's operation on the components on the device, or the user's body operation, for example, a preset gesture.
  • the electronic device determines whether to wake up based on the response information, which improves the accuracy of waking up the electronic device.
  • the method for waking up the device may also include:
  • the voiceprint template library is updated according to the audio signal.
  • the electronic device interacts with the user and determines to wake up the electronic device based on the response information input by the user, it means that the audio signal is used to wake up the electronic device. Therefore, the voiceprint template library is updated according to the audio signal, and the voiceprint template information corresponding to the audio signal is added or updated in the voiceprint template library. In this way, when the user subsequently wakes up the electronic device through voice, the electronic device can perform voiceprint recognition on the audio signal based on the updated voiceprint template library, which improves the probability of successful voiceprint recognition and thus improves the accuracy of waking up the electronic device.
  • the user does not need to register voiceprint information with the electronic device in advance, which simplifies the process of registering the voiceprint for the user.
  • the voiceprint template library in the electronic device does not have the voiceprint template information of user A.
  • the electronic device is used for the first time after leaving the factory, or user A has never woken up the electronic device.
  • the electronic device receives the audio signal input by user A. Assume that the electronic device performs machine sound recognition and voiceprint recognition on the audio signal, determines that the audio signal is not a machine sound, and the voiceprint recognition fails.
  • the electronic device outputs prompt information to user A so that user A can further confirm whether to wake up the electronic device.
  • User A inputs response information to the electronic device according to the prompt information.
  • the electronic device determines to wake up the electronic device according to the response information, updates the voiceprint template library according to the audio signal input by user A, and adds user A's voiceprint template information to the voiceprint template library. . In this way, the registration of user A's voiceprint information is completed, and user A does not need to register the voiceprint information with the electronic device in advance. Similarly, it is assumed that the electronic device only performs voiceprint recognition on audio signals, and fails to perform voiceprint recognition on the audio signals input by user A. The electronic device determines to wake up the electronic device according to the response information, and can update the voiceprint template library according to the audio signal input by user A.
  • the prompt information is used to guide the user to perform voice interaction with the electronic device to determine whether to wake up the electronic device; or, the prompt information is used to guide the user to perform a preset action within the shooting range of the shooting device on the target device to determine whether to wake up the electronic device. Whether to wake up the electronic device; or, the prompt information is used to guide the user to operate in the target interface displayed on the target device to determine whether to wake up the electronic device; or, the prompt information is used to guide the user to operate the target physical button on the target device, to determine whether to wake up the electronic device.
  • the target device may be an electronic device, or a first device that communicates with the electronic device.
  • the interaction between the electronic device and the user can take many forms. For example, perform voice interaction, or the user performs preset operations or preset body movements. Among them, this embodiment does not limit the content of voice interaction, the content in the target interface, the layout of the target interface, and the target physical buttons.
  • the electronic device is a speaker
  • the target device is an electronic device
  • the target physical buttons can be the play button, pause button, previous track button, next track button, or volume button on the speaker.
  • the electronic device is a speaker
  • the target device is a mobile phone with the same user account as the speaker
  • the target physical button can be the volume button on the mobile phone.
  • the electronic device or the first device may output prompt information to the user.
  • the electronic device outputs prompt information to the user, which may include:
  • the preset action before obtaining the response information input by the user according to the prompt information, can also include:
  • FIGS. 6A to 6G do not limit the prompt information and the implementation manner of outputting the prompt information to the user.
  • the electronic device is a speaker
  • the first device is a mobile phone
  • the wake-up word is XXX.
  • the prompt information is audio
  • the speaker outputs the prompt information to the user
  • the response information is also audio
  • the speaker performs voice interaction with the user.
  • the speaker outputs the audio "Please say XXX again.”
  • the user confirms to wake up the speaker and speaks the wake-up word "XXX” to confirm to wake up the speaker.
  • the speaker outputs audio "Are you calling me?”
  • the user confirms not to wake up the speaker, he or she can say "no" or do not respond to confirm that the speaker is not woken up.
  • the speaker displays a target interface, and the target interface includes prompt information, and the prompt information is used to guide the user to perform voice interaction with the electronic device.
  • the speaker displays a target interface 51, and the target interface 51 includes the text "Please say XXX again.”
  • the user confirms to wake up the speaker and speaks the wake-up word "XXX" to confirm to wake up the speaker.
  • the prompt information is audio
  • the speaker outputs the prompt information to the user
  • the response information is a preset user action.
  • the speaker has a camera 52.
  • the speaker outputs audio "If you are calling me, please raise your right arm.”
  • the user confirms to wake up the speaker, he can raise his right arm toward the camera 52 on the speaker.
  • the speaker's camera 52 captures the user's action of raising the right arm, and can determine to wake up the speaker.
  • the speaker output audio can also be "If you are calling me, please wink at me.”
  • the speaker displays a target interface
  • the target interface includes prompt information
  • the prompt information is used to guide the user to perform operations in the target interface.
  • FIG. 6E the speaker displays a target interface 53.
  • the target interface 53 includes the text "Wake up the speaker", a "Yes” button and a "No” button. Assuming that the user confirms that the speaker is not to be woken up, the user can click the "No" button to confirm that the speaker is not to be woken up.
  • the mobile phone displays a target interface, and the target interface includes prompt information, and the prompt information is used to guide the user to perform operations in the target interface.
  • the speaker recognizes a mobile phone with the same user account in the same area, and the speaker transmits instruction information to the mobile phone to instruct the mobile phone to output prompt information to the user.
  • the mobile phone displays the target interface 54 according to the instruction information.
  • the target interface 54 includes the text "Wake up the speaker", a "Yes" button and a "No” button. Assuming that the user confirms to wake up the speaker, he can click the "Yes" button to confirm to wake up the speaker.
  • the mobile phone obtains the response information input by the user according to the prompt information and transmits the response information to the speaker.
  • the speaker determines to wake up the speaker based on the response information.
  • the prompt information is audio
  • the speaker outputs the prompt information to the user to guide the user to operate a target physical button on the electronic device.
  • the speaker outputs the audio "If you are calling me, please press the pause button.”
  • the user confirms to wake up the speaker, he can press the pause button on the speaker.
  • the electronic device includes corresponding hardware and/or software modules that perform each function.
  • the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a function is performed by hardware or computer software driving the hardware depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions in conjunction with the embodiments for each specific application, but such implementations should not be considered to be beyond the scope of this application.
  • Embodiments of the present application can divide the electronic device into functional modules according to the above method examples. For example, each functional module can be divided corresponding to each function, or two or more functions can be integrated into one module. It should be noted that the division of modules in the embodiment of the present application is schematic and is only a logical function division. In actual implementation, there may be other division methods. It should be noted that the names of the modules in the embodiments of this application are schematic, and there are no restrictions on the names of the modules during actual implementation.
  • FIG. 7 is a schematic structural diagram of a device for waking up a device provided by an embodiment of the present application.
  • the device for waking up the device can be applied to the electronic device.
  • the device for waking up the device provided in this embodiment may include:
  • the counterfeiting module 31 is configured to perform machine sound recognition and/or voiceprint recognition on the audio signal when it is determined that the received audio signal includes a wake-up word, and obtain a recognition result; the recognition result is used to indicate whether to wake up the electronic device or not. Wakes the electronic device or is unsure whether to wake the electronic device;
  • the output module 71 is configured to output prompt information to the user if the recognition result indicates that it is uncertain whether to wake up the electronic device; the prompt information is used to guide the user to wake up the electronic device.
  • the prompt information is used to guide the user to perform voice interaction with the electronic device to determine whether to wake up the electronic device; or,
  • the prompt information is used to guide the user to perform a preset action within the shooting range of the shooting device on the target device to determine whether to wake up the electronic device; or,
  • the prompt information is used to guide the user to operate in the target interface displayed by the target device to determine whether to wake up the electronic device; or,
  • the prompt information is used to guide the user to operate a target physical button on the target device to determine whether to wake up the electronic device.
  • the target device is the electronic device, or a first device that communicates with the electronic device.
  • the user accounts of the electronic device and the first device are the same.
  • the output module 71 is used for:
  • a transmission module is also included for:
  • Instruction information is transmitted with the first device, where the instruction information is used to instruct the first device to output the prompt information to the user.
  • the first device includes a mobile phone and/or a watch.
  • the counterfeit identification module 31 is used for:
  • the recognition result indicates not to wake up the electronic device
  • the recognition result indicates waking up the electronic device
  • the recognition result indicates that it is uncertain whether to wake up the electronic device.
  • the counterfeit identification module 31 is used for:
  • voiceprint recognition is performed on the audio signal.
  • the counterfeit identification module 31 is used for:
  • voiceprint recognition is performed on the audio signal according to the voiceprint feature information of the audio signal and a voiceprint template library.
  • the counterfeit identification module 31 is used for:
  • the recognition result indicates not to wake up the electronic device
  • the recognition result indicates waking up the electronic device
  • the recognition result indicates that it is uncertain whether to wake up the electronic device.
  • the counterfeit identification module 31 is used for:
  • the recognition result indicates waking up the electronic device
  • the recognition result indicates that it is uncertain whether to wake up the electronic device.
  • a confirmation module is also included, and the confirmation module is used for:
  • the prompt information is used to guide the user to perform preset actions within the shooting range of the shooting device on the electronic device
  • the confirmation module is also used to:
  • the shooting device on the electronic device is started.
  • the electronic device stores a voiceprint template library, and also includes an update module, where the update module is used to:
  • the voiceprint template library is updated according to the audio signal.
  • a wake-up word module 21 is also included, and the wake-up word module 21 is used for:
  • the device for waking up a device provided in this embodiment is used to execute the method for waking up a device provided in the method embodiment of this application.
  • the technical principles and technical effects are similar and will not be described again here.
  • FIG. 8 shows another structure of an electronic device provided by an embodiment of the present application.
  • the electronic device includes: processor 801, receiver 802, transmitter 803, memory 804 and bus 805.
  • the processor 801 includes one or more processing cores.
  • the processor 801 executes various functional applications and information processing by running software programs and modules.
  • the memory 804 can be used to store at least one program instruction, and the processor 801 is used to execute at least one program instruction to implement the technical solutions of the above embodiments.
  • the receiver 802 and the transmitter 803 can be implemented as a communication component, and the communication component can be a baseband chip.
  • the memory 804 is connected to the processor 801 through a bus 805.
  • the implementation principles and technical effects are similar to the above-mentioned method-related embodiments, and will not be described again here.
  • FIG. 8 only shows one memory and processor. In real electronic devices, there can be multiple processors and memories.
  • the memory may also be called a storage medium or a storage device, which is not limited in the embodiments of the present application.
  • the processor may be a general-purpose processor, a digital signal processor, an application-specific integrated circuit, a field programmable gate array or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component, which may implement or Execute each method, step and logical block diagram disclosed in the embodiment of this application.
  • a general-purpose processor may be a microprocessor or any conventional processor, etc. The steps of the methods disclosed in conjunction with the embodiments of the present application can be directly implemented by a hardware processor for execution, or can be executed by a combination of hardware and software modules in the processor.
  • the memory may be a non-volatile memory, such as a hard disk drive (HDD) or a solid-state drive (SS), etc., or it may be a volatile memory (volatile memory), such as Random-access memory (RAM).
  • Memory is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by the computer, without limitation.
  • the memory in the embodiment of the present application can also be a circuit or any other device capable of realizing a storage function, used to store program instructions and/or data.
  • the methods provided by the embodiments of this application can be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions described in the embodiments of the present application are generated in whole or in part.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, a network device, a user equipment, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another, e.g., the computer instructions may be transferred from a website, computer, server, or data center Transmission to another website, computer, server or data center through wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) means.
  • the readable storage medium can be any available media that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more available media integrated.
  • the available media can be magnetic media (for example, floppy disks, hard disks, tapes ), optical media (for example, digital video disc (DWD)), or semiconductor media (for example, SSD), etc.
  • Embodiments of the present application provide a computer program product.
  • the computer program product When the computer program product is run on a device, it causes the device to execute the technical solutions in the above embodiments.
  • the implementation principles and technical effects are similar to the above-mentioned related embodiments and will not be described again here.
  • Embodiments of the present application provide a computer-readable storage medium on which program instructions are stored.
  • program instructions When the program instructions are executed by a device, they cause the device to execute the technical solutions of the above embodiments.
  • the implementation principles and technical effects are similar to the above-mentioned related embodiments and will not be described again here.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Business, Economics & Management (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Game Theory and Decision Science (AREA)
  • Telephone Function (AREA)

Abstract

本申请实施例涉及终端技术领域,提供一种唤醒设备的方法、电子设备和存储介质。唤醒设备的方法,包括:在确定接收的音频信号包括唤醒词时,对音频信号进行机器音识别和/或声纹识别,得到识别结果;识别结果用于指示唤醒电子设备、不唤醒电子设备或者不确定是否唤醒电子设备;若识别结果指示不确定是否唤醒电子设备,则向用户输出提示信息;提示信息用于引导用户唤醒电子设备。当电子设备不确定是否被唤醒时,通过向用户输出提示信息,与用户进行人机交互,使得用户可以进一步确认是否唤醒电子设备,提高了电子设备被唤醒的正确率。

Description

唤醒设备的方法、电子设备和存储介质
本申请要求于2022年04月18日提交国家知识产权局、申请号为202210403454.8、申请名称为“唤醒设备的方法、电子设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及终端技术领域,尤其涉及一种唤醒设备的方法、电子设备和存储介质。
背景技术
随着电子技术和终端技术的发展,电子设备的类型越来越多,功能越来越强大,人机交互的效果越来越好。很多电子设备在不工作时可以进入休眠状态,通过用户的唤醒,可以进入工作状态。
通常,用户说出唤醒词用于唤醒电子设备。相应的,电子设备接收用户输入的音频信号,解析出唤醒词后,可以唤醒设备。
但是,由于机器音干扰或环境噪声干扰等因素,电子设备可能被误唤醒或者不能被成功唤醒,降低了电子设备被唤醒的正确率。
发明内容
本申请实施例提供一种唤醒设备的方法、电子设备和存储介质,提高了电子设备被唤醒的正确率。
第一方面,提供了一种唤醒设备的方法,包括:在确定接收的音频信号包括唤醒词时,对音频信号进行机器音识别和/或声纹识别,得到识别结果;识别结果用于指示唤醒电子设备、不唤醒电子设备或者不确定是否唤醒电子设备;若识别结果指示不确定是否唤醒电子设备,则向用户输出提示信息;提示信息用于引导用户唤醒电子设备。
第一方面提供的唤醒设备的方法,通过对音频信号进行机器音识别和/或声纹识别,提高了确定用户身份的准确性,避免了设备的误唤醒。而且,当不确定是否唤醒电子设备时,通过向用户输出提示信息,与用户进行人机交互,使得用户进一步确认是否唤醒电子设备,从而避免了电子设备被误唤醒或者无法成功唤醒,提高了电子设备被唤醒的正确率。
一种可能的实现方式中,提示信息用于引导用户与电子设备进行语音交互,以确定是否唤醒电子设备;或者,提示信息用于引导用户在目标设备上的拍摄设备的拍摄范围内执行预设动作,以确定是否唤醒电子设备;或者,提示信息用于引导用户在目标设备显示的目标界面中进行操作,以确定是否唤醒电子设备;或者,提示信息用于引导用户对目标设备上的目标物理按键进行操作,以确定是否唤醒电子设备。
通过该实现方式,电子设备和用户之间的交互可以采用多种形式,提高了用户进 一步确认是否唤醒电子设备的灵活性。
一种可能的实现方式中,目标设备为电子设备,或者为与电子设备通信的第一设备。
可见,可以通过电子设备与用户完成人机交互,确认是否唤醒电子设备,或者,通过第一设备与用户完成人机交互,确认是否唤醒电子设备。人机交互方式更加灵活。
一种可能的实现方式中,电子设备和第一设备的用户账号相同。
通过该实现方式,电子设备和第一设备的用户账号相同,电子设备更容易发现第一设备,从而可以通过第一设备完成与用户之间的交互,进一步确认是否唤醒电子设备。
一种可能的实现方式中,向用户输出提示信息,包括:向用户输出语音提示信息;或者,向用户显示目标界面,目标界面包括提示信息。
通过该实现方式,电子设备直接输出提示信息,电子设备通过第一设备可以实现与用户之间的交互。
一种可能的实现方式中,向用户输出提示信息,包括:与第一设备传输指示信息,指示信息用于指示第一设备向用户输出提示信息。
通过该实现方式,第一设备提示信息,通过第一设备可以实现与用户之间的交互。
一种可能的实现方式中,第一设备包括手机和/或手表。
一种可能的实现方式中,对音频信号进行机器音识别和/或声纹识别,得到识别结果,包括:对音频信号进行机器音识别和声纹识别;若确定音频信号是机器音,则识别结果指示不唤醒电子设备;若确定音频信号不是机器音,且声纹识别成功,则识别结果指示唤醒电子设备;若确定音频信号不是机器音,且声纹识别失败,则识别结果指示不确定是否唤醒电子设备。
通过该实现方式,对音频信号进行机器音识别和声纹识别,提高了确定用户身份的准确性。识别结果指示了三种结果:唤醒电子设备、不唤醒电子设备,或者,不确定是否唤醒电子设备。通过增加不确定是否唤醒电子设备的判定结果,后续可以进一步确认是否唤醒电子设备,从而提高了唤醒电子设备的正确率。
一种可能的实现方式中,对音频信号进行机器音识别和声纹识别,包括:对音频信号进行机器音识别;若确定音频信号不是机器音,则对音频信号进行声纹识别。
在该实现方式中,先进行机器音识别,如果确定不是机器音,再进行声纹识别,提高了处理效率。
一种可能的实现方式中,对音频信号进行机器音识别和声纹识别,包括:将音频信号输入声纹鉴伪模型,得到第一结果和音频信号的声纹特征信息;第一结果用于指示音频信号是否为机器音;若第一结果指示音频信号不是机器音,则根据音频信号的声纹特征信息和声纹模板库对音频信号进行声纹识别。
在该实现方式中,通过声纹鉴伪模型实现机器音识别和声纹识别,利用神经网络模型的参数共享实现机器音识别和声纹识别之间的耦合。
一种可能的实现方式中,对音频信号进行机器音识别和/或声纹识别,得到识别结果,包括:对音频信号进行机器音识别;若确定音频信号是机器音,则识别结果指示不唤醒电子设备;若确定音频信号不是机器音,则识别结果指示唤醒电子设备;若不 确定音频信号是否为机器音,则识别结果指示不确定是否唤醒电子设备。
通过该实现方式,实现了对音频信号进行机器音识别,确定是否唤醒电子设备。
一种可能的实现方式中,对音频信号进行机器音识别和/或声纹识别,得到识别结果,包括:对音频信号进行声纹识别;若确定声纹识别成功,则识别结果指示唤醒电子设备;若确定声纹识别失败,则识别结果指示不确定是否唤醒电子设备。
通过该实现方式,实现了对音频信号进行声纹识别,确定是否唤醒电子设备。
一种可能的实现方式中,方法还包括:获取用户根据提示信息输入的响应信息;根据响应信息确定是否唤醒电子设备。
一种可能的实现方式中,提示信息用于引导用户在电子设备上的拍摄设备的拍摄范围内执行预设动作,获取用户根据提示信息输入的响应信息之前,还包括:启动电子设备上的拍摄设备。
一种可能的实现方式中,电子设备存储有声纹模板库,方法还包括:若根据响应信息确定唤醒电子设备,则根据音频信号更新声纹模板库。
通过该实现方式,如果电子设备通过和用户交互,根据用户输入的响应信息确定唤醒电子设备,说明音频信号可以唤醒电子设备。因此,根据音频信号更新声纹模板库,提高了后续声纹识别成功的概率,提高了唤醒电子设备的正确率。
第二方面,提供了一种唤醒设备的装置,包括:鉴伪模块,用于在确定接收的音频信号包括唤醒词时,对音频信号进行机器音识别和/或声纹识别,得到识别结果;识别结果用于指示唤醒电子设备、不唤醒电子设备或者不确定是否唤醒电子设备;输出模块,用于若识别结果指示不确定是否唤醒电子设备,则向用户输出提示信息;提示信息用于引导用户唤醒电子设备。
一种可能的实现方式中,提示信息用于引导用户与电子设备进行语音交互,以确定是否唤醒电子设备;或者,提示信息用于引导用户在目标设备上的拍摄设备的拍摄范围内执行预设动作,以确定是否唤醒电子设备;或者,提示信息用于引导用户在目标设备显示的目标界面中进行操作,以确定是否唤醒电子设备;或者,提示信息用于引导用户对目标设备上的目标物理按键进行操作,以确定是否唤醒电子设备。
一种可能的实现方式中,目标设备为电子设备,或者为与电子设备通信的第一设备。
一种可能的实现方式中,电子设备和第一设备的用户账号相同。
一种可能的实现方式中,输出模块用于:向用户输出语音提示信息;或者,向用户显示目标界面,目标界面包括提示信息。
一种可能的实现方式中,还包括传输模块,用于:与第一设备传输指示信息,指示信息用于指示第一设备向用户输出提示信息。
一种可能的实现方式中,第一设备包括手机和/或手表。
一种可能的实现方式中,鉴伪模块用于:对音频信号进行机器音识别和声纹识别;若确定音频信号是机器音,则识别结果指示不唤醒电子设备;若确定音频信号不是机器音,且声纹识别成功,则识别结果指示唤醒电子设备;若确定音频信号不是机器音,且声纹识别失败,则识别结果指示不确定是否唤醒电子设备。
一种可能的实现方式中,鉴伪模块用于:对音频信号进行机器音识别;若确定音频信号不是机器音,则对音频信号进行声纹识别。
一种可能的实现方式中,鉴伪模块用于:将音频信号输入声纹鉴伪模型,得到第一结果和音频信号的声纹特征信息;第一结果用于指示音频信号是否为机器音;若第一结果指示音频信号不是机器音,则根据音频信号的声纹特征信息和声纹模板库对音频信号进行声纹识别。
一种可能的实现方式中,鉴伪模块用于:对音频信号进行机器音识别;若确定音频信号是机器音,则识别结果指示不唤醒电子设备;若确定音频信号不是机器音,则识别结果指示唤醒电子设备;若不确定音频信号是否为机器音,则识别结果指示不确定是否唤醒电子设备。
一种可能的实现方式中,鉴伪模块用于:对音频信号进行声纹识别;若确定声纹识别成功,则识别结果指示唤醒电子设备;若确定声纹识别失败,则识别结果指示不确定是否唤醒电子设备。
一种可能的实现方式中,还包括确认模块,确认模块用于:获取用户根据提示信息输入的响应信息;根据响应信息确定是否唤醒电子设备。
一种可能的实现方式中,提示信息用于引导用户在电子设备上的拍摄设备的拍摄范围内执行预设动作,确认模块还用于:在获取用户根据提示信息输入的响应信息之前,启动电子设备上的拍摄设备。
一种可能的实现方式中,电子设备存储有声纹模板库,还包括更新模块,更新模块用于:若根据响应信息确定唤醒电子设备,则根据音频信号更新声纹模板库。
一种可能的实现方式中,还包括唤醒词模块,唤醒词模块用于:获取音频信号,确定音频信号是否包括唤醒词。
第三方面,提供一种电子设备,包括处理器,处理器用于与存储器耦合,并读取存储器中的指令并根据指令使得电子设备执行第一方面提供的方法。
第四方面,提供一种程序,该程序在被处理器执行时用于执行第一方面提供的方法。
第五方面,提供一种计算机可读存储介质,计算机可读存储介质中存储有指令,当指令在计算机或处理器上运行时,实现第一方面提供的方法。
第六方面,提供一种程序产品,所述程序产品包括计算机程序,所述计算机程序存储在可读存储介质中,设备的至少一个处理器可以从所述可读存储介质读取所述计算机程序,所述至少一个处理器执行所述计算机程序使得该设备实施第一方面提供的方法。
附图说明
图1A~图1D为本申请实施例提供的唤醒电子设备的一组应用场景示意图;
图2为本申请实施例提供的电子设备的一种结构示意图;
图3为本申请实施例提供的电子设备的另一种结构示意图;
图4为本申请实施例提供的唤醒设备的方法的一种流程图;
图5为本申请实施例提供的声纹鉴伪模型的一种结构示意图;
图6A~图6G为本申请实施例提供的输出提示信息的一组应用场景示意图;
图7为本申请实施例提供的唤醒设备的装置的一种结构示意图;
图8为本申请实施例提供的电子设备的另一种结构示意图。
具体实施方式
下面结合附图描述本申请实施例。
本申请实施例提供的唤醒设备的方法,适用于电子设备被用户唤醒的场景。本申请实施例对电子设备的名称和类型不做限定。例如,电子设备也可以称为物联网(internet of things,IOT)设备、终端、移动终端、终端设备、智能设备或用户设备等。目前,一些电子设备的举例为:智能音箱、智能家电、手机等。
为了方便说明,本申请实施例以电子设备为音箱作为示例。
示例性的,图1A~图1D为本申请实施例提供的唤醒电子设备的一组应用场景示意图,但图1A~图1D并不对应用场景形成限定。
在一个示例中,如图1A所示,用户可以说出音箱的唤醒词,用于唤醒音箱。相应的,音箱接收用户输入的音频信号,对音频信号进行解析。当解析出唤醒词后,可以唤醒音箱。其中,唤醒词为预先设置的用于唤醒电子设备的信息,本申请实施例对唤醒词的名称和具体内容不做限定。例如,唤醒词也可以称为关键词。
在另一个示例中,如图1B所示,电视正在播放节目,画面中的人物说出了唤醒词或者带有唤醒词的语句。音箱接收音频信号,对音频信号进行解析。在该场景中,没有用户要唤醒音箱。但是,音箱接收音频信号后可能解析出唤醒词,造成音箱的误唤醒。
在又一个示例中,如图1C所示,手机对用户说出的唤醒词进行录音。后续,手机在音箱附近播放用户的录音。相应的,音箱接收音频信号,对音频信号进行解析。在该场景中,用户没有唤醒音箱。但是,音箱接收音频信号后会解析出唤醒词,可能造成音箱的误唤醒。
在又一个示例中,如图1D所示,音箱所在环境的噪音较大,并且用户距离音箱较远。用户可以说出音箱的唤醒词,用于唤醒音箱。相应的,音箱接收音频信号,对音频信号进行解析。在该场景中,环境中的噪音对用户的声音形成了干扰,可能造成无法唤醒音箱。
相关技术中,电子设备可以对输入的音频信号进行声纹识别,避免设备被误唤醒。声纹,是指携带言语信息的声波频谱,具有特定性和稳定性。声纹识别,是生物识别技术的一种,可以通过声音判别说话人的身份。在一种实现方式中,如图2所示,电子设备可以包括唤醒词模块21和声纹识别模块22。
唤醒设备的方法可以包括:
唤醒词模块21,用于获取音频信号,确定音频信号是否包括唤醒词。如果音频信号包括唤醒词,则控制电子设备进入唤醒激活状态,并获取音频信号的声纹特征信息。
声纹识别模块22,用于根据音频信号的声纹特征信息对音频信号进行声纹识别。如果声纹识别成功,则唤醒设备;如果声纹识别失败,则不唤醒设备。
其中,本申请实施例对电子设备进行声纹识别的实现方式不做限定,可以采用现有的声纹识别技术。可选的,一种实现方式为:建立并存储声纹模板库,声纹模板库中包括至少一个声纹模板信息;将音频信号的声纹特征信息和至少一个声纹模板信息 进行匹配,得到至少一个匹配值;将至少一个匹配值中取值最大的目标匹配值和预设匹配值进行比较;如果目标匹配值大于预设匹配值,则确定声纹识别成功;如果目标匹配值小于预设匹配值,则确定声纹识别失败。当目标匹配值等于预设匹配值时,可以确定声纹识别成功,或者确定声纹识别失败。通常,用户需要提前向电子设备注册声纹信息,确保声纹识别可以成功。可选的,另一种实现方式为:采用预先训练的声纹识别模型对音频信号进行声纹识别。
采用图2所示的实现方式,通过对音频信号进行声纹识别,提高了确定用户身份的准确性,在一些场景中避免了设备的误唤醒。例如,在图1B所示的场景中,音箱对电视播放的音频信号进行声纹识别,声纹识别会失败,音箱不会被唤醒。但是,在一些场景中电子设备依然可能被误唤醒或者不能被唤醒。例如,在图1C所示的场景中,如果利用手机播放用户说出唤醒词的录音注册音箱,声纹模板库中包括该录音对应的声纹模板信息。后续,当手机播放用户说出唤醒词的录音时,音箱对手机播放的音频信号进行声纹识别,声纹识别会成功,导致音箱被误唤醒。又例如,在图1D所示的场景中,环境中的噪音对用户的声音形成了干扰,音箱对用户输入的音频信号进行声纹识别,声纹识别可能会失败,导致音箱不能被成功唤醒。
本申请实施例提供一种唤醒设备的方法,如图3所示,电子设备可以包括:唤醒词模块21和鉴伪模块31。唤醒词模块21可以参见图2中的描述,此处不再赘述。鉴伪模块31,用于对音频信号进行机器音识别和/或声纹识别,得到识别结果。识别结果用于指示唤醒电子设备、不唤醒电子设备或者不确定是否唤醒电子设备。如果识别结果指示唤醒电子设备,则唤醒电子设备;如果识别结果指示不唤醒电子设备,则不唤醒电子设备;如果识别结果指示不确定是否唤醒电子设备,则向用户输出提示信息,以便用户进一步确认是否唤醒电子设备。
本申请实施例提供的唤醒设备的方法,通过对音频信号进行机器音识别和/或声纹识别,提高了确定用户身份的准确性,避免了设备的误唤醒。而且,当不确定是否唤醒电子设备时,通过向用户输出提示信息,与用户进行人机交互,使得用户进一步确认是否唤醒电子设备,从而避免了电子设备被误唤醒或者无法成功唤醒,提高了电子设备被唤醒的正确率。
下面通过具体的实施例对本申请的技术方案进行详细说明。下面的实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例中不再赘述。
本申请实施例中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。
图4为本申请实施例提供的唤醒设备的方法的一种流程图。本实施例提供的唤醒设备的方法,执行主体可以为电子设备。如图4所示,本实施例提供的唤醒设备的方法,可以包括:
S401、接收音频信号。
S402、对音频信号进行前处理。
可选的,前处理可以包括但不限于下列中的至少一项:降噪处理、滤波处理、去混响处理、参量均衡调节处理、音量调节处理或者增益处理。
S403、确定音频信号是否包括唤醒词。
如果音频信号包括唤醒词,则控制电子设备进入唤醒激活状态,后续执行S404。
如果音频信号不包括唤醒词,则结束处理。
S404、对音频信号进行机器音识别和/或声纹识别,得到识别结果。
其中,识别结果用于指示唤醒电子设备、不唤醒电子设备或者不确定是否唤醒电子设备。根据不同的识别结果执行S405~S407中的一项。
具体的,机器音识别用于确定音频信号是否为机器音,或者确定音频信号是否为人声。机器音也可以称为机器声、机械音、电子音等,本申请实施例对具体的名称和形成的原因不做限定。例如,通过设备播放包括唤醒词的语音形成机器音,或者,由于环境噪声或者设备的背景噪声形成的机器音。声纹识别可以确定用户的身份。通过对音频信号进行机器音识别和/或声纹识别,提高了确定用户身份的准确性。识别结果指示了三种结果:唤醒电子设备、不唤醒电子设备,或者,不确定是否唤醒电子设备。通过增加不确定是否唤醒电子设备的判定结果,后续可以进一步确认是否唤醒电子设备,从而提高了唤醒电子设备的正确率。
可选的,在第一种实现方式中,对音频信号进行机器音识别和/或声纹识别,可以包括:
对音频信号进行机器音识别。
在该实现方式中,只对音频信号进行机器音识别。机器音识别的结果可以包括两种情况。
第一种情况,机器音识别的结果包括两类:确定音频信号是机器音,或者,确定音频信号不是机器音。
如果确定音频信号是机器音,则识别结果指示不唤醒电子设备,避免电子设备的误唤醒。
如果确定音频信号不是机器音,则识别结果指示唤醒电子设备或者指示不确定是否唤醒电子设备。考虑到机器音识别的准确性,当确定音频信号不是机器音时,识别结果可以指示为不确定是否唤醒电子设备,后续可以进一步确认是否唤醒电子设备,提高唤醒电子设备的正确率。
第二种情况,机器音识别的结果包括三类:确定音频信号是机器音、确定音频信号不是机器音,或者,不确定音频信号是否为机器音。
如果确定音频信号是机器音,则识别结果指示不唤醒电子设备。
如果确定音频信号不是机器音,则识别结果指示唤醒电子设备。
如果不确定音频信号是否为机器音,则识别结果指示不确定是否唤醒电子设备。后续,可以进一步确认是否唤醒电子设备,提高唤醒电子设备的正确率。
可选的,在第二种实现方式中,对音频信号进行机器音识别和/或声纹识别,可以包括:
对音频信号进行声纹识别。
在该实现方式中,只对音频信号进行声纹识别。声纹识别的结果可以包括两类:声纹识别成功,或者,声纹识别失败。
如果确定声纹识别成功,则识别结果指示唤醒电子设备。
如果确定声纹识别失败,则识别结果指示不确定是否唤醒电子设备。后续,可以进一步确认是否唤醒电子设备,提高唤醒电子设备的正确率。
可选的,在第三种实现方式中,对音频信号进行机器音识别和/或声纹识别,可以包括:
对音频信号进行机器音识别和声纹识别。
在该实现方式中,对音频信号进行机器音识别和声纹识别。下面对机器音识别和声纹识别的结果进行说明。
第一种情况,机器音识别的结果包括两类:确定音频信号是机器音,或者,确定音频信号不是机器音。声纹识别的结果包括两类:声纹识别成功,或者,声纹识别失败。
如果确定音频信号是机器音,则识别结果指示不唤醒电子设备。
如果确定音频信号不是机器音,且声纹识别成功,则识别结果指示唤醒电子设备。
如果确定音频信号不是机器音,且声纹识别失败,则识别结果指示不确定是否唤醒电子设备。后续,可以进一步确认是否唤醒电子设备,提高唤醒电子设备的正确率。
第二种情况,机器音识别的结果包括三类:确定音频信号是机器音、确定音频信号不是机器音,或者,不确定音频信号是否为机器音。声纹识别的结果包括两类:声纹识别成功,或者,声纹识别失败。
如果确定音频信号是机器音,则识别结果指示不唤醒电子设备。
如果确定音频信号不是机器音,且声纹识别成功,则识别结果指示唤醒电子设备。
如果确定音频信号不是机器音,且声纹识别失败,则识别结果指示不确定是否唤醒电子设备。
如果不确定音频信号是否为机器音,且声纹识别成功,则识别结果指示唤醒电子设备。
如果不确定音频信号是否为机器音,且声纹识别失败,则识别结果指示不确定是否唤醒电子设备。
可选的,对音频信号进行机器音识别和声纹识别,在一种实现方式中,可以分别对音频信号进行机器音识别和声纹识别,机器音识别和声纹识别是独立的、非耦合的。本实施例对机器音识别和声纹识别的执行顺序不做限定。机器音识别和声纹识别可以依次进行,也可以同时进行。例如,先对音频信号进行机器音识别;若确定音频信号不是机器音,再对音频信号进行声纹识别。再例如,先对音频信号进行声纹识别,再对音频信号进行机器音识别。
可选的,对音频信号进行机器音识别和声纹识别,在另一种实现方式中,机器音识别和声纹识别之间可以相互关联,采用耦合的方式。通常,机器音识别和声纹识别需要获取音频信号的声学特征,利用神经网络模型的参数共享实现机器音识别和声纹识别之间的耦合。
可选的,对音频信号进行机器音识别和声纹识别,可以包括:
将音频信号输入声纹鉴伪模型,得到第一结果和音频信号的声纹特征信息。其中,鉴伪模型是以音频信号的声学特征作为输入,以第一结果和音频信号的声纹特征信息作为输出训练得到的网络模型。第一结果用于指示音频信号是否为机器音。
若第一结果指示音频信号不是机器音,则根据音频信号的声纹特征信息和声纹模板库对音频信号进行声纹识别。
其中,声纹鉴伪模型为预先训练的网络模型。示例性的,图5为本申请实施例提供的声纹鉴伪模型的一种结构示意图。如图5所示,声纹鉴伪模型的输入为音频信号的声学特征,输出为第一结果和音频信号的声纹特征信息。音频信号的声学特征是可以反映音频信号特性的物理量。例如,可以是预设维度的Mel频率倒谱系数(Mel-Frequency Cepstral Coefficients,MFCC)。可选的,预设维度可以为39维。图5所示的声纹鉴伪模型可以理解为包括三个部分,分别是:左侧的声纹识别模型、右侧的语音鉴伪模型,以及,位于下方的声纹识别模型和语音鉴伪模型共享的公共部分。其中,声纹识别模型用于输出音频信号的声纹特征信息,语音鉴伪模型用于输出第一结果,指示音频信号是否为机器音。
公共部分可以包括至少两个时延神经网络(Time Delay Neural Network,TDNN)模块。TDNN模块也即x-vector框架中的时延神经网络层。x-vector是声纹识别领域主流的模型框架,可以接受任意长度的输入特征,映射为固定长度的特征表达。
对于左侧的声纹识别模型,与右侧的语音鉴伪模型共享TDNN模块后,继续经过相同结构但非共享的TDNN层,完成音频信号的帧级别的特征提取。接着,帧级别特征提取层的输出经过统计池化层,完成音频信号从帧级别到句子级别的特征映射。接着,统计池化层的输出输入到句级别特征提取层。其中,句级别特征提取层均由深度神经网络(Deep Neural Networks,DNN)构成,本实施例对DNN层的个数不做限定。最后,提取倒数第二个DNN层的输出作为音频信号的声纹特征信息。
对于右侧的语音鉴伪模型,与左侧的声纹识别模型共享TDNN模块后,继续经过相同结构但非共享的TDNN层,完成音频信号的帧级别的特征提取。接着,帧级别特征提取层的输出经过统计池化层,完成音频信号从帧级别到句子级别的特征映射。接着,统计池化层的输出输入到句级别特征提取层,完成特征提取,并在鉴伪语音判别层进行是否是机器音的判别,输出第一结果。
可见,本实施例提供的声纹鉴伪模型,声纹识别和机器音识别共享部分隐藏层,通过联合训练的方式控制两个模型的学习,最终使得模型收敛,达到同时提取音频信号的声纹信息和判断音频信号是否为机器音的两个能力。
S405、若识别结果指示唤醒电子设备,则唤醒电子设备。
S406、若识别结果指示不唤醒电子设备,则不唤醒电子设备。
S407、若识别结果指示不确定是否唤醒电子设备,则向用户输出提示信息,提示信息用于引导用户唤醒电子设备。
其中,提示信息可以为下列中的任意一种形式:音频、视频、文字、动画或显示界面。
通过向用户输出提示信息,实现与用户之间的人机交互,引导用户确认是否唤醒电子设备,从而进一步确认了电子设备是否被唤醒,避免了电子设备被误唤醒或者无法成功唤醒,提高了唤醒电子设备的正确率。
可选的,向用户输出提示信息,可以为电子设备直接向用户输出提示信息。实现了电子设备和用户之间的直接交互。
可选的,向用户输出提示信息,可以包括:与第一设备传输指示信息,指示信息用于指示第一设备向用户输出提示信息。在该实现方式中,电子设备与第一设备之间可以通信,传输指示信息,第一设备根据指示信息向用户输出提示信息。电子设备通过第一设备可以实现与用户之间的交互。
其中,本实施例对第一设备的类型和名称不做限定。例如,第一设备可以为可穿戴设备、手机、智能大屏等,可穿戴设备可以为智能手表。
可选的,电子设备和第一设备的用户账号相同。比如,电子设备为音箱,第一设备为与音箱具有相同用户账号的手机。由于电子设备和第一设备的用户账号相同,电子设备更容易发现第一设备,从而可以通过第一设备完成与用户之间的交互,进一步确认是否唤醒电子设备。
下面结合图1A~图1D,对音箱采用本实施例提供的唤醒设备的方法的效果进行说明。假设,音箱对音频信号进行机器音识别和声纹识别。
在图1A所示的场景中,音箱接收用户输入的音频信号,可以确定音频信号不是机器音,并且声纹识别成功,识别结果指示唤醒电子设备,音箱被唤醒。
在图1B所示的场景中,音箱接收电视播放的音频信号,可以确定音频信号是机器音,识别结果指示不唤醒电子设备,避免了音箱被误唤醒。
在图1C所示的场景中,手机播放用户说出唤醒词的录音。音箱接收音频信号,可以确定音频信号是机器音,识别结果指示不唤醒电子设备,避免了音箱被误唤醒。
在图1D所示的场景中,音箱接收用户输入的音频信号。假设,音箱确定音频信号不是机器音,但是声纹识别失败,那么识别结果指示不确定是否唤醒电子设备,音箱向用户输出提示信息,引导用户确认是否唤醒音箱,从而提高了音箱被唤醒的正确率。
可见,本实施例提供的唤醒设备的方法,通过对音频信号进行机器音识别和/或声纹识别,提高了确定用户身份的准确性,避免了设备的误唤醒。而且,当不确定是否唤醒电子设备时,通过向用户输出提示信息,与用户进行人机交互,使得用户进一步确认是否唤醒电子设备,从而避免了电子设备被误唤醒或者无法成功唤醒,提高了电子设备被唤醒的正确率。
可选的,本实施例提供的唤醒设备的方法,还可以包括:
S408、获取用户根据提示信息输入的响应信息。
S409、根据响应信息确定是否唤醒电子设备。
其中,响应信息可以为下列中的任意一种形式:音频、视频、用户在显示屏上进行的触控操作、用户对设备上的部件进行的操作,或者用户的肢体操作,例如,预设的手势。
可见,通过电子设备与用户之间的人机交互,用户通过响应信息确认是否唤醒电子设备,从而,电子设备根据响应信息最终确定是否被唤醒,提高了电子设备被唤醒的正确率。
可选的,若电子设备存储有声纹模板库,本实施例提供的唤醒设备的方法,还可以包括:
若根据响应信息确定唤醒电子设备,则根据音频信号更新声纹模板库。
具体的,如果电子设备通过和用户交互,根据用户输入的响应信息确定唤醒电子设备,说明音频信号是用于唤醒电子设备的。因此,根据音频信号更新声纹模板库,在声纹模板库中增加或者更新音频信号对应的声纹模板信息。这样,用户后续通过语音唤醒电子设备时,电子设备可以根据更新的声纹模板库对音频信号进行声纹识别,提高了声纹识别成功的概率,从而提高了唤醒电子设备的正确率。
而且,采用本实施例提供的唤醒设备的方法,不需要用户提前向电子设备注册声纹信息,简化了用户注册声纹的过程。
下面具体说明。电子设备中声纹模板库没有用户A的声纹模板信息。比如,电子设备出厂后首次使用,或者,用户A从没有唤醒过电子设备。当用户A唤醒电子设备时,电子设备接收用户A输入的音频信号。假设,电子设备对音频信号进行机器音识别和声纹识别,确定音频信号不是机器音,并且声纹识别失败。此时,电子设备向用户A输出提示信息,以便用户A进一步确认是否唤醒电子设备。用户A根据提示信息向电子设备输入响应信息,电子设备根据响应信息确定唤醒电子设备,并且根据用户A输入的音频信号更新声纹模板库,在声纹模板库中增加用户A的声纹模板信息。这样就完成了用户A声纹信息的注册,不需要用户A提前向电子设备注册声纹信息。相似的,假设,电子设备只对音频信号进行声纹识别,对用户A输入的音频信号会声纹识别失败。电子设备根据响应信息确定唤醒电子设备,可以根据用户A输入的音频信号更新声纹模板库。
下面,对S407中,提示信息的实现方式、电子设备向用户输出提示信息的实现方式进行说明。
可选的,提示信息用于引导用户与电子设备进行语音交互,以确定是否唤醒电子设备;或者,提示信息用于引导用户在目标设备上的拍摄设备的拍摄范围内执行预设动作,以确定是否唤醒电子设备;或者,提示信息用于引导用户在目标设备显示的目标界面中进行操作,以确定是否唤醒电子设备;或者,提示信息用于引导用户对目标设备上的目标物理按键进行操作,以确定是否唤醒电子设备。其中,目标设备可以为电子设备,或者为与电子设备通信的第一设备。
具体的,电子设备和用户之间的交互可以采用多种形式。例如,进行语音交互,或者,用户执行预设操作或预设肢体动作。其中,本实施例对语音交互的内容、目标界面中的内容、目标界面的布局、目标物理按键不做限定。例如,电子设备为音箱,目标设备为电子设备,目标物理按键可以为音箱上的播放键、暂停键、上一首按键、下一首按键或者音量键。又例如,电子设备为音箱,目标设备为和音箱具有相同用户账号的手机,目标物理按键可以为手机上的音量键。
通过多种形式的电子设备和用户之间的交互,提高了用户进一步确认是否唤醒电子设备的灵活性。
电子设备或者第一设备可以向用户输出提示信息。以电子设备为执行主体为例,可选的,电子设备向用户输出提示信息,可以包括:
向用户输出语音提示信息。或者,
向用户显示目标界面,目标界面包括提示信息。
可选的,如果提示信息用于引导用户在电子设备上的拍摄设备的拍摄范围内执行 预设动作,获取用户根据提示信息输入的响应信息之前,还可以包括:
启动电子设备上的拍摄设备。
下面,结合图6A~图6G举例说明,但图6A~图6G并不对提示信息以及向用户输出提示信息的实现方式形成限定。电子设备为音箱,第一设备为手机,唤醒词为XXX。
可选的,在一个示例中,提示信息为音频,音箱向用户输出提示信息,响应信息也为音频,音箱和用户进行语音交互。例如,如图6A所示,音箱输出音频“请您再说一次XXX”。相应的,用户确认唤醒音箱,说出唤醒词“XXX”,以确认唤醒音箱。又例如,如图6B所示,音箱输出音频“请问你是在叫我吗”。相应的,用户确认不唤醒音箱,可以说出“不是”或者不应答,以确认没有唤醒音箱。
可选的,在另一个示例中,音箱显示目标界面,目标界面中包括提示信息,提示信息用于引导用户与电子设备进行语音交互。例如,如图6C所示,音箱显示目标界面51,目标界面51中包括文字“请您再说一次XXX”。相应的,用户确认唤醒音箱,说出唤醒词“XXX”,以确认唤醒音箱。
可选的,在又一个示例中,提示信息为音频,音箱向用户输出提示信息,响应信息为预设的用户动作。如图6D所示,音箱上具有摄像头52。音箱输出音频“如果您在叫我,请您抬起右侧手臂”。相应的,用户确认唤醒音箱,可以对着音箱上的摄像头52抬起右侧手臂。音箱的摄像头52捕捉到用户抬起右侧手臂的动作,可以确定唤醒音箱。又例如,音箱输出音频还可以为“如果您在叫我,请您对我眨眨眼睛”。
可选的,在又一个示例中,音箱显示目标界面,目标界面中包括提示信息,提示信息用于引导用户在目标界面中进行操作。如图6E所示,音箱显示目标界面53,目标界面53中包括文字“唤醒音箱”、“是”按钮和“否”按钮。假设,用户确认不是唤醒音箱,可以点击“否”按钮,以确认不唤醒音箱。
可选的,在又一个示例中,手机显示目标界面,目标界面中包括提示信息,提示信息用于引导用户在目标界面中进行操作。如图6F所示,音箱识别到同一区域内具有相同用户账号的手机,音箱向手机传输指示信息,以指示手机向用户输出提示信息。手机根据指示信息显示目标界面54,目标界面54中包括文字“唤醒音箱”、“是”按钮和“否”按钮。假设,用户确认唤醒音箱,可以点击“是”按钮,以确认唤醒音箱。相应的,手机获取用户根据提示信息输入的响应信息,并将响应信息传输给音箱。音箱根据响应信息确定唤醒音箱。
可选的,在又一个示例中,提示信息为音频,音箱向用户输出提示信息,用于引导用户对电子设备上的目标物理按键进行操作。如图6G所示,音箱输出音频“如果您在叫我,请您按一下暂停键”。相应的,用户确认唤醒音箱,可以按压音箱上的暂停键。
可以理解的是,电子设备为了实现上述功能,其包含了执行各个功能相应的硬件和/或软件模块。结合本文中所公开的实施例描述的各示例的算法步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。本领域技术人员可以结合实施例对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
本申请实施例可以根据上述方法示例对电子设备进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个模块中。需要说明的是,本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。需要说明的是,本申请实施例中模块的名称是示意性的,实际实现时对模块的名称不做限定。
示例性的,图7为本申请实施例提供的唤醒设备的装置的一种结构示意图。可选的,唤醒设备的装置可以应用于电子设备。如图7所示,本实施例提供的唤醒设备的装置,可以包括:
鉴伪模块31,用于在确定接收的音频信号包括唤醒词时,对所述音频信号进行机器音识别和/或声纹识别,得到识别结果;所述识别结果用于指示唤醒电子设备、不唤醒电子设备或者不确定是否唤醒电子设备;
输出模块71,用于若所述识别结果指示不确定是否唤醒电子设备,则向用户输出提示信息;所述提示信息用于引导用户唤醒电子设备。
可选的,所述提示信息用于引导所述用户与所述电子设备进行语音交互,以确定是否唤醒所述电子设备;或者,
所述提示信息用于引导所述用户在目标设备上的拍摄设备的拍摄范围内执行预设动作,以确定是否唤醒所述电子设备;或者,
所述提示信息用于引导所述用户在所述目标设备显示的目标界面中进行操作,以确定是否唤醒所述电子设备;或者,
所述提示信息用于引导所述用户对所述目标设备上的目标物理按键进行操作,以确定是否唤醒所述电子设备。
可选的,所述目标设备为所述电子设备,或者为与所述电子设备通信的第一设备。
可选的,所述电子设备和所述第一设备的用户账号相同。
可选的,输出模块71用于:
向所述用户输出语音提示信息;或者,
向所述用户显示目标界面,所述目标界面包括所述提示信息。
可选的,还包括传输模块,用于:
与第一设备传输指示信息,所述指示信息用于指示所述第一设备向所述用户输出所述提示信息。
可选的,所述第一设备包括手机和/或手表。
可选的,鉴伪模块31用于:
对所述音频信号进行机器音识别和声纹识别;
若确定所述音频信号是机器音,则所述识别结果指示不唤醒电子设备;
若确定所述音频信号不是机器音,且声纹识别成功,则所述识别结果指示唤醒电子设备;
若确定所述音频信号不是机器音,且声纹识别失败,则所述识别结果指示不确定是否唤醒电子设备。
可选的,鉴伪模块31用于:
对所述音频信号进行机器音识别;
若确定所述音频信号不是机器音,则对所述音频信号进行声纹识别。
可选的,鉴伪模块31用于:
将所述音频信号输入声纹鉴伪模型,得到第一结果和所述音频信号的声纹特征信息;所述第一结果用于指示所述音频信号是否为机器音;
若所述第一结果指示所述音频信号不是机器音,则根据所述音频信号的声纹特征信息和声纹模板库对所述音频信号进行声纹识别。
可选的,鉴伪模块31用于:
对所述音频信号进行机器音识别;
若确定所述音频信号是机器音,则所述识别结果指示不唤醒电子设备;
若确定所述音频信号不是机器音,则所述识别结果指示唤醒电子设备;
若不确定所述音频信号是否为机器音,则所述识别结果指示不确定是否唤醒电子设备。
可选的,鉴伪模块31用于:
对所述音频信号进行声纹识别;
若确定声纹识别成功,则所述识别结果指示唤醒电子设备;
若确定声纹识别失败,则所述识别结果指示不确定是否唤醒电子设备。
可选的,还包括确认模块,所述确认模块用于:
获取所述用户根据所述提示信息输入的响应信息;
根据所述响应信息确定是否唤醒所述电子设备。
可选的,所述提示信息用于引导所述用户在所述电子设备上的拍摄设备的拍摄范围内执行预设动作,所述确认模块还用于:
在获取所述用户根据所述提示信息输入的响应信息之前,启动所述电子设备上的所述拍摄设备。
可选的,所述电子设备存储有声纹模板库,还包括更新模块,所述更新模块用于:
若根据所述响应信息确定唤醒所述电子设备,则根据所述音频信号更新所述声纹模板库。
可选的,还包括唤醒词模块21,所述唤醒词模块21用于:
获取音频信号,确定音频信号是否包括唤醒词。
本实施例提供的唤醒设备的装置,用于执行本申请方法实施例提供的唤醒设备的方法,技术原理和技术效果相似,此处不再赘述。
请参考图8,其示出了本申请实施例提供的电子设备的另一种结构。电子设备包括:处理器801、接收器802、发射器803、存储器804和总线805。处理器801包括一个或者多个处理核心,处理器801通过运行软件程序以及模块,从而执行各种功能的应用以及信息处理。存储器804可用于存储至少一个程序指令,处理器801用于执行至少一个程序指令,以实现上述实施例的技术方案。接收器802和发射器803可以实现为一个通信组件,该通信组件可以是一块基带芯片。存储器804通过总线805和处理器801相连。其实现原理和技术效果与上述方法相关实施例类似,此处不再赘述。
本领域技术人员可以理解,为了便于说明,图8仅示出了一个存储器和处理器。 在实际的电子设备中,可以存在多个处理器和存储器。存储器也可以称为存储介质或者存储设备等,本申请实施例对此不做限制。
在本申请实施例中,处理器可以是通用处理器、数字信号处理器、专用集成电路、现场可编程门阵列或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件,可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。
在本申请实施例中,存储器可以是非易失性存储器,比如硬盘(hard disk drive,HDD)或固态硬盘(solid-state drive,SS)等,还可以是易失性存储器(volatile memory),例如随机存取存储器(random-access memory,RAM)。存储器是能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,不限于此。
本申请实施例中的存储器还可以是电路或者其它任意能够实现存储功能的装置,用于存储程序指令和/或数据。本申请各实施例提供的方法中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、网络设备、用户设备或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机可以存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,数字视频光盘(digital video disc,DWD)、或者半导体介质(例如,SSD)等。
本申请实施例提供一种计算机程序产品,当所述计算机程序产品在设备运行时,使得所述设备执行上述实施例中的技术方案。其实现原理和技术效果与上述相关实施例类似,此处不再赘述。
本申请实施例提供一种计算机可读存储介质,其上存储有程序指令,所述程序指令被设备执行时,使得所述设备执行上述实施例的技术方案。其实现原理和技术效果与上述相关实施例类似,此处不再赘述。
综上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。

Claims (17)

  1. 一种唤醒设备的方法,其特征在于,包括:
    在确定接收的音频信号包括唤醒词时,对所述音频信号进行机器音识别和/或声纹识别,得到识别结果;所述识别结果用于指示唤醒电子设备、不唤醒电子设备或者不确定是否唤醒电子设备;
    若所述识别结果指示不确定是否唤醒电子设备,则向用户输出提示信息;所述提示信息用于引导用户唤醒电子设备。
  2. 根据权利要求1所述的方法,其特征在于,
    所述提示信息用于引导所述用户与所述电子设备进行语音交互,以确定是否唤醒所述电子设备;或者,
    所述提示信息用于引导所述用户在目标设备上的拍摄设备的拍摄范围内执行预设动作,以确定是否唤醒所述电子设备;或者,
    所述提示信息用于引导所述用户在所述目标设备显示的目标界面中进行操作,以确定是否唤醒所述电子设备;或者,
    所述提示信息用于引导所述用户对所述目标设备上的目标物理按键进行操作,以确定是否唤醒所述电子设备。
  3. 根据权利要求2所述的方法,其特征在于,所述目标设备为所述电子设备,或者为与所述电子设备通信的第一设备。
  4. 根据权利要求3所述的方法,其特征在于,所述电子设备和所述第一设备的用户账号相同。
  5. 根据权利要求1所述的方法,其特征在于,所述向用户输出提示信息,包括:
    向所述用户输出语音提示信息;或者,
    向所述用户显示目标界面,所述目标界面包括所述提示信息。
  6. 根据权利要求1所述的方法,其特征在于,所述向用户输出提示信息,包括:
    与第一设备传输指示信息,所述指示信息用于指示所述第一设备向所述用户输出所述提示信息。
  7. 根据权利要求3、4或6所述的方法,其特征在于,所述第一设备包括手机和/或手表。
  8. 根据权利要求1-7中任一项所述的方法,其特征在于,所述对所述音频信号进行机器音识别和/或声纹识别,得到识别结果,包括:
    对所述音频信号进行机器音识别和声纹识别;
    若确定所述音频信号是机器音,则所述识别结果指示不唤醒电子设备;
    若确定所述音频信号不是机器音,且声纹识别成功,则所述识别结果指示唤醒电子设备;
    若确定所述音频信号不是机器音,且声纹识别失败,则所述识别结果指示不确定是否唤醒电子设备。
  9. 根据权利要求8所述的方法,其特征在于,所述对所述音频信号进行机器音识别和声纹识别,包括:
    对所述音频信号进行机器音识别;
    若确定所述音频信号不是机器音,则对所述音频信号进行声纹识别。
  10. 根据权利要求8所述的方法,其特征在于,所述对所述音频信号进行机器音识别和声纹识别,包括:
    将所述音频信号输入声纹鉴伪模型,得到第一结果和所述音频信号的声纹特征信息;所述第一结果用于指示所述音频信号是否为机器音;
    若所述第一结果指示所述音频信号不是机器音,则根据所述音频信号的声纹特征信息和声纹模板库对所述音频信号进行声纹识别。
  11. 根据权利要求1-7中任一项所述的方法,其特征在于,所述对所述音频信号进行机器音识别和/或声纹识别,得到识别结果,包括:
    对所述音频信号进行机器音识别;
    若确定所述音频信号是机器音,则所述识别结果指示不唤醒电子设备;
    若确定所述音频信号不是机器音,则所述识别结果指示唤醒电子设备;
    若不确定所述音频信号是否为机器音,则所述识别结果指示不确定是否唤醒电子设备。
  12. 根据权利要求1-7中任一项所述的方法,其特征在于,所述对所述音频信号进行机器音识别和/或声纹识别,得到识别结果,包括:
    对所述音频信号进行声纹识别;
    若确定声纹识别成功,则所述识别结果指示唤醒电子设备;
    若确定声纹识别失败,则所述识别结果指示不确定是否唤醒电子设备。
  13. 根据权利要求1-12中任一项所述的方法,其特征在于,所述方法还包括:
    获取所述用户根据所述提示信息输入的响应信息;
    根据所述响应信息确定是否唤醒所述电子设备。
  14. 根据权利要求13所述的方法,其特征在于,所述提示信息用于引导所述用户在所述电子设备上的拍摄设备的拍摄范围内执行预设动作,所述获取所述用户根据所述提示信息输入的响应信息之前,还包括:
    启动所述电子设备上的所述拍摄设备。
  15. 根据权利要求13所述的方法,其特征在于,所述电子设备存储有声纹模板库,所述方法还包括:
    若根据所述响应信息确定唤醒所述电子设备,则根据所述音频信号更新所述声纹模板库。
  16. 一种电子设备,其特征在于,包括处理器,所述处理器用于与存储器耦合,并读取存储器中的指令并根据所述指令使得所述电子设备执行如权利要求1-15中任一项所述的方法。
  17. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机指令,当所述计算机指令在设备上运行时,使得所述设备执行如权利要求1-15中任一项所述的方法。
PCT/CN2023/087805 2022-04-18 2023-04-12 唤醒设备的方法、电子设备和存储介质 WO2023202442A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210403454.8A CN116959438A (zh) 2022-04-18 2022-04-18 唤醒设备的方法、电子设备和存储介质
CN202210403454.8 2022-04-18

Publications (1)

Publication Number Publication Date
WO2023202442A1 true WO2023202442A1 (zh) 2023-10-26

Family

ID=88419127

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/087805 WO2023202442A1 (zh) 2022-04-18 2023-04-12 唤醒设备的方法、电子设备和存储介质

Country Status (2)

Country Link
CN (1) CN116959438A (zh)
WO (1) WO2023202442A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117354623A (zh) * 2023-12-04 2024-01-05 深圳市冠旭电子股份有限公司 拍照的控制方法、装置、电子设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1399247A (zh) * 2001-07-19 2003-02-26 三星电子株式会社 能够防止语音识别中的错误和提高语音识别率的电子设备
CN102404330A (zh) * 2011-11-30 2012-04-04 上海博泰悦臻电子设备制造有限公司 多用户同时在线管理方法和***
CN110459204A (zh) * 2018-05-02 2019-11-15 Oppo广东移动通信有限公司 语音识别方法、装置、存储介质及电子设备
WO2020019176A1 (zh) * 2018-07-24 2020-01-30 华为技术有限公司 一种终端更新语音助手的唤醒语音的方法及终端
CN112700782A (zh) * 2020-12-25 2021-04-23 维沃移动通信有限公司 语音处理方法和电子设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1399247A (zh) * 2001-07-19 2003-02-26 三星电子株式会社 能够防止语音识别中的错误和提高语音识别率的电子设备
CN102404330A (zh) * 2011-11-30 2012-04-04 上海博泰悦臻电子设备制造有限公司 多用户同时在线管理方法和***
CN110459204A (zh) * 2018-05-02 2019-11-15 Oppo广东移动通信有限公司 语音识别方法、装置、存储介质及电子设备
WO2020019176A1 (zh) * 2018-07-24 2020-01-30 华为技术有限公司 一种终端更新语音助手的唤醒语音的方法及终端
CN112700782A (zh) * 2020-12-25 2021-04-23 维沃移动通信有限公司 语音处理方法和电子设备

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117354623A (zh) * 2023-12-04 2024-01-05 深圳市冠旭电子股份有限公司 拍照的控制方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
CN116959438A (zh) 2023-10-27

Similar Documents

Publication Publication Date Title
US20200410983A1 (en) System and method for a language understanding conversational system
US11217245B2 (en) Customizable keyword spotting system with keyword adaptation
JP2019117623A (ja) 音声対話方法、装置、デバイス及び記憶媒体
WO2019098038A1 (ja) 情報処理装置、及び情報処理方法
CN110998720A (zh) 话音数据处理方法及支持该方法的电子设备
TW201905675A (zh) 資料更新方法、客戶端及電子設備
KR20190100512A (ko) 챗봇과 대화하기 위한 전자 장치 및 그의 동작 방법
US11120792B2 (en) System for processing user utterance and controlling method thereof
CN112840396A (zh) 用于处理用户话语的电子装置及其控制方法
WO2021008538A1 (zh) 语音交互方法及相关装置
WO2023202442A1 (zh) 唤醒设备的方法、电子设备和存储介质
US10540973B2 (en) Electronic device for performing operation corresponding to voice input
KR20210001082A (ko) 사용자 발화를 처리하는 전자 장치와 그 동작 방법
US10908763B2 (en) Electronic apparatus for processing user utterance and controlling method thereof
US20190304455A1 (en) Electronic device for processing user voice
CN109670025A (zh) 对话管理方法及装置
US11244676B2 (en) Apparatus for processing user voice input
WO2023040658A1 (zh) 语音交互方法及电子设备
EP4293664A1 (en) Voiceprint recognition method, graphical interface, and electronic device
US11942089B2 (en) Electronic apparatus for recognizing voice and method of controlling the same
KR20200056754A (ko) 개인화 립 리딩 모델 생성 방법 및 장치
KR20210059367A (ko) 음성 입력 처리 방법 및 이를 지원하는 전자 장치
US20220261218A1 (en) Electronic device including speaker and microphone and method for operating the same
US10916250B2 (en) Duplicate speech to text display for the deaf
KR20240038523A (ko) 오거부 판단 방법 및 이를 수행하는 전자 장치

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23791097

Country of ref document: EP

Kind code of ref document: A1