CN114694647A - Response method and device for awakening voice audio, storage medium and electronic device - Google Patents

Response method and device for awakening voice audio, storage medium and electronic device Download PDF

Info

Publication number
CN114694647A
CN114694647A CN202210152710.0A CN202210152710A CN114694647A CN 114694647 A CN114694647 A CN 114694647A CN 202210152710 A CN202210152710 A CN 202210152710A CN 114694647 A CN114694647 A CN 114694647A
Authority
CN
China
Prior art keywords
volume value
voice
audio
volume
awakening
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210152710.0A
Other languages
Chinese (zh)
Inventor
周斌道
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Haier Technology Co Ltd
Haier Smart Home Co Ltd
Original Assignee
Qingdao Haier Technology Co Ltd
Haier Smart Home Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao Haier Technology Co Ltd, Haier Smart Home Co Ltd filed Critical Qingdao Haier Technology Co Ltd
Priority to CN202210152710.0A priority Critical patent/CN114694647A/en
Publication of CN114694647A publication Critical patent/CN114694647A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The application discloses a response method and a response device for awakening voice audio, a storage medium and an electronic device, wherein the method comprises the following steps: under the condition that the awakening voice audio is judged to be effective, determining a first volume value of the awakening voice audio; calculating a second volume value corresponding to the first volume value according to the first linear relation, and controlling the voice interaction equipment to play a response voice audio corresponding to the awakening voice audio by the second volume value; wherein the difference value between the first volume value and the second volume value is smaller than a first preset threshold value; by adopting the technical scheme, the problems that in the related technology, the voice interaction equipment cannot automatically adjust the voice interaction volume, cannot meet the actual needs of a user, and causes poor user experience and the like are solved.

Description

Response method and device for awakening voice audio, storage medium and electronic device
Technical Field
The present application relates to the field of communications, and in particular, to a response method and apparatus for waking up a voice audio, a storage medium, and an electronic apparatus.
Background
Along with the popularization of intelligent voice products and the increase of the usage amount, the interactive experience requirements of users on the voice products are continuously improved.
In the related art, the voice interaction device can only set a fixed voice interaction volume, and a user is required to set the volume through voice or a key, but cannot dynamically adjust the volume according to changes in the surrounding environment. For example, when the ambient noise is large, the user may not hear the sound emitted by the voice module of the voice interaction device; when the person is still at night, the voice of the voice module is too loud, and other people can be influenced. After the user wakes up, the user may move around, and when the user walks to a place far away from the device to handle things, the user cannot hear the sound of the voice module, for example, after the user opens the menu function, the user goes to a place far away from the voice module to prepare food materials, and the user may not hear the content played by the voice module, so that the user experience is poor.
Aiming at the problems that in the related art, the intelligent voice product cannot automatically adjust the voice interaction volume, cannot meet the actual needs of a user, causes poor user experience and the like, and an effective solution is not provided.
Disclosure of Invention
The embodiment of the application provides a response method and device for awakening voice audio, a storage medium and an electronic device, and aims to at least solve the problem that in the related technology, voice interaction equipment cannot automatically adjust voice interaction volume, cannot meet actual needs of a user, and is poor in user experience.
According to an embodiment of the present application, there is provided a response method for waking up a voice audio, including: under the condition that the awakening voice audio is judged to be effective, determining a first volume value of the awakening voice audio; calculating a second volume value corresponding to the first volume value according to the first linear relation, and controlling the voice interaction equipment to play a response voice audio corresponding to the awakening voice audio by the second volume value; wherein, the difference value between the first volume value and the second volume value is smaller than a first preset threshold value.
In one exemplary embodiment, determining a first volume value of the wake-up voice audio comprises: determining a target volume value calculation method from a plurality of volume value calculation methods, wherein the plurality of volume value calculation methods include at least one of: calculating the root mean square RMS of all sampling points in the awakening voice audio and calculating the average value AVG of all sampling points in the awakening voice audio; and calculating the energy value of the awakening voice audio by the target volume value calculation method to obtain a first volume value of the awakening voice audio.
In an exemplary embodiment, after determining a second volume value corresponding to the first volume value, the method further comprises: storing the second volume value into an audio database, wherein the audio database is used for storing all volume values of response audio emitted by the voice interaction equipment; periodically traversing all volume values of the response audio stored in the audio database according to a first preset period to determine a third volume value with the highest occurrence frequency in all volume values; determining the third volume value as a default volume value of the voice interaction device.
In an exemplary embodiment, after determining the third volume value as a default volume value of the voice interaction device, the method further comprises: and controlling the voice interaction equipment to respond to the awakening voice audio based on the response voice of the default volume value under the condition that the awakening voice audio received next time is invalid.
In an exemplary embodiment, after controlling the voice interaction device to respond to the wake-up voice audio based on a response voice, the method further includes: acquiring at least two interactive audios sent by a first object to the voice interaction equipment according to a second preset period; determining a distance change of the first object from the voice interaction device according to comparing volume values of the at least two interaction audios; and adjusting the volume value of the current interactive audio of the voice interactive equipment according to the distance change.
In one exemplary embodiment, determining a distance variation of the first object from the voice interaction device according to volume values of the at least two interaction audios includes: comparing the received current interactive audio volume value with the received last interactive audio volume value; determining that the first object is far away from the voice interaction device if the received current interaction audio volume value is smaller than the received last interaction audio volume value; determining that the first object is close to the voice interaction device if the received current interaction audio volume value is greater than the received last interaction audio volume value.
In an exemplary embodiment, adjusting the volume value of the current interaction audio of the voice interaction device according to the distance variation includes: if the distance change indicates that the first object is far away from the voice interaction device, increasing the volume value of the current interaction audio of the voice interaction device to a fourth volume value; and under the condition that the distance change indicates that the first object is close to the voice interaction equipment, reducing the volume value of the current interaction audio of the voice interaction equipment to a fifth volume value, wherein the fourth volume value and the fifth volume value are both calculated through a second linear relation, and the second linear relation is used for indicating the functional relation between the distance change and the volume value.
According to another embodiment of the present application, there is also provided a response device for waking up an audio, including: the determining module is used for determining a first volume value of the awakening voice audio under the condition that the awakening voice audio is judged to be the effective awakening voice audio; the computing module is used for computing a second volume value corresponding to the first volume value according to the first linear relation and controlling the voice interaction equipment to play a response voice audio corresponding to the awakening voice audio by the second volume value; wherein, the difference value between the first volume value and the second volume value is smaller than a first preset threshold value.
According to another aspect of the embodiments of the present application, there is also provided a computer-readable storage medium, in which a computer program is stored, where the computer program is configured to execute the above-mentioned wake-up voice audio processing method when running.
According to another aspect of the embodiments of the present application, there is also provided an electronic apparatus, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the above-mentioned method for processing the wake-up voice audio through the computer program.
In the embodiment of the application, under the condition that the awakening voice audio is judged to be the effective awakening voice audio, a first volume value of the awakening voice audio is determined; calculating a second volume value corresponding to the first volume value according to the first linear relation, and controlling the voice interaction equipment to play a response voice audio corresponding to the awakening voice audio by the second volume value; wherein, the difference value between the first volume value and the second volume value is smaller than a first preset threshold value. By adopting the technical scheme, the problems that in the related art, the voice interaction volume cannot be automatically adjusted by the voice interaction equipment, the actual needs of a user cannot be met, the user experience is poor and the like are solved, the requirements of the user on the voice interaction volume in different environments are met, and the technical effect of the user experience is improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
fig. 1 is a block diagram of a hardware structure of a voice module of an alternative response method for waking up a voice audio according to an embodiment of the present application;
FIG. 2 is a flowchart illustrating an alternative wake-up voice audio response method according to an embodiment of the present application;
FIG. 3 is a flow chart of an alternative method of responding to wake-up voice audio according to an embodiment of the present application;
FIG. 4 is a flow chart of an alternative method of responding to wake-up voice audio according to an embodiment of the present application;
fig. 5 is a block diagram of an alternative responding apparatus for waking up voice audio according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The method provided by the embodiment of the application can be executed in a voice interaction device or a similar operation system. Taking the example of operating on the voice interaction device, fig. 1 is a hardware structure block diagram of the voice interaction device of the response method for waking up the voice audio according to the embodiment of the present application. As shown in fig. 1, the voice interaction device may include one or more (only one shown in fig. 1) processors 102 (the processor 102 may include, but is not limited to, a processing system such as a microprocessor MCU or a programmable logic device FPGA) and a memory 104 for storing data, and in an exemplary embodiment, may also include a transmission device 106 for communication functions and an input-output device 108. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration, and is not intended to limit the structure of the voice interaction device. For example, the voice interaction device may also include more or fewer components than shown in FIG. 1, or have a different configuration with equivalent functionality to that shown in FIG. 1 or more functionality than that shown in FIG. 1.
The memory 104 can be used to store a computer program, for example, a software program and a module of an application software, such as a computer program corresponding to the method for processing the wake-up voice audio in the embodiment of the present application, and the processor 102 executes the computer program stored in the memory 104 to execute various functional applications and data processing, i.e., to implement the method described above. The memory 104 may include high speed random access memory and may also include non-volatile memory, such as one or more magnetic storage systems, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the voice interaction device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission system 106 is used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the voice interactive apparatus. In one example, the transmission system 106 includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet.
In this embodiment, a response method for waking up a voice audio is provided, and is applied to the above-mentioned voice interaction device, and fig. 2 is a flowchart of an optional response method for waking up a voice audio according to an embodiment of the present application, where the flowchart includes the following steps:
step S202, under the condition that the awakening voice audio is judged to be effective, determining a first volume value of the awakening voice audio;
step S204, calculating a second volume value corresponding to the first volume value according to the first linear relation, and controlling the voice interaction equipment to play a response voice audio corresponding to the awakening voice audio by the second volume value; wherein, the difference value between the first volume value and the second volume value is smaller than a first preset threshold value.
Through the steps, under the condition that the awakening voice audio is judged to be the effective awakening voice audio, determining a first volume value of the awakening voice audio; calculating a second volume value corresponding to the first volume value according to the first linear relation, and controlling the voice interaction equipment to play a response voice audio corresponding to the awakening voice audio by using the second volume value; wherein the difference value between the first volume value and the second volume value is smaller than a first preset threshold value; by adopting the technical scheme, the problems that in the related art, the voice interaction volume cannot be automatically adjusted by the voice interaction equipment, the actual needs of a user cannot be met, the user experience is poor and the like are solved, the requirements of the user on the voice interaction volume in different environments are met, and the technical effect of the user experience is improved.
It should be noted that, the above determining that the wake-up voice audio is the valid wake-up voice audio may be understood as determining whether the wake-up voice audio is falsely triggered by the user, and may also be understood as determining whether the volume value of the wake-up voice audio is the valid value, which is not limited in this application.
It should be noted that, if the difference between the first volume value and the second volume value is smaller than the first preset threshold, it can be understood that the voice interaction device responds to the wake-up voice audio with a volume similar to the volume of the wake-up voice audio of the user.
For better understanding of the technical solutions of the alternative embodiments of the present invention, the terms involved in the embodiments of the present invention are defined as follows:
lombard Effect: in a strong noise environment, a speaker has to improve the sound effects, including increasing the frequency and intensity of the sound, lengthening the sentence, and so on, and hopefully, the speaker can hear the voice.
It should be noted that, based on the humped effect, when the peripheral noise is large, the volume of the user awakening the voice audio is naturally increased, and the voice module automatically increases the volume along with the increase of the awakening voice audio; and the volume of the user in the quiet environment is reduced, and the voice module is correspondingly reduced.
In one exemplary embodiment, determining a first volume value of the wake-up voice audio comprises: determining a target volume value calculation method from a plurality of volume value calculation methods, wherein the plurality of volume value calculation methods include at least one of: calculating Root Mean Square (RMS) of all sampling points in the awakening voice audio and calculating Average Value (AVG) of all sampling points in the awakening voice audio; and calculating the energy value of the awakening voice audio by the target volume value calculation method to obtain a first volume value of the awakening voice audio.
In order to calculate the volume value of the wake-up voice audio more accurately so as to calculate the volume of the response voice required by the current user according to the volume of the user, the following target volume value calculation methods are proposed in the application: calculating decibel values of user sounds by a method of calculating Root Mean Square (RMS) energy values of all sampling points of the awakening voice audio; or calculating the decibel value of the voice of the user by calculating the Average Value (AVG) of all sampling points in the awakening voice audio. When the fluctuation range of the sampling point data in the obtained awakening voice audio exceeds a preset threshold value, a method of calculating an Average Value (AVG) is adopted; and when the fluctuation range of the sampling point data does not exceed the preset threshold, calculating the energy value of the awakening voice audio by adopting a method of calculating Root Mean Square (RMS) to obtain a first volume value of the awakening voice audio.
It should be noted that, the above method for calculating the target volume value may further include a method for calculating peaks of all sampling points of the wake-up voice audio, which is not limited in this application.
Based on the above process, after determining the second volume value corresponding to the first volume value, the method further includes: storing the second volume value into an audio database, wherein the audio database is used for storing all volume values of response audio sent by the voice interaction equipment; periodically traversing all volume values of the response audio stored in the audio database according to a first preset period to determine a third volume value with the highest occurrence frequency in all volume values; determining the third volume value as a default volume value for the voice interaction device.
In order to bring better use experience to users, a default volume value is set for the voice interaction equipment, but if the default volume value is fixed and unchanged, the requirement of the users in most time cannot be met, therefore, the volume value of the response audio corresponding to the voice audio awakened each time is stored in the audio database, then the volume value data stored in the audio database is periodically traversed according to a preset period, the volume value with the highest occurrence frequency is found out, the volume value is set as a third volume value, and the default volume value of the voice interaction equipment is updated to the third volume value, so that the default volume value of the voice interaction equipment is always the volume value which is most suitable for the requirement of the users.
Further, after determining the third volume value as a default volume value for the voice interaction device, the method further comprises: and controlling the voice interaction equipment to respond to the awakening voice audio based on the response voice of the default volume value under the condition that the awakening voice audio received next time is invalid.
When the received awakening voice audio is judged to be invalid awakening voice audio, namely under the condition that the user is awakened by mistake, the voice interaction equipment still needs to give voice response to the user, but the volume value of the response voice cannot be adjusted according to the volume value of the awakening voice audio, so that the voice interaction equipment can update the default volume value based on the user requirement and respond to the awakening voice audio based on the response voice of the updated default volume value.
Based on the above process, after controlling the voice interaction device to respond to the wake-up voice audio based on the response voice, the method further includes: acquiring at least two interactive audios sent by a first object to the voice interaction equipment according to a second preset period; determining a distance change of the first object from the voice interaction device according to comparing volume values of the at least two interaction audios; and adjusting the volume value of the current interactive audio of the voice interactive equipment according to the distance change.
In the actual application process, a user may interact with the voice interaction device while handling other things, and the position of the user may be in a constantly changing process, so as to prevent the user from influencing the use experience due to too small or too large interaction volume of the voice interaction device when the user is far away from or close to the voice interaction device; the voice interaction equipment collects at least two interactive audios sent by the user to the voice interaction equipment according to a second preset period, the distance change of the user relative to the voice interaction equipment is determined by comparing the volume values of the at least two interactive audios, and the volume value of the current interactive audio of the voice interaction equipment is adjusted according to the obtained distance change.
It should be noted that, in the interaction process, if the difference between the user volume value and the wake-up volume value exceeds a certain range, the playing volume of the interactive audio of the voice device is reset.
Further, determining a distance change of the first object from the voice interaction device according to volume values of the at least two interaction audios comprises: comparing the received current interactive audio volume value with the received last interactive audio volume value; determining that the first object is far away from the voice interaction device if the received current interaction audio volume value is smaller than the received last interaction audio volume value; determining that the first object is close to the voice interaction device if the received current interaction audio volume value is greater than the received last interaction audio volume value.
It can be understood that when the user uses the menu function, the user can prepare food materials in the kitchen, so that the distance between the food materials and the voice interaction device is far, if the voice interaction device interacts with the user according to the volume value calculated according to the awakening voice audio, the user cannot hear the content played by the voice module, and bad use experience can be generated; therefore, the voice interaction device can acquire the volume value of the interaction audio sent by the current user when the current user is in the kitchen, compare the volume value with the volume value corresponding to the awakening voice audio, and judge that the user is far away from the voice interaction device if the current volume value is smaller than the volume value corresponding to the awakening voice audio; and if the current volume value is larger than the volume value corresponding to the awakening voice audio, judging that the user is close to the voice interaction equipment. The voice interaction device automatically adjusts and increases the interaction volume so that the user can clearly hear the response voice of the voice device in the kitchen.
Based on the above process, adjusting the volume value of the current interactive audio of the voice interaction device according to the distance change includes: if the distance change indicates that the first object is far away from the voice interaction device, increasing the volume value of the current interaction audio of the voice interaction device to a fourth volume value; and under the condition that the distance change indicates that the first object is close to the voice interaction equipment, reducing the volume value of the current interaction audio of the voice interaction equipment to a fifth volume value, wherein the fourth volume value and the fifth volume value are both calculated through a second linear relation, and the second linear relation is used for indicating the functional relation between the distance change and the volume value.
After judging that the user is far away from or close to the voice interaction device, automatically and correspondingly adjusting the volume value of the voice interaction device, and when judging that the user is far away from the voice interaction device, increasing the volume value of the current interaction audio of the voice interaction device to a fourth volume value; and when the user is judged to be close to the voice interaction equipment, reducing the volume value of the current interaction audio of the voice interaction equipment to a fifth volume value, and calculating a fourth volume value and the fifth volume value by a second linear relation, wherein the second linear relation is used for indicating the functional relation between the distance change and the volume value.
It should be noted that, if the user is far away from the voice module in the interaction process, the voice module will detect that the volume of the user is relatively reduced, and then will automatically increase the volume; on the contrary, if the user is close to the voice module, the voice module detects that the volume of the user is relatively increased, and then the volume is automatically reduced, so that the user can hear the sound of the voice module in different environments.
Fig. 3 is a flowchart illustrating an alternative response method for waking up voice audio according to an embodiment of the present application. As shown in fig. 3, a response method for waking up a voice audio is provided, which includes the following specific steps:
step S302: the voice interaction equipment receives awakening voice of a user;
step S304: judging whether the awakening voice is effective awakening or not to prevent misoperation of the user, if the awakening voice is effective awakening, executing step S306, and if not, executing step S318;
step S306: calculating the energy of the awakening voice audio;
step S308: saving the calculated energy value;
step S310: judging whether the audio energy value is effective or not, avoiding the occurrence of an excessively small or large energy value, and if so, executing step S312; if not, go to step S318;
step S312: calculating the volume value which is corresponding to the audio energy value and should be set by the voice interaction equipment;
step S314: setting the volume of the voice module of the voice interaction device to the value calculated in step S312;
step S316: playing the reply language according to the set volume value;
step S318: the volume of the voice module of the voice interaction device is set to a default value, and step S316 is executed.
Through the steps, the voice interaction equipment performs effective awakening judgment under the condition of receiving the awakening voice of the user, so that the user is prevented from being awakened by mistake, and if the voice interaction equipment is not effectively awakened, the voice interaction volume is set to be a default numerical value and a reply is played; if the voice is effectively awakened, calculating and storing an energy value of the awakening voice audio, judging whether the audio energy value is an effective energy value, preventing the energy value from being too large or too small and exceeding a normal range, and if the audio energy value is not an effective energy value, setting the volume as a default numerical value and playing a reply word according to the default volume value; if the energy value is the effective energy value, calculating the volume value of the voice interaction equipment corresponding to the energy value, setting the volume of the voice interaction equipment to be the calculated volume value, and playing the reply language by using the volume value; by adopting the technical scheme, the problems that in the related art, the voice interaction volume cannot be automatically adjusted by the voice interaction equipment, the actual needs of a user cannot be met, the user experience is poor and the like are solved, the requirements of the user on the voice interaction volume in different environments are met, and the technical effect of the user experience is improved.
In this embodiment, another optional wake-up audio response method is provided, and fig. 4 is a flowchart of another optional wake-up audio response method according to the embodiment of the present application, and as shown in fig. 4, the following steps are specifically provided:
step S402: the voice interaction equipment starts to perform voice interaction with the user;
step S404: calculating the energy value of the interactive audio;
step S406: judging whether the energy value exceeds the fluctuation range, if not, executing the step S416; if yes, go to step S408;
step S408: judging whether the energy value of the interactive audio is greater than the energy value of the awakening voice audio, if so, executing a step S410; if not, go to step S412;
step S410: correspondingly turning down the interactive volume value of the voice interactive equipment;
step S412: correspondingly increasing the interaction volume value of the voice interaction equipment;
step S414: setting the interactive volume of the voice interactive equipment according to the obtained volume value;
step S416: and playing the interactive audio according to the set interactive volume.
In the embodiment of the invention, the voice interaction equipment firstly judges whether the energy value exceeds the fluctuation range by capturing the change of the interaction audio energy of the user in the interaction process, if so, judges the size between the interaction audio energy and the awakening voice audio energy so as to judge the change of the distance between the user and the voice interaction equipment, and correspondingly adjusts the interaction volume of the voice interaction equipment according to the obtained change of the distance, thereby bringing better use experience to the user; by adopting the technical scheme, the problems that in the related art, the voice interaction volume cannot be automatically adjusted by the voice interaction equipment, the actual needs of a user cannot be met, the user experience is poor and the like are solved, the requirements of the user on the voice interaction volume in different environments are met, and the technical effect of the user experience is improved.
Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method of the embodiments of the present application.
FIG. 5 is a block diagram of a responding device for waking up voice audio according to an embodiment of the present application; as shown in fig. 5, includes:
a determining module 52, configured to determine a first volume value of a wake-up voice audio if the wake-up voice audio is determined to be a valid wake-up voice audio;
a calculating module 54, configured to calculate a second volume value corresponding to the first volume value according to a first linear relationship, and control the voice interaction device to play a response voice audio corresponding to the wake-up voice audio with the second volume value; wherein, the difference value between the first volume value and the second volume value is smaller than a first preset threshold value.
By the device, under the condition that the awakening voice audio is judged to be the effective awakening voice audio, a first volume value of the awakening voice audio is determined; calculating a second volume value corresponding to the first volume value according to the first linear relation, and controlling the voice interaction equipment to play a response voice audio corresponding to the awakening voice audio by the second volume value; wherein the difference value between the first volume value and the second volume value is smaller than a first preset threshold value; by adopting the technical scheme, the problems that in the related art, the voice interaction volume cannot be automatically adjusted by the voice interaction equipment, the actual needs of a user cannot be met, the user experience is poor and the like are solved, the requirements of the user on the voice interaction volume in different environments are met, and the technical effect of the user experience is improved.
In an exemplary embodiment, the determining module is further configured to determine a first volume value of the wake-up voice audio, and includes: determining a target volume value calculation method from a plurality of volume value calculation methods, wherein the plurality of volume value calculation methods include at least one of: calculating the root mean square RMS of all sampling points in the awakening voice audio and calculating the average value AVG of all sampling points in the awakening voice audio; and calculating the energy value of the awakening voice audio by the target volume value calculation method to obtain a first volume value of the awakening voice audio.
In order to calculate the volume value of the wake-up voice audio more accurately so as to calculate the volume of the response voice required by the current user according to the volume of the user, the following target volume value calculation methods are proposed in the application: calculating decibel values of user sounds by a method of calculating Root Mean Square (RMS) energy values of all sampling points of the awakening voice audio; or calculating the decibel value of the user voice by calculating the Average Value (AVG) of all sampling points in the awakening voice audio. When the fluctuation range of the sampling point data in the acquired awakening voice audio exceeds a preset threshold value, a method of calculating an Average Value (AVG) is adopted; and when the fluctuation range of the sampling point data does not exceed the preset threshold, calculating the energy value of the awakening voice audio by adopting a method of calculating Root Mean Square (RMS) to obtain a first volume value of the awakening voice audio.
Based on the above process, the determining module is further configured to store the second volume value into an audio database, where the audio database is configured to store all volume values of response audio sent by the voice interaction device; periodically traversing all volume values of the response audio stored in the audio database according to a first preset period to determine a third volume value with the highest occurrence frequency in all volume values; determining the third volume value as a default volume value for the voice interaction device.
In order to bring better use experience to users, a default volume value is set for the voice interaction equipment, but if the default volume value is fixed and unchangeable, the requirement of the users in most time cannot be met, therefore, the volume value of the response audio corresponding to the voice audio awakened each time is stored in the audio database, then the volume value data stored in the audio database is regularly traversed according to a preset period, the volume value with the highest occurrence frequency is found out, the volume value is set as a third volume value, and the default volume value of the voice interaction equipment is updated to the third volume value, so that the default volume value of the voice interaction equipment is always the volume value which is most suitable for the requirement of the users.
In addition, the determining module is further configured to control the voice interaction device to respond to the wake-up voice audio based on the response voice of the default volume value when the wake-up voice audio received next time is an invalid wake-up voice audio.
When the received awakening voice audio is judged to be invalid awakening voice audio, namely under the condition that the user is awakened by mistake, the voice interaction equipment still needs to give voice response to the user, but the volume value of the response voice cannot be adjusted according to the volume value of the awakening voice audio, so that the voice interaction equipment can update the default volume value based on the user requirement and respond to the awakening voice audio based on the response voice of the updated default volume value.
Based on the process, the determining module collects at least two interactive audios sent by the first object to the voice interaction equipment according to a second preset period; determining a distance change of the first object from the voice interaction device according to comparing volume values of the at least two interaction audios; and adjusting the volume value of the current interactive audio of the voice interactive equipment according to the distance change.
In the actual application process, a user may interact with the voice interaction device while handling other things, and the position of the user may be in a constantly changing process, so as to prevent the user from influencing the use experience due to too small or too large interaction volume of the voice interaction device when the user is far away from or close to the voice interaction device; the voice interaction equipment collects at least two interactive audios sent by the user to the voice interaction equipment according to a second preset period, the distance change of the user relative to the voice interaction equipment is determined by comparing the volume values of the at least two interactive audios, and the volume value of the current interactive audio of the voice interaction equipment is adjusted according to the obtained distance change.
Further, the determining module is further configured to compare the received current interactive audio volume value with the received last interactive audio volume value; determining that the first object is far away from the voice interaction device if the received current interaction audio volume value is smaller than the received last interaction audio volume value; determining that the first object is close to the voice interaction device if the received current interaction audio volume value is greater than the received last interaction audio volume value.
It can be understood that when the user uses the menu function, the user can prepare food materials in the kitchen, so that the distance between the food materials and the voice interaction device is far, if the voice interaction device interacts with the user according to the volume value calculated according to the awakening voice audio, the user cannot hear the content played by the voice module, and bad use experience can be generated; therefore, the voice interaction device can acquire the volume value of the interaction audio sent by the current user when the current user is in the kitchen, compare the volume value with the volume value corresponding to the awakening voice audio, and judge that the user is far away from the voice interaction device if the current volume value is smaller than the volume value corresponding to the awakening voice audio; and if the current volume value is larger than the volume value corresponding to the awakening voice audio, judging that the user is close to the voice interaction equipment. The voice interaction device automatically adjusts the interaction volume to be increased, so that the user can clearly hear the response voice of the voice device in the kitchen.
Based on the above process, the determining module is further configured to increase the volume value of the current interaction audio of the voice interaction device to a fourth volume value if the distance change indicates that the first object is far away from the voice interaction device; and under the condition that the distance change indicates that the first object is close to the voice interaction equipment, reducing the volume value of the current interaction audio of the voice interaction equipment to a fifth volume value, wherein the fourth volume value and the fifth volume value are both calculated through a second linear relation, and the second linear relation is used for indicating the functional relation between the distance change and the volume value.
After judging that the user is far away from or close to the voice interaction equipment, the volume value of the voice interaction equipment can be automatically and correspondingly adjusted, and when the user is far away from the voice interaction equipment, the volume value of the current interaction audio frequency of the voice interaction equipment is increased to a fourth volume value; and when the user is judged to be close to the voice interaction equipment, reducing the volume value of the current interaction audio of the voice interaction equipment to a fifth volume value, and calculating a fourth volume value and the fifth volume value by a second linear relation, wherein the second linear relation is used for indicating the functional relation between the distance change and the volume value.
Embodiments of the present application also provide a storage medium including a stored program, where the program performs any one of the methods described above when executed.
Alternatively, in the present embodiment, the storage medium may be configured to store program codes for performing the following steps:
s1, determining a first volume value of the awakening voice audio under the condition that the awakening voice audio is judged to be effective;
s2, calculating a second volume value corresponding to the first volume value according to the first linear relation, and controlling the voice interaction equipment to play a response voice audio corresponding to the awakening voice audio by the second volume value; wherein, the difference value between the first volume value and the second volume value is smaller than a first preset threshold value.
Embodiments of the present application further provide an electronic device, comprising a memory in which a computer program is stored and a processor configured to execute the computer program to perform the steps in any of the method embodiments described above.
Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:
s1, determining a first volume value of the awakening voice audio under the condition that the awakening voice audio is judged to be effective;
s2, calculating a second volume value corresponding to the first volume value according to the first linear relation, and controlling the voice interaction equipment to play a response voice audio corresponding to the awakening voice audio by the second volume value; wherein, the difference value between the first volume value and the second volume value is smaller than a first preset threshold value.
Optionally, in this embodiment, the storage medium may include, but is not limited to: various media capable of storing program codes, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.
Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments and optional implementation manners, and this embodiment is not described herein again.
It will be apparent to those skilled in the art that the modules or steps of the present application described above may be implemented in a general purpose computing system, centralized on a single computing system or distributed across a network of computing systems, or alternatively implemented in program code that is executable by a computing system, such that the steps shown and described may be executed by a computing system stored in a memory system and, in some cases, executed out of order, or fabricated separately as individual integrated circuit modules, or fabricated as a single integrated circuit module. Thus, the present application is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the principle of the present application shall be included in the protection scope of the present application.

Claims (10)

1. A response method for waking up voice audio, comprising:
under the condition that the awakening voice audio is judged to be effective, determining a first volume value of the awakening voice audio;
calculating a second volume value corresponding to the first volume value according to the first linear relation, and controlling the voice interaction equipment to play a response voice audio corresponding to the awakening voice audio by using the second volume value; wherein, the difference value between the first volume value and the second volume value is smaller than a first preset threshold value.
2. The method of claim 1, wherein determining the first volume value of the wake-up voice audio comprises:
determining a target volume value calculation method from a plurality of volume value calculation methods, wherein the plurality of volume value calculation methods include at least one of: calculating the root mean square RMS of all sampling points in the awakening voice audio and calculating the average value AVG of all sampling points in the awakening voice audio;
and calculating the energy value of the awakening voice audio by the target volume value calculation method to obtain a first volume value of the awakening voice audio.
3. A method of responding to wake-up speech audio according to claim 1, wherein after determining a second volume value corresponding to the first volume value, the method further comprises:
storing the second volume value into an audio database, wherein the audio database is used for storing all volume values of response audio sent by the voice interaction equipment;
periodically traversing all volume values of the response audio stored in the audio database according to a first preset period to determine a third volume value with the highest occurrence frequency in all volume values;
determining the third volume value as a default volume value for the voice interaction device.
4. A method for waking up voice audio according to claim 3, wherein after determining the third volume value as a default volume value of the voice interaction device, the method further comprises:
and controlling the voice interaction equipment to respond to the awakening voice audio based on the response voice of the default volume value under the condition that the awakening voice audio received next time is invalid.
5. The method of claim 1, wherein after controlling the voice interaction device to respond to the wake-up voice audio based on a response voice, the method further comprises:
acquiring at least two interactive audios sent by a first object to the voice interaction equipment according to a second preset period;
determining a distance change of the first object from the voice interaction device according to comparing volume values of the at least two interaction audios;
and adjusting the volume value of the current interactive audio of the voice interactive equipment according to the distance change.
6. The method for responding to wake-up voice audio according to claim 5, wherein determining the distance variation of the first object from the voice interaction device according to the volume values of the at least two interaction audios comprises:
comparing the received current interactive audio volume value with the received last interactive audio volume value;
determining that the first object is far away from the voice interaction device if the received current interaction audio volume value is smaller than the received last interaction audio volume value;
determining that the first object is close to the voice interaction device if the received current interaction audio volume value is greater than the received last interaction audio volume value.
7. The method for responding to wake-up voice audio according to claim 5, wherein adjusting the volume value of the current interactive audio of the voice interactive device according to the distance variation comprises:
if the distance change indicates that the first object is far away from the voice interaction device, increasing the volume value of the current interaction audio of the voice interaction device to a fourth volume value;
and under the condition that the distance change indicates that the first object is close to the voice interaction equipment, reducing the volume value of the current interaction audio of the voice interaction equipment to a fifth volume value, wherein the fourth volume value and the fifth volume value are both calculated through a second linear relation, and the second linear relation is used for indicating the functional relation between the distance change and the volume value.
8. A response device for waking up voice audio, comprising:
the determining module is used for determining a first volume value of the awakening voice audio under the condition that the awakening voice audio is judged to be the effective awakening voice audio;
the computing module is used for computing a second volume value corresponding to the first volume value according to the first linear relation and controlling the voice interaction equipment to play a response voice audio corresponding to the awakening voice audio by the second volume value; wherein, the difference value between the first volume value and the second volume value is smaller than a first preset threshold value.
9. A computer-readable storage medium, comprising a stored program, wherein the program is operable to perform the method of any one of claims 1 to 7.
10. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method of any of claims 1 to 7 by means of the computer program.
CN202210152710.0A 2022-02-18 2022-02-18 Response method and device for awakening voice audio, storage medium and electronic device Pending CN114694647A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210152710.0A CN114694647A (en) 2022-02-18 2022-02-18 Response method and device for awakening voice audio, storage medium and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210152710.0A CN114694647A (en) 2022-02-18 2022-02-18 Response method and device for awakening voice audio, storage medium and electronic device

Publications (1)

Publication Number Publication Date
CN114694647A true CN114694647A (en) 2022-07-01

Family

ID=82136682

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210152710.0A Pending CN114694647A (en) 2022-02-18 2022-02-18 Response method and device for awakening voice audio, storage medium and electronic device

Country Status (1)

Country Link
CN (1) CN114694647A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108091330A (en) * 2017-12-13 2018-05-29 北京小米移动软件有限公司 Output sound intensity adjusting method, device, electronic equipment and storage medium
CN108200486A (en) * 2017-12-25 2018-06-22 出门问问信息科技有限公司 The dynamic adjusting method and device of a kind of volume
CN109361969A (en) * 2018-10-29 2019-02-19 歌尔科技有限公司 A kind of audio frequency apparatus and its volume adjusting method, device, equipment, medium
CN110806849A (en) * 2019-10-30 2020-02-18 歌尔科技有限公司 Intelligent device, volume adjusting method thereof and computer-readable storage medium
CN111090412A (en) * 2019-12-18 2020-05-01 北京声智科技有限公司 Volume adjusting method and device and audio equipment
US20210366262A1 (en) * 2018-01-23 2021-11-25 Sony Corporation Reminder method and apparatus and electronic device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108091330A (en) * 2017-12-13 2018-05-29 北京小米移动软件有限公司 Output sound intensity adjusting method, device, electronic equipment and storage medium
CN108200486A (en) * 2017-12-25 2018-06-22 出门问问信息科技有限公司 The dynamic adjusting method and device of a kind of volume
US20210366262A1 (en) * 2018-01-23 2021-11-25 Sony Corporation Reminder method and apparatus and electronic device
CN109361969A (en) * 2018-10-29 2019-02-19 歌尔科技有限公司 A kind of audio frequency apparatus and its volume adjusting method, device, equipment, medium
CN110806849A (en) * 2019-10-30 2020-02-18 歌尔科技有限公司 Intelligent device, volume adjusting method thereof and computer-readable storage medium
CN111090412A (en) * 2019-12-18 2020-05-01 北京声智科技有限公司 Volume adjusting method and device and audio equipment

Similar Documents

Publication Publication Date Title
CN108847219B (en) Awakening word preset confidence threshold adjusting method and system
CN106126167B (en) A kind of sound effect treatment method and terminal device
CN110782891B (en) Audio processing method and device, computing equipment and storage medium
CN109089156B (en) Sound effect adjusting method and device and terminal
CN107155133B (en) Volume adjusting method, audio playing terminal and computer readable storage medium
CN108335700B (en) Voice adjusting method and device, voice interaction equipment and storage medium
CN109408663A (en) The playback method and device for music of sleeping
CN107908388A (en) Method for controlling volume and device, computer installation and computer-readable recording medium
CN110806849A (en) Intelligent device, volume adjusting method thereof and computer-readable storage medium
CN101795323A (en) Electronic alarm operation method, electronic alarm and mobile communication terminal
CN111968644A (en) Intelligent device awakening method and device and electronic device
CN110677774B (en) Volume self-adaptive adjusting method and device, computer equipment and storage medium
CN110767225B (en) Voice interaction method, device and system
CN113531844B (en) Control method and system for noise reduction of air conditioner, electronic equipment and storage medium
CN108932947B (en) Voice control method and household appliance
CN112837686A (en) Wake-up response operation execution method and device, storage medium and electronic device
CN108597520A (en) A kind of control method of Intelligent socket and Intelligent socket
CN109150675A (en) A kind of exchange method and device of household electrical appliance
CN107526569B (en) A kind of volume adjusting method, device, storage medium and mobile terminal
CN106375809B (en) Volume adjusting method and device and storage medium
CN108810614A (en) Method for regulation of sound volume, system and readable storage medium storing program for executing
CN108234738A (en) volume adjusting method, device, terminal and storage medium
CN114694647A (en) Response method and device for awakening voice audio, storage medium and electronic device
CN115985323B (en) Voice wakeup method and device, electronic equipment and readable storage medium
CN112837694B (en) Equipment awakening method and device, storage medium and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination