CN115171703A - Distributed voice awakening method and device, storage medium and electronic device - Google Patents

Distributed voice awakening method and device, storage medium and electronic device Download PDF

Info

Publication number
CN115171703A
CN115171703A CN202210603410.XA CN202210603410A CN115171703A CN 115171703 A CN115171703 A CN 115171703A CN 202210603410 A CN202210603410 A CN 202210603410A CN 115171703 A CN115171703 A CN 115171703A
Authority
CN
China
Prior art keywords
group
devices
equipment
noise
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210603410.XA
Other languages
Chinese (zh)
Other versions
CN115171703B (en
Inventor
邓邱伟
郝斌
王迪
张丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Haier Technology Co Ltd
Qingdao Haier Intelligent Home Appliance Technology Co Ltd
Haier Smart Home Co Ltd
Original Assignee
Qingdao Haier Technology Co Ltd
Qingdao Haier Intelligent Home Appliance Technology Co Ltd
Haier Smart Home Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao Haier Technology Co Ltd, Qingdao Haier Intelligent Home Appliance Technology Co Ltd, Haier Smart Home Co Ltd filed Critical Qingdao Haier Technology Co Ltd
Priority to CN202210603410.XA priority Critical patent/CN115171703B/en
Publication of CN115171703A publication Critical patent/CN115171703A/en
Priority to PCT/CN2023/085259 priority patent/WO2023231552A1/en
Application granted granted Critical
Publication of CN115171703B publication Critical patent/CN115171703B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • G10L17/24Interactive procedures; Man-machine interfaces the user being prompted to utter a password or a predefined phrase
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/20Pattern transformations or operations aimed at increasing system robustness, e.g. against channel noise or different working conditions
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • G10L21/028Voice signal separating using properties of sound source
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)

Abstract

The application discloses a distributed voice awakening method and device, a storage medium and an electronic device, which relate to the technical field of smart families and comprise the following steps: under the condition that the first group of equipment receives the first awakening audio, acquiring an original signal generated by the first awakening audio of each equipment in the first group of equipment to obtain a first group of original signals, acquiring feedback information of each equipment in the first group of equipment to the first awakening audio to obtain a first group of feedback information, determining equipment with an interactive function awakened from the first group of equipment according to the first group of feedback information to obtain a second group of equipment, and determining the original signal generated by the second group of equipment in the first group of original signals to obtain a second group of original signals; determining a target noise elimination mode from a preset group of noise elimination modes according to the number of the equipment in the second group of equipment; and performing noise elimination processing on the second group of original signals by using a target noise elimination mode to obtain a group of noise reduction signals, and determining target equipment in the second group of equipment according to the group of noise reduction signals.

Description

Distributed voice awakening method and device, storage medium and electronic device
Technical Field
The application relates to the technical field of smart homes, in particular to a distributed voice awakening method and device, a storage medium and an electronic device.
Background
In the related art, with the development of artificial intelligence technology, more and more intelligent voice devices enter common families. After the wake-up module devices are configured in a scene, and after a user speaks a wake-up word, intelligent devices such as a television, an air conditioner and a refrigerator speak "I am" at the same time. Networking all the devices by using the Internet of things technology, intelligently judging according to dimensions such as the distance and the orientation between a user and the sound box through an intelligent perception algorithm, and only one device responds and interacts with the user after the user speaks the awakening word, so that other devices are kept quiet. In quiet scenes, the amplitude/energy of the signal may be used as a criterion. Due to the attenuation of the acoustic wave, the signal amplitude of the device at close distance is greater than that of the remote device. When a complex scene appears and a certain device broadcasts automatically, a user speaks a wakeup word, at the moment, for the broadcasting device, echo belongs to self noise, and for other devices, echo belongs to external noise, and a device which cannot accurately determine response simply takes the amplitude/energy of a signal as a judgment criterion.
Aiming at the problems that in the related art, under a complex scene, a responding device cannot be determined from a plurality of devices accurately and quickly, and the like, an effective solution is not provided.
Disclosure of Invention
The embodiment of the application provides a distributed voice awakening method and device, a storage medium and an electronic device, and aims to at least solve the problems that in the related art, under a complex scene, responding equipment cannot be accurately and quickly determined from multiple pieces of equipment and the like.
According to an embodiment of the present application, a distributed voice wake-up method is provided, including: under the condition that it is determined that a first group of devices receives a first wake-up audio, acquiring an original signal generated by each device in the first group of devices according to the first wake-up audio to obtain a first group of original signals in total, and acquiring feedback information of each device in the first group of devices for the first wake-up audio to obtain a first group of feedback information in total, wherein the first group of devices are devices in the same network, the feedback information is used for indicating whether the corresponding device in the first group of devices responds to the first wake-up audio to wake up an interactive function, and the original signal is an audio signal converted by the device after the first wake-up audio is received; determining equipment awakening the interactive function from the first group of equipment according to the first group of feedback information to obtain a second group of equipment in total, and determining original signals generated by the second group of equipment from the first group of original signals to obtain a second group of original signals in total; determining a target noise elimination mode from a preset group of noise elimination modes according to the number of the equipment in the second group of equipment; carrying out noise elimination processing on the second group of original signals by using a target noise elimination mode to obtain a group of noise reduction signals; and determining target equipment in the second group of equipment according to the group of noise reduction signals, controlling the target equipment to play a second audio corresponding to the first awakening audio, and controlling equipment except the second target equipment in the second group of equipment to mute.
In an exemplary embodiment, determining a target noise cancellation scheme from a preset set of noise cancellation schemes based on the number of devices in the second set of devices comprises: determining a first noise elimination mode from a group of noise elimination modes under the condition that the number of devices in a second group of devices is greater than or equal to a first preset threshold value, wherein the first noise elimination mode is used for filtering self-noise signals from a second group of original signals through a preset first adaptive filter and filtering external noise signals from the second group of original signals through a second adaptive filter, and the second adaptive filter is a filter generated according to beam forming between the second group of devices; and determining a second noise elimination mode from the group of noise elimination modes under the condition that the number of the devices in the second group of devices is less than a first preset threshold, wherein the second noise elimination mode is used for filtering self-noise signals from a second group of original signals through a first adaptive filter and filtering external noise signals determined by sound source separation between the second group of devices from the second group of original signals.
In an exemplary embodiment, after determining the first noise cancellation scheme from the set of noise cancellation schemes, the method further comprises: counting a first energy value of a target noise reduction signal corresponding to the second group of equipment after the first noise signal elimination mode processing; determining a second energy value corresponding to the external noise signal in the second group of original signals filtered by the first noise signal elimination mode; determining that the target object and the second group of equipment are positioned at the same angle under the condition that the difference value of the first energy value and the second energy value is lower than a second preset threshold value; and adding an estimated signal in a second group of original signals of the target processed in the first noise signal elimination mode, wherein the estimated signal is a preset signal for balancing signal cancellation.
In an exemplary embodiment, in a case that the number of devices in the second set of devices is greater than or equal to a first preset threshold, before determining the first noise cancellation mode from the set of noise cancellation modes, the method further comprises: determining the position information of each device in the second group of devices in the target area to obtain a group of position information; determining a relative position between each two devices in the second group of devices through a group of position information; a second adaptive filter is determined for each device in the second set of devices based on the relative position.
In one exemplary embodiment, determining the relative position between each two devices in the second set of devices through a set of position information comprises that each device in the second set of devices successively enters a calibration mode, and the relative direction between each device and other devices is determined according to the set of position information; performing beam forming based on the relative direction to obtain a first estimated external noise and a second estimated external noise between every two devices in the second group of devices; in the case where the first and second estimated external noises are the same, the relative position between each two devices in the second set of devices is determined.
In an exemplary embodiment, in case that the number of devices in the second set of devices is smaller than the first preset threshold, before determining the second noise cancellation mode from the set of noise cancellation modes, the method further comprises: decomposing the second group of original signals into a first sub-signal and a second sub-signal through a target algorithm; calculating a third energy value corresponding to the first sub-signal and a fourth energy value corresponding to the second sub-signal; and determining the sub-signals approaching the target energy value in the third energy value and the fourth energy value as echo signals corresponding to the second group of original signals, and determining external noise signals to be filtered in the second group of original signals based on the echo signals.
In one exemplary embodiment, determining the target device in the second set of devices based on the set of noise reduction signals includes: under the condition that each device in the second group of devices has the noise reduction signal, determining a target amplitude of the noise reduction signal corresponding to each device in the second group of devices to obtain a plurality of target amplitudes corresponding to the second group of devices; and sequentially arranging the target amplitudes from large to small, selecting the equipment with the maximum target amplitude as response equipment, and determining the target equipment from the second group of equipment by using the response equipment so as to interact with the target object emitting the first awakening audio.
According to another embodiment of the present application, there is also provided a distributed voice wake-up apparatus, including: the acquisition module is used for acquiring an original signal generated by each device in the first group of devices according to the first wake-up audio under the condition that the first group of devices receives the first wake-up audio, so as to obtain a first group of original signals in total, and acquiring feedback information of each device in the first group of devices for the first wake-up audio so as to obtain a first group of feedback information in total, wherein the first group of devices are devices in the same network, the feedback information is used for indicating whether the corresponding device in the first group of devices responds to the first wake-up audio wake-up interactive function, and the original signal is an audio signal converted after the first wake-up audio is received by the device; the first determining module is used for determining equipment awakening the interactive function from the first group of equipment according to the first group of feedback information to obtain a second group of equipment, determining original signals generated by the second group of equipment from the first group of original signals to obtain a second group of original signals; a second determining module, configured to determine a target noise cancellation mode from a preset group of noise cancellation modes according to the number of devices in the second group of devices; the processing module is used for carrying out noise elimination processing on the second group of original signals by using a target noise elimination mode to obtain a group of noise reduction signals; and the control module is used for determining target equipment in the second group of equipment according to the group of noise reduction signals, controlling the target equipment to play a second audio corresponding to the first awakening audio and controlling the equipment except the second target equipment in the second group of equipment to mute.
In an exemplary embodiment, the second determining module is further configured to determine a first noise cancellation mode from a set of noise cancellation modes when the number of devices in the second set of devices is greater than or equal to a first preset threshold, where the first noise cancellation mode is used to filter the self-noise signal from the second set of original signals through a preset first adaptive filter and filter the external noise signal from the second set of original signals through a second adaptive filter, and the second adaptive filter is a filter generated according to beam forming between the second set of devices; and under the condition that the number of the devices in the second group of devices is less than a first preset threshold value, determining a second noise elimination mode from the group of noise elimination modes, wherein the second noise elimination mode is used for filtering self-noise signals from a second group of original signals through a first adaptive filter, and filtering external noise signals determined by sound source separation between the second group of devices from the second group of original signals.
In an exemplary embodiment, the second determining module further includes: the adding unit is used for counting a first energy value of a target noise reduction signal corresponding to the second group of equipment after the first noise signal elimination mode processing; determining a second energy value corresponding to the external noise signal in the second group of original signals filtered by the first noise signal elimination mode; determining that the target object and the second group of equipment are positioned at the same angle under the condition that the difference value of the first energy value and the second energy value is lower than a second preset threshold value; and adding an estimated signal in a second group of original signals of the target processed in the first noise signal elimination mode, wherein the estimated signal is a preset signal for balancing signal cancellation.
In an exemplary embodiment, the second determining module is further configured to determine location information of each device in the second set of devices in the target area, so as to obtain a set of location information; determining the relative position between every two devices in the second group of devices through one group of position information; a second adaptive filter is determined for each device in the second set of devices based on the relative position.
In an exemplary embodiment, the second determining module is further configured to successively enter a calibration mode for each device in the second set of devices, and determine a relative direction between each device and the other device according to a set of location information; performing beam forming based on the relative direction to obtain a first estimated external noise and a second estimated external noise between every two devices in the second group of devices; in the case where the first and second estimated external noises are the same, the relative position between each two devices in the second set of devices is determined.
In an exemplary embodiment, the second determining module further includes: the comparison unit is used for decomposing the second group of original signals into a first sub-signal and a second sub-signal through a target algorithm; calculating a third energy value corresponding to the first sub-signal and a fourth energy value corresponding to the second sub-signal; and determining the sub-signals approaching the target energy value in the third energy value and the fourth energy value as echo signals corresponding to the second group of original signals, and determining external noise signals to be filtered in the second group of original signals based on the echo signals.
In an exemplary embodiment, the control module is further configured to determine, when each device in the second group of devices has a noise reduction signal, a target amplitude of the noise reduction signal corresponding to each device in the second group of devices, to obtain a plurality of target amplitudes corresponding to the second group of devices; and sequentially arranging the target amplitudes from large to small, selecting the equipment with the maximum target amplitude as response equipment, and determining the target equipment from the second group of equipment by using the response equipment so as to interact with the target object emitting the first awakening audio.
According to another aspect of the embodiments of the present application, there is also provided a computer-readable storage medium, in which a computer program is stored, where the computer program is configured to execute the distributed voice wakeup method when running.
According to another aspect of the embodiments of the present application, there is also provided an electronic apparatus, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the distributed voice wake-up method through the computer program.
In the embodiment of the application, under the condition that it is determined that a first group of devices receives a first wake-up audio, obtaining original signals generated by each device in the first group of devices according to the first wake-up audio to obtain a first group of original signals in total, and obtaining feedback information of each device in the first group of devices for the first wake-up audio to obtain a first group of feedback information in total, where the first group of devices are devices in the same network, the feedback information is used to indicate whether a corresponding device in the first group of devices responds to a first wake-up audio wake-up interaction function, and the original signals are audio signals converted after the first wake-up audio is received by the devices; determining equipment awakening the interactive function from the first group of equipment according to the first group of feedback information to obtain a second group of equipment in total, and determining original signals generated by the second group of equipment from the first group of original signals to obtain a second group of original signals in total; determining a target noise elimination mode from a preset group of noise elimination modes according to the number of the equipment in the second group of equipment; carrying out noise elimination processing on the second group of original signals by using a target noise elimination mode to obtain a group of noise reduction signals; according to a group of noise reduction signals, determining target equipment in a second group of equipment, controlling the target equipment to play a second audio corresponding to the first awakening audio, and controlling equipment except the second target equipment in the second group of equipment to be muted, namely determining an original signal of equipment which feeds back a first awakening audio signal sent by a target object in a plurality of corresponding equipment in a target area in a distributed processing scene, performing noise elimination processing on the original signal to obtain a corresponding noise reduction signal, and determining final target equipment from the plurality of target equipment with feedback through the noise reduction signal; by adopting the technical scheme, the problems that in the related art, the responding equipment cannot be accurately and quickly determined from the multiple equipment in a complex scene and the like are solved, the target equipment which is finally interacted with the target object can be determined from the multiple equipment with responses in the complex scene, and the effectiveness of the subsequent scheme taking the amplitude/energy of the signal as the judgment criterion is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art to obtain other drawings without inventive labor.
FIG. 1 is a diagram of a hardware environment of a distributed voice wake-up method according to an embodiment of the present application;
FIG. 2 is a flow chart of a distributed voice wake-up method according to an embodiment of the present application;
FIG. 3 is a flow chart illustrating a calculation of an optional beamforming according to an alternative embodiment of the present application;
FIG. 4 is a flow chart of a calculation of a selected sound source separation according to an alternative embodiment of the present application;
fig. 5 is a block diagram of an alternative distributed voice wake-up apparatus according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be implemented in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
According to one aspect of the embodiment of the application, a distributed voice wake-up method is provided. The distributed voice awakening method is widely applied to full-House intelligent digital control application scenes such as intelligent homes (Smart Home), intelligent homes, intelligent Home equipment ecology, intelligent House (Intelligent House) ecology and the like. Alternatively, in the present embodiment, the distributed voice wakeup method described above may be applied to a hardware environment formed by the terminal apparatus 102, the server 104, and the image pickup apparatus 106 as shown in fig. 1. As shown in fig. 1, the server 104 is connected to the terminal device 102 through a network, and may be configured to provide a service (e.g., an application service) for the terminal or a client installed on the terminal, provide a database on or independent of the server for providing a data storage service for the server 104, and configure a cloud computing and/or edge computing service on or independent of the server for providing a data operation service for the server 104.
The network may include, but is not limited to, at least one of: wired networks, wireless networks. The wired network may include, but is not limited to, at least one of: wide area networks, metropolitan area networks, local area networks, which may include, but are not limited to, at least one of the following: WIFI (Wireless Fidelity), bluetooth. Terminal equipment 102 can be but not limited to be PC, the cell-phone, the panel computer, intelligent air conditioner, intelligent cigarette machine, intelligent refrigerator, intelligent oven, intelligent kitchen range, intelligent washing machine, intelligent water heater, intelligent washing equipment, intelligent dish washer, intelligent projection equipment, intelligent TV, intelligent clothes hanger, intelligent (window) curtain, intelligence audio-visual, smart jack, intelligent stereo set, intelligent audio amplifier, intelligent new trend equipment, intelligent kitchen guarding equipment, intelligent bathroom equipment, intelligence robot of sweeping the floor, intelligence robot of wiping the window, intelligence robot of mopping the ground, intelligent air purification equipment, intelligent steam ager, intelligent microwave oven, intelligent kitchen is precious, intelligent clarifier, intelligent water dispenser, intelligent lock etc..
In this embodiment, a distributed voice wake-up method is provided, which is applied to the above-mentioned image capturing apparatus, and fig. 2 is a flowchart of an optional distributed voice wake-up method according to an embodiment of the present application, where the flowchart includes the following steps:
step S202, under the condition that it is determined that a first group of devices receives a first wake-up audio, obtaining an original signal generated by each device in the first group of devices according to the first wake-up audio to obtain a first group of original signals in common, and obtaining feedback information of each device in the first group of devices for the first wake-up audio to obtain a first group of feedback information in common, wherein the first group of devices are devices in the same network, the feedback information is used for indicating whether the corresponding device in the first group of devices responds to the first wake-up audio wake-up interaction function, and the original signal is an audio signal converted after the first wake-up audio received by the device;
step S204, determining the equipment awakening the interactive function from the first group of equipment according to the first group of feedback information to obtain a second group of equipment, and determining the original signals generated by the second group of equipment from the first group of original signals to obtain a second group of original signals;
step S206, determining a target noise elimination mode from a preset group of noise elimination modes according to the number of the equipment in the second group of equipment;
step S208, carrying out noise elimination processing on the second group of original signals by using the target noise elimination mode to obtain a group of noise reduction signals;
step S210, according to the group of noise reduction signals, determining a target device in the second group of devices, controlling the target device to play a second audio corresponding to the first wake-up audio, and controlling devices in the second group of devices except the second target device to mute.
Through the above steps, under the condition that it is determined that the first group of devices receives the first wake-up audio, obtaining an original signal generated by each device in the first group of devices according to the first wake-up audio to obtain a first group of original signals in common, and obtaining feedback information of each device in the first group of devices for the first wake-up audio to obtain a first group of feedback information in common, wherein the first group of devices are devices in the same network, the feedback information is used for indicating whether the corresponding device in the first group of devices responds to the first wake-up audio wake-up interactive function, and the original signal is an audio signal converted by the device after receiving the first wake-up audio; determining equipment wakening the interactive function from the first group of equipment according to the first group of feedback information to obtain a second group of equipment in total, and determining original signals generated by the second group of equipment from the first group of original signals to obtain a second group of original signals in total; determining a target noise elimination mode from a preset group of noise elimination modes according to the number of the equipment in the second group of equipment; carrying out noise elimination processing on the second group of original signals by using a target noise elimination mode to obtain a group of noise reduction signals; according to a group of noise reduction signals, determining target equipment in a second group of equipment, controlling the target equipment to play a second audio corresponding to the first awakening audio, and controlling equipment except the second target equipment in the second group of equipment to be muted, namely determining an original signal of equipment which feeds back a first awakening audio signal sent by a target object in a plurality of corresponding equipment in a target area in a distributed processing scene, carrying out noise elimination processing on the original signal to obtain a corresponding noise reduction signal, and determining final target equipment from the plurality of target equipment with feedback through the noise reduction signal; by adopting the technical scheme, the problems that in the related art, the responding equipment cannot be accurately and quickly determined from the multiple equipment in a complex scene and the like are solved, the target equipment which is finally interacted with the target object can be determined from the multiple equipment with responses in the complex scene, and the effectiveness of the subsequent scheme taking the amplitude/energy of the signal as the judgment criterion is improved.
In an exemplary embodiment, determining a target noise cancellation scheme from a preset set of noise cancellation schemes based on the number of devices in the second set of devices comprises: determining a first noise elimination mode from a group of noise elimination modes under the condition that the number of the devices in the second group of devices is greater than or equal to a first preset threshold, wherein the first noise elimination mode is used for filtering self-noise signals from a second group of original signals through a preset first adaptive filter and filtering external noise signals from the second group of original signals through a second adaptive filter, and the second adaptive filter is a filter generated according to beam forming among the second group of devices; and determining a second noise elimination mode from the group of noise elimination modes under the condition that the number of the devices in the second group of devices is less than a first preset threshold, wherein the second noise elimination mode is used for filtering self-noise signals from a second group of original signals through a first adaptive filter and filtering external noise signals determined by sound source separation between the second group of devices from the second group of original signals.
Optionally, the external noise signal refers to a signal corresponding to noise received by a device when other devices in a group in the same group network broadcast by themselves. The self-noise signal is a signal corresponding to noise generated by the equipment when the equipment runs.
In an exemplary embodiment, after determining the first noise cancellation scheme from the set of noise cancellation schemes, the method further comprises: counting a first energy value of a target noise reduction signal corresponding to the second group of equipment after the first noise signal elimination mode is processed; determining a second energy value corresponding to the external noise signal in the second group of original signals filtered by the first noise signal elimination mode; determining that the target object and the second group of equipment are positioned at the same angle under the condition that the difference value of the first energy value and the second energy value is lower than a second preset threshold value; and adding an estimated signal in a second group of original signals of the target processed in the first noise signal elimination mode, wherein the estimated signal is a preset signal for balancing signal cancellation.
It should be noted that, the beam forming is used in a multi-microphone scene, and may be a case where the positioning of the sound source of multiple microphones is relatively accurate, or a case where the beam effect of multiple microphones is good, such as a case where the main lobe width is small.
Alternatively, if the user (corresponding to the target object in the embodiment of the present invention) and the device a are at the same angle, the estimated external noise may have a wake-up word, resulting in signal cancellation. At this point, the statistical energy may be calculated using the microphone signal. In the method for judging, optionally, the microphone signal and the signal for estimating the external noise are respectively connected to a wake-up module, and when the score of the latter is close to or even larger than that of the former, the user and the device a can be considered to be at the same angle.
In an exemplary embodiment, in the case that the number of devices in the second group of devices is greater than or equal to the first preset threshold, before determining the first noise cancellation mode from the group of noise cancellation modes, the method further includes: determining the position information of each device in the second group of devices in the target area to obtain a group of position information; determining the relative position between every two devices in the second group of devices through one group of position information; a second adaptive filter is determined for each device in the second set of devices based on the relative position.
In one exemplary embodiment, determining the relative position between each two devices in the second set of devices through a set of position information comprises that each device in the second set of devices successively enters a calibration mode, and the relative direction between each device and other devices is determined according to the set of position information; performing beam forming based on the relative direction to obtain a first estimated external noise and a second estimated external noise between every two devices in the second group of devices; in the case where the first and second estimated external noises are the same, the relative position between each two devices in the second set of devices is determined.
Forming a wave beam; the beam forming combines and processes the multi-path signals, can restrain interference signals in a non-target direction and enhance sound signals in a target direction. An alternative embodiment of the invention suggests that the external noise cancellation method uses a beam forming method when the number of microphones of the device is 4 or more. The process is as follows: the method comprises the following steps: and calibrating the user terminal. When the relative position of the devices changes, the propagation paths of the devices a to B also change. Once the position changes, the custom calibration module can be turned on: after the device is started, the device A automatically plays a piece of music or the like, and the device B calculates the relative position of the device A. (Note: the sound source location calculation can use music, gcc _ phat, tdoa, aml, etc. algorithms). Step two, signal noise reduction: first, device B performs beamforming in direction a (e.g., mvdr, or gsc structure), and can obtain estimated external noise; the noise is then filtered out of the microphone signal using an adaptive filtering technique (e.g., NLMS).
In an exemplary embodiment, in case that the number of devices in the second set of devices is smaller than the first preset threshold, before determining the second noise cancellation mode from the set of noise cancellation modes, the method further comprises: decomposing the second group of original signals into a first sub-signal and a second sub-signal through a target algorithm; calculating a third energy value corresponding to the first sub-signal and a fourth energy value corresponding to the second sub-signal; and determining the sub-signals approaching the target energy value in the third energy value and the fourth energy value as echo signals corresponding to the second group of original signals, and determining external noise signals to be filtered in the second group of original signals based on the echo signals.
In one exemplary embodiment, determining the target device in the second set of devices based on a set of noise reduction signals includes: under the condition that each device in the second group of devices has the noise reduction signal, determining a target amplitude of the noise reduction signal corresponding to each device in the second group of devices to obtain a plurality of target amplitudes corresponding to the second group of devices; and sequentially arranging the target amplitudes from large to small, selecting the equipment with the maximum target amplitude as response equipment, and determining the target equipment from the second group of equipment by using the response equipment so as to interact with the target object emitting the first awakening audio.
In order to better understand the process of the distributed voice wake-up method, the following describes a flow of the implementation method of the distributed voice wake-up method with reference to an optional embodiment, but the flow is not limited to the technical solution of the embodiment of the present application.
In the related art, in a quiet scene, the amplitude/energy of a signal can be used as a criterion. Due to the attenuation of the acoustic wave, the amplitude of the signal from the near device is greater than that from the far device. However, in complex scenarios, the original signal amplitude is no longer accurate as a criterion. Although the received signal can be represented as y (t) = w (t) + n (t), i.e. the noisy signal is equal to a linear addition of the wake-up word w and the noise. But the noisy signal amplitude is not equal to the linear addition of the two amplitudes,
Figure BDA0003670494210000121
a simple energy calibration cannot be achieved.
Optionally, an optional embodiment of the present invention provides a front-end signal processing apparatus for self-noise scene distributed decision, where the apparatus performs operations such as echo cancellation on a signal of a self-noise device in cooperation with information of multiple devices, performs noise removal on an external noise signal to obtain respective clean signals, and then uses an amplitude as a criterion. Optionally, the front-end signal processing apparatus may include: the device comprises an echo cancellation module, a sound source positioning module, a beam forming module, a sound source separating module, a dereverberation module and the like. The different modules described above are used as follows.
As an optional implementation manner, in the case of self-noise scene distributed decision, before determining a device to be responded by using amplitude/energy of a signal as a criterion for a plurality of voice devices (equivalent to a target device in the foregoing embodiment) in the same network, elimination of a self-noise signal in a generated signal during an interaction process for different voice devices and elimination of an external noise signal included in an original signal corresponding to an audio frequency, which is influenced by the device by other devices, is performed to obtain a clean signal of different voice devices during the interaction process, and then, a final target device for responding to the interaction is determined by using the amplitude as the criterion.
Optionally, the external noise signal refers to a signal corresponding to noise received by a device when other devices in a group in the same group network broadcast by themselves. The self-noise signal is a signal corresponding to noise generated by the equipment when the equipment runs.
For example, assume that device a represents an autopilot device and device B represents a device that receives external noise. When performing corresponding noise signal processing, the following cases are classified:
and (3) a self-noise processing mode: the self-noise signal carried in the original signal in the device is removed by self-noise, specifically, echo removal technology is used. Optionally, the original signal is processed by an adaptive filter of a Multi-delayblock frequency domain adaptive filter, so as to filter out self-noise signals carried in the original signal. It should be noted that, if the self-noise signal in the original signal is a nonlinear echo part, the self-noise signal can be removed by using a model/nonlinear method, and this is not a limitation to the alternative embodiment of the present invention.
External noise processing mode: for the out-of-noise signals carried in the original signal in the device, optionally, beamforming or source separation. There were differences in the calculation procedures using different methods. The method comprises the following specific steps:
the first method is as follows: forming a wave beam; the beam forming combines and processes the multi-path signals, can restrain interference signals in a non-target direction and enhance sound signals in a target direction. An alternative embodiment of the invention suggests that the external noise cancellation method uses a beam forming method when the number of microphones of the device is 4 or more. The process is as follows:
the method comprises the following steps: and calibrating the user terminal. When the relative position of the devices changes, the propagation paths of the devices a to B also change. Once the position changes, the custom calibration module can be turned on: after the device A is started, the device A automatically plays a piece of music or the like, and the device B calculates the relative position of the device A. (Note: the sound source position calculation can use Music, gcc _ phat, tdoa, aml, etc. algorithms).
Step two, signal noise reduction: first, device B performs beamforming in direction a (e.g., mvdr, or gsc structure), and can obtain estimated external noise; the noise is then filtered out of the microphone signal using an adaptive filtering technique (e.g., NLMS).
FIG. 3 is a flow chart illustrating a calculation of selective beam forming according to an alternative embodiment of the present application; the method comprises the following steps:
step S402: performing sound source estimation between the device A and the device B;
step S404: determining original signals corresponding to audios sent by a target object and received by the equipment A and the equipment B respectively;
step S406: performing echo cancellation (AEC) on the original signal to obtain a de-echo signal corresponding to the equipment A and the equipment B;
step S408: it is determined whether there is still echo in device a, and in the case of this determination, the effect of the estimated alien noise generated by device B on the signal in device a is estimated, and the estimated alien noise is filtered from the de-echoed signal of device a using an adaptive filtering technique (e.g., NLMS).
Step S410: and determining clean signals which do not contain noise and correspond to the equipment A and the equipment B respectively, comparing the signal amplitude/energy corresponding to the equipment A and the equipment B, and determining the larger signal amplitude/energy as equipment for finally responding to the user.
It should be noted that, the beam forming is used in a multi-microphone scene, and may be a case where the positioning of the sound source of multiple microphones is relatively accurate, or a case where the beam effect of multiple microphones is good, such as a main lobe width is small.
Alternatively, if the user (corresponding to the target object in the embodiment of the present invention) and the device a are at the same angle, the estimated external noise may have a wake-up word, resulting in signal cancellation. At this point, the statistical energy may be calculated using the microphone signal. In the method for judging, optionally, the microphone signal and the signal for estimating the external noise are respectively connected to a wake-up module, and when the score of the latter is close to or even larger than that of the former, the user and the device a can be considered to be at the same angle.
The second method comprises the following steps: separating a sound source; when the number of the microphones of the equipment is small, the beam forming effect is poor, for example, the main lobe is wide, the pickup range is large, and the estimated external noise contains partial awakening words, so that the signals are offset. In an alternative embodiment of the invention, it is proposed to use the AUX-IVA method for sound source separation when the number of microphones is 2. The method designs complex matrix inversion calculation, the 2 x 2 matrix inversion has an analytic solution, and the calculation amount is small. When the number of the microphones is large, such as 4 × 4 and 6 × 6, the calculation amount is large, and real-time calculation cannot be performed. In addition, the reverberation component has a large influence on sound source separation, and a WPE algorithm can be used for dereverberation firstly.
It can be understood that two paths of output can be obtained by using the AUX-IVA method, one path is noise, and the other path is a clean signal, and due to the problem of replacement, which specific path is noise or clean signal cannot be known. Therefore, the signals of different channels need to be processed and then selected again.
Optionally, when channel selection is performed, each path of signal may be connected to a wake-up module, and one path with a high wake-up score is output as a clean signal, but the method has a large calculation amount. The energy E1, E2 of the two channels can also be calculated; the device A respectively calculates microphone signals mic, the echo-removed signals aec and the energy corresponding to the echo signals spk. This method requires self-calibration, e.g. device a automatically plays a piece of music or otherwise, device B original signal energy E B Affected only by A echo to obtain
Figure BDA0003670494210000141
When in use, the method comprises the following steps: (1) E mic And E spk Are close to, and E aec When the sound is smaller, the playing sound can be judged to be small, and the E1 and the E2 are selected to be larger; (2) Calculating alpha A→B *E spk E1 and E2 are close to alpha A→B *E spk Which may be considered separate echoes.
When awakening trigger exists, counting signal energy in a certain time:
Figure BDA0003670494210000142
wherein, the suffix clean represents the signal after noise reduction, T represents the statistical period, X represents the frequency domain signal after stft, fh is the statistical maximum frequency band, and fl is the lowest frequency band. E of the devices A and B clean The larger responds first.
Figure BDA0003670494210000151
When E is spk Below a certain threshold, the device may be considered to be not playing.
FIG. 4 is a flow chart of a calculation of a selected sound source separation according to an alternative embodiment of the present application; the method comprises the following steps:
step S502: calibrating an echo path between the device A and the device B;
step S504: determining original signals corresponding to audios sent by target objects respectively received by the equipment A and the equipment B;
step S506: performing echo cancellation on the original signal to obtain echo-removed signals corresponding to the equipment A and the equipment B;
step S508: determining whether the equipment A has echoes or not, separating sound sources under the condition of determining the echoes, selecting channels based on the energy determination mode, and determining corresponding echo signals;
step S510: and determining clean signals which do not contain noise and correspond to the equipment A and the equipment B respectively, comparing the signal amplitude/energy corresponding to the equipment A and the equipment B, and determining the side with larger signal amplitude/energy as equipment for finally responding to a user.
In summary, in the optional embodiment of the present invention, information of multiple devices is coordinated, and different external noise removal methods are formulated for different devices; selecting an estimated clean signal by the cooperative wake-up module; when external noise is removed, prior information can be obtained through the self-calibration module to assist subsequent signal processing, and accuracy based on energy judgment is improved. Namely, echo cancellation, external noise signal denoising and other operations are carried out on equipment signals to obtain respective clean signals, and then the respective clean signals are used as a judgment criterion according to the amplitude.
Through the description of the foregoing embodiments, it is clear to those skilled in the art that the method according to the foregoing embodiments may be implemented by software plus a necessary general hardware platform, and certainly may also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present application or portions thereof that contribute to the prior art may be embodied in the form of a software product, where the computer software product is stored in a storage medium (such as a ROM/RAM, a magnetic disk, and an optical disk), and includes several instructions for enabling a terminal device (which may be a mobile phone, a computer, a server, or a network device) to execute the method of the embodiments of the present application.
FIG. 5 is a block diagram of an alternative distributed voice wake-up apparatus according to an embodiment of the present application; as shown in fig. 5, includes:
an obtaining module 62, configured to, when it is determined that a first group of devices receives a first wake-up audio, obtain an original signal generated by each device in the first group of devices according to the first wake-up audio to obtain a first group of original signals in total, and obtain feedback information of each device in the first group of devices for the first wake-up audio to obtain a first group of feedback information in total, where the first group of devices are devices in the same network, the feedback information is used to indicate whether a corresponding device in the first group of devices responds to the first wake-up audio wake-up interaction function, and the original signal is an audio signal converted after the first wake-up audio received by the device;
a first determining module 64, configured to determine, according to the first group of feedback information, devices that have wakened up the interactive function from the first group of devices, to obtain a second group of devices in total, and determine, in the first group of original signals, original signals generated by the second group of devices, to obtain a second group of original signals in total;
a second determining module 66, configured to determine a target noise cancellation mode from a preset set of noise cancellation modes according to the number of devices in the second set of devices;
a processing module 68, configured to perform noise cancellation processing on the second set of original signals by using the target noise cancellation method to obtain a set of noise reduction signals;
a control module 70, configured to determine a target device in the second group of devices according to the group of noise reduction signals, control the target device to play a second audio corresponding to the first wake-up audio, and control devices in the second group of devices except the second target device to mute.
By the device, under the condition that the first group of equipment receives the first awakening audio, the original signals generated by each piece of equipment in the first group of equipment according to the first awakening audio are obtained to obtain a first group of original signals in total, the feedback information of each piece of equipment in the first group of equipment to the first awakening audio is obtained to obtain a first group of feedback information in total, wherein the first group of equipment is equipment in the same network, the feedback information is used for indicating whether the corresponding equipment in the first group of equipment responds to the first awakening audio to awaken the interactive function, and the original signals are audio signals converted after the first awakening audio received by the equipment; determining equipment wakening the interactive function from the first group of equipment according to the first group of feedback information to obtain a second group of equipment in total, and determining original signals generated by the second group of equipment from the first group of original signals to obtain a second group of original signals in total; determining a target noise elimination mode from a preset group of noise elimination modes according to the number of the equipment in the second group of equipment; carrying out noise elimination processing on the second group of original signals by using a target noise elimination mode to obtain a group of noise reduction signals; according to a group of noise reduction signals, determining target equipment in a second group of equipment, controlling the target equipment to play a second audio corresponding to the first awakening audio, and controlling equipment except the second target equipment in the second group of equipment to be muted, namely determining an original signal of equipment which feeds back a first awakening audio signal sent by a target object in a plurality of corresponding equipment in a target area in a distributed processing scene, performing noise elimination processing on the original signal to obtain a corresponding noise reduction signal, and determining final target equipment from the plurality of target equipment with feedback through the noise reduction signal; by adopting the technical scheme, the problems that in the related technology, under a complex scene, the responding equipment cannot be accurately and quickly determined from a plurality of pieces of equipment and the like are solved, the target equipment which is finally interacted with the target object can be determined from the plurality of pieces of equipment with responses under the complex scene, and the technical effect of effectiveness of a subsequent scheme which takes the amplitude/energy of the signal as a judgment criterion is improved.
In an exemplary embodiment, the second determining module is further configured to determine a first noise cancellation mode from a set of noise cancellation modes when the number of devices in the second set of devices is greater than or equal to a first preset threshold, where the first noise cancellation mode is used to filter the self-noise signal from the second set of original signals through a preset first adaptive filter and filter the external noise signal from the second set of original signals through a second adaptive filter, and the second adaptive filter is a filter generated according to beam forming between the second set of devices; and determining a second noise elimination mode from the group of noise elimination modes under the condition that the number of the devices in the second group of devices is less than a first preset threshold, wherein the second noise elimination mode is used for filtering self-noise signals from a second group of original signals through a first adaptive filter and filtering external noise signals determined by sound source separation between the second group of devices from the second group of original signals.
In an exemplary embodiment, the second determining module further includes: the adding unit is used for counting a first energy value of a target noise reduction signal corresponding to the second group of equipment after the first noise signal elimination mode processing; determining a second energy value corresponding to the external noise signal in the second group of original signals filtered by the first noise signal elimination mode; determining that the target object and the second group of equipment are positioned at the same angle under the condition that the difference value of the first energy value and the second energy value is lower than a second preset threshold value; and adding an estimated signal into the second group of original signals of the target processed in the first noise signal elimination mode, wherein the estimated signal is a preset signal for balancing signal cancellation.
In an exemplary embodiment, the second determining module is further configured to determine location information of each device in the second set of devices in the target area, so as to obtain a set of location information; determining the relative position between every two devices in the second group of devices through one group of position information; a second adaptive filter is determined for each device in the second set of devices based on the relative position.
In an exemplary embodiment, the second determining module is further configured to successively enter a calibration mode for each device in the second set of devices, and determine a relative direction between each device and the other devices according to a set of location information; performing beam forming based on the relative direction to obtain a first estimated external noise and a second estimated external noise between every two devices in the second group of devices; and under the condition that the first estimated external noise and the second estimated external noise are the same, determining the relative position between every two devices in the second group of devices.
In an exemplary embodiment, the second determining module further includes: the comparison unit is used for decomposing the second group of original signals into a first sub-signal and a second sub-signal through a target algorithm; calculating a third energy value corresponding to the first sub-signal and a fourth energy value corresponding to the second sub-signal; and determining the sub-signals approaching the target energy value in the third energy value and the fourth energy value as echo signals corresponding to the second group of original signals, and determining the external noise signals to be filtered in the second group of original signals based on the echo signals.
In an exemplary embodiment, the control module is further configured to determine, when each device in the second group of devices has a noise reduction signal, a target amplitude of the noise reduction signal corresponding to each device in the second group of devices, to obtain a plurality of target amplitudes corresponding to the second group of devices; and sequentially arranging the target amplitudes from large to small, selecting the equipment with the largest target amplitude as response equipment, and determining the target equipment from the second group of equipment by using the response equipment so as to interact with the target object emitting the first awakening audio.
Embodiments of the present application also provide a storage medium including a stored program, where the program performs any one of the methods described above when executed.
Alternatively, in the present embodiment, the storage medium may be configured to store program codes for performing the following steps:
the method comprises the steps of S1, under the condition that a first group of equipment receives a first awakening audio, obtaining original signals generated by each equipment in the first group of equipment according to the first awakening audio to obtain a first group of original signals in total, obtaining feedback information of each equipment in the first group of equipment on the first awakening audio to obtain a first group of feedback information in total, wherein the first group of equipment is equipment in the same network, the feedback information is used for indicating whether the corresponding equipment in the first group of equipment responds to a first awakening audio awakening interaction function or not, and the original signals are audio signals converted after the first awakening audio received by the equipment;
s2, determining the equipment awakening the interactive function from the first group of equipment according to the first group of feedback information to obtain a second group of equipment, determining original signals generated by the second group of equipment from the first group of original signals to obtain a second group of original signals;
s3, determining a target noise elimination mode from a preset group of noise elimination modes according to the number of the equipment in the second group of equipment;
s4, carrying out noise elimination processing on the second group of original signals by using the target noise elimination mode to obtain a group of noise reduction signals;
s5, according to the group of noise reduction signals, target equipment is determined in the second group of equipment, the target equipment is controlled to play second audio corresponding to the first awakening audio, and equipment except the second target equipment in the second group of equipment is controlled to be mute.
Embodiments of the present application further provide an electronic device comprising a memory having a computer program stored therein and a processor configured to execute the computer program to perform the steps of any of the above method embodiments.
Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:
the method comprises the steps of S1, under the condition that a first group of equipment receives a first awakening audio, obtaining original signals generated by each equipment in the first group of equipment according to the first awakening audio to obtain a first group of original signals in total, obtaining feedback information of each equipment in the first group of equipment on the first awakening audio to obtain a first group of feedback information in total, wherein the first group of equipment is equipment in the same network, the feedback information is used for indicating whether the corresponding equipment in the first group of equipment responds to a first awakening audio awakening interaction function or not, and the original signals are audio signals converted after the first awakening audio received by the equipment;
s2, determining the equipment awakening the interactive function from the first group of equipment according to the first group of feedback information to obtain a second group of equipment, determining original signals generated by the second group of equipment from the first group of original signals to obtain a second group of original signals;
s3, determining a target noise elimination mode from a preset group of noise elimination modes according to the number of the equipment in the second group of equipment;
s4, carrying out noise elimination processing on the second group of original signals by using the target noise elimination mode to obtain a group of noise reduction signals;
and S5, determining target equipment in the second group of equipment according to the group of noise reduction signals, controlling the target equipment to play a second audio corresponding to the first awakening audio, and controlling equipment except the second target equipment in the second group of equipment to be mute.
Optionally, in this embodiment, the storage medium may include, but is not limited to: various media capable of storing program codes, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.
Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments and optional implementation manners, and this embodiment is not described herein again.
It will be apparent to those skilled in the art that the modules or steps of the present application described above may be implemented in a general purpose computing system, centralized on a single computing system or distributed across a network of computing systems, or alternatively implemented in program code that is executable by a computing system, such that the steps shown and described may be executed by a computing system stored in a memory system and, in some cases, executed out of order, or fabricated separately as individual integrated circuit modules, or fabricated as a single integrated circuit module. Thus, the present application is not limited to any specific combination of hardware and software.
The foregoing is only a preferred embodiment of the present application and it should be noted that, as will be apparent to those skilled in the art, numerous modifications and adaptations can be made without departing from the principles of the present application and such modifications and adaptations are intended to be considered within the scope of the present application.

Claims (10)

1. A distributed voice wake-up method, comprising:
under the condition that a first group of devices receives a first wake-up audio, acquiring original signals generated by each device in the first group of devices according to the first wake-up audio to obtain a first group of original signals in total, and acquiring feedback information of each device in the first group of devices for the first wake-up audio to obtain a first group of feedback information in total, wherein the first group of devices are devices in the same network, the feedback information is used for indicating whether the corresponding device in the first group of devices responds to the first wake-up audio wake-up interaction function, and the original signals are audio signals converted after the first wake-up audio received by the devices;
determining the devices awakening the interactive function from the first group of devices according to the first group of feedback information to obtain a second group of devices in total, and determining original signals generated by the second group of devices from the first group of original signals to obtain a second group of original signals in total;
determining a target noise elimination mode from a preset group of noise elimination modes according to the number of the devices in the second group of devices;
carrying out noise elimination processing on the second group of original signals by using the target noise elimination mode to obtain a group of noise reduction signals;
and determining target equipment in the second group of equipment according to the group of noise reduction signals, controlling the target equipment to play a second audio corresponding to the first awakening audio, and controlling equipment except the second target equipment in the second group of equipment to mute.
2. The method of claim 1, wherein determining a target noise cancellation scheme from a predetermined set of noise cancellation schemes based on the number of devices in the second set of devices comprises:
determining a first noise cancellation mode from the set of noise cancellation modes when the number of devices in the second set of devices is greater than or equal to a first preset threshold, wherein the first noise cancellation mode is used for filtering out self-noise signals from the second set of original signals through a preset first adaptive filter and filtering out external noise signals from the second set of original signals through a second adaptive filter, and the second adaptive filter is a filter generated according to beam forming between the second set of devices;
and determining a second noise elimination mode from the group of noise elimination modes when the number of the devices in the second group of devices is smaller than the first preset threshold, wherein the second noise elimination mode is used for filtering self-noise signals from the second group of original signals through the first adaptive filter and filtering external noise signals determined by sound source separation between the second group of devices from the second group of original signals.
3. The method of claim 2, wherein after said determining a first noise cancellation mode from said set of noise cancellation modes, said method further comprises:
counting a first energy value of a target noise reduction signal corresponding to the second group of equipment after the first noise signal elimination mode processing; determining a second energy value corresponding to the external noise signal in the second group of original signals filtered by the first noise signal elimination mode;
determining that the target object and a second group of equipment are at the same angle under the condition that the difference value of the first energy value and the second energy value is lower than a second preset threshold value;
and adding an estimated signal in a second group of original signals of the target processed by the first noise signal elimination mode, wherein the estimated signal is a preset signal for balancing signal cancellation.
4. The method of claim 2, wherein in the event that the number of devices in the second set of devices is greater than or equal to a first preset threshold, prior to determining a first noise cancellation mode from the set of noise cancellation modes, the method further comprises:
determining the position information of each device in the second group of devices in the target area to obtain a group of position information;
determining a relative position between each two devices in the second set of devices from the set of position information;
determining a second adaptive filter for each device in the second set of devices based on the relative position.
5. The method of claim 4, wherein determining the relative position between each two devices in the second set of devices from the set of position information comprises:
each device in the second group of devices successively enters a calibration mode, and the relative direction between each device and other devices is determined according to the group of position information;
performing beam forming based on the relative direction to obtain a first estimated external noise and a second estimated external noise between every two devices in the second group of devices;
and under the condition that the first estimated external noise and the second estimated external noise are the same, determining the relative position between every two devices in the second group of devices.
6. The method of claim 2, wherein in the event the number of devices in the second set of devices is less than the first predetermined threshold, the method further comprises, prior to determining a second noise cancellation pattern from the set of noise cancellation patterns:
decomposing the second group of original signals into a first sub-signal and a second sub-signal through a target algorithm;
calculating a third energy value corresponding to the first sub-signal and a fourth energy value corresponding to the second sub-signal;
and determining a sub-signal approaching a target energy value in the third energy value and the fourth energy value as an echo signal corresponding to the second group of original signals, and determining an external noise signal to be filtered out in the second group of original signals based on the echo signal.
7. The method of claim 1, wherein determining a target device in the second set of devices based on the set of noise reduction signals comprises:
under the condition that each device in the second group of devices has a noise reduction signal, determining a target amplitude of the noise reduction signal corresponding to each device in the second group of devices to obtain a plurality of target amplitudes corresponding to the second group of devices;
and sequentially arranging the target amplitudes from large to small, selecting the equipment with the largest target amplitude as response equipment, and using the response equipment as target equipment determined from the second group of equipment to interact with the target object emitting the first awakening audio.
8. A distributed voice wake-up apparatus, comprising:
an obtaining module, configured to, when it is determined that a first group of devices receives a first wake-up audio, obtain an original signal generated by each device in the first group of devices according to the first wake-up audio to obtain a first group of original signals in common, and obtain feedback information of each device in the first group of devices for the first wake-up audio to obtain a first group of feedback information in common, where the first group of devices are devices in the same network, the feedback information is used to indicate whether a corresponding device in the first group of devices wakes up an interactive function in response to the first wake-up audio, and the original signal is an audio signal converted after the first wake-up audio received by the device;
a first determining module, configured to determine, according to the first group of feedback information, devices that have wakened up the interactive function from the first group of devices to obtain a second group of devices in total, and determine, from the first group of original signals, original signals generated by the second group of devices to obtain a second group of original signals in total;
a second determining module, configured to determine a target noise cancellation mode from a preset group of noise cancellation modes according to the number of devices in the second group of devices;
the processing module is used for carrying out noise elimination processing on the second group of original signals by using the target noise elimination mode to obtain a group of noise reduction signals;
and the control module is used for determining target equipment in the second group of equipment according to the group of noise reduction signals, controlling the target equipment to play a second audio corresponding to the first awakening audio and controlling the equipment except the second target equipment in the second group of equipment to mute.
9. A computer-readable storage medium, comprising a stored program, wherein the program when executed performs the method of any of claims 1 to 7.
10. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method of any of claims 1 to 7 by means of the computer program.
CN202210603410.XA 2022-05-30 2022-05-30 Distributed voice awakening method and device, storage medium and electronic device Active CN115171703B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210603410.XA CN115171703B (en) 2022-05-30 2022-05-30 Distributed voice awakening method and device, storage medium and electronic device
PCT/CN2023/085259 WO2023231552A1 (en) 2022-05-30 2023-03-30 Distributed voice wake-up method and apparatus, storage medium, and electronic apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210603410.XA CN115171703B (en) 2022-05-30 2022-05-30 Distributed voice awakening method and device, storage medium and electronic device

Publications (2)

Publication Number Publication Date
CN115171703A true CN115171703A (en) 2022-10-11
CN115171703B CN115171703B (en) 2024-05-24

Family

ID=83483084

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210603410.XA Active CN115171703B (en) 2022-05-30 2022-05-30 Distributed voice awakening method and device, storage medium and electronic device

Country Status (2)

Country Link
CN (1) CN115171703B (en)
WO (1) WO2023231552A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023231552A1 (en) * 2022-05-30 2023-12-07 青岛海尔科技有限公司 Distributed voice wake-up method and apparatus, storage medium, and electronic apparatus

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101325604A (en) * 2008-07-21 2008-12-17 重庆邮电大学 Energy-saving method for distributed self-adaption industry wireless network
US20140006825A1 (en) * 2012-06-30 2014-01-02 David Shenhav Systems and methods to wake up a device from a power conservation state
WO2018032954A1 (en) * 2016-08-16 2018-02-22 华为技术有限公司 Method and device for waking up wireless device
CN108877827A (en) * 2017-05-15 2018-11-23 福州瑞芯微电子股份有限公司 Voice-enhanced interaction method and system, storage medium and electronic equipment
CN109669775A (en) * 2018-12-10 2019-04-23 平安科技(深圳)有限公司 Distributed task dispatching method, system and storage medium
CN110211599A (en) * 2019-06-03 2019-09-06 Oppo广东移动通信有限公司 Using awakening method, device, storage medium and electronic equipment
CN111696562A (en) * 2020-04-29 2020-09-22 华为技术有限公司 Voice wake-up method, device and storage medium
CN111880855A (en) * 2020-07-31 2020-11-03 宁波奥克斯电气股份有限公司 Equipment control method and distributed voice system
CN112185388A (en) * 2020-09-14 2021-01-05 北京小米松果电子有限公司 Speech recognition method, device, equipment and computer readable storage medium
US20210233524A1 (en) * 2020-01-23 2021-07-29 International Business Machines Corporation Placing a voice response system into a forced sleep state

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020218645A1 (en) * 2019-04-25 2020-10-29 엘지전자 주식회사 Method and device for searching for smart voice enabled device
CN110288997B (en) * 2019-07-22 2021-04-16 苏州思必驰信息科技有限公司 Device wake-up method and system for acoustic networking
CN111640431B (en) * 2020-04-30 2023-10-27 海尔优家智能科技(北京)有限公司 Equipment response processing method and device
CN112634922A (en) * 2020-11-30 2021-04-09 星络智能科技有限公司 Voice signal processing method, apparatus and computer readable storage medium
CN113593548B (en) * 2021-06-29 2023-12-19 青岛海尔科技有限公司 Method and device for waking up intelligent equipment, storage medium and electronic device
CN114420094A (en) * 2021-12-13 2022-04-29 北京声智科技有限公司 Cross-device wake-up method, device, equipment and storage medium
CN115171703B (en) * 2022-05-30 2024-05-24 青岛海尔科技有限公司 Distributed voice awakening method and device, storage medium and electronic device

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101325604A (en) * 2008-07-21 2008-12-17 重庆邮电大学 Energy-saving method for distributed self-adaption industry wireless network
US20140006825A1 (en) * 2012-06-30 2014-01-02 David Shenhav Systems and methods to wake up a device from a power conservation state
WO2018032954A1 (en) * 2016-08-16 2018-02-22 华为技术有限公司 Method and device for waking up wireless device
CN108877827A (en) * 2017-05-15 2018-11-23 福州瑞芯微电子股份有限公司 Voice-enhanced interaction method and system, storage medium and electronic equipment
CN109669775A (en) * 2018-12-10 2019-04-23 平安科技(深圳)有限公司 Distributed task dispatching method, system and storage medium
CN110211599A (en) * 2019-06-03 2019-09-06 Oppo广东移动通信有限公司 Using awakening method, device, storage medium and electronic equipment
US20210233524A1 (en) * 2020-01-23 2021-07-29 International Business Machines Corporation Placing a voice response system into a forced sleep state
CN111696562A (en) * 2020-04-29 2020-09-22 华为技术有限公司 Voice wake-up method, device and storage medium
CN111880855A (en) * 2020-07-31 2020-11-03 宁波奥克斯电气股份有限公司 Equipment control method and distributed voice system
CN112185388A (en) * 2020-09-14 2021-01-05 北京小米松果电子有限公司 Speech recognition method, device, equipment and computer readable storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
吴亚明;: "基于小波包的分布式光纤信号降噪方法", 激光杂志, no. 10, 25 October 2018 (2018-10-25) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023231552A1 (en) * 2022-05-30 2023-12-07 青岛海尔科技有限公司 Distributed voice wake-up method and apparatus, storage medium, and electronic apparatus

Also Published As

Publication number Publication date
WO2023231552A1 (en) 2023-12-07
CN115171703B (en) 2024-05-24

Similar Documents

Publication Publication Date Title
CN102164328B (en) Audio input system used in home environment based on microphone array
US9558755B1 (en) Noise suppression assisted automatic speech recognition
JP6196320B2 (en) Filter and method for infomed spatial filtering using multiple instantaneous arrival direction estimates
CN111161751A (en) Distributed microphone pickup system and method under complex scene
CN111128210B (en) Method and system for audio signal processing with acoustic echo cancellation
KR20190085924A (en) Beam steering
US9378754B1 (en) Adaptive spatial classifier for multi-microphone systems
US9966086B1 (en) Signal rate synchronization for remote acoustic echo cancellation
KR20100054873A (en) Robust two microphone noise suppression system
US9313573B2 (en) Method and device for microphone selection
WO2007038922A1 (en) A system for providing a reduction of audiable noise perception for a human user
US10863296B1 (en) Microphone failure detection and re-optimization
CN110769352B (en) Signal processing method and device and computer storage medium
CN110931007B (en) Voice recognition method and system
CN112735462A (en) Noise reduction method and voice interaction method of distributed microphone array
CN112951261B (en) Sound source positioning method and device and voice equipment
WO2023231552A1 (en) Distributed voice wake-up method and apparatus, storage medium, and electronic apparatus
JP4835151B2 (en) Audio system
CN110199528A (en) Far field voice capturing
CN110913312B (en) Echo cancellation method and device
CN115410593A (en) Audio channel selection method, device, equipment and storage medium
CN110797040A (en) Noise elimination method, intelligent sound box and storage medium
CN111788627A (en) Signal processing device, signal processing method, and signal processing program
JP6861303B2 (en) How to operate the hearing aid system and the hearing aid system
CN102387269B (en) Method, device and system for cancelling echo out under single-talking state

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant