CN113488031A

CN113488031A - Method and device for determining electronic equipment, storage medium and electronic device

Info

Publication number: CN113488031A
Application number: CN202110742317.2A
Authority: CN
Inventors: 刘建国; 栾天祥; 赵培
Original assignee: Qingdao Haier Technology Co Ltd; Haier Smart Home Co Ltd
Current assignee: Qingdao Haier Technology Co Ltd; Haier Smart Home Co Ltd
Priority date: 2021-06-30
Filing date: 2021-06-30
Publication date: 2021-10-08
Anticipated expiration: 2041-06-30
Also published as: CN113488031B

Abstract

The invention discloses a method and a device for determining electronic equipment, a storage medium and an electronic device. Wherein, the method comprises the following steps: acquiring voice signals acquired by a plurality of electronic devices, wherein each electronic device comprises at least one microphone array; determining a reverberation energy ratio corresponding to the voice signal acquired by each electronic device based on the voice signal acquired by each electronic device, wherein the reverberation energy ratio represents the relationship between a reverberation energy component and a direct energy component in the voice signal acquired by the electronic device; and determining the target device from the plurality of electronic devices according to the reverberation energy ratio of the plurality of electronic devices. The invention solves the technical problems of large calculation amount, poor performance and low practical application value of the distributed awakening method in the prior art because the distributed awakening method inhibits the influence of environmental influence on distance estimation by means of reverberation and noise reduction.

Description

Method and device for determining electronic equipment, storage medium and electronic device

Technical Field

The invention relates to the field of Internet of things, in particular to a method and device for determining electronic equipment, a storage medium and an electronic device.

Background

Distributed wake-up is a problem that multiple AI voice devices are deployed in a local space at the same time, which easily causes the simultaneous operation of the same voice command on multiple devices. Especially in a home environment, when the AI voice device is woken up by voice, if a plurality of voice devices respond simultaneously, a phenomenon of 'one-for-one-response' may be caused, resulting in that a user cannot achieve a real operation purpose.

At present, in order to solve the problem of simultaneously waking up a plurality of AI voice devices, a distributed wake-up solution is introduced, and a common distributed wake-up solution calculates the energy of each device to obtain a voice signal according to wake-up time, and compares the energy according to the energy, and the larger the energy is, the closer the device is to a speaker is considered, so that the device should be woken up preferentially. The method cannot work accurately in a space with large reverberation, because the influence of the reverberation on energy calculation is not considered, and further, the estimation of the distance error and the near error directly according to the voice energy is extremely large. In the known distributed wake-up scheme, robust processing on reverberation influence is still difficult when the distance of a speaker is estimated, because the influence of environmental influence on distance estimation is usually suppressed by using a traditional method of reverberation and noise reduction, in an actual scene, reverberation estimation and noise reduction processing with large computation amount are difficult to tolerate due to limited computing resources of hardware equipment, and meanwhile, the processing is required not to cause actual influence on distance estimation of a sound source, and the requirements greatly limit the actual application value of distributed wake-up methods such as dereverberation and noise reduction.

In view of the above problems, no effective solution has been proposed.

Disclosure of Invention

The embodiment of the invention provides a method, a device, a storage medium and an electronic device for determining electronic equipment, which are used for at least solving the technical problems of large operation amount, poor performance and low practical application value of a distributed awakening method in the prior art because the distributed awakening method inhibits the influence of environmental influence on distance estimation in a mode of reverberation and noise reduction.

According to an aspect of an embodiment of the present invention, there is provided a method of determining an electronic device, the method including: acquiring voice signals acquired by a plurality of electronic devices, wherein each electronic device comprises at least one microphone array; determining a reverberation energy ratio corresponding to the voice signal acquired by each electronic device based on the voice signal acquired by each electronic device, wherein the reverberation energy ratio represents the relationship between a reverberation energy component and a direct energy component in the voice signal acquired by the electronic device; and determining the target device from the plurality of electronic devices according to the reverberation energy ratio of the plurality of electronic devices.

In one exemplary embodiment, determining a target device from a plurality of electronic devices according to a reverberation energy ratio of the plurality of electronic devices includes: and determining the electronic equipment with the smallest reverberation energy ratio in the plurality of electronic equipment as the target equipment.

In an exemplary embodiment, determining a reverberation energy ratio corresponding to the voice signal collected by each electronic device based on the voice signal collected by each electronic device includes: determining a frequency domain signal corresponding to the voice signal based on the voice signal collected by the microphone of each electronic device; calculating estimation vectors of direct energy components and reverberation energy components of the frequency domain signals of each electronic device at a plurality of frequency points, wherein the estimation vectors are used for representing the transposition of the direct energy components and the reverberation energy components after splicing; acquiring a plurality of direct energy components on a plurality of preset frequency points and a plurality of reverberation energy components on a plurality of preset frequency points based on the estimation vector; and determining the ratio of the sum of the plurality of reverberation energy components to the sum of the plurality of direct energy components as the reverberation energy ratio of the electronic equipment.

In one exemplary embodiment, calculating an estimation vector of direct energy components and reverberant energy components of the frequency domain signal of each electronic device at a plurality of frequency points comprises: determining cross-correlation parameters among microphone arrays of each electronic device, audio correlation coefficients and noise correlation coefficients among the microphone arrays; and determining an estimation vector according to the cross-correlation parameters, the audio correlation coefficient and the noise correlation coefficient among the microphone arrays.

In an exemplary embodiment, determining an estimation vector according to the cross-correlation parameter, the audio correlation coefficient and the noise correlation coefficient between microphone arrays includes: determining a correlation coefficient matrix according to the audio correlation coefficient and the noise correlation coefficient; acquiring a preset weight matrix; and determining an estimation vector according to the cross-correlation parameter, the weight matrix and the correlation coefficient matrix.

In one exemplary embodiment, determining cross-correlation parameters between microphone arrays of each electronic device includes: sampling the frequency domain signal of each microphone at a preset frequency point to obtain sampling signals corresponding to the preset frequency point at a plurality of moments; forming a sampling signal sequence based on the sampling signal corresponding to each microphone; and forming a cross-correlation parameter between every two microphones based on the sampling signal sequence corresponding to each microphone and the conjugate of the sampling signal sequence.

In an exemplary embodiment, the method further includes: detecting whether voice information corresponding to the voice signal is preset voice information or not, wherein the preset voice information is used for triggering an alarm; and sending an alarm signal under the condition that the voice information corresponding to the voice signal is determined to be the preset voice information.

In an exemplary embodiment, after determining the target device from the plurality of electronic devices according to the reverberation energy ratios of the plurality of electronic devices, the method further includes: and sending a response instruction to the target equipment so that the target equipment responds to the voice signal according to the response instruction.

According to another aspect of the embodiments of the present invention, there is also provided a method for determining an electronic device, where the method for determining an electronic device includes: acquiring voice signals collected by a plurality of microphones; determining a reverberation energy ratio corresponding to the voice signals based on the voice signals collected by the plurality of microphones, wherein the reverberation energy ratio represents a relation between a reverberation energy component and a direct energy component in the voice signals collected by the electronic equipment; and sending the reverberation energy ratio to a service end, wherein the service end receives a plurality of reverberation energy ratios sent by a plurality of electronic devices, and determines a target device from the plurality of electronic devices according to the plurality of reverberation energy ratios.

According to another aspect of the embodiments of the present invention, there is also provided an apparatus for determining an electronic device, where the apparatus for determining an electronic device includes: the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring voice signals acquired by a plurality of electronic devices, and each electronic device comprises at least one microphone array; the first determining module is used for determining a reverberation energy ratio corresponding to the voice signal acquired by each electronic device based on the voice signal acquired by each electronic device, wherein the reverberation energy ratio represents a relation between a reverberation energy component and a direct energy component in the voice signal acquired by the electronic device; and the second determining module is used for determining the target equipment from the plurality of electronic equipment according to the reverberation energy ratio of the plurality of electronic equipment.

According to another aspect of the embodiments of the present invention, there is also provided an apparatus for determining an electronic device, where the apparatus for determining an electronic device includes: the acquisition module is used for acquiring voice signals acquired by a plurality of microphones; the determining module is used for determining a reverberation energy ratio corresponding to the voice signals based on the voice signals collected by the microphones, wherein the reverberation energy ratio represents the relation between a reverberation energy component and a direct energy component in the voice signals collected by the electronic equipment; the sending module is used for sending the reverberation energy ratio to the service end, wherein the service end receives the reverberation energy ratios sent by the electronic devices, and determines the target device from the electronic devices according to the reverberation energy ratios.

According to another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to execute the above method for determining an electronic device when running.

According to another aspect of the embodiments of the present invention, there is also provided an electronic apparatus, including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor executes the method for determining an electronic device through the computer program.

In an embodiment of the present invention, the method includes acquiring voice signals acquired by a plurality of electronic devices, each electronic device including at least one microphone array; determining a reverberation energy ratio corresponding to the voice signal acquired by each electronic device based on the voice signal acquired by each electronic device, wherein the reverberation energy ratio represents the relationship between a reverberation energy component and a direct energy component in the voice signal acquired by the electronic device; and determining the target device from the plurality of electronic devices according to the reverberation energy ratio of the plurality of electronic devices. According to the scheme, when the indoor AI voice equipment is awakened, the accuracy of distributed awakening under a reverberation condition is greatly improved by using an algorithm of reverberation energy ratio, meanwhile, the method is small in operand, does not affect the characteristic of obtaining the distance of the voice signal, has robustness on environmental influence, and solves the technical problems that in the prior art, the distributed awakening method is large in operand, poor in performance and small in practical application value due to the fact that the influence of the environmental influence on distance estimation is restrained by the distributed awakening method through the mode of reverberation and noise reduction.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

fig. 1 is a block diagram of a hardware configuration of a computer terminal of a method of determining an electronic device according to an embodiment of the present invention;

FIG. 2 is a flow chart of a method of determining an electronic device according to an embodiment of the present application;

FIG. 3 is a flow chart of another method of determining an electronic device according to an embodiment of the present application;

FIG. 4 is a flow chart of an alternative method of determining an electronic device according to an embodiment of the present application;

FIG. 5 is a schematic diagram of an alternative reverberation energy component of a speech signal according to an embodiment of the application;

FIG. 6 is a schematic diagram of an apparatus for determining an electronic device according to an embodiment of the present application;

FIG. 7 is a schematic diagram of another apparatus for determining an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The method provided by the embodiment of the application can be executed in a computer terminal, a computer terminal or a similar operation device. Taking the example of being operated on a computer terminal, fig. 1 is a hardware structure block diagram of a computer terminal of a data request processing method according to an embodiment of the present invention. As shown in fig. 1, the computer terminal may include one or more (only one shown in fig. 1) processors 102 (the processors 102 may include, but are not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA) and a memory 104 for storing data, and in an exemplary embodiment, may also include a transmission device 106 for communication functions and an input-output device 108. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the computer terminal. For example, the computer terminal may also include more or fewer components than shown in FIG. 1, or have a different configuration with equivalent functionality to that shown in FIG. 1 or with more functionality than that shown in FIG. 1.

The memory 104 may be used to store computer programs, for example, software programs and modules of application software, such as computer programs corresponding to the data request processing method in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by running the computer programs stored in the memory 104, so as to implement the above-mentioned method. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to a computer terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 106 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal. In one example, the transmission device 106 includes a Network adapter (NIC), which can be connected to other Network devices through a base station so as to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.

In the embodiment, a method for determining an electronic device is provided, and is applied to the computer terminal, and fig. 2 is a flowchart of a method for determining an electronic device according to an embodiment of the present invention. The execution main body of the equipment in the embodiment of the application is the central control equipment, and the central control equipment can be one of indoor intelligent household equipment and also can be intelligent voice terminal equipment.

As shown in fig. 2, the method comprises the steps of:

s202, acquiring voice signals collected by a plurality of electronic devices, each electronic device including at least one microphone array.

Each of the electronic devices may be a terminal device, a home device, an intelligent voice terminal device, and the like having a microphone array, for example, a mobile phone, a computer, an air purifier, a refrigerator, a television, an AI speaker, an oven, and the like having a microphone array. The microphone arrays adopted by the electronic equipment can be provided with a plurality of microphones, each microphone can collect the voice signals, and each electronic equipment can make corresponding behaviors according to the corresponding voice signals. For example, one of the electronic devices may be selected as a central control device, so that all the electronic devices in a room can be controlled by controlling one electronic device. The terminal device can also be used as a central control device, and all indoor electronic devices can be controlled through the terminal device.

In an alternative embodiment, the central control device is one of all the electronic devices, such as the electronic device a, and the voice signal is a wake-up word. After a user wakes up each electronic device through the wake-up word by the electronic device a, the electronic device a can obtain multi-channel data corresponding to the length (time) of the wake-up word of each electronic device, each electronic device adopts a microphone array, for example, if the electronic device B has 4 microphones, there will be 4-channel data, and if the electronic device C has 6 microphones, there will be 6-channel data. Of course, all the electronic devices may also be controlled by using one terminal device as a central control device, for example, a mobile phone, a PC, and the like, and may all acquire a voice signal acquired by a microphone of each electronic device.

And S204, determining a reverberation energy ratio corresponding to the voice signal acquired by each electronic device based on the voice signal acquired by each electronic device, wherein the reverberation energy ratio represents the relationship between a reverberation energy component and a direct energy component in the voice signal acquired by the electronic device.

The voice signal may have a reverberation energy component and a Direct energy component, the reverberation energy component is energy reflected to the device after energy carried by the voice signal emitted from the sound source contacts with other objects, the Direct energy component is energy carried by a Direct sound (Direct sound) emitted from the sound source, wherein the Direct sound may directly reflect a distance information part of the sound source from the electronic device, and the reverberation energy component is completely caused by environmental factors, and the two parts need to be considered separately during distance information estimation, so that a reverberation energy ratio of the electronic device may be obtained for better distance information judgment.

In an optional implementation manner, the central control device is an intelligent voice terminal device, the sound source is a user, the electronic device is an electronic device a disposed at an indoor corner, the intelligent voice terminal device receives a voice signal collected by a microphone of the electronic device a, obtains a reverberation energy ratio of the device a through the voice signal, and determines distance information of the user from the device according to the reverberation energy ratio. Because the electronic device a is disposed in a corner, energy carried by a voice signal sent by a user may contact and reflect a wall around the electronic device a, which causes a reverberation energy component to be increased under the influence of a wall environment factor, so that the reverberation energy in the voice signal collected by the electronic device a has a large proportion.

And S206, determining a target device from the plurality of electronic devices according to the reverberation energy ratio of the plurality of electronic devices.

The target device is a device which is controlled by the user through the central control device to realize corresponding functions, and the reverberation energy ratio of the target device is related to the distance of the sound source and the environment around the target device.

In an optional implementation manner, the central control device is an intelligent terminal device, the target device is an electronic device a, and the intelligent terminal device determines that the electronic device a is the target device by comparing the reverberation energy ratios of all indoor electronic devices and taking the magnitude of the reverberation energy ratios as a basis.

As can be seen from the above, in the embodiment of the present application, the method includes acquiring voice signals collected by a plurality of electronic devices, where each electronic device includes at least one microphone array; determining a reverberation energy ratio corresponding to the voice signal acquired by each electronic device based on the voice signal acquired by each electronic device, wherein the reverberation energy ratio represents the relationship between a reverberation energy component and a direct energy component in the voice signal acquired by the electronic device; and determining the target device from the plurality of electronic devices according to the reverberation energy ratio of the plurality of electronic devices. According to the scheme, when the indoor AI voice equipment is awakened, the accuracy of distributed awakening under a reverberation condition is greatly improved by using an algorithm of reverberation energy ratio, meanwhile, the method is small in operand, does not affect the characteristic of obtaining the distance of the voice signal, has robustness on environmental influence, and solves the technical problems that in the prior art, the distributed awakening method is large in operand, poor in performance and small in practical application value due to the fact that the influence of the environmental influence on distance estimation is restrained by the distributed awakening method through the mode of reverberation and noise reduction.

The ratio of the reverberation energy to the direct energy is determined by a reverberation energy component and a direct energy component, wherein the reverberation energy component refers to energy which is carried by a voice signal sent by a user and is reflected to target equipment after contacting with other objects, and when the reverberation energy component is too large, interference can be caused to the voice signal, and the target equipment is determined by the user through central control equipment.

In an alternative embodiment, the central control device calculates a reverberation energy component and a direct energy component carried by the voice signal of each electronic device based on the voice signal collected by the microphone of each electronic device, where the plurality of electronic devices in the room are electronic device a, electronic device B, and electronic device C, respectively. Through calculation, it is known that the reverberation energy of the electronic device a accounts for twenty-five percent, the reverberation energy of the electronic device B accounts for twenty percent, and the reverberation energy of the electronic device C accounts for ten percent, so that the electronic device C is a target device of the plurality of electronic devices, the electronic device C responds to the voice signal, and a user can control the electronic device C to perform corresponding operations through the central control device.

Each of the electronic devices may have a plurality of microphones, each microphone corresponding to one channel data. The voice signals of the preset frequency points are voice signals acquired by a microphone at a certain time, frequency domain signals corresponding to the voice signals are determined according to the voice signals acquired by the microphone of each electronic device, estimation vectors of direct energy components and reverberation energy components of the frequency domain signals of each electronic device at the multiple frequency points are calculated, specifically, the period of time during which the voice signals are acquired by the microphone is divided into multiple time points according to the number of the microphones of each device, and one time point is a preset frequency point. Obtaining a reverberation energy component and a direct energy component of each preset frequency point based on the plurality of preset frequency points, taking the sum of all the obtained reverberation energy components as the reverberation energy component of the voice signal detected by each electronic device, taking the sum of all the obtained direct energy components as the direct energy component of the voice signal detected by each device, and then determining the ratio of the sum of the plurality of reverberation energy components to the sum of the plurality of direct energy components as the reverberation energy proportion of the device.

In an alternative embodiment, there are multiple electronic devices in the room, namely, electronic device a, electronic device B, and electronic device C. The electronic device A is provided with 4 microphones and 4 frequency points, wherein the reverberation energy component of the preset frequency point A1 is 10, and the direct energy component is 75; the reverberation energy component of the preset frequency point A2 is 8, and the direct energy component is 81; the reverberation energy component of the preset frequency point A3 is 5, and the direct energy component is 90; the reverberation energy component of the preset frequency point a4 is 17, and the direct energy component is 77. The sum of the reverberant energy components of device a is 10+8+5+ 17-40, the sum of the direct energy components is 75+81+90+ 77-323, and the reverberant energy share ratio of electronic device a is 40/323 ≈ 12.4%. The electronic device B is provided with three microphones and 3 preset frequency points, wherein the reverberation energy component of the preset frequency point B1 is 7, and the direct energy component is 65; the reverberation energy component of the preset frequency point B2 is 3, and the direct energy component is 70; the reverberation energy component of the preset frequency point B3 is 15, and the direct energy component is 44. The sum of the reverberation energy components of the electronic device B is 7+3+ 15-25, the sum of the direct energy components is 65+70+ 44-179, and the reverberation energy proportion of the electronic device B is 25/179-14%. The electronic device C has 2 microphones and 2 preset frequency points, wherein the reverberation energy component of the preset frequency point C1 is 25, and the direct energy component is 77; the reverberation energy component of the preset frequency point C2 is 22, and the direct energy component is 80. The sum of the reverberation energy components of the electronic device C is 25+ 22-47, the sum of the direct energy components is 77+ 80-157, and the reverberation energy proportion of the electronic device C is 47/157-29.9%. The electronic device C can be determined as the target device by comparing the ratio of the reverberation energies of the electronic device a, the electronic device B, and the electronic device C.

In another alternative embodiment, the estimation vector is used to represent the transpose after the direct energy component and the reverberation energy component are spliced, and accordingly, may be formulated as \ hat { \ theta } (f), where the vector \ theta (f) { [ P { \\ theta } (f) } is expressed by formula_D(f)，P_R(f)]^T，P_D(f) And P_R(f) Respectively represents a direct energy component and a reverberation energy component, and the sum of the direct energy components is \ sum _ f { P_D(f) The sum of the reverberant energy components is \ sum _ f { P }_R(f) Thus the reverberation energy ratio is R_est＝10log₁₀(\sum_f{P_R(f)}/\sum_f{P_D(f) H), where f represents a frequency band.

The cross-correlation parameter between microphone arrays of each electronic device may be d₁₁(f)，r₁₁(f)；d₁₂(f)，r₁₂(f)；…，d_MM(f)，r_MM(f) In that respect The audio frequency correlation coefficient between every two microphones of the same electronic equipment is d_ij(f) Noise correlation coefficient of r_ij(f) Wherein, i, j represents the ith microphone and the jth microphone in the same electronic device, and the audio correlation coefficient d_ij(f) Can be obtained by calculating the parameters of the microphone and the spatial relationship, and the noise correlation coefficient is r_ij(f) Affected by the spatial noise field, is also relatively easy to pre-determine.

On each preset frequency point, a correlation coefficient matrix between each microphone in the same electronic equipment is A (f), the correlation degree between any microphone and other microphones in the same equipment can be determined by calculating the correlation coefficient between each microphone, wherein the correlation coefficient d can be used for determining the correlation degree between any microphone and other microphones in the same equipment through audio frequency_ij(f) And a noise correlation coefficient of r_ij(f) It doesA matrix of phasing relationship numbers; the preset weight matrix is W, and global optimization selection can be performed according to historical record data.

In an alternative embodiment, when determining the estimation vector according to the cross-correlation parameter, the weight matrix and the correlation coefficient matrix, the estimation vector of the direct energy component and the reverberation energy component of the speech signal detected by the electronic device may be \ hat { \ theta } (f) ═ a { (a)^HWA)^-1A^HWz. Wherein (A)^HWA)^-1Expressed is the inverse matrix of the product of the conjugate matrix, the weight matrix and the correlation coefficient matrix of the correlation coefficient matrix, A^HWz represents the product of conjugate matrix of correlation coefficient matrix, weight matrix and cross-correlation parameter.

As can be seen from the above, the cross-correlation parameter between the microphone arrays of each electronic device can be d₁₁(f)，r₁₁(f)；d₁₂(f)，r₁₂(f)；…，d_MM(f)，r_MM(f) Obtaining a correlation coefficient matrix between each two microphones as a (f) ═ d based on the cross-correlation parameters of each two microphones₁₁(f)，r₁₁(f)；d₁₂(f)，r₁₂(f)；…，d_MM(f)，r_MM(f)]Meanwhile, the cross-correlation parameter between each microphone can also be obtained as z ═ R₁₁(f)，…，R_MM(f)]^T。

Note that the estimated vector \ hat { \\ theta } (f) — (a)^HWA)^-1A^HWz. Can be used to represent the direct energy component P of each electronic device_D(f) And the reverberation energy component P_R(f) The estimated vector of (2).

After each electronic device wakes up, the central control device obtains multi-channel data corresponding to the length (time) of the voice message, wherein each electronic device adopts a microphone array, and each electronic device can have a plurality of microphones, for example, 4 microphones, and thus has 4 channels of data. And performing fast Fourier transform on the voice signal, and recording a sampling signal sequence formed by the acquired frequency domain signals as X (f, t) ═ X⁽¹⁾(f，t)，X⁽²⁾(f，t)，…，X^(M)(f，t)]^TWhere M is the number of channels, f represents a frequency band, T represents the observation time, and T is 0, 1, …, T-1. Counting cross-correlation parameters R (f) ═ E [ x (f, t) x between each microphone formed by sampling sequence signal corresponding to each microphone and conjugate of the sampling sequence signal for a frequency point f^H(f，t)]Wherein the cross-correlation parameter represents a mathematical expectation of a product of a sample signal sequence and a conjugate matrix of the sample signal sequence, and since x (f, t) is a multi-dimensional sample sequence signal, the cross-correlation parameter between each microphone can be obtained.

The voice signal is used for triggering an alarm signal of the equipment, and after the equipment receives the voice signal, the equipment starts an alarm task and sends out the alarm signal. The voice signals can learn the characteristics of alarm sounds of people or other animals under dangerous conditions by using the deep learning model, so that the accuracy of the alarm signals is improved, and false alarm is avoided.

In an alternative embodiment, the predetermined voice signal may be emergency keywords such as "fire", "gas is on" and the like set by the user, after the user finds that the fire is on at home, the user shouts "fire, and the microphone of the device sends an alarm signal after acquiring voice information corresponding to the voice signal of" fire fighting ": the alarm and the loud shout of the sounds of firing and cheering can attract the attention of the surrounding residents, so that the surrounding residents can escape from the firing environment in time, and the personal safety is ensured.

In another optional implementation, the predetermined voice signal may be an emergency keyword such as "rescue" set by the user, when a house rushes into a gangster, the user shouts the rescue, and the microphone of the device sends an alarm signal after acquiring a voice signal corresponding to the voice signal of "rescue": alarming and largely sounding the sound of calling to attract the attention of the surrounding residents and frighten the gangster. Meanwhile, in the above situation, in order to avoid irritating gangsters, the voice signal of "lifesaving" can be set as a word which is relatively hidden and is not easy to be discovered by people for alarming, so as to prolong the time and ensure the safety of users.

In one exemplary embodiment, after determining the target device from the plurality of electronic devices according to the reverberation energy ratios of the plurality of electronic devices, the method further includes: and sending a response instruction to the target equipment so that the target equipment responds to the voice signal according to the response instruction.

After the target device responds to the voice signal, the user can send a response instruction through the terminal device and the like to control the target device to perform corresponding operation, so that the target device can respond to the voice signal according to the response instruction.

In an optional implementation manner, the central control device is a terminal device, the target device is an electronic device a, the terminal device compares reverberation energy ratios of all electronic devices in a room, and after the electronic device a is determined as the target device according to the size of the reverberation energy ratios, the terminal device sends a response instruction to the electronic device a, for example, plays music, and the electronic device a starts to play the music after receiving the response instruction.

In the embodiment, a method for determining an electronic device is provided, and is applied to the computer terminal, and fig. 3 is a flowchart of a method for determining an electronic device according to an embodiment of the present invention. The execution main body of the equipment in the embodiment of the application is the central control equipment with the server side, and the central control equipment can be one of indoor intelligent household equipment and also can be intelligent voice terminal equipment.

As shown in fig. 3, the method comprises the steps of:

s302, voice signals collected by a plurality of microphones are obtained.

The microphones are part of or all of the microphones on the electronic device, wherein each of the microphones can collect the voice signal, and the electronic device can perform corresponding actions according to the voice signals collected by the microphones.

S304, determining a reverberation energy ratio corresponding to the voice signals based on the voice signals collected by the microphones, wherein the reverberation energy ratio represents the relationship between a reverberation energy component and a direct energy component in the voice signals collected by the electronic equipment.

S306, the reverberation energy ratio is sent to a service end, wherein the service end receives the reverberation energy ratios sent by the electronic devices, and determines the target device from the electronic devices according to the reverberation energy ratios.

In an optional implementation manner, the target device is an electronic device a, the server receives multiple reverberation energy ratios sent by multiple electronic devices, and the server compares the reverberation energy ratios of all indoor electronic devices and determines that the electronic device a is the target device according to the size of the reverberation energy ratio.

Fig. 4 is a flowchart of an alternative method for determining an electronic device according to an embodiment of the present invention, as shown in fig. 4, the specific steps are as follows:

s401, acquiring voice signals acquired by a plurality of electronic devices;

s402, determining a frequency domain signal corresponding to the voice signal according to the collected voice signal;

s403, determining cross-correlation parameters among microphone arrays of each electronic device, audio correlation coefficients and noise correlation coefficients among the microphone arrays;

s404, determining a correlation coefficient matrix according to the audio correlation coefficient and the noise correlation coefficient;

s405, acquiring a preset weight matrix;

s406, sampling the frequency domain signal of each microphone at a preset frequency point to obtain sampling signals corresponding to the preset frequency point at a plurality of moments;

s407, forming a sampling signal sequence based on the sampling signals corresponding to each microphone;

s408, forming cross-correlation parameters between every two microphones based on the sampling signal sequence corresponding to every microphone and the conjugate of the sampling signal sequence;

s409, determining an estimation vector according to the cross-correlation parameters, the weight matrix and the correlation coefficient matrix;

s410, acquiring a plurality of direct energy components on a plurality of preset frequency points and a plurality of reverberation energy components on a plurality of preset frequency points based on the estimation vectors;

s411, determining the ratio of the sum of the multiple reverberation energy components to the sum of the multiple direct energy components as the reverberation energy ratio of the electronic equipment;

s412, determining the reverberation energy ratio corresponding to the voice signal collected by each electronic device;

s413, determining an electronic device with the smallest reverberation energy ratio among the plurality of electronic devices as a target device;

and S414, sending a response instruction to the target equipment so that the target equipment responds to the multi-voice signal according to the response instruction.

Referring to fig. 5, in the reverberation energy component of the voice signal, the Direct sound corresponds to Direct sound in fig. 5, the Early emission sound corresponds to Early reflection in fig. 5, and the reverberation sound corresponds to reverbenation in fig. 5, wherein the Direct sound is a part directly reflecting the distance information between the speaker (sound source) and the device, and since the reverberation sound part is caused by environmental factors, the Direct sound and the reverberation sound need to be considered separately in the distance estimation to more robustly estimate the distance information.

It should be noted that, H (ω) in fig. 5 indicates a frequency domain signal received by the target device, H_D(ω) denotes the frequency domain signal, H, corresponding to the direct sound_RAnd (ω) represents a frequency domain signal corresponding to the reverberant sound. Where H (ω ═ H) is the case where there is only direct sound and no other reverberation sound_D(ω); in the course of receiving a voice signal by a target device, a reverberation sound may be generated due to the influence of environmental factors, in which case H (ω) ═ H_D(ω)+H_R(ω)。

Fig. 6 is a schematic diagram of an apparatus for determining an electronic device according to an embodiment of the present invention, where the apparatus for determining an electronic device is shown in fig. 6, and the apparatus includes:

an obtaining module 61, configured to obtain speech signals collected by a plurality of electronic devices, where each electronic device includes at least one microphone array;

the first determining module 62 is configured to determine, based on the voice signal acquired by each electronic device, a reverberation energy ratio corresponding to the voice signal acquired by each electronic device, where the reverberation energy ratio represents a relationship between a reverberation energy component and a direct energy component in the voice signal acquired by the electronic device;

and a second determining module 63, configured to determine the target device from the multiple electronic devices according to the reverberation energy ratios of the multiple electronic devices.

In an exemplary embodiment, the second determining module includes: the first determining submodule is used for determining the electronic equipment with the minimum reverberation energy ratio in the plurality of electronic equipment as target equipment.

In an exemplary embodiment, the first determining module includes:

the second determining submodule is used for determining a frequency domain signal corresponding to the voice signal based on the voice signal acquired by the microphone of each piece of electronic equipment;

and the calculation module is used for calculating estimation vectors of direct energy components and reverberation energy components of the frequency domain signals of each electronic device at a plurality of frequency points, wherein the estimation vectors are used for representing transposes formed after splicing of the direct energy components and the reverberation energy components.

The first obtaining submodule is used for obtaining a plurality of direct energy components on a plurality of preset frequency points and a plurality of reverberation energy components on the plurality of preset frequency points based on the estimation vector;

and the third determining submodule is used for determining that the ratio of the sum of the multiple reverberation energy components to the sum of the multiple direct energy components is the reverberation energy ratio of the electronic equipment.

In one exemplary embodiment, the calculation module includes:

the fourth determining submodule is used for determining cross-correlation parameters among microphone arrays of each electronic device, audio correlation coefficients and noise correlation coefficients among the microphone arrays;

and the fifth determining submodule is used for determining an estimation vector according to the cross-correlation parameters, the audio correlation coefficients and the noise correlation coefficients among the microphone arrays.

In one exemplary embodiment, the fifth determination sub-module includes:

a sixth determining submodule, configured to determine a correlation coefficient matrix according to the audio correlation coefficient and the noise correlation coefficient;

the second acquisition module is used for acquiring a preset weight matrix;

and the seventh determining submodule is used for determining an estimation vector according to the cross-correlation parameter, the weight matrix and the correlation coefficient matrix.

In one exemplary embodiment, determining cross-correlation parameters between microphone arrays of each electronic device includes:

the sampling module is used for sampling the frequency domain signal of each microphone at a preset frequency point to obtain sampling signals corresponding to the preset frequency point at a plurality of moments;

the first forming module is used for forming a sampling signal sequence based on the sampling signal corresponding to each microphone;

and the second constructing module is used for constructing the cross-correlation parameter between every two microphones based on the sampling signal sequence corresponding to each microphone and the conjugate of the sampling signal sequence.

In an exemplary embodiment, the method further includes:

the detection module is used for detecting whether the voice information corresponding to the voice signal is preset voice information or not, wherein the preset voice information is the voice information used for triggering alarm;

and the alarm module is used for sending out an alarm signal under the condition that the voice information corresponding to the voice signal is determined to be the preset voice information.

In an exemplary embodiment, after the second determining module, the method further includes:

and the sending module is used for sending the response instruction to the target equipment so that the target equipment responds to the voice signal according to the response instruction.

Fig. 7 is a schematic diagram of another apparatus for determining an electronic device according to an embodiment of the present invention, where the apparatus for determining an electronic device is shown in fig. 7, and the apparatus includes:

an obtaining module 71, configured to obtain voice signals collected by multiple microphones;

the determining module 72 is configured to determine a reverberation energy ratio corresponding to the voice signal based on the voice signals acquired by the multiple microphones, where the reverberation energy ratio represents a relationship between a reverberation energy component and a direct energy component in the voice signal acquired by the electronic device;

the sending module 73 is configured to send the reverberation energy ratio to a server, where the server receives multiple reverberation energy ratios sent by multiple electronic devices, and determines a target device from the multiple electronic devices according to the multiple reverberation energy ratios.

An embodiment of the present invention further provides a storage medium including a stored program, wherein the program executes any one of the methods described above.

Alternatively, in the present embodiment, the storage medium may be configured to store program codes for performing the following steps:

s1: acquiring voice signals acquired by a plurality of electronic devices, wherein each electronic device comprises at least one microphone array;

s2: determining a reverberation energy ratio corresponding to the voice signal acquired by each electronic device based on the voice signal acquired by each electronic device, wherein the reverberation energy ratio represents the relationship between a reverberation energy component and a direct energy component in the voice signal acquired by the electronic device;

s3: and determining the target device from the plurality of electronic devices according to the reverberation energy ratio of the plurality of electronic devices.

Embodiments of the present invention also provide an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the steps of any of the above method embodiments.

Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.

Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:

Optionally, in this embodiment, the storage medium may include, but is not limited to: various media capable of storing program codes, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.

Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments and optional implementation manners, and this embodiment is not described herein again.

It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method of determining an electronic device, the method comprising:

acquiring voice signals acquired by a plurality of electronic devices, wherein each electronic device comprises at least one microphone array;

determining a reverberation energy ratio corresponding to the voice signal acquired by each electronic device based on the voice signal acquired by each electronic device, wherein the reverberation energy ratio represents a relation between a reverberation energy component and a direct energy component in the voice signal acquired by the electronic device;

determining a target device from the plurality of electronic devices according to the reverberation energy ratios of the plurality of electronic devices.

2. The method of claim 1, wherein determining a target device from the plurality of electronic devices according to the reverberation energy fractions of the plurality of electronic devices comprises:

and determining the electronic equipment with the minimum reverberation energy ratio in the plurality of electronic equipment as the target equipment.

3. The method of claim 1, wherein determining a reverberation energy ratio corresponding to the voice signal collected by each electronic device based on the voice signal collected by each electronic device comprises:

determining a frequency domain signal corresponding to the voice signal based on the voice signal collected by the microphone of each electronic device;

calculating estimation vectors of direct energy components and reverberation energy components of the frequency domain signals of each electronic device at a plurality of frequency points, wherein the estimation vectors are used for representing transposes after splicing of the direct energy components and the reverberation energy components;

acquiring a plurality of direct energy components on a plurality of preset frequency points and a plurality of reverberation energy components on the plurality of preset frequency points based on the estimation vector;

determining a ratio of a sum of the plurality of reverberation energy components to a sum of the plurality of direct energy components as a reverberation energy ratio of the electronic device.

4. The method of claim 3, wherein calculating an estimated vector of direct energy components and reverberant energy components of the frequency domain signal for each of the electronic devices at a plurality of frequency points comprises:

determining cross-correlation parameters among microphone arrays of each electronic device, audio correlation coefficients and noise correlation coefficients among the microphone arrays;

and determining the estimation vector according to the cross-correlation parameters, the audio correlation coefficients and the noise correlation coefficients among the microphone arrays.

5. The method of claim 4, wherein determining the estimation vector based on the cross-correlation parameter, audio correlation coefficients and noise correlation coefficients between the microphone arrays comprises:

determining a correlation coefficient matrix according to the audio correlation coefficient and the noise correlation coefficient;

acquiring a preset weight matrix;

and determining the estimation vector according to the cross-correlation parameter, the weight matrix and the correlation coefficient matrix.

6. The method of claim 4, wherein determining cross-correlation parameters between microphone arrays of each of the electronic devices comprises:

sampling the frequency domain signal of each microphone at the preset frequency point to obtain sampling signals corresponding to the preset frequency point at a plurality of moments;

forming a sampling signal sequence based on the sampling signal corresponding to each microphone;

and forming a cross-correlation parameter between every two microphones based on the sampling signal sequence corresponding to each microphone and the conjugate of the sampling signal sequence.

7. The method of claim 1, further comprising:

detecting whether voice information corresponding to the voice signal is preset voice information or not, wherein the preset voice information is used for triggering an alarm;

and sending an alarm signal under the condition that the voice information corresponding to the voice signal is determined to be preset voice information.

8. The method of claim 1, wherein after determining a target device from the plurality of electronic devices according to the reverberation energy fractions of the plurality of electronic devices, the method further comprises:

and sending a response instruction to the target equipment so that the target equipment responds to the voice signal according to the response instruction.

9. A method of determining an electronic device, the method comprising:

acquiring voice signals collected by a plurality of microphones;

determining a reverberation energy ratio corresponding to the voice signals based on the voice signals collected by a plurality of microphones, wherein the reverberation energy ratio represents the relationship between a reverberation energy component and a direct energy component in the voice signals collected by electronic equipment;

and sending the reverberation energy ratio to a service end, wherein the service end receives a plurality of reverberation energy ratios sent by a plurality of electronic devices, and determines target equipment from the plurality of electronic devices according to the plurality of reverberation energy ratios.

10. An apparatus for determining an electronic device, the apparatus for determining an electronic device comprising:

the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring voice signals acquired by a plurality of electronic devices, and each electronic device comprises at least one microphone array;

the first determining module is used for determining a reverberation energy ratio corresponding to the voice signal acquired by each electronic device based on the voice signal acquired by each electronic device, wherein the reverberation energy ratio represents a relation between a reverberation energy component and a direct energy component in the voice signal acquired by the electronic device;

a second determining module, configured to determine a target device from the multiple electronic devices according to the reverberation energy ratios of the multiple electronic devices.

11. An apparatus for determining an electronic device, the apparatus for determining an electronic device comprising:

the acquisition module is used for acquiring voice signals acquired by a plurality of microphones;

the determining module is used for determining a reverberation energy ratio corresponding to the voice signals based on the voice signals collected by the microphones, wherein the reverberation energy ratio represents the relation between a reverberation energy component and a direct energy component in the voice signals collected by the electronic equipment;

and the sending module is used for sending the reverberation energy ratio to a service end, wherein the service end receives a plurality of reverberation energy ratios sent by a plurality of electronic devices, and determines target equipment from the plurality of electronic devices according to the plurality of reverberation energy ratios.

12. A computer-readable storage medium, comprising a stored program, wherein the program is operable to perform the method of any one of claims 1 to 9.

13. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method of any of claims 1 to 9 by means of the computer program.