CN107452398B

CN107452398B - Echo acquisition method, electronic device and computer readable storage medium

Info

Publication number: CN107452398B
Application number: CN201710674519.1A
Authority: CN
Inventors: 王晓晖; 李彬; 文立夫
Original assignee: Shenzhen Skyworth Digital Technology Co Ltd
Current assignee: Shenzhen Xiaopai Technology Co.,Ltd.
Priority date: 2017-08-09
Filing date: 2017-08-09
Publication date: 2021-03-16
Anticipated expiration: 2037-08-09
Also published as: CN107452398A

Abstract

The invention provides an echo acquisition method, which comprises the following steps: when audio data are played through audio playing equipment, the audio data played by the audio playing equipment are obtained; acquiring a pre-stored frequency response function of the audio playing equipment; and calculating echo audio data of the audio playing equipment based on the audio data and the frequency response function. The invention also provides electronic equipment and a computer readable storage medium. The invention can reduce the difficulty of acquiring the echo.

Description

Echo acquisition method, electronic device and computer readable storage medium

Technical Field

The present invention relates to the field of speech recognition technologies, and in particular, to an echo acquisition method, an electronic device, and a computer-readable storage medium.

Background

With the development of artificial intelligence, the speech recognition technology has been greatly improved, and the application thereof has also been widely popularized, so that the speech pickup has more challenges. Because the voice pickup function is triggered manually and the near-end voice pickup is performed, and more demands point to far-field (remote) voice pickup, under the new demands, a new requirement is provided for the original technical problem, namely how to pick up a cleaner voice signal while playing audio (played sound is echo) by the electronic equipment, and the problem can be described as echo cancellation. In order to realize echo cancellation, acquisition of echoes is correspondingly required, and the closer the acquired echoes are to an actual value, the better echo cancellation effect can be obtained. However, in the existing echo collection method, usually, a level signal at the power amplifier output end (or input end) of the audio playing device is collected as an echo for echo cancellation, and sometimes, it is difficult to collect the level signal, for example, in a system formed by a set-top box and a television, it is difficult for the set-top box to collect the level signal at the power amplifier output end of the television by using the existing technical scheme, because there is no such electric signal loop between the television and the set-top box; for another example, when there are multiple outputs (such as 2.0, 2.1, 5.1, etc. multi-channel outputs sometimes), more hardware is required to integrate and process the multiple output audio signals into echo.

Disclosure of Invention

The invention mainly aims to provide an echo acquisition method, electronic equipment and a computer readable storage medium, aiming at reducing the difficulty of acquiring echo.

In order to achieve the above object, the present invention provides an echo acquisition method, including:

when audio data are played through audio playing equipment, the audio data played by the audio playing equipment are obtained;

acquiring a pre-stored frequency response function of the audio playing equipment;

and calculating echo audio data of the audio playing equipment based on the audio data and the frequency response function.

Further, the present invention also provides an electronic device, comprising:

a memory storing an echo acquisition program;

a processor in communication with the memory and configured to execute the echo acquisition program to implement the steps of:

Further, the present invention also provides a computer-readable storage medium having stored thereon an echo acquisition program, which when executed by a processor, implements the steps of:

According to the scheme, when the electronic equipment plays the audio data through the audio playing equipment, the audio data played by the audio playing equipment is firstly obtained; then, acquiring a pre-stored frequency response function of the audio playing equipment; and finally, calculating to obtain echo audio data of the audio playing device based on the obtained audio data and the frequency response function, so that echoes are not required to be collected in a signal loop mode, and the echoes of the audio playing device can be collected more easily.

Drawings

FIG. 1 is a diagram of an alternative hardware configuration of the electronic device of the present invention;

FIG. 2 is a flowchart illustrating a first embodiment of an echo obtaining method according to the present invention;

FIG. 3 is a schematic diagram of a segment of audio data in the time domain according to a first embodiment of the echo obtaining method of the present invention;

FIG. 4 is a schematic diagram of a front-end audio data in a frequency domain according to a first embodiment of the echo obtaining method of the present invention;

FIG. 5 is a schematic diagram of a frequency response function of an audio playback device according to a first embodiment of the echo obtaining method of the present invention;

fig. 6 is a schematic diagram of echo data of an audio playing device in a frequency domain according to a first embodiment of the echo obtaining method of the present invention;

fig. 7 is a schematic diagram of echo data of an audio playing device in the time domain according to the first embodiment of the echo obtaining method of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The main solution of the embodiment of the invention is as follows: when the electronic equipment plays audio data through the audio playing equipment, firstly, the audio data played by the audio playing equipment is obtained; then, acquiring a pre-stored frequency response function of the audio playing equipment; and finally, calculating to obtain echo audio data of the audio playing device based on the obtained audio data and the frequency response function, so that echoes are not required to be collected in a signal loop mode, and the echoes of the audio playing device can be collected more easily.

As shown in fig. 1, fig. 1 is a schematic structural diagram of an electronic device in a hardware operating environment according to an embodiment of the present invention.

As shown in fig. 1, the electronic device may include: the system comprises a processor 1001, a communication bus 1002, a user interface 1003, a network interface 1004, a memory 1005, an audio playing device 1006 and an audio collecting device 1004. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface, and the like. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., a Wi-Fi interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001. The audio playing device 1006 may be a speaker, and the number of speakers is not limited, and may be a single speaker or a speaker array. The audio collecting device 1005 may be a microphone, and the number of microphones is not limited, and may be a single microphone, or may be a microphone array.

Those skilled in the art will appreciate that the configuration of the electronic device shown in fig. 1 does not constitute a limitation of the electronic device and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

As shown in fig. 1, in an embodiment of the electronic device of the present invention, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and an echo acquisition program.

In the electronic device shown in fig. 1, the network interface 1004 is mainly used for connecting to a backend server and performing data communication with the backend server; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; and the processor 1001 may be configured to call the echo acquisition program stored in the memory 1005 and perform the following operations:

and calculating echo audio data of the audio playing equipment based on the acquired audio data and the frequency response function.

Further, the processor 1001 may be configured to call the echo obtaining program stored in the memory 1005, and further perform the following operations:

playing preset audio data through audio playing equipment, and simultaneously carrying out audio acquisition through audio acquisition equipment to obtain first recorded audio data;

converting the preset audio data and the first recorded audio data from time domain to frequency domain respectively;

and calculating the frequency domain correlation of the preset audio data and the first recorded audio data after the frequency domain conversion to obtain a transformation function of the preset audio data and the first recorded audio data, and storing the transformation function as a frequency response function of the audio playing equipment.

and respectively converting the preset audio data and the first recorded audio data from a time domain to a frequency domain by adopting fast Fourier transform.

acquiring audio through audio acquisition equipment to obtain second recorded audio data;

judging whether the current environment is in a quiet state or not based on the second recorded audio data;

when the current environment is in a quiet state, preset audio data are played through audio playing equipment, and audio acquisition is carried out through audio acquisition equipment to obtain first recorded audio data.

and when the current environment is not in a quiet state, playing preset prompting audio through audio playing equipment.

and judging whether the volume value of the second recorded audio data is continuously smaller than a preset volume value or not, wherein when the volume value of the second recorded audio data is continuously smaller than the preset volume value, the current environment is determined to be in a quiet state.

judging whether the current position of the audio playing device and/or the audio collecting device changes or not;

when the current position of the audio playing device and/or the audio collecting device changes, the preset audio data are played through the audio playing device, and meanwhile, audio collection is carried out through the audio collecting device, so that first recorded audio data are obtained.

when the audio data are played through the audio playing equipment, audio acquisition is carried out through the audio acquisition equipment, and third recorded audio is obtained;

echo cancellation is performed on the third recorded audio based on the echo audio data.

Further, the present invention also provides an echo obtaining method, applied to the electronic device shown in fig. 1, and referring to fig. 2, in a first embodiment of the echo obtaining method of the present invention, the echo obtaining method includes:

step S10, when playing audio data through the audio playing device, acquiring the audio data played by the audio playing device;

step S20, acquiring a pre-stored frequency response function of the audio playing device;

step S30, calculating echo audio data of the audio playing device based on the obtained audio data and the frequency response function.

When the electronic equipment works normally, the audio data can be played through the audio playing equipment according to actual needs. Taking the public address device as an example, for example, when a user speaks through the public address device, a background sound can be played through a sound box (i.e. an audio playing device) of the public address device to increase the effect of the lecture.

Accordingly, in this embodiment, when the electronic device plays the audio data through the audio playing device, the electronic device first acquires the audio data played by the audio device, for example, when a certain song is played through the audio playing device, the electronic device directly acquires an audio file of the song.

After the audio data played by the audio playing device is obtained, the electronic device further obtains a pre-stored frequency response function of the audio playing device, where the frequency response function is used to describe the frequency domain correlation between the audio data obtained by the audio acquisition device performing audio acquisition on the audio playing device and the original audio data played by the audio playing device.

And then, the electronic equipment calculates echo data of the audio playing equipment based on the acquired audio data and the frequency response function.

In this embodiment, audio acquisition is performed by the audio acquisition device while audio data is played by the audio playing device, so as to obtain a third recorded audio;

after step S30, the method further includes:

and performing echo cancellation on the third recorded audio based on the calculated echo audio data.

It will be readily appreciated that after echo cancellation of the third recorded audio is completed, a clean sound (human voice) is obtained. In specific implementation, the echo cancellation algorithm used in the present invention is not particularly limited, and may be selected by those skilled in the art according to actual needs.

Further, to achieve the acquisition of the echo, in this embodiment, step S10 is preceded by:

In this embodiment, in order for the electronic device to work normally, an initialization process is also required. Specifically, preset audio data are played through an audio playing device, and audio collection is performed through an audio collecting device to obtain first recorded audio data. The preset audio data may be a segment of audio data of a full frequency band, or may be composed of a plurality of audio data of different frequency bands. For example, the electronic device plays the background sound a through the audio playing device, and performs audio acquisition through the audio acquisition device to obtain the first recorded audio data background sound a'.

After the first recorded audio data are acquired, the electronic device converts the preset audio data and the first recorded audio data from time domain to frequency domain respectively, and specifically, converts the preset audio data and the first recorded audio data from time domain to frequency domain respectively by adopting fast fourier transform.

After the first recorded audio data of the preset audio data are all converted into the frequency domain, the electronic device further calculates the frequency domain correlation between the preset audio data after the frequency domain conversion and the first recorded audio data to obtain a transformation function of the preset audio data and the first recorded audio data, and stores the transformation function as a frequency response function of the audio playing device, specifically in the memory 1005, for use in subsequent echo audio data calculation.

How to calculate the echo data of the audio playing device according to the solution of the present invention is described below with reference to specific examples:

now, assuming that the sampling rate of the audio data played by the audio playing device is 16KHz, a total of 512 pieces of data are captured and used as a description of the operation process, as shown in fig. 3, the horizontal axis represents time, and the vertical axis represents amplitude.

A section of audio data shown in fig. 3 is fast fourier transformed to convert the section of audio data from the time domain to the frequency domain, as shown in fig. 4, in which the horizontal axis represents frequency and the vertical axis represents amplitude, and in which the highest frequency of the audio data is 8KHz since the sampling rate of the audio data is 16 KHz.

Fig. 5 is a schematic diagram of a frequency response function of an audio playing device.

It should be noted that, when calculating echo data, the arithmetic algorithm used by a person skilled in the art can be designed according to actual needs, for example, in this embodiment, the calculation is performed according to the following formula:

o(x)’＝f(x)’*g(x)；

where o (x) 'denotes echo data in the frequency domain, f (x)' denotes audio data in the frequency domain, and g (x) denotes a frequency response function.

After o (x)' shown in fig. 6 is obtained through calculation, the echo data o (x) in the time domain of the audio playing device can be obtained by converting the calculated value from the frequency domain to the time domain, as shown in fig. 7. During the conversion, the frequency domain to the time domain is specifically converted by adopting the inverse fast fourier transform. Referring to fig. 3 and 7, the echo data o (x) has substantially no interference compared to the original audio data, but the original high and low frequency components are suppressed (corresponding to the frequency response function), and the middle frequency components are highlighted. According to the echo acquisition method provided by the embodiment of the invention, when the electronic equipment plays audio data through the audio playing equipment, the audio data played by the audio playing equipment is firstly acquired; then, acquiring a pre-stored frequency response function of the audio playing equipment; and finally, calculating to obtain echo audio data of the audio playing device based on the obtained audio data and the frequency response function, so that echoes are not required to be collected in a signal loop mode, and the echoes of the audio playing device can be collected more easily.

Further, in order to improve the accuracy of calculating the frequency response function, based on the first embodiment, a second embodiment of the echo obtaining method according to the present invention is provided, where in this embodiment, before the step of playing preset audio data by an audio playing device, and simultaneously performing audio acquisition by an audio acquisition device to obtain first recorded audio data, the method further includes:

In the embodiment of the present invention, it is necessary to perform the initialization process in a quiet environment. Specifically, firstly, audio acquisition is performed through the audio acquisition device to obtain second recorded audio data, wherein the acquisition time for performing audio acquisition through the audio acquisition device is not specifically limited in the present invention, and may be set by a person skilled in the art according to actual needs, for example, the set acquisition time may be set to 5 seconds.

After the second recorded audio data is obtained, the electronic device determines whether the current environment is in a quiet state based on the second recorded audio data, specifically, the electronic device determines whether a volume value of the second recorded audio data is continuously smaller than a preset volume value, wherein when the volume value of the second recorded audio data is continuously smaller than the preset volume value, it is determined that the current environment is in the quiet state. For example, if the acquisition duration of the second recorded audio data is 5 seconds, the electronic device determines whether the volume values of the second recorded audio data are all smaller than the preset volume value within 5 seconds, and if so, determines that the current environment is in a quiet state.

When it is determined that the current environment is in a quiet state, the preset audio data may be played through the audio playing device, and audio acquisition is performed through the audio acquisition device to obtain first recorded audio data, and initialization processing is started.

Further, after the step of determining whether the current environment is in a quiet state based on the second recorded audio data, the method further includes:

It is easy to understand that, when the current environment is not in a quiet state, if the audio playing device playing the preset audio data is directly acquired by the audio acquisition device, the acquired first recorded audio data will contain a large amount of noise, which affects the accuracy of calculating the frequency response function. At the moment, the preset prompting audio is played through the audio playing device and used for prompting surrounding personnel to keep quiet.

Further, based on the first embodiment, a third embodiment of the echo obtaining method according to the present invention is provided, where in this embodiment, before the step of playing the preset audio data by the audio playing device and simultaneously performing audio acquisition by the audio acquisition device to obtain the first recorded audio data, the method further includes:

It should be noted that, in practical application, when any one of the positions of the audio playing device and the audio collecting device changes, the obtained echo will be changed. Therefore, in order to ensure that the echo of the audio playing device is accurately obtained, whether the current position of the audio playing device and/or the current position of the audio collecting device changes or not may be determined in real time, and if the current position of the audio playing device and/or the current position of the audio collecting device changes, the initialization processing may be triggered.

In specific implementation, positioning modules can be respectively arranged in the audio playing device and the audio collecting device, the positioning modules arranged in the audio playing device and the audio collecting device are used for acquiring the position information of the audio playing device and the audio collecting device, and whether the positions of the audio playing device and the audio collecting device are changed or not is judged according to the position information of the audio playing device and the audio collecting device.

Further, the present invention also provides a computer readable storage medium, which stores an echo obtaining program, and when executed by the processor 1001, the echo obtaining program implements the following operations:

Further, when the echo obtaining program is executed by the processor 1001, the following operations are also implemented:

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for causing an electronic device to execute the method according to the corresponding embodiment of the present invention.

While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. An echo acquisition method, comprising the steps of:

calculating the frequency domain correlation of the preset audio data and the first recorded audio data after frequency domain conversion to obtain a transformation function of the preset audio data and the first recorded audio data, and storing the transformation function as a frequency response function of the audio playing equipment;

when audio data are played through the audio playing equipment, the audio data played by the audio playing equipment are obtained;

2. The method of claim 1, wherein the step of converting the preset audio data and the first recorded audio data from time domain to frequency domain respectively comprises:

3. The method of claim 1, wherein before the step of playing the preset audio data by the audio playing device and simultaneously performing audio acquisition by the audio acquisition device to obtain the first recorded audio data, the method further comprises:

acquiring audio through the audio acquisition equipment to obtain second recorded audio data;

determining whether a current environment is in a quiet state based on the second recorded audio data;

and when the current environment is in a quiet state, playing preset audio data through the audio playing equipment, and simultaneously carrying out audio acquisition through the audio acquisition equipment to obtain first recorded audio data.

4. The echo acquisition method of claim 3, wherein said step of determining whether the current environment is in a quiet state based on said second recorded audio data is followed by the step of:

and when the current environment is not in a quiet state, playing preset prompt audio through the audio playing equipment.

5. The echo acquisition method of claim 3, wherein the step of determining whether the current environment is in a quiet state based on the second recorded audio data comprises:

6. The method of claim 1, wherein before the step of playing the preset audio data by the audio playing device and simultaneously performing audio acquisition by the audio acquisition device to obtain the first recorded audio data, the method further comprises:

when the current position of the audio playing device and/or the audio collecting device changes, preset audio data are played through the audio playing device, and audio collection is carried out through the audio collecting device to obtain first recorded audio data.

7. The echo acquisition method according to any one of claims 1 to 6, wherein audio data is played by the audio playing device, and audio acquisition is performed by the audio acquisition device to obtain a third recorded audio;

after the step of calculating the echo audio data of the audio playing device based on the audio data and the frequency response function, the method further includes:

and performing echo cancellation on the third recorded audio based on the echo audio data.

8. An electronic device, comprising:

a memory storing an echo acquisition program;

9. A computer-readable storage medium having stored thereon an echo acquisition program, which when executed by a processor, performs the steps of: