CN115273871A

CN115273871A - Data processing method and device, electronic equipment and storage medium

Info

Publication number: CN115273871A
Application number: CN202110477895.8A
Authority: CN
Inventors: 熊飞飞; 冯津伟
Original assignee: Alibaba Singapore Holdings Pte Ltd
Current assignee: Alibaba Innovation Co
Priority date: 2021-04-29
Filing date: 2021-04-29
Publication date: 2022-11-01

Abstract

The embodiment of the application provides a data processing method, a data processing device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring audio data to be analyzed, and determining a filter coefficient corresponding to the audio data; and determining the acoustic environment corresponding to the audio data according to the coefficient attenuation information of the filter coefficient. The acoustic environment that this application can the analysis audio data correspond to denoising according to acoustic environment, can promote the optimization effect to audio data.

Description

Data processing method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a data processing method, a data processing apparatus, an electronic device, and a storage medium.

Background

In order to improve the quality of the audio data, optimization processing is usually performed on the audio data, for example, echo cancellation, dereverberation, and the like are performed on the audio data.

However, the current algorithm for processing the audio data usually processes the audio data according to preset parameters, and the optimization effect on the audio data is poor.

Disclosure of Invention

The embodiment of the application provides a data processing method so as to improve the optimization effect on audio data.

Correspondingly, the embodiment of the application also provides a data processing device, an electronic device and a storage medium, which are used for ensuring the realization and the application of the system.

In order to solve the above problem, an embodiment of the present application discloses a data processing method, where the method includes: acquiring audio data to be analyzed, and determining a filter coefficient corresponding to the audio data; and determining the acoustic environment corresponding to the audio data according to the coefficient attenuation information of the filter coefficient.

In order to solve the above problem, an embodiment of the present application discloses a data processing method, where the method includes: acquiring live broadcast audio data, and determining a filter coefficient corresponding to the live broadcast audio data; determining an acoustic environment corresponding to the live audio data according to the coefficient attenuation information of the filter coefficient; determining noise estimation information according to the acoustic environment, wherein the noise estimation information comprises echo noise estimation information and reverberation noise estimation information; and processing the broadcasting audio data according to the noise estimation information.

In order to solve the above problem, an embodiment of the present application discloses a data processing method, where the method includes: acquiring conference audio data and determining a filter coefficient corresponding to the conference audio data; determining an acoustic environment corresponding to the conference audio data according to the coefficient attenuation information of the filter coefficient; determining interference noise estimation information according to the acoustic environment and an output audio played by a loudspeaker; and processing the conference audio data according to the interference noise estimation information.

In order to solve the above problem, an embodiment of the present application discloses a data processing method, where the method includes: acquiring control audio data and determining controlled Internet of things equipment; when the controlled Internet of things equipment comprises at least two pieces of Internet of things equipment, determining a filter coefficient corresponding to control audio data, and determining an acoustic environment corresponding to the control audio data according to coefficient attenuation information of the filter coefficient; and screening target Internet of things equipment from at least two pieces of Internet of things equipment according to the acoustic environment, and controlling the target Internet of things equipment according to the control audio data.

In order to solve the above problem, an embodiment of the present application discloses a data processing apparatus, including: the filter coefficient determining module is used for acquiring audio data to be analyzed and determining a filter coefficient corresponding to the audio data; and the acoustic environment determining module is used for determining the acoustic environment corresponding to the audio data according to the coefficient attenuation information of the filter coefficient.

In order to solve the above problem, an embodiment of the present application discloses an electronic device, including: a processor; and a memory having executable code stored thereon, which when executed, causes the processor to perform the method as described in one or more of the above embodiments.

To solve the above problems, embodiments of the present application disclose one or more machine-readable media having executable code stored thereon, which when executed, causes a processor to perform a method as described in one or more of the above embodiments.

Compared with the prior art, the embodiment of the application has the following advantages:

in the embodiment of the application, the audio data to be analyzed can be obtained, the audio data is filtered through the self-adaptive filter, and the filter coefficient corresponding to the audio data is determined; the filter coefficients may then be subjected to coefficient attenuation analysis to determine coefficient attenuation information to determine the acoustic environment of the audio data from the coefficient attenuation information. Compared with the method for denoising by adopting the preset parameters, the method for denoising the audio data can analyze the acoustic environment corresponding to the audio data, determine the noise (such as echo noise and reverberation noise) in the audio data according to the acoustic environment, and the like, so as to counteract the noise in the audio data.

Drawings

FIG. 1 is a schematic flow chart diagram of a data processing method according to an embodiment of the present application;

FIG. 2A is a schematic flow chart diagram of a data processing method according to another embodiment of the present application;

FIG. 2B is a schematic diagram of a processing side according to an embodiment of the present application;

FIG. 2C is a schematic illustration of acoustic energy as a function of time in different spaces according to one embodiment of the present application;

FIG. 2D is a graphical illustration of impulse response energy over time for one embodiment of the present application;

FIG. 3 is a schematic flow chart diagram of a data processing method according to yet another embodiment of the present application;

FIG. 4 is a schematic flow chart diagram of a data processing method according to yet another embodiment of the present application;

FIG. 5 is a schematic flow chart diagram of a data processing method according to yet another embodiment of the present application;

FIG. 6 is a flow diagram illustrating a data processing method according to yet another embodiment of the present application;

FIG. 7 is a block diagram of a data processing apparatus according to an embodiment of the present application;

FIG. 8 is a schematic block diagram of a data processing apparatus according to another embodiment of the present application;

FIG. 9 is a schematic block diagram of a data processing apparatus according to yet another embodiment of the present application;

FIG. 10 is a schematic block diagram of a data processing apparatus according to yet another embodiment of the present application;

fig. 11 is a schematic structural diagram of an exemplary apparatus provided in one embodiment of the present application.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, the present application is described in further detail with reference to the accompanying drawings and the detailed description.

The embodiment of the application can be applied to the field of audio data analysis, and the embodiment can filter the audio data according to the adaptive filter and determine the acoustic environment corresponding to the audio data according to the coefficient attenuation information of the filter coefficient of the adaptive filter so as to process the audio data according to the acoustic environment. For example, the audio data is subjected to echo cancellation, dereverberation, and the like depending on the acoustic environment to optimize the audio data.

Specifically, the embodiment of the application may acquire audio data to be analyzed, and filter the audio data to be analyzed by using an adaptive filter to acquire filter coefficients (coefficients) corresponding to the audio data. The audio data to be analyzed may be segmented into multiple segments of sub-audio data to determine a filter coefficient corresponding to each segment of sub-audio data, and the sub-audio data may also be referred to as a sub-band (subband). After determining the filter coefficients corresponding to the audio data, coefficient attenuation analysis may be performed on the filter coefficients to determine coefficient attenuation information, where the coefficient attenuation information includes a Reverberation Time (RT) and an energy attenuation amount including an energy ratio of a direct sound to a reverberant sound in the audio data. In the process of determining the reverberation time length, a coefficient threshold value can be preset to determine the time length required by the attenuation of the filter coefficient to the preset coefficient threshold value, as the reverberation time length, the reverberation time length can also be called as the reverberation time length, and the reverberation time length refers to the time required by the reduction of the sound energy density to 1/10^6 of the original sound energy density (which is equivalent to the time required by the decay of the sound pressure level by 60 decibels). In the determining of the energy attenuation amount, the audio data may be divided into first audio data corresponding to the direct sound and second audio data corresponding to the reverberant sound, thereby determining a first filter coefficient corresponding to the direct sound and a second filter coefficient corresponding to the reverberant sound, to determine an energy ratio corresponding to the first filter coefficient and the second filter coefficient as the energy attenuation amount. Here, the Direct Sound (Direct Sound) refers to a Sound directly transmitted from a Sound source to a microphone in a straight line without any reflection, and the reverberant Sound refers to a Sound transmitted from a Sound source to a microphone after being reflected from the surrounding environment. After determining the coefficient attenuation information, the acoustic environment may be determined from the coefficient attenuation information. In order to process the audio data in dependence of the acoustic environment. For example, the noise estimation information may be determined according to the acoustic environment to eliminate the noise estimation information in the audio data to obtain the processed audio data, for example, echo noise estimation may be performed according to the acoustic environment to eliminate echo noise in the audio data; as another example, reverberation noise estimation may be performed according to an acoustic environment to eliminate reverberation noise in audio data; for another example, the reverberation level may also be determined according to the acoustic environment, so that the audio data is processed using a corresponding equalizer.

In the embodiment of the application, the audio data to be analyzed can be obtained, the audio data is filtered through the self-adaptive filter, and the filter coefficient corresponding to the audio data is determined; the filter coefficients may then be subjected to coefficient attenuation analysis to determine coefficient attenuation information to determine the acoustic environment of the audio data from the coefficient attenuation information. Compared with the method for denoising by adopting preset parameters, the method for denoising the audio data can analyze the acoustic environment corresponding to the audio data, determine noise (such as echo noise and reverberation noise) in the audio data according to the acoustic environment, and the like, so as to offset the noise in the audio data.

The embodiment of the application can analyze the audio data to determine the acoustic environment corresponding to the audio data, so that the embodiment of the application can be applied to various scenes for processing the audio data, for example, the embodiment of the application can be applied to processing the audio data in scenes such as conferences, live broadcasts, voice calls, voice transmission and the like. For example, the method can be applied to conference scenes such as voice conferences and video conferences, and can acquire conference audio data and analyze the acoustic environment corresponding to the conference audio data so as to determine interference noise according to the acoustic environment and the output audio played by the loudspeaker, so that the interference noise in the conference audio data is eliminated, the data quality of the audio data is improved, and the user experience of participating users is improved. For another example, the embodiment of the application can also be applied to a scene in which internet of things equipment is controlled according to voice, for example, in a home scene, air conditioners (internet of things equipment) may be arranged in a bedroom and a living room, and when the air conditioners are controlled according to received voice through a processing end (such as a mobile phone end), the processing end may not determine whether the controlled air conditioner is the bedroom air conditioner or the living room air conditioner.

The embodiment of the application provides a data processing method, which can be completed through a processing end, wherein the processing end can be a terminal device for collecting audio data, such as a mobile phone, a computer, a microphone, a Bluetooth sound box and other terminals. The processing end may also be a server end for transferring audio data, and the processing end may also be an output end for receiving audio data and outputting the audio data, such as a mobile phone and a computer for receiving audio. The method can analyze the audio data, determine the acoustic environment corresponding to the audio data, and perform corresponding processing according to the acoustic environment, so as to improve the optimization effect on the audio data, and specifically, as shown in fig. 2A, the method includes:

step 202, obtaining audio data to be analyzed, and determining a filter coefficient corresponding to the audio data. According to the embodiment of the application, the audio data can be divided into a plurality of sections of sub-audio data (or called sub-bands), and the sub-bands (subbands) are filtered to determine the filter coefficients corresponding to the sub-bands. The filter coefficient corresponding to the full band (fullband) can also be determined by means of sub-band synthesis. Fig. 2B shows a schematic structural diagram of an exemplary processing end. As shown in fig. 2B, an adaptive filter may be configured at the processing end (near end) to perform echo cancellation on the audio data to determine corresponding filter coefficients of the audio data. Specifically, as an optional embodiment, the determining a filter coefficient corresponding to the audio data includes: the audio data is filtered according to the adaptive filter, and a filter coefficient of the adaptive filter is determined. An Adaptive filter (Adaptive filter) is a filter that changes parameters and structure of a filter using an Adaptive algorithm according to a change in an environment. The coefficients of the adaptive filter are time-varying coefficients that are updated by an adaptive algorithm. In the embodiment of the application, the speaker may receive audio data to be output from a far end (such as other terminals) and output the audio data to be output, and the adaptive filter may acquire the audio data to be output and perform echo cancellation and dereverberation on the audio data acquired by the microphone, thereby determining the filtered audio data.

The adaptive filter may be divided into an unconverged stage (1 s-2s before the audio data) and a converged stage (1 s-2s after the audio data), so that the filter coefficients of the adaptive filter may be divided into filter coefficients corresponding to the converged stage and filter coefficients corresponding to the unconverged stage. The acoustic environment corresponding to the audio data may be expressed by a plurality of parameters, for example, the corresponding acoustic environment may be represented by parameters such as reverberation duration (or space size), energy ratio of direct sound to reverberant sound in space (or spatial reflectivity, absorptivity, and the like for sound), and the like, so as to perform corresponding processing on the audio data according to the acoustic environment. Wherein, under different acoustic environments, the noise in the audio data may have different differences. For example, fig. 2C shows the variation of the same sound in the reverberation room (reveberationroom) and the sound absorption room (acoustic room) with time, and as shown in fig. 2C, the reverberation generated in the reverberation room with a poor sound absorption effect (good reflection effect) has a larger energy and a longer duration (about 500 ms), and the energy decay is slow. Reverberation generated in a sound-absorbing chamber having a good sound-absorbing effect has a small energy, a short duration (about 50 ms), and a rapid energy decay. Therefore, in order to analyze an acoustic environment, an embodiment of the present application may perform coefficient attenuation analysis on a filter coefficient of audio data, and determine a reverberation duration and an energy attenuation amount as coefficient attenuation information to determine the acoustic environment, specifically, as an optional embodiment, the determining the acoustic environment corresponding to the audio data according to the coefficient attenuation information of the filter coefficient includes: determining coefficient attenuation information according to the filter coefficient, wherein the coefficient attenuation information comprises reverberation time length and energy attenuation quantity, and the energy attenuation quantity comprises the energy ratio of direct sound to reverberant sound in audio data; and determining the acoustic environment corresponding to the audio data according to the reverberation time length and the energy attenuation amount.

According to the embodiment of the application, the corresponding echo effect and reverberation effect can be determined according to the reverberation time length and the energy attenuation amount, so that the acoustic environment where the microphone is located is determined. So that the corresponding processing is performed depending on the acoustic environment. In an alternative example, the embodiment of the present application may quantify the parameters of the acoustic environment through a Room Impulse Response (Room Impulse Response). A spatial impulse response, which is a representation of a transfer function between points in the sound propagation space, may also be referred to as a spatial impulse response or the like. The spatial impulse response can be divided into direct sound pulses, typically within 5ms, early reflections typically within 50ms, and late reverberation (late reverberation) typically after 50ms, as shown in fig. 2D. In acoustic studies, the late reverberation conforms to an Exponential decay model (Exponential decay model), and the late reverberation can be expressed by the following equation 1:

N(t)＝N₀*e^-λt equation 1

Wherein N is₀Is the amplitude value at the beginning of the late reverberation, λ is the decay factor, and t is time.

Specifically, as an optional embodiment, the determining coefficient attenuation information according to the filter coefficient includes: determining reverberation time length according to the filter coefficient and a preset coefficient threshold; determining a first filter coefficient corresponding to the direct sound and a second filter coefficient corresponding to the reverberant sound according to the filter coefficients; and taking the energy ratio corresponding to the first filter coefficient and the second filter coefficient as the energy attenuation.

For the reverberation time length, the reverberation time length refers to a time length required for the energy of the sound to attenuate to a certain target energy value, and a coefficient threshold may be preset in the embodiment of the present application to determine a time required for the filter coefficient to attenuate to the coefficient threshold as the reverberation time length. The preset coefficient threshold may be a fixed value, or may be determined according to the energy of the direct sound and a preset attenuation ratio, for example, the target energy value may be determined according to the energy of the direct sound and a preset ratio (e.g., 1/10^ 6), and then the time required for the energy of the audio data to attenuate to 1/10^6 is determined as the reverberation duration. For the energy attenuation, the energy attenuation includes an energy ratio of direct sound to reverberant sound in the audio data, and in the embodiment of the present application, first audio data corresponding to the direct sound and second audio data corresponding to the reverberant sound may be separated from the audio data, and a first filter coefficient corresponding to the first audio data and a second filter coefficient corresponding to the second audio data may be obtained, so that a ratio of the first filter coefficient and the second filter coefficient is determined as the energy attenuation.

After the reverberation duration and the energy attenuation amount are determined, a corresponding acoustic environment may be determined and echoes, reverberation, etc. in the audio data may be removed depending on the acoustic environment. Specifically, as an optional embodiment, the method further includes: determining noise estimation information according to the acoustic environment, wherein the noise estimation information comprises echo noise estimation information and reverberation noise estimation information; and processing the audio data according to the noise estimation information. According to the embodiment of the application, the noise in the audio data can be determined according to the acoustic environment so as to offset the noise in the audio data, and the processed audio data can be obtained. Specifically, in an alternative example, the reverberation duration represents a reverberation degree of the whole room, and the energy attenuation represents a distance between a sound source and a microphone in the reverberation environment, and the embodiment of the present application may determine a corresponding reverberation degree and a distance between the sound source and the microphone according to the reverberation duration and the energy attenuation, so as to perform Echo estimation and reverberation estimation for Echo Cancellation (AEC) and dereverberation. The reverberation level is related to various factors, such as the size of the space, the structure of the space, the sound absorption rate of the space material, and the like. According to the method and the device, the direct sound can be separated from the audio data, echo estimation and reverberation estimation are carried out according to the direct sound, echo noise estimation information and reverberation noise estimation information are determined, echo noise and reverberation noise in the audio data are offset, and processed audio data are obtained. It should be noted that, in the embodiment of the present application, echo cancellation and dereverberation may be performed on a sound emitted by a user, or echo cancellation and dereverberation may be performed on a sound output by a speaker, which may be specifically set according to requirements. For example, in a scene without a speaker, such as a live scene or a recorded scene, echo cancellation and dereverberation may be performed only on the sound emitted by the user.

In addition, in another alternative example, the embodiment of the present application may determine the corresponding reverberation level according to the reverberation time length, so as to process the audio data by adopting an equalizer corresponding to the level. For example, for a room with long reverberation duration (e.g., reverberation duration exceeding 1 s), an equalizer that increases the energy of the high band may be selected on the choice of horn equalizer to make the sound more loud. For another example, for a room with a short reverberation period (e.g., less than 1 s), an equalizer that increases the energy of the low frequency band may be selected for the loudspeaker equalizer selection, so that the sound sounds more muffly.

On the basis of the foregoing embodiments, an embodiment of the present application further provides a data processing method, which can be applied to a processing end, and as shown in fig. 3, the method includes:

step 302, audio data to be analyzed is obtained.

Step 304, filtering the audio data according to the adaptive filter, and determining a filter coefficient of the adaptive filter.

And step 306, determining the reverberation time length according to the filter coefficient and a preset coefficient threshold value.

And 308, determining a first filter coefficient corresponding to the direct sound and a second filter coefficient corresponding to the reverberant sound according to the filter coefficients.

And step 310, taking the energy ratio corresponding to the first filter coefficient and the second filter coefficient as an energy attenuation. The amount of energy attenuation includes an energy ratio of the direct sound to the reverberant sound in the audio data.

Step 312, determining an acoustic environment corresponding to the audio data according to the reverberation time length and the energy attenuation amount.

Step 314, determining noise estimation information according to the acoustic environment, where the noise estimation information includes echo noise estimation information and reverberation noise estimation information.

And step 316, processing the audio data according to the noise estimation information.

In the embodiment of the application, the audio data to be analyzed can be obtained, the audio data is filtered through the self-adaptive filter, and the filter coefficient corresponding to the audio data is determined. Then, the reverberation time length can be determined according to the filter coefficient, and a first filter coefficient corresponding to the direct sound and a second filter coefficient corresponding to the reverberant sound can be determined according to the filter coefficient, so that an energy ratio corresponding to the first filter coefficient and the second filter coefficient is determined as an energy attenuation amount. Then, the acoustic environment corresponding to the audio data can be determined according to the reverberation time length and the energy attenuation amount. And performing echo noise estimation and reverberation noise estimation according to the acoustic environment to obtain noise estimation information so as to counteract noise in the audio data and obtain processed audio data. The embodiment of the application can be applied to a voice communication scene, so that after the processed audio data is determined, the processing end can transmit the processed audio data to the receiving end.

On the basis of the foregoing embodiment, an embodiment of the present application further provides a data processing method, which may be executed by a processing end, and the method may be applied in a live broadcast scenario, and may analyze live broadcast audio data, determine a corresponding acoustic environment, and further perform denoising on the live broadcast audio data according to the acoustic environment, specifically, as shown in fig. 4, the method includes:

step 402, acquiring live audio data and determining a filter coefficient corresponding to the live audio data.

And step 404, determining an acoustic environment corresponding to the live audio data according to the coefficient attenuation information of the filter coefficient.

Step 406, determining noise estimation information according to the acoustic environment, where the noise estimation information includes echo noise estimation information and reverberation noise estimation information.

Step 408, according to the noise estimation information, the broadcast audio data is processed.

The implementation manner of the embodiment of the present application is similar to that of the embodiment described above, and the specific implementation process of the embodiment may refer to the specific implementation process of the embodiment described above, which is not described herein again.

The method and the device can be applied to a live broadcast scene, and in the live broadcast scene, live broadcast audio data of live broadcast personnel can be collected through the microphone, and the live broadcast audio data are filtered through the adaptive filter to determine a filter coefficient corresponding to the live broadcast audio data; then, the filter coefficient can be subjected to coefficient attenuation analysis, and coefficient attenuation information is determined, wherein the coefficient attenuation information comprises attenuation amount according to reverberation time length and energy. The embodiment may determine the acoustic environment corresponding to the live audio data according to the reverberation duration and the energy attenuation amount. And carrying out echo noise estimation and reverberation noise estimation according to the acoustic environment to obtain noise estimation information so as to counteract noise in the live broadcast audio data and obtain processed live broadcast audio data. The processed audio data may then be transmitted to a user watching the live broadcast. Compare in adopting the parameter that has set up in advance to denoise, the acoustic environment that this application can the analysis live audio data correspond to denoise according to acoustic environment, can promote the optimization effect to live audio data, thereby can promote the user experience who watches live user.

For a live broadcast scene, the live broadcast scene may include subdivided scenes such as e-commerce live broadcast, education live broadcast, entertainment live broadcast, and the like, and the embodiment of the present application may be applied to various live broadcast scenes, for example, the embodiment of the present application may be applied to an education live broadcast scene to process education audio data, and specifically, in an optional embodiment, the data processing method may specifically include:

and acquiring education audio data, and determining a filter coefficient corresponding to the education audio data.

And determining the acoustic environment corresponding to the education audio data according to the coefficient attenuation information of the filter coefficient.

Noise estimation information is determined based on the acoustic environment, the noise estimation information including echo noise estimation information and reverberation noise estimation information.

And processing the education audio data according to the noise estimation information.

In this embodiment of the application, the education audio data may be education audio data in an education live scene, and the education audio data may also be other audio data related to education, such as audio data in a teaching video. The method and the device for detecting the education audio data can acquire the education audio data, filter the education audio data through the adaptive filter, determine the filter coefficient corresponding to the education audio data, and then perform coefficient attenuation analysis on the filter coefficient to determine coefficient attenuation information, wherein the coefficient attenuation information comprises the time length and the energy attenuation amount according to reverberation. The embodiment can determine the acoustic environment corresponding to the educational audio data according to the reverberation time length and the energy attenuation amount. And performing echo noise estimation and reverberation noise estimation according to the acoustic environment to obtain noise estimation information so as to counteract noise in the education audio data and obtain processed live broadcast audio data.

On the basis of the above embodiment, the embodiment of the present application further provides a data processing method, where the method may be executed by a processing end, and the method may be applied to a conference scene such as a voice conference, a video conference, and the like, and the method may acquire conference audio data, and analyze an acoustic environment corresponding to the conference audio data, so as to determine interference noise according to the acoustic environment and an output audio played by a speaker, and to denoise the conference audio data. Specifically, as shown in fig. 5, the method includes:

and 502, acquiring conference audio data and determining a filter coefficient corresponding to the conference audio data.

Step 504, determining an acoustic environment corresponding to the conference audio data according to the coefficient attenuation information of the filter coefficient.

Step 506, determining interference noise estimation information according to the acoustic environment and the output audio played by the loudspeaker.

And step 508, processing the conference audio data according to the interference noise estimation information.

The embodiment of the application can be applied to the scenes of conferences (such as voice conferences, video conferences and the like), and in the scenes of the conferences, conference audio data of participants can be acquired through a microphone, and the conference audio data is filtered through a self-adaptive filter to determine the filter coefficient corresponding to the conference audio data; then, the filter coefficients may be subjected to coefficient attenuation analysis to determine coefficient attenuation information to determine the acoustic environment, the coefficient attenuation information including attenuation amount according to reverberation duration and energy. According to the embodiment, the interference noise estimation information can be determined according to the acoustic environment and the output audio played by the loudspeaker, so that the interference of the output audio of the loudspeaker to the microphone can be eliminated from the conference audio data, and the data quality of the conference audio data can be improved. It should be noted that, in the embodiments of the present application, in addition to the interference noise of the microphone caused by the speaker, the interference noise such as echo and reverberation generated in the space by the user sound can be cancelled.

On the basis of the above embodiment, the embodiment of the application further provides a data processing method, which can be executed by the processing terminal, and the method can be applied to a scene of controlling the internet of things device according to voice. Specifically, as shown in fig. 6, the method includes:

step 602, obtaining control audio data, and determining controlled internet of things equipment.

Step 604, when the controlled internet of things device comprises at least two internet of things devices, determining a filter coefficient corresponding to control audio data, and determining an acoustic environment corresponding to the control audio data according to coefficient attenuation information of the filter coefficient.

And 606, screening target Internet of things equipment from at least two pieces of Internet of things equipment according to the acoustic environment, and controlling the target Internet of things equipment according to the control audio data.

The control audio data can be obtained, corresponding controlled Internet of things equipment is determined, when the controlled Internet of things equipment is larger than or equal to two, the control audio data can be filtered according to the adaptive filter, a filter coefficient is determined, the acoustic environment corresponding to the control audio data is determined according to coefficient attenuation information of the filter coefficient, and therefore target Internet of things equipment is screened out from the controlled Internet of things equipment and is controlled according to the control audio data. The embodiment of the application can be applied to a scene of controlling the Internet of things equipment according to voice, in the scene, the voice of a user can correspond to a plurality of Internet of things equipment, and the existing scheme generally needs to be further interacted with the user, so that the target Internet of things equipment is screened out from the plurality of Internet of things equipment to be controlled. And adopt the scheme of this application, can the acoustic environment that analysis control audio data corresponds to according to acoustic environment, thereby sieve out target thing networking device among a plurality of thing networking devices, in order to control. The method and the device can reduce the operation of further interaction with the user, and can improve the user experience of the user. For example, in a home scene, air conditioners (internet of things devices) may be arranged in a bedroom and a living room, and when the air conditioners are controlled through a processing end (such as a mobile phone end) according to received voice, the processing end may not determine whether the controlled air conditioner is a bedroom air conditioner or a living room air conditioner.

In the embodiment of the present application, the audio data of various scenes may include audio data separated from video data, and for a video conference, a live broadcast, and other scenes, the audio data may be separated from corresponding video data. In addition, if the audio data stream and the image data stream are transmitted separately in the above scene, the corresponding audio data can be directly acquired.

It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the embodiments are not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the embodiments. Further, those of skill in the art will recognize that the embodiments described in this specification are presently preferred embodiments and that no particular act is required to implement the embodiments of the disclosure.

On the basis of the foregoing embodiment, this embodiment further provides a data processing apparatus, and with reference to fig. 7, the data processing apparatus may specifically include the following modules:

a filter coefficient determining module 702, configured to obtain audio data to be analyzed, and determine a filter coefficient corresponding to the audio data;

an acoustic environment determining module 704, configured to determine an acoustic environment corresponding to the audio data according to the coefficient attenuation information of the filter coefficient.

In summary, in the embodiment of the present application, the audio data to be analyzed may be obtained, and the adaptive filter is used to filter the audio data to determine a filter coefficient corresponding to the audio data; the filter coefficients may then be subjected to coefficient attenuation analysis to determine coefficient attenuation information to determine the acoustic environment of the audio data from the coefficient attenuation information. Compared with the method for denoising by adopting preset parameters, the method for denoising the audio data can analyze the acoustic environment corresponding to the audio data, determine noise (such as echo noise and reverberation noise) in the audio data according to the acoustic environment, and the like, so as to offset the noise in the audio data.

On the basis of the foregoing embodiment, this embodiment further provides a data processing apparatus, which may specifically include the following modules:

and the audio data acquisition processing module is used for acquiring the audio data to be analyzed.

And the filtering coefficient acquisition processing module is used for filtering the audio data according to the self-adaptive filter and determining the filtering coefficient of the self-adaptive filter.

And the reverberation duration acquisition processing module is used for determining the reverberation duration according to the filter coefficient and a preset coefficient threshold.

And the filter coefficient segmentation processing module is used for determining a first filter coefficient corresponding to the direct sound and a second filter coefficient corresponding to the reverberant sound according to the filter coefficients.

And the energy attenuation quantity acquisition processing module is used for taking the energy ratio corresponding to the first filter coefficient and the second filter coefficient as the energy attenuation quantity. The energy attenuation amount includes an energy ratio of a direct sound to a reverberant sound in the audio data.

And the acoustic environment acquisition processing module is used for determining the acoustic environment corresponding to the audio data according to the reverberation time length and the energy attenuation amount.

And the noise information acquisition processing module is used for determining noise estimation information according to the acoustic environment, wherein the noise estimation information comprises echo noise estimation information and reverberation noise estimation information.

And the audio data denoising processing module is used for processing the audio data according to the noise estimation information.

In the embodiment of the application, the audio data to be analyzed can be obtained, the audio data is filtered through the self-adaptive filter, and the filter coefficient corresponding to the audio data is determined. Then, the reverberation duration may be determined according to the filter coefficient, and a first filter coefficient corresponding to the direct sound and a second filter coefficient corresponding to the reverberant sound may be determined according to the filter coefficient, so as to determine an energy ratio corresponding to the first filter coefficient and the second filter coefficient as an energy attenuation amount. Then, the acoustic environment corresponding to the audio data can be determined according to the reverberation time length and the energy attenuation amount. And performing echo noise estimation and reverberation noise estimation according to the acoustic environment to obtain noise estimation information so as to counteract noise in the audio data and obtain processed audio data. The embodiment of the application can be applied to a voice communication scene, so that after the processed audio data is determined, the processing end can transmit the processed audio data to the receiving end.

On the basis of the foregoing embodiment, the present embodiment further provides a data processing apparatus, and with reference to fig. 8, the data processing apparatus may specifically include the following modules:

the filter coefficient obtaining module 802 is configured to obtain live audio data and determine a filter coefficient corresponding to the live audio data.

An acoustic environment obtaining module 804, configured to determine an acoustic environment corresponding to the live audio data according to the coefficient attenuation information of the filter coefficient.

A noise information obtaining module 806, configured to determine noise estimation information according to the acoustic environment, where the noise estimation information includes echo noise estimation information and reverberation noise estimation information.

And the live audio denoising module 808 is configured to process the audio data according to the noise estimation information.

In summary, the embodiment of the application can be applied to a live broadcast scene, in the live broadcast scene, live broadcast audio data of live broadcast personnel can be acquired through the microphone, and the live broadcast audio data is filtered through the adaptive filter to determine the filter coefficient corresponding to the live broadcast audio data; then, the filter coefficient may be subjected to coefficient attenuation analysis, and coefficient attenuation information may be determined, where the coefficient attenuation information includes attenuation amount according to reverberation duration and energy. The embodiment may determine the acoustic environment corresponding to the live audio data according to the reverberation duration and the energy attenuation amount. And carrying out echo noise estimation and reverberation noise estimation according to the acoustic environment to obtain noise estimation information so as to counteract noise in the live broadcast audio data and obtain processed audio data. The processed audio data may then be transmitted to a user watching the live broadcast. Compare in adopting the parameter that sets up in advance to denoise, the acoustic environment that this application can the analysis live audio data correspond to denoise according to acoustic environment, can promote the optimization effect to live audio data, thereby can promote the user experience who watches live user.

The embodiment of the present application may also be applied to a live education scene, and specifically, as an optional embodiment, the filter coefficient obtaining module 802 is specifically configured to obtain education audio data and determine a filter coefficient corresponding to the education audio data. The acoustic environment obtaining module 804 is specifically configured to determine an acoustic environment corresponding to the education audio data according to the coefficient attenuation information of the filter coefficient. The noise information obtaining module 806 is specifically configured to determine noise estimation information according to the acoustic environment, where the noise estimation information includes echo noise estimation information and reverberation noise estimation information. The live audio denoising module 808 is specifically configured to process the education audio data according to the noise estimation information.

In this embodiment of the present application, the education audio data may be education audio data in an education live scene, and the education audio data may also be other audio data related to education, such as audio data in a teaching video. The method and the device for detecting the education audio data can acquire the education audio data, filter the education audio data through the adaptive filter, determine the filter coefficient corresponding to the education audio data, and then perform coefficient attenuation analysis on the filter coefficient to determine coefficient attenuation information, wherein the coefficient attenuation information comprises the time length and the energy attenuation amount according to reverberation. The embodiment may determine the acoustic environment corresponding to the educational audio data according to the reverberation duration and the energy attenuation amount. And performing echo noise estimation and reverberation noise estimation according to the acoustic environment to obtain noise estimation information so as to counteract noise in the education audio data and obtain processed live broadcast audio data.

On the basis of the foregoing embodiment, this embodiment further provides a data processing apparatus, and with reference to fig. 9, the data processing apparatus may specifically include the following modules:

the filter coefficient obtaining module 902 is configured to obtain conference audio data and determine a filter coefficient corresponding to the conference audio data.

An acoustic environment obtaining module 904, configured to determine an acoustic environment corresponding to the conference audio data according to the coefficient attenuation information of the filter coefficient.

An interference noise obtaining module 906, configured to determine interference noise estimation information according to the acoustic environment and an output audio played by a speaker.

And the conference audio denoising module 908 is configured to process the conference audio data according to the interference noise estimation information.

In summary, the embodiment of the application can be applied to a conference (such as a voice conference, a video conference, etc.), in the conference scene, conference audio data of participants can be collected through a microphone, and the conference audio data is filtered through an adaptive filter to determine a filter coefficient corresponding to the conference audio data; then, coefficient attenuation analysis can be carried out on the filter coefficients, coefficient attenuation information is determined, and therefore the acoustic environment is determined, and the coefficient attenuation information comprises attenuation amount according to reverberation duration and energy. According to the embodiment, the interference noise estimation information can be determined according to the acoustic environment and the output audio played by the loudspeaker, so that the interference of the output audio of the loudspeaker to the microphone can be eliminated from the conference audio data, and the data quality of the conference audio data can be improved.

On the basis of the foregoing embodiment, the present embodiment further provides a data processing apparatus, and with reference to fig. 10, the data processing apparatus may specifically include the following modules:

the control audio obtaining module 1002 is configured to obtain control audio data and determine a controlled internet of things device.

The acoustic environment analysis module 1004 is configured to, when the controlled internet of things device includes at least two internet of things devices, determine a filter coefficient corresponding to control audio data, and determine an acoustic environment corresponding to the control audio data according to coefficient attenuation information of the filter coefficient.

And a controlled device screening module 1006, configured to screen a target internet of things device from the at least two internet of things devices according to the acoustic environment, and control the target internet of things device according to the control audio data.

The control audio data can be obtained, corresponding controlled Internet of things equipment is determined, when the controlled Internet of things equipment is larger than or equal to two, the control audio data can be filtered according to the adaptive filter, a filter coefficient is determined, the acoustic environment corresponding to the control audio data is determined according to coefficient attenuation information of the filter coefficient, and therefore target Internet of things equipment is screened out from the controlled Internet of things equipment and is controlled according to the control audio data. The embodiment of the application can be applied to a scene of controlling the Internet of things equipment according to voice, in the scene, the voice of a user may correspond to a plurality of Internet of things equipment, and the existing scheme generally needs to be further interacted with the user, so that the target Internet of things equipment is screened out from the plurality of Internet of things equipment to be controlled. And adopt the scheme of this application, can the acoustic environment that analysis control audio data corresponds to according to acoustic environment, thereby sieve out target thing networking device among a plurality of thing networking devices, in order to control. By the method and the device, further interactive operation with the user can be reduced, and user experience of the user can be improved. For example, in a home scene, air conditioners (internet of things devices) may be arranged in a bedroom and a living room, and when the air conditioners are controlled through a processing end (such as a mobile phone end) according to received voice, the processing end may not determine whether the controlled air conditioner is a bedroom air conditioner or a living room air conditioner.

The present application further provides a non-transitory, readable storage medium, where one or more modules (programs) are stored, and when the one or more modules are applied to a device, the device may execute instructions (instructions) of method steps in this application.

Embodiments of the present application provide one or more machine-readable media having instructions stored thereon, which when executed by one or more processors, cause an electronic device to perform the methods as described in one or more of the above embodiments. In the embodiment of the application, the electronic device includes a server, a terminal device and other devices.

Embodiments of the present disclosure may be implemented as an apparatus, which may include servers (clusters), terminals, etc. electronic devices, using any suitable hardware, firmware, software, or any combination thereof, for a desired configuration. Fig. 11 schematically illustrates an example apparatus 1100 that may be used to implement various embodiments described herein.

For one embodiment, fig. 11 illustrates an example apparatus 1100 having one or more processors 1102, a control module (chipset) 1104 coupled to at least one of the processor(s) 1102, a memory 1106 coupled to the control module 1104, a non-volatile memory (NVM)/storage 1108 coupled to the control module 1104, one or more input/output devices 1110 coupled to the control module 1104, and a network interface 1112 coupled to the control module 1104.

The processor 1102 may include one or more single-core or multi-core processors, and the processor 1102 may include any combination of general-purpose or special-purpose processors (e.g., graphics processors, application processors, baseband processors, etc.). In some embodiments, the apparatus 1100 can be used as a server, a terminal, or the like in the embodiments of the present application.

In some embodiments, the apparatus 1100 may include one or more computer-readable media (e.g., the memory 1106 or the NVM/storage 1108) having instructions 1114 and one or more processors 1102 in combination with the one or more computer-readable media configured to execute the instructions 1114 to implement modules to perform the actions described in this disclosure.

For one embodiment, control module 1104 may include any suitable interface controllers to provide any suitable interface to at least one of the processor(s) 1102 and/or to any suitable device or component in communication with control module 1104.

The control module 1104 may include a memory controller module to provide an interface to the memory 1106. The memory controller module may be a hardware module, a software module, and/or a firmware module.

The memory 1106 may be used to load and store data and/or instructions 1114 for the device 1100, for example. For one embodiment, memory 1106 may include any suitable volatile memory, such as suitable DRAM. In some embodiments, the memory 1106 may comprise a double data rate type four synchronous dynamic random access memory (DDR 4 SDRAM).

For one embodiment, control module 1104 may include one or more input/output controllers to provide an interface to NVM/storage 1108 and input/output device(s) 1110.

For example, NVM/storage 1108 may be used to store data and/or instructions 1114. NVM/storage 1108 may include any suitable non-volatile memory (e.g., flash memory) and/or may include any suitable non-volatile storage device(s) (e.g., one or more hard disk drive(s) (HDD (s)), one or more Compact Disc (CD) drive(s), and/or one or more Digital Versatile Disc (DVD) drive (s)).

NVM/storage 1108 may include storage resources that are part of the device on which apparatus 1100 is installed, or it may be accessible by the device and need not be part of the device. For example, NVM/storage 1108 may be accessed over a network via input/output device(s) 1110.

Input/output device(s) 1110 may provide an interface for apparatus 1100 to communicate with any other suitable device, input/output devices 1110 may include communication components, audio components, sensor components, and so forth. Network interface 1112 may provide an interface for apparatus 1100 to communicate over one or more networks, and apparatus 1100 may communicate wirelessly with one or more components of a wireless network according to any of one or more wireless network standards and/or protocols, such as access to a communication standard-based wireless network, such as WiFi, 2G, 3G, 4G, 5G, etc., or a combination thereof.

For one embodiment, at least one of the processor(s) 1102 may be packaged together with logic for one or more controller(s) (e.g., memory controller modules) of control module 1104. For one embodiment, at least one of the processor(s) 1102 may be packaged together with logic for one or more controllers of control module 1104 to form a System In Package (SiP). For one embodiment, at least one of the processor(s) 1102 may be integrated on the same die with logic for one or more controller(s) of control module 1104. For one embodiment, at least one of the processor(s) 1102 may be integrated on the same die with logic for one or more controller(s) of control module 1104 to form a system on chip (SoC).

In various embodiments, the apparatus 1100 may be, but is not limited to being: a server, a desktop computing device, or a mobile computing device (e.g., a laptop computing device, a handheld computing device, a tablet, a netbook, etc.), among other terminal devices. In various embodiments, the apparatus 1100 may have more or fewer components and/or different architectures. For example, in some embodiments, device 1100 includes one or more cameras, keyboards, liquid Crystal Display (LCD) screens (including touch screen displays), non-volatile memory ports, multiple antennas, graphics chips, application Specific Integrated Circuits (ASICs), and speakers.

The detection device can adopt a main control chip as a processor or a control module, sensor data, position information and the like are stored in a memory or an NVM/storage device, a sensor group can be used as an input/output device, and a communication interface can comprise a network interface.

An embodiment of the present application further provides an electronic device, including: a processor; and a memory having executable code stored thereon that, when executed, causes the processor to perform a method as described in one or more of the embodiments of the application.

Embodiments of the present application also provide one or more machine-readable media having executable code stored thereon that, when executed, cause a processor to perform a method as described in one or more of the embodiments of the present application.

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present application have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the true scope of the embodiments of the application.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrases "comprising one of \ 8230; \8230;" does not exclude the presence of additional like elements in a process, method, article, or terminal device that comprises the element.

The foregoing detailed description has provided a data processing method, a data processing apparatus, an electronic device, and a storage medium, and the principles and embodiments of the present application are described herein using specific examples, which are only used to help understand the method and the core ideas of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A method of data processing, the method comprising:

acquiring audio data to be analyzed, and determining a filter coefficient corresponding to the audio data;

and determining the acoustic environment corresponding to the audio data according to the coefficient attenuation information of the filter coefficient.

2. The method of claim 1, wherein the determining the acoustic environment corresponding to the audio data according to the coefficient attenuation information of the filter coefficient comprises:

determining coefficient attenuation information according to the filter coefficient, wherein the coefficient attenuation information comprises reverberation duration and energy attenuation, and the energy attenuation comprises the energy ratio of direct sound to reverberant sound in audio data;

and determining the acoustic environment corresponding to the audio data according to the reverberation time length and the energy attenuation amount.

3. The method of claim 2, wherein determining coefficient attenuation information based on the filter coefficients comprises:

determining reverberation time length according to the filter coefficient and a preset coefficient threshold;

determining a first filter coefficient corresponding to the direct sound and a second filter coefficient corresponding to the reverberant sound according to the filter coefficients;

and taking the energy ratio corresponding to the first filter coefficient and the second filter coefficient as the energy attenuation.

4. The method of claim 1, further comprising:

determining noise estimation information according to the acoustic environment, wherein the noise estimation information comprises echo noise estimation information and reverberation noise estimation information;

and processing the audio data according to the noise estimation information.

5. A method of data processing, the method comprising:

acquiring live broadcast audio data, and determining a filter coefficient corresponding to the live broadcast audio data;

determining an acoustic environment corresponding to the live audio data according to the coefficient attenuation information of the filter coefficient;

and processing the broadcast audio data according to the noise estimation information.

6. A method of data processing, the method comprising:

acquiring conference audio data and determining a filter coefficient corresponding to the conference audio data;

determining an acoustic environment corresponding to the conference audio data according to the coefficient attenuation information of the filter coefficient;

determining interference noise estimation information according to the acoustic environment and an output audio played by a loudspeaker;

and processing the conference audio data according to the interference noise estimation information.

7. A method of data processing, the method comprising:

acquiring control audio data and determining controlled Internet of things equipment;

when the controlled Internet of things equipment comprises at least two pieces of Internet of things equipment, determining a filter coefficient corresponding to control audio data, and determining an acoustic environment corresponding to the control audio data according to coefficient attenuation information of the filter coefficient;

and screening target Internet of things equipment from at least two pieces of Internet of things equipment according to the acoustic environment, and controlling the target Internet of things equipment according to the control audio data.

8. A data processing apparatus, characterized in that the apparatus comprises:

the filter coefficient determining module is used for acquiring audio data to be analyzed and determining a filter coefficient corresponding to the audio data;

and the acoustic environment determining module is used for determining the acoustic environment corresponding to the audio data according to the coefficient attenuation information of the filter coefficient.

9. An electronic device, comprising: a processor; and

memory having stored thereon executable code which, when executed, causes the processor to perform the method of one or more of claims 1-7.

10. One or more machine-readable media having executable code stored thereon that, when executed, causes a processor to perform the method of one or more of claims 1-7.