CN111669697B

CN111669697B - Coherent sound and environmental sound extraction method and system of multichannel signal

Info

Publication number: CN111669697B
Application number: CN202010447863.9A
Authority: CN
Inventors: 吴彦琴; 桑晋秋; 郑成诗; 张芳杰; 李晓东
Original assignee: Institute of Acoustics CAS
Current assignee: Institute of Acoustics CAS
Priority date: 2020-05-25
Filing date: 2020-05-25
Publication date: 2021-05-18
Anticipated expiration: 2040-05-25
Also published as: CN111669697A

Abstract

The invention discloses a method and a system for extracting coherent sound and environmental sound of a multi-channel signal, wherein the method comprises the following steps: calculating weight expressions of the N channel signal coherent sounds, and estimating the coherent sounds according to the weight expressions, thereby calculating the coherent sounds of each channel; wherein the ambient sound energy of each channel is the same; calculating the environment sound of each channel according to the coherent sound of each channel; and carrying out inverse Fourier transform on the N channels of coherent sound and the N channels of environment sound to obtain coherent sound and environment sound represented by a time domain. The method of the invention explores a weight expression for estimating coherent sound aiming at signals with any number of channels under the condition that the environmental sound energy of each channel is the same, and solves each unknown parameter in the weight expression by using the signal energy of each channel and the correlation value among the channels, thereby realizing the extraction of the coherent sound and the environmental sound of the multi-channel signals and having high extraction precision.

Description

Coherent sound and environmental sound extraction method and system of multichannel signal

Technical Field

The invention relates to the field of spatial sound reproduction, in particular to a method and a system for extracting coherent sound and environmental sound of a multi-channel signal.

Background

The spatial sound reproduction technology is widely applied to entertainment media, for example, when a movie theater, a home theater and a portable electronic device play a movie, spatial sound with a certain sound image width and good immersion feeling is reproduced through an earphone or a loudspeaker, and better audio-visual experience can be brought to consumers. In recent years, the spatial sound reproduction gradually shows important application prospects in the fields of advanced scientific research and practical engineering, such as the fields of virtual reality, aviation, aerospace and the like.

The spatial sound mainly comprises two components with different properties, one is a sound component with directivity, and is called coherent sound; the other is an acoustic component having diffusivity and no direction discrimination, which is called ambient sound. In order to achieve a better sound reproduction effect, coherent sound and Ambient sound Extraction (PAE) and different processing are required for spatial sound. For example, in an audio codec system, the PAE is used as a front end of audio encoding or decoding, and thus effective and immersive spatial sound playback can be achieved.

The development of the PAE method aiming at the two-channel signals is mature, and the PAE method is widely applied to a principal component analysis method and a least square method. For multi-channel signals, PAE may be performed using a pairwise correlation method. But the accuracy of extracting components by the pairwise correlation method is not high. Therefore, it is significant to extend the PAE method suitable for stereo to multi-channel signals. Under the premise that coherent sound occupies main energy, the principal component analysis method carries out PAE on the stereo signal by calculating the eigenvalue of the covariance matrix of the input signal. The method can also extract components of the multi-channel signals, but when the number of channels is large, the calculation complexity is increased, and the principal component analysis method has a good extraction effect only when coherent sound occupies main energy. The least square method realizes PAE of the stereo signal by calculating and estimating the weight of coherent sound under the premise that the environmental sound energy of each channel is equal. However, when the least squares method is directly applied to a multichannel signal, the estimation weight is not easily solved. The ambient sound component mainly plays a role of atmosphere warming in the space sound, and in order to achieve better surrounding feeling, the energy distribution difference of the ambient sound in each channel is small. Therefore, on the premise that the environmental sound energy of each channel is equal, the coherent sound and the environmental sound extraction of the multi-channel signal have important significance.

Disclosure of Invention

The invention aims to overcome the technical defects, and under the premise that the energy of the environmental sound in each channel is equal, the weight of coherent sound is estimated by using a least square method when the number of the channels is calculated to be less, and a weight expression when the coherent sound estimation is carried out on a multi-channel signal with any number of channels is obtained according to the regularity of the change of the weight along with the number of the channels.

In order to achieve the above object, the present invention provides a coherent sound and ambient sound extraction method for a multi-channel signal, the method comprising:

calculating weight expressions of the N channel signal coherent sounds, and estimating the coherent sounds according to the weight expressions, thereby calculating the coherent sounds of each channel; wherein the ambient sound energy of each channel is the same;

calculating the environment sound of each channel according to the coherent sound of each channel;

and carrying out inverse Fourier transform on the N channels of coherent sound and the N channels of environment sound to obtain coherent sound and environment sound represented by a time domain.

As an improvement of the above method, the method calculates a weight expression of the coherent sound of the N channel signals, estimates the coherent sound according to the weight expression, and thereby calculates the coherent sound of each channel; wherein the ambient sound energy of each channel is the same; the method specifically comprises the following steps:

fourier transform is carried out on time domain multi-channel signals, and the nth channel inputs a signal X_nExpressed as:

X_n＝β_nS+A_n

wherein S represents the spectrum of coherent sound, β_nRepresenting the amplitude difference factor of the coherent sound of the nth channel and the coherent sound of the first channel, N is more than or equal to 1 and less than or equal to N, beta₁＝1，A_nA frequency spectrum representing the ambient sound of the nth channel;

calculating the nth channel input signal X_nShort time energy of

Calculating a correlation value phi of the first channel and the second channel₁₂：

According to the short-time energy of the first channel

Short time energy of the second channel

And the correlation value phi between the two channels₁₂Calculating intermediate parameters C and D:

from which the short-time energy P of the coherent sound is calculated_SShort-time energy of ambient sound P_AAnd beta₂

Calculating beta_n：

The weight value of the nth channel is as follows:

then the estimate of the coherent sound

Comprises the following steps:

the nth channel coherent sound S_n：

As an improvement of the above method, the ambient sound of each channel is calculated from the coherent sound of each channel; the method specifically comprises the following steps:

ambient sound of nth channel A_nComprises the following steps:

A_n＝X_n-S_n。

embodiment 2 of the present invention provides a coherent sound and ambient sound extraction system for a multichannel signal, including:

the coherent sound extraction module is used for calculating weight expressions of the coherent sounds of the signals of the N channels, estimating the coherent sounds according to the weight expressions, and calculating the coherent sounds of each channel; wherein the ambient sound energy of each channel is the same;

the environment sound extraction module is used for calculating the environment sound of each channel according to the coherent sound of each channel;

and the frequency domain to time domain module is used for carrying out inverse Fourier transform on the N channels of coherent sound and the N channels of environment sound to obtain coherent sound and environment sound represented by time domain.

As an improvement of the above system, the implementation process of the coherent sound extraction module includes:

X_n＝β_nS+A_n

calculating the nth channel input signal X_nShort time energy of

According to the short-time energy of the first channel

Short time energy of the second channel

Calculating beta_n：

The weight value of the nth channel is as follows:

then the estimate of the coherent sound

Comprises the following steps:

the nth channel coherent sound S_n：

As an improvement of the above system, the implementation process of the ambient sound calculation module includes:

ambient sound of nth channel A_nComprises the following steps:

A_n＝X_n-S_n。

the invention has the advantages that:

the method of the invention explores a weight expression for estimating coherent sound aiming at signals with any number of channels under the condition that the environmental sound energy of each channel is the same, and solves each unknown parameter in the weight expression by using the signal energy of each channel and the correlation value among the channels, thereby realizing the extraction of the coherent sound and the environmental sound of the multi-channel signals and having high extraction precision.

Drawings

FIG. 1 is a flow chart of a coherent acoustic and ambient acoustic extraction method of a multi-channel signal of the present invention;

FIG. 2(a) is an error plot of coherent acoustic component extraction for a mixed five channel signal 1 using the method of the present invention and pairwise correlation;

FIG. 2(b) is an error plot of ambient sound component extraction for a mixed five channel signal 1 using the method of the present invention and pairwise correlation;

FIG. 3(a) is an error plot of coherent acoustic component extraction for a mixed five-channel signal 2 using the method of the present invention and pairwise correlation;

fig. 3(b) is an error map of ambient sound component extraction for a mixed five-channel signal 2 using the method of the present invention and the pairwise correlation method.

Detailed Description

The technical solution of the present invention is described in detail below with reference to the accompanying drawings and specific embodiments.

Example 1

As shown in fig. 1, embodiment 1 of the present invention proposes a coherent acoustic and ambient acoustic extraction method for equalizing ambient acoustic energy of each channel of a multi-channel signal, including the following steps:

step 1) framing a multichannel signal, performing Fourier transform to obtain a frequency spectrum, and expressing short-time energy of each channel and correlation values between any two channels according to a multichannel signal model, wherein the method specifically comprises the following steps:

in the multi-channel signal model, the input signal is represented as a superposition of coherent sound and ambient sound. Because the characteristics of coherent sound and environmental sound are different, the coherent sound of each channel is assumed to be completely correlated, namely, a linear relation exists; it is assumed that coherent sound is uncorrelated with ambient sound of each channel and ambient sound between channels.

Step 1-1), performing Fourier transform on the time domain multi-channel signal to obtain a frequency spectrum:

X_n＝β_nS+A_n,n＝1,2,…,N

where N is the number of channels, S represents the frequency spectrum of the coherent sound, β_nAn amplitude difference factor representing the presence of coherent sound of the nth channel and coherent sound of the first channel, and beta₁＝1，A_nA frequency spectrum representing the ambient sound of the nth channel;

step 1-2) the signal energy of each channel can be expressed as:

wherein E { } represents a short-time average.

The correlation values between the channels of steps 1-3) can be expressed as:

wherein the content of the first and second substances,

is n th₁A channel and an n-th channel₂Correlation value between channels, n₁＝1,2,…,N,n₂＝1,2,…,N,n₁≠n₂；

Step 2) calculating the weight value of the coherent sound estimated by using a least square method when the number of channels is small, and exploring the regularity of the coherent sound, wherein the method specifically comprises the following steps:

step 2-1) calculating an input signal X for the two-channel signal₁And X₂Estimating the weight value of the coherent sound S:

step 2-1-1) estimating coherent sound S:

wherein, w₁And w₂Representing the estimated weights to be found.

Step 2-1-2) estimation error σ of S_SExpressed as:

step 2-1-3) is solved by using a least square algorithm, namely when the estimation error is completely uncorrelated with the input stereo signal, the obtained weight is an optimal estimation:

E{σ_SX₁}＝0

E{σ_SX₂}＝0.

at this time, the weight of the optimal estimation is expressed as:

wherein, P_SShort-time energy, P, representing coherent sound_ARepresenting the short-term energy of the ambient sound.

Step 2-2) calculating an input signal X for the three-channel signal₁、X₂And X₃Estimating the weight value of the coherent sound S:

step 2-2-1) estimating coherent sound S:

wherein, w₁、w₂And w₃Representing the estimated weights to be found.

Step 2-2-2) and the processing method similar to step 2-1) can obtain the weight value of the three-channel signal estimated coherent sound:

and 2-3) calculating more estimated weights of coherent sounds when the number of channels is more, and finding that the weight values can be uniformly expressed. For a multi-channel signal with a number of channels N, the estimated coherent sound is represented as:

wherein, the weight value can be expressed as:

step 3) calculating and estimating each unknown parameter in the weight of the coherent sound, and completing the extraction of the coherent sound and the environmental sound of the multichannel signal, wherein the method specifically comprises the following steps:

step 3-1) known of beta ₁1, the unknown parameter P can be obtained from the signal energy of the first two channels and the correlation value between the channels in step 1)_S、P_AAnd beta₂：

Wherein the content of the first and second substances,

step 3-2) according to the energy values of other channels except the first channel and the second channel, the beta when N is more than or equal to 3 and less than or equal to N can be obtained_n：

Step 3-3) aiming at the multichannel signal with the number of channels N, the parameter P calculated in the step 3-1) and the step 3-2) is_S、P_AAnd beta_nSubstituting (N-1, 2, …, N) into the weight value w of the estimated coherent sound calculated in step 2-3)_n(1,2, …, N) the operation of extracting coherent sound from the multi-channel signal can be completed.

Step 4) PAE is carried out on the multichannel signals with any number of channels, and the method specifically comprises the following steps:

step 4-1) calculating coherent sound of each channel, which specifically comprises the following steps:

since step 2) calculates the weight expression for estimating coherent sound when PAE is performed on the multichannel signal with any number of channels, and step 3) calculates each unknown parameter in the weight expression, when the number of channels of the multichannel signal is determined, the coherent sound S can be estimated directly according to the weight expression. The coherent sound is directly the coherent sound of the first channel, the coherent sound of other channels is obtained by S linear processing, namely beta_nS(n＝2,…,N)。

Step 4-2) calculating the environment sound of each channel, which specifically comprises the following steps:

the remaining component of each channel is considered as ambient sound, i.e. A_n＝X_n-β_nS。

And 4-3) carrying out inverse Fourier transform on the obtained N-channel coherent sound and N-channel environment sound to obtain coherent sound and environment sound represented by a time domain.

The following describes the performance of the method proposed by the present invention with reference to the simulation example:

and synthesizing the completely correlated coherent sound and the completely uncorrelated environmental sound into a mixed five-channel signal according to a certain proportion, and performing component extraction by using the multi-channel PAE method and the pairwise correlation method provided by the invention. Two groups of mixed multi-channel signals are synthesized, namely a mixed five-channel signal 1 with pure voice as coherent sound and sea wave sound as environment sound, and a mixed five-channel signal 2 with pure music sound as coherent sound and forest background sound as environment sound. In mixing, in order to control the distribution of coherent sound energy between channels, a coherent sound amplitude difference factor beta between channels is set_nWith its reference value beta₀The components are in a certain proportional relation; setting the ambient sound energy of each channel equal to P_A0(ii) a In order to control the proportion of coherent sound components in the mixed signal, different coherent sound energy proportion gamma is set. Reference value beta₀And P_A0Determined by gamma.

The experiment sets the amplitude of coherent sound of each channel to have beta₁＝β₂＝β₀，β₃＝2β₀，β₄＝β₅＝0.5β₀The proportional relationship of (c). The coherent acoustic energy ratio γ is 0.05 to 0.95 (interval is 0.1). Extraction error epsilon of coherent sound_PRespectively expressed as:

extraction error epsilon of environmental sound_aRespectively expressed as:

fig. 2(a) and 2(b) represent the extraction errors of coherent sound and ambient sound when PAE is performed on the mixed five-channel signal 1 by the algorithm and the pairwise correlation method proposed by the present invention, respectively; fig. 3(a) and 3(b) represent extraction errors of coherent sound and ambient sound when the algorithm and the pairwise correlation method proposed by the present invention perform PAE on the mixed five-channel signal 2, respectively. It can be seen that, in the whole interval of the coherent acoustic energy ratio gamma of 0.05 to 0.95 (interval of 0.1), the extraction error of the method provided by the invention is smaller than that of the pairwise correlation method.

Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention and are not limited. Although the present invention has been described in detail with reference to the embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A method of coherent acoustic and ambient acoustic extraction of a multichannel signal, the method comprising:

carrying out inverse Fourier transform on the N channels of coherent sound and the N channels of environment sound to obtain coherent sound and environment sound represented by a time domain;

the weight expression of the coherent sound of the signals of the N channels is calculated, and the coherent sound is estimated according to the weight expression, so that the coherent sound of each channel is calculated; wherein the ambient sound energy of each channel is the same; the method specifically comprises the following steps:

X_n＝β_nS+A_n

calculating the nth channel input signal X_nShort time energy of

According to the short-time energy of the first channel

Short time energy of the second channel

Calculating beta_n：

The weight value of the nth channel is as follows:

then the estimate of the coherent sound

Comprises the following steps:

the nth channel coherent sound S_n：

2. The method according to claim 1, wherein the method calculates the ambient sound of each channel from the coherent sound of each channel; the method specifically comprises the following steps:

ambient sound of nth channel A_nComprises the following steps:

A_n＝X_n-S_n。

3. a coherent acoustic and ambient acoustic extraction system for a multichannel signal, the system comprising:

the frequency domain to time domain conversion module is used for carrying out inverse Fourier transform on the N channels of coherent sound and the N channels of environment sound to obtain coherent sound and environment sound represented by a time domain;

the specific implementation process of the coherent sound extraction module comprises the following steps:

X_n＝β_nS+A_n

calculating the nth channel input signal X_nShort time energy of

According to the short-time energy of the first channel

Short time energy of the second channel

Calculating beta_n：

The weight value of the nth channel is as follows:

then the estimate of the coherent sound

Comprises the following steps:

the nth channel coherent sound S_n：

4. The system for coherent acoustic and ambient sound extraction of multichannel signal according to claim 3, wherein the ambient sound extraction module is implemented by:

ambient sound of nth channel A_nComprises the following steps:

A_n＝X_n-S_n。