CN105828271B

CN105828271B - A method of two channel sound signals are converted into three sound channel signals

Info

Publication number: CN105828271B
Application number: CN201510012765.1A
Authority: CN
Inventors: 潘兴德; 张小新
Original assignee: NANJING QINGJIN INFORMATION TECHNOLOGY Co Ltd
Current assignee: Beijing panoramic sound information technology Co.,Ltd.
Priority date: 2015-01-09
Filing date: 2015-01-09
Publication date: 2019-07-05
Anticipated expiration: 2035-01-09
Also published as: CN105828271A

Abstract

The present invention discloses a kind of method that two channel sound signals are converted into three sound channel signals, include the following steps: (1) by the time domain sampled data of two sound channels of input by regular hour resolution ratio framing, every frame signal is arranged by sample time order, obtains the framing voice data of two sound channels；(2) covariance matrix of two track voice datas is calculated, and further calculates the characteristic value eig1 and the corresponding feature vector Vec1 and Vec2 of eig2 and two characteristic value of covariance matrix；(3) by characteristic vector Vec1 and Vec2 tectonic transition matrix W；(4) two track voice datas are transformed to by number of principal components evidence and time compositional data by transformation matrix W；(5) mapping matrix V is constructed by characteristic vector Vec1 and Vec2；(6) number of principal components evidence and time compositional data are mapped as by triple-track voice data by mapping matrix V.The method of the present invention can effectively solve the problems, such as APN allochthonous when too small best listening area and music-listener's movement.

Description

A method of two channel sound signals are converted into three sound channel signals

Technical field

The present invention relates to a kind of methods that two channel sound signals are converted into three sound channel signals, belong to acoustic processing Technical field.

Background technique

In real world, sound is the Free propagation in three-dimensional sound field.In order to restore certain in other times or place One sound scenery needs that original sound is sampled and encoded.Because of the limitation of odjective cause, voice signal is being sampled or is being compiled When code, less sound channel can only be selected, this makes voice signal in playback, one approximate sound field of recovery that can only be limited.

In order to preferably sound field recovery effect can be obtained with limited channel number, it has been developed that two-channel stereo The recording of the more multichannels such as sound recording technology and 5.1,7.1 and playback technology.

But recording track quantity is still critical constraints, cannot meet the requirements the application demand of higher occasion.One A typical example is, the professional sound systems such as cinema generally use the loudspeaker far more than recording track number and raise one's voice to its utmost back and forth sound Signal.At this point, typical way is that loudspeaker is grouped (such as left circulating loudspeaker group, right surround loudspeaker group), each group It is made of multiple loudspeakers, and the same channel sound signal of feeding (such as left channel signals, right channel sound signal)；Also have By same channel sound signal by way of being simply delayed and decaying and feed different loudspeakers.Fig. 1 is a kind of typical electricity (actual theater also needs low-frequency effect loudspeaker to theatre loudspeaker layout drawing, and number of loudspeakers can change with movie theatre size Become), when playing 5.1 surround sound, a left side surround and has circular signal to be fed by 6 circulating loudspeakers respectively；It is surround in broadcasting 7.1 When acoustical signal, a left side surrounds and right surrounds signal and is individually fed to 4 lateral circulating loudspeakers, and left back circular signal feeds left back 2 A circulating loudspeaker, it is right after around signal feed the right side after 2 circulating loudspeakers.

At least there are the following problems for existing audio system: (1) best listening area is too small；(2) acoustic image when music-listener moves Drift.Both of these problems are all to use sound since existing audio system is in order to restore sound field with less sound channel approximation Caused by mirage principle.Wherein, first problem is the emperor position problem frequently referred to, i.e., best listening area works as music-listener Not in best listening area, original mirage position is destroyed and leads to the decline of auditory perception.Second Problem be because Original recording assumes that music-listener is in ideal static LisPos, when music-listener is in movement, causes to restore acoustic image quilt It further destroys and leads to being decreased obviously for auditory perception.

Summary of the invention

Goal of the invention: in view of the above-mentioned deficiencies in the prior art, it is an object of the present invention to provide a kind of by two channel sounds letters Number method for being converted into three sound channel signals can obtain more preferably sound by increasing an intermediate channel between two sound channels Field recovery effect；Specifically, the method for the present invention can significantly improve best listening zone when original two sound channel signal plays back The problem of too small problem in domain (emperor position), and being effectively relieved when music-listener is kept in motion, acoustic image positions are drifted about.

Technical solution: a kind of method that two channel sound signals are converted into three sound channel signals of the present invention, Include the following steps:

(1) time domain sampled data of two sound channels of input is pressed into the framing of regular hour resolution ratio, every frame signal is by adopting The arrangement of sample time sequencing, obtains the framing voice data of two sound channels；

(2) covariance matrix of two track voice datas is calculated, and further calculates the characteristic value of covariance matrix The corresponding feature vector Vec1 and Vec2 of eig1 and eig2 and two characteristic value；

(3) by characteristic vector Vec1 and Vec2 tectonic transition matrix W；

(4) two track voice datas are transformed to by number of principal components evidence and time compositional data by transformation matrix W；

(5) mapping matrix V is constructed by characteristic vector Vec1 and Vec2；

(6) number of principal components evidence and time compositional data are mapped as by triple-track voice data by mapping matrix V.

Further improve above-mentioned technical proposal: hypothesis eig1 >=eig2 in the step (2), and Vec1=[v11, V21] ', Vec1=[v12, v22] ', in which: symbol " ' " be transposition operator, v11, v21 and v12, v22 be characterized respectively to Measure two elements in Vec1 and Vec2, it may be assumed that

Transformation matrix described in the step (3)

The step (4) includes following two sub-steps:

4.1 construct vector S ub by the first track voice data and second sound channel voice data:

Vector S ub is two-dimensional vector, and row vector is the time series data of corresponding sound channel, and column vector is the particular sample moment First sound channel and second sound channel sample；

4.2 are multiplied the transposition of transformation matrix W with vector S ub, obtain vector S ubN:

Wherein: SubN1 is main compositional data, and SubN2 is time compositional data；SubN1 and SubN2 is row vector.

The construction process of the step (5) are as follows:

Firstly, selection wl=W (1,1), wr=W (2,1), W (1,1) are the element on the 1st row the 1st of transformation matrix W column, W It (2,1) is the element on the 2nd row the 1st of transformation matrix W column；

Then, mapping parameters cl, cr, cc and normalization coefficient g are calculated by wl and wr, and

Finally, construction mapping matrix

The mapping process of the step (6) are as follows: transformation matrix V is multiplied with vector S ubN, obtains triple-track vector SubObj:

Wherein, SubObj1 is new first track voice data, and SubObj2 is new second sound channel voice data, SubObj3 For new third track voice data.SubObj1, SubObj2 and SubObj3 are row vector.

Further: the method that mapping parameters cl, cr, cc are calculated by wl and wr in the step (5) are as follows:

First: calculating acoustic image angle: α=pi/2-tan^-1(wr/wl), tan^-1For arctan function；

Then: calculating mapping parameters: clr=cos (2 α), cc=sin (2 α)；

If clr < 0 takes cl=-clr, cr=0；Conversely, cl=0 is taken, cr=clr.

As improvement: can also directly be counted in the step (5) by the method that wl and wr calculates mapping parameters cl, cr, cc Calculate mapping parameters: clr=wl²-wr², cc=2 × wl × wr

If clr < 0 takes cl=-clr, cr=0；Conversely, cl=0 is taken, cr=clr.

Further: micronization processes further below also are made to two sound channel framing voice datas in the step (1):

First: processing module being mapped by time-frequency, two sound channel framing voice datas are mapped to frequency domain from time domain, obtain two The transformation data of a sound channel；

Secondly: the transformation data of described two sound channels with continuous N frame for one group, temporally and frequency order, will convert Data organization is divided into N number of subband at two-dimentional time-frequency plane, and by the time-frequency plane；

Subsequent step is handled in the subband；

It and in the step (6) further include following treatment processes: when constructing three by the triple-track subband data obtained Frequency plane maps mapping when processing module does frequency to every frame transformation data, obtains triple-track time domain voice signal when passing through frequency.

As further perfect: being pressed centainly in the step (1) in the time-domain sampling signal for two sound channels that will be inputted The method being overlapped when temporal resolution framing using interframe data, avoiding that treated, signal is discontinuous in frame boundaries, generates bright Aobvious blocking artifact.

Accordingly: mapping block and a block laminating module when processing module includes a frequency are mapped when the frequency, it is described Input frequency domain data are mapped to time domain by mapping block when frequency, and are output to block laminating module, block laminating module by input when Domain signal overlap is added, and obtains target sound signal.

The time-frequency mapping processing module includes that a signal type analysis module, a piecemeal module and a time-frequency reflect Penetrate module；The signal type analysis module divides input signal according to the statistical property or psychoacoustic characteristics of input signal For different subclasses, the corresponding block of every kind of subclass is long, and is output to piecemeal module, and piecemeal module presses the voice signal of input According to the result of signal type analysis module, to be divided into different blocks long, is sent into time-frequency mapping module, finally by time-frequency mapping module into Row time-frequency mapping processing.

The above method is mapped using time-frequency is mapped to target frequency domain sub-band for two channel sound signals first, obtains two Sound channel subband data；Subband data is handled, three sound channel subband datas are obtained；Finally, to the sub-band number of three sound channels According to mapping when doing frequency, the target sound signal of three sound channels is obtained.

Compared with prior art, the present invention the beneficial effect is that:

The present invention can obtain more preferably sound field recovery effect by increasing an intermediate channel between two sound channels.Tool Body, it is too small that the method for the present invention can significantly improve the best listening area (emperor position) when original two sound channel signal plays back The problem of, and it is effectively relieved when music-listener is kept in motion, the problem of acoustic image positions are drifted about.

Detailed description of the invention

Fig. 1 is a kind of typical cinema loudspeakers layout drawing.

Fig. 2 is three channel loudspeaker layout drawings of the present invention.

Fig. 3 is flow diagram of the invention.

Fig. 4 is the flow diagram of embodiment 1.

Fig. 5 is the schematic diagram that time-frequency plane presses frequency order tissue subband.

Fig. 6 is that multiframe is organized together the schematic diagram for dividing subband by time-frequency plane.

Fig. 7 is that time-frequency plane presses frequency order, while multiframe being organized together to the schematic diagram for dividing subband.

In figure: 1, the first channel loudspeaker, 2, second sound channel loudspeaker, 3, third channel loudspeaker.

Specific embodiment

Technical solution of the present invention is described in detail with reference to the accompanying drawing, but protection scope of the present invention is not limited to The embodiment.

Embodiment 1: a method of two channel sound signals are converted into three sound channel signals, are included the following steps:

(1) time domain sampled data of two sound channels of input is pressed into the framing of regular hour resolution ratio, every frame signal is by adopting The arrangement of sample time sequencing, obtains the framing voice data of two sound channels.

(2) processing module is mapped by time-frequency, two sound channel framing voice datas is mapped to frequency domain from time domain, obtain two The transformation data of sound channel；Time-frequency mapping includes that time-frequency conversion is (such as Fourier transformation FFT, discrete cosine transform, discrete Sine transform DST, Modified Discrete Cosine Transform MDCT, the common converter technique of amendment discrete sine transform MDST etc.), sub-band filter If (quadrature mirror filter group QMF, complex quad-rature-mirror filter group CQMF, pseudo- quadrature mirror filter group PQMF etc. are often With sub-band filter technology), (such as small echo Wavelet, wavelet packet Wavelete Packege are common to be differentiated multiresolution analysis more Rate analytical technology).

(3) by the transformation data of described two sound channels, with continuous N frame for one group, temporally and frequency order, number will be converted According to being organized into two-dimentional time-frequency plane, and the time-frequency plane is divided into N number of subband；The X-axis of the time-frequency plane temporally (or Frame) sequence, Y-axis is by frequency (or subband) sequence), each corresponding time-frequency plane of sound channel transformation data.N number of subband It divides, can be by psychologic acoustics partition of the scale (such as Bark scale, ERB scale), it can also be by divisions sides such as signal statistics Formula, it might even be possible to be evenly dividing.The M and N is all larger than or is equal to 1.Subsequent step is handled in the subband.

As an example, the tissue of the time-frequency plane and division can be by the methods as shown in Fig. 5, Fig. 6 and Fig. 7, wherein Horizontal axis is time shaft, corresponding time frame (1~M)；The longitudinal axis is frequency axis, the transformation signal of correspondence mappings to frequency domain, from top to bottom It is arranged by low frequency to high frequency.The division methods of the subband can both be divided in a frame by frequency order, i.e., as shown in Figure 5 Data in subband are in a certain frequency range of the frame, by the data of frequency order arrangement；It can also be as shown in Figure 6 by multiframe group It is woven in together division, i.e. data in subband are the data in a certain frequency range of front and back number frame；Or it as shown in Figure 7 will be above-mentioned Division methods combine division.

Assuming that two sound channel time-frequency planes be respectively the first channel data Sig1 (t, f) and second sound channel data Sig2 (t, F), wherein t is frame number (1 <=t <=M), and f is frequency or sub-band serial number.Time-frequency plane is divided, the first sound channel is obtained Band data Sub1 (k) and second sound channel subband data Sub2 (k), 1 <=k <=N, each subband include multiple transformation data. Because operation below is using subband as unit operation, for sake of convenience, sub-band serial number is omitted.

(4) in the subband, the covariance matrix of two sound channel transformation data is calculated, and further calculates covariance square The characteristic value eig1 and the corresponding feature vector Vec1 and Vec2 of eig2 and two characteristic value of battle array；Assuming that eig1 >=eig2, And Vec1=[v11, v21] ', Vec1=[v12, v22] ', wherein symbol " ' " be transposition operator, v11, v21 and v12, v22 Two elements in respectively feature vector Vec1 and Vec2, it may be assumed that

(5) by characteristic vector Vec1 and Vec2 tectonic transition matrix W；

(6) two track voice datas are transformed to by number of principal components evidence and time compositional data by transformation matrix W；Include two Sub-steps:

First: vector S ub is constructed by the first track voice data and second sound channel voice data:

Then: the transposition of transformation matrix W is multiplied with vector S ub, obtains vector S ubN:

(7) mapping matrix V is constructed by characteristic vector Vec1 and Vec2；

Finally, construction mapping matrix

Wherein: the method that mapping parameters cl, cr, cc are calculated by wl and wr are as follows:

Then: calculating mapping parameters: clr=cos (2 α), cc=sin (2 α)

If clr < 0 takes cl=-clr, cr=0；Conversely, cl=0 is taken, cr=clr.

It is another: can also directly to calculate mapping parameters: clr=wl²-wr², cc=2 × wl × wr

If clr < 0 takes cl=-clr, cr=0；Conversely, cl=0 is taken, cr=clr.

(8) number of principal components evidence and time compositional data are mapped as by triple-track voice data, mapping process by mapping matrix V Are as follows: transformation matrix V is multiplied with vector S ubN, obtains triple-track vector S ubObj:

(9) three time-frequency planes are constructed by the triple-track subband data obtained, processing module is mapped when passing through frequency to every frame Mapping when transformation data do frequency, obtains triple-track time domain voice signal.

The above method is mapped using time-frequency is mapped to target frequency domain sub-band for voice signal, and processes to subband data, Finally to treated data do frequency when mapping, obtain target sound signal.This is the signal processing side of a kind of analysis and synthesis Method.According to uncertainty principle, using in the signal processing algorithm of analysis and synthesis, higher temporal resolution means lower Frequency resolution；Conversely, higher frequency resolution means lower temporal resolution.

For the contradiction of active balance temporal resolution and frequency resolution, different blocks is taken for different input signals It is long, with the contradiction of active balance temporal resolution and frequency resolution.Mapping is handled when for time-frequency mapping processing module and frequency Module has made following improvement:

The signal type analysis module, the known method that can be generally used using acoustic coding field.For example, movement Perceptual entropy (PE) detection method of motion picture expert group version (MPEG) standard use, FAAC (free advanced audio) acoustic coding use Short time FFT (Fourier transformation) energy measuring method or subframe energy variation detection method etc. for using of 3GPP standardization body.

In the step (1): when signal sub-frame processing, if treated, signal is discontinuous in frame boundaries, can generate bright Aobvious fast effect phenomenon.Can be in framing, using the method for interframe data overlapping, i.e., the partial data of adjacent two frame data is Overlapping, and before treatment to overlapped data adding window.Specifically when realizing, the window function that can be used includes raised cosine window, KBD Window etc..The method of the frame overlapping removal blocking artifact belongs to the known method of field of signal processing.

It is corresponding, in the step (9): mapping mapping block and a block superposition when processing module includes a frequency when frequency Module, input frequency domain data are mapped to time domain by mapping block when the frequency, and are output to block laminating module, and block laminating module will The time-domain signal overlap-add of input obtains target sound signal.Specifically, mapping block obtains new channel number when through overfrequency According to rear, former frame addition will be corresponded to after the new channel data adding window of lap, if two frame of front and back corresponds to the window letter of sampled point Several quadratic sums is 1 (i.e. the energy of front and back window overlap-add is 1), can obtain preferable deblocking effect effect.It is adopted at this time Window function should be identical as the window function used in step (1).

The efficiency and quality of acoustic processing can be effectively improved by above-mentioned improvement.

It, need to be in the base of existing two channel loudspeaker when the triple-track voice signal obtained by adopting the above technical scheme plays Increase third channel loudspeaker on plinth, the third sound channel is when playing, loudspeaker and the first channel loudspeaker and the rising tone Road loudspeaker is on concentric circles, and the angle between the first channel loudspeaker and second sound channel loudspeaker is identical, such as Fig. 2 institute Show.It, can also be by the first channel loudspeaker, second sound channel loudspeaker and third channel loudspeaker cloth in actual loudspeaker arrangement It sets point-blank.When the first channel loudspeaker and smaller second sound channel loudspeaker angles, lip synch error resulting from Cannot be discovered by human ear, at this point, third channel loudspeaker will translate downwards in Fig. 2, device outerplanar and the first channel loudspeaker and Second sound channel loudspeaker is point-blank.It, can be in more professional application (such as recording studio, post-production and monitoring) According to the regulation of ITU standard, the mode that the center of circle is directed toward in loudspeaker axle center puts loudspeaker.

First channel loudspeaker and second sound channel loudspeaker can raise the left speaker of system for stereophony With right loudspeaker, it is also possible to any two adjacent loudspeakers of any other multi-channel system.

As described above, must not be explained although the present invention has been indicated and described referring to specific preferred embodiment For the limitation to invention itself.It without prejudice to the spirit and scope of the invention as defined in the appended claims, can be right Various changes can be made in the form and details for it.

Claims

1. a kind of method that two channel sound signals are converted into three sound channel signals, characterized by the following steps:

(1) time domain sampled data of two sound channels of input is pressed into the framing of regular hour resolution ratio, when every frame signal is by sampling Between sequentially arrange, obtain the framing voice data of two sound channels；Refinement further below also is made to two sound channel framing voice datas Processing:

First: processing module being mapped by time-frequency, two sound channel framing voice datas are mapped to frequency domain from time domain, obtain two sound The transformation data in road；

Secondly: by the transformation data of described two sound channels, with continuous N frame for one group, temporally and frequency order, data will be converted It is organized into two-dimentional time-frequency plane, and the time-frequency plane is divided into N number of subband；

Subsequent step is handled in the subband；

(2) calculate two track voice datas covariance matrix, and further calculate covariance matrix characteristic value eig1 and The corresponding feature vector Vec1 and Vec2 of eig2 and two characteristic value；

(3) by feature vector Vec1 and Vec2 tectonic transition matrix W；

(5) mapping matrix V is constructed by feature vector Vec1 and Vec2；

(6) number of principal components evidence and time compositional data are mapped as by triple-track voice data by mapping matrix V, by three sound obtained Line band three time-frequency planes of data configuration map mapping when processing module does frequency to every frame transformation data, obtain when passing through frequency Triple-track time domain voice signal.

2. a kind of method that two channel sound signals are converted into three sound channel signals according to claim 1, special Sign is: eig1 >=eig2, and Vec1=[v11, v21] ', Vec1=[v12, v22] ' is assumed in the step (2), wherein according with Number " ' " it is transposition operator, v11, v21 and v12, v22 are respectively two elements in feature vector Vec1 and Vec2, it may be assumed that

。

3. a kind of method that two channel sound signals are converted into three sound channel signals according to claim 2, special Sign is: transformation matrix described in the step (3)

。

4. a kind of method that two channel sound signals are converted into three sound channel signals according to claim 3, special Sign is: the step (4) includes following two sub-steps:

Vector S ub is two-dimensional vector, and row vector is the time series data of corresponding sound channel, and column vector is the first of particular sample moment Sound channel and second sound channel sample；

5. a kind of method that two channel sound signals are converted into three sound channel signals according to claim 4, special Sign is: the construction process of the step (5) are as follows:

Firstly, selection wl=W (1,1), wr=W (2,1), W (1,1) are the element on the 1st row the 1st of transformation matrix W column, W (2,1) For the element on the 2nd row the 1st of transformation matrix W column；

Then, mapping parameters cl, cr, cc and normalization coefficient g are calculated by wl and wr, and；

Finally, construction mapping matrix

。

6. a kind of method that two channel sound signals are converted into three sound channel signals according to claim 5, special Sign is: the mapping process of the step (6) are as follows: transformation matrix V is multiplied with vector S ubN, obtains triple-track vector SubObj:

Wherein, SubObj1 is new first track voice data, and SubObj2 is new second sound channel voice data, and SubObj3 is new Third track voice data, SubObj1, SubObj2 and SubObj3 are row vector.

7. a kind of method that two channel sound signals are converted into three sound channel signals according to claim 5, special Sign is: the method for calculating mapping parameters cl, cr, cc by wl and wr in the step (5) are as follows:

First: calculate acoustic image angle:,For arctan function；

Then: calculate mapping parameters:,；

If clr < 0, cl=- clr, cr=0 are taken；Conversely, cl=0 is taken, cr=clr.

8. a kind of method that two channel sound signals are converted into three sound channel signals according to claim 5, special Sign is: mapping parameters can also directly be calculated by the method that wl and wr calculates mapping parameters cl, cr, cc in the step (5):,

If clr < 0, cl=- clr, cr=0 are taken；Conversely, cl=0 is taken, cr=clr.

9. a kind of method that two channel sound signals are converted into three sound channel signals according to claim 1, special Sign is:

It is used when by the time-domain sampling signal of two sound channels of input by regular hour resolution ratio framing in the step (1) The method of interframe data overlapping, avoiding that treated, signal is discontinuous in frame boundaries, generates apparent blocking artifact；

Meanwhile mapping block and a block laminating module, institute when mapping processing module includes a frequency when step (6) intermediate frequency Input frequency domain data are mapped to time domain by mapping block when stating frequency, and are output to block laminating module, and block laminating module is by input Time-domain signal overlap-add obtains target sound signal.

10. a kind of method that two channel sound signals are converted into three sound channel signals according to claim 1, special Sign is: the time-frequency mapping processing module includes that a signal type analysis module, a piecemeal module and a time-frequency reflect Penetrate module；The signal type analysis module divides input signal according to the statistical property or psychoacoustic characteristics of input signal For different subclasses, the corresponding block of every kind of subclass is long, and is output to piecemeal module, and piecemeal module presses the voice signal of input According to the result of signal type analysis module, to be divided into different blocks long, is sent into time-frequency mapping module, finally by time-frequency mapping module into Row time-frequency mapping processing.