A method of two channel sound signals are converted into three sound channel signals
Technical field
The present invention relates to a kind of methods that two channel sound signals are converted into three sound channel signals, belong to acoustic processing
Technical field.
Background technique
In real world, sound is the Free propagation in three-dimensional sound field.In order to restore certain in other times or place
One sound scenery needs that original sound is sampled and encoded.Because of the limitation of odjective cause, voice signal is being sampled or is being compiled
When code, less sound channel can only be selected, this makes voice signal in playback, one approximate sound field of recovery that can only be limited.
In order to preferably sound field recovery effect can be obtained with limited channel number, it has been developed that two-channel stereo
The recording of the more multichannels such as sound recording technology and 5.1,7.1 and playback technology.
But recording track quantity is still critical constraints, cannot meet the requirements the application demand of higher occasion.One
A typical example is, the professional sound systems such as cinema generally use the loudspeaker far more than recording track number and raise one's voice to its utmost back and forth sound
Signal.At this point, typical way is that loudspeaker is grouped (such as left circulating loudspeaker group, right surround loudspeaker group), each group
It is made of multiple loudspeakers, and the same channel sound signal of feeding (such as left channel signals, right channel sound signal);Also have
By same channel sound signal by way of being simply delayed and decaying and feed different loudspeakers.Fig. 1 is a kind of typical electricity
(actual theater also needs low-frequency effect loudspeaker to theatre loudspeaker layout drawing, and number of loudspeakers can change with movie theatre size
Become), when playing 5.1 surround sound, a left side surround and has circular signal to be fed by 6 circulating loudspeakers respectively;It is surround in broadcasting 7.1
When acoustical signal, a left side surrounds and right surrounds signal and is individually fed to 4 lateral circulating loudspeakers, and left back circular signal feeds left back 2
A circulating loudspeaker, it is right after around signal feed the right side after 2 circulating loudspeakers.
At least there are the following problems for existing audio system: (1) best listening area is too small;(2) acoustic image when music-listener moves
Drift.Both of these problems are all to use sound since existing audio system is in order to restore sound field with less sound channel approximation
Caused by mirage principle.Wherein, first problem is the emperor position problem frequently referred to, i.e., best listening area works as music-listener
Not in best listening area, original mirage position is destroyed and leads to the decline of auditory perception.Second Problem be because
Original recording assumes that music-listener is in ideal static LisPos, when music-listener is in movement, causes to restore acoustic image quilt
It further destroys and leads to being decreased obviously for auditory perception.
Summary of the invention
Goal of the invention: in view of the above-mentioned deficiencies in the prior art, it is an object of the present invention to provide a kind of by two channel sounds letters
Number method for being converted into three sound channel signals can obtain more preferably sound by increasing an intermediate channel between two sound channels
Field recovery effect;Specifically, the method for the present invention can significantly improve best listening zone when original two sound channel signal plays back
The problem of too small problem in domain (emperor position), and being effectively relieved when music-listener is kept in motion, acoustic image positions are drifted about.
Technical solution: a kind of method that two channel sound signals are converted into three sound channel signals of the present invention,
Include the following steps:
(1) time domain sampled data of two sound channels of input is pressed into the framing of regular hour resolution ratio, every frame signal is by adopting
The arrangement of sample time sequencing, obtains the framing voice data of two sound channels;
(2) covariance matrix of two track voice datas is calculated, and further calculates the characteristic value of covariance matrix
The corresponding feature vector Vec1 and Vec2 of eig1 and eig2 and two characteristic value;
(3) by characteristic vector Vec1 and Vec2 tectonic transition matrix W;
(4) two track voice datas are transformed to by number of principal components evidence and time compositional data by transformation matrix W;
(5) mapping matrix V is constructed by characteristic vector Vec1 and Vec2;
(6) number of principal components evidence and time compositional data are mapped as by triple-track voice data by mapping matrix V.
Further improve above-mentioned technical proposal: hypothesis eig1 >=eig2 in the step (2), and Vec1=[v11,
V21] ', Vec1=[v12, v22] ', in which: symbol " ' " be transposition operator, v11, v21 and v12, v22 be characterized respectively to
Measure two elements in Vec1 and Vec2, it may be assumed that
Transformation matrix described in the step (3)
The step (4) includes following two sub-steps:
4.1 construct vector S ub by the first track voice data and second sound channel voice data:
Vector S ub is two-dimensional vector, and row vector is the time series data of corresponding sound channel, and column vector is the particular sample moment
First sound channel and second sound channel sample;
4.2 are multiplied the transposition of transformation matrix W with vector S ub, obtain vector S ubN:
Wherein: SubN1 is main compositional data, and SubN2 is time compositional data;SubN1 and SubN2 is row vector.
The construction process of the step (5) are as follows:
Firstly, selection wl=W (1,1), wr=W (2,1), W (1,1) are the element on the 1st row the 1st of transformation matrix W column, W
It (2,1) is the element on the 2nd row the 1st of transformation matrix W column;
Then, mapping parameters cl, cr, cc and normalization coefficient g are calculated by wl and wr, and
Finally, construction mapping matrix
The mapping process of the step (6) are as follows: transformation matrix V is multiplied with vector S ubN, obtains triple-track vector
SubObj:
Wherein, SubObj1 is new first track voice data, and SubObj2 is new second sound channel voice data, SubObj3
For new third track voice data.SubObj1, SubObj2 and SubObj3 are row vector.
Further: the method that mapping parameters cl, cr, cc are calculated by wl and wr in the step (5) are as follows:
First: calculating acoustic image angle: α=pi/2-tan-1(wr/wl), tan-1For arctan function;
Then: calculating mapping parameters: clr=cos (2 α), cc=sin (2 α);
If clr < 0 takes cl=-clr, cr=0;Conversely, cl=0 is taken, cr=clr.
As improvement: can also directly be counted in the step (5) by the method that wl and wr calculates mapping parameters cl, cr, cc
Calculate mapping parameters: clr=wl2-wr2, cc=2 × wl × wr
If clr < 0 takes cl=-clr, cr=0;Conversely, cl=0 is taken, cr=clr.
Further: micronization processes further below also are made to two sound channel framing voice datas in the step (1):
First: processing module being mapped by time-frequency, two sound channel framing voice datas are mapped to frequency domain from time domain, obtain two
The transformation data of a sound channel;
Secondly: the transformation data of described two sound channels with continuous N frame for one group, temporally and frequency order, will convert
Data organization is divided into N number of subband at two-dimentional time-frequency plane, and by the time-frequency plane;
Subsequent step is handled in the subband;
It and in the step (6) further include following treatment processes: when constructing three by the triple-track subband data obtained
Frequency plane maps mapping when processing module does frequency to every frame transformation data, obtains triple-track time domain voice signal when passing through frequency.
As further perfect: being pressed centainly in the step (1) in the time-domain sampling signal for two sound channels that will be inputted
The method being overlapped when temporal resolution framing using interframe data, avoiding that treated, signal is discontinuous in frame boundaries, generates bright
Aobvious blocking artifact.
Accordingly: mapping block and a block laminating module when processing module includes a frequency are mapped when the frequency, it is described
Input frequency domain data are mapped to time domain by mapping block when frequency, and are output to block laminating module, block laminating module by input when
Domain signal overlap is added, and obtains target sound signal.
The time-frequency mapping processing module includes that a signal type analysis module, a piecemeal module and a time-frequency reflect
Penetrate module;The signal type analysis module divides input signal according to the statistical property or psychoacoustic characteristics of input signal
For different subclasses, the corresponding block of every kind of subclass is long, and is output to piecemeal module, and piecemeal module presses the voice signal of input
According to the result of signal type analysis module, to be divided into different blocks long, is sent into time-frequency mapping module, finally by time-frequency mapping module into
Row time-frequency mapping processing.
The above method is mapped using time-frequency is mapped to target frequency domain sub-band for two channel sound signals first, obtains two
Sound channel subband data;Subband data is handled, three sound channel subband datas are obtained;Finally, to the sub-band number of three sound channels
According to mapping when doing frequency, the target sound signal of three sound channels is obtained.
Compared with prior art, the present invention the beneficial effect is that:
The present invention can obtain more preferably sound field recovery effect by increasing an intermediate channel between two sound channels.Tool
Body, it is too small that the method for the present invention can significantly improve the best listening area (emperor position) when original two sound channel signal plays back
The problem of, and it is effectively relieved when music-listener is kept in motion, the problem of acoustic image positions are drifted about.
Detailed description of the invention
Fig. 1 is a kind of typical cinema loudspeakers layout drawing.
Fig. 2 is three channel loudspeaker layout drawings of the present invention.
Fig. 3 is flow diagram of the invention.
Fig. 4 is the flow diagram of embodiment 1.
Fig. 5 is the schematic diagram that time-frequency plane presses frequency order tissue subband.
Fig. 6 is that multiframe is organized together the schematic diagram for dividing subband by time-frequency plane.
Fig. 7 is that time-frequency plane presses frequency order, while multiframe being organized together to the schematic diagram for dividing subband.
In figure: 1, the first channel loudspeaker, 2, second sound channel loudspeaker, 3, third channel loudspeaker.
Specific embodiment
Technical solution of the present invention is described in detail with reference to the accompanying drawing, but protection scope of the present invention is not limited to
The embodiment.
Embodiment 1: a method of two channel sound signals are converted into three sound channel signals, are included the following steps:
(1) time domain sampled data of two sound channels of input is pressed into the framing of regular hour resolution ratio, every frame signal is by adopting
The arrangement of sample time sequencing, obtains the framing voice data of two sound channels.
(2) processing module is mapped by time-frequency, two sound channel framing voice datas is mapped to frequency domain from time domain, obtain two
The transformation data of sound channel;Time-frequency mapping includes that time-frequency conversion is (such as Fourier transformation FFT, discrete cosine transform, discrete
Sine transform DST, Modified Discrete Cosine Transform MDCT, the common converter technique of amendment discrete sine transform MDST etc.), sub-band filter
If (quadrature mirror filter group QMF, complex quad-rature-mirror filter group CQMF, pseudo- quadrature mirror filter group PQMF etc. are often
With sub-band filter technology), (such as small echo Wavelet, wavelet packet Wavelete Packege are common to be differentiated multiresolution analysis more
Rate analytical technology).
(3) by the transformation data of described two sound channels, with continuous N frame for one group, temporally and frequency order, number will be converted
According to being organized into two-dimentional time-frequency plane, and the time-frequency plane is divided into N number of subband;The X-axis of the time-frequency plane temporally (or
Frame) sequence, Y-axis is by frequency (or subband) sequence), each corresponding time-frequency plane of sound channel transformation data.N number of subband
It divides, can be by psychologic acoustics partition of the scale (such as Bark scale, ERB scale), it can also be by divisions sides such as signal statistics
Formula, it might even be possible to be evenly dividing.The M and N is all larger than or is equal to 1.Subsequent step is handled in the subband.
As an example, the tissue of the time-frequency plane and division can be by the methods as shown in Fig. 5, Fig. 6 and Fig. 7, wherein
Horizontal axis is time shaft, corresponding time frame (1~M);The longitudinal axis is frequency axis, the transformation signal of correspondence mappings to frequency domain, from top to bottom
It is arranged by low frequency to high frequency.The division methods of the subband can both be divided in a frame by frequency order, i.e., as shown in Figure 5
Data in subband are in a certain frequency range of the frame, by the data of frequency order arrangement;It can also be as shown in Figure 6 by multiframe group
It is woven in together division, i.e. data in subband are the data in a certain frequency range of front and back number frame;Or it as shown in Figure 7 will be above-mentioned
Division methods combine division.
Assuming that two sound channel time-frequency planes be respectively the first channel data Sig1 (t, f) and second sound channel data Sig2 (t,
F), wherein t is frame number (1 <=t <=M), and f is frequency or sub-band serial number.Time-frequency plane is divided, the first sound channel is obtained
Band data Sub1 (k) and second sound channel subband data Sub2 (k), 1 <=k <=N, each subband include multiple transformation data.
Because operation below is using subband as unit operation, for sake of convenience, sub-band serial number is omitted.
(4) in the subband, the covariance matrix of two sound channel transformation data is calculated, and further calculates covariance square
The characteristic value eig1 and the corresponding feature vector Vec1 and Vec2 of eig2 and two characteristic value of battle array;Assuming that eig1 >=eig2,
And Vec1=[v11, v21] ', Vec1=[v12, v22] ', wherein symbol " ' " be transposition operator, v11, v21 and v12, v22
Two elements in respectively feature vector Vec1 and Vec2, it may be assumed that
(5) by characteristic vector Vec1 and Vec2 tectonic transition matrix W;
(6) two track voice datas are transformed to by number of principal components evidence and time compositional data by transformation matrix W;Include two
Sub-steps:
First: vector S ub is constructed by the first track voice data and second sound channel voice data:
Vector S ub is two-dimensional vector, and row vector is the time series data of corresponding sound channel, and column vector is the particular sample moment
First sound channel and second sound channel sample;
Then: the transposition of transformation matrix W is multiplied with vector S ub, obtains vector S ubN:
Wherein: SubN1 is main compositional data, and SubN2 is time compositional data;SubN1 and SubN2 is row vector.
(7) mapping matrix V is constructed by characteristic vector Vec1 and Vec2;
Firstly, selection wl=W (1,1), wr=W (2,1), W (1,1) are the element on the 1st row the 1st of transformation matrix W column, W
It (2,1) is the element on the 2nd row the 1st of transformation matrix W column;
Then, mapping parameters cl, cr, cc and normalization coefficient g are calculated by wl and wr, and
Finally, construction mapping matrix
Wherein: the method that mapping parameters cl, cr, cc are calculated by wl and wr are as follows:
First: calculating acoustic image angle: α=pi/2-tan-1(wr/wl), tan-1For arctan function;
Then: calculating mapping parameters: clr=cos (2 α), cc=sin (2 α)
If clr < 0 takes cl=-clr, cr=0;Conversely, cl=0 is taken, cr=clr.
It is another: can also directly to calculate mapping parameters: clr=wl2-wr2, cc=2 × wl × wr
If clr < 0 takes cl=-clr, cr=0;Conversely, cl=0 is taken, cr=clr.
(8) number of principal components evidence and time compositional data are mapped as by triple-track voice data, mapping process by mapping matrix V
Are as follows: transformation matrix V is multiplied with vector S ubN, obtains triple-track vector S ubObj:
Wherein, SubObj1 is new first track voice data, and SubObj2 is new second sound channel voice data, SubObj3
For new third track voice data.SubObj1, SubObj2 and SubObj3 are row vector.
(9) three time-frequency planes are constructed by the triple-track subband data obtained, processing module is mapped when passing through frequency to every frame
Mapping when transformation data do frequency, obtains triple-track time domain voice signal.
The above method is mapped using time-frequency is mapped to target frequency domain sub-band for voice signal, and processes to subband data,
Finally to treated data do frequency when mapping, obtain target sound signal.This is the signal processing side of a kind of analysis and synthesis
Method.According to uncertainty principle, using in the signal processing algorithm of analysis and synthesis, higher temporal resolution means lower
Frequency resolution;Conversely, higher frequency resolution means lower temporal resolution.
For the contradiction of active balance temporal resolution and frequency resolution, different blocks is taken for different input signals
It is long, with the contradiction of active balance temporal resolution and frequency resolution.Mapping is handled when for time-frequency mapping processing module and frequency
Module has made following improvement:
The time-frequency mapping processing module includes that a signal type analysis module, a piecemeal module and a time-frequency reflect
Penetrate module;The signal type analysis module divides input signal according to the statistical property or psychoacoustic characteristics of input signal
For different subclasses, the corresponding block of every kind of subclass is long, and is output to piecemeal module, and piecemeal module presses the voice signal of input
According to the result of signal type analysis module, to be divided into different blocks long, is sent into time-frequency mapping module, finally by time-frequency mapping module into
Row time-frequency mapping processing.
The signal type analysis module, the known method that can be generally used using acoustic coding field.For example, movement
Perceptual entropy (PE) detection method of motion picture expert group version (MPEG) standard use, FAAC (free advanced audio) acoustic coding use
Short time FFT (Fourier transformation) energy measuring method or subframe energy variation detection method etc. for using of 3GPP standardization body.
In the step (1): when signal sub-frame processing, if treated, signal is discontinuous in frame boundaries, can generate bright
Aobvious fast effect phenomenon.Can be in framing, using the method for interframe data overlapping, i.e., the partial data of adjacent two frame data is
Overlapping, and before treatment to overlapped data adding window.Specifically when realizing, the window function that can be used includes raised cosine window, KBD
Window etc..The method of the frame overlapping removal blocking artifact belongs to the known method of field of signal processing.
It is corresponding, in the step (9): mapping mapping block and a block superposition when processing module includes a frequency when frequency
Module, input frequency domain data are mapped to time domain by mapping block when the frequency, and are output to block laminating module, and block laminating module will
The time-domain signal overlap-add of input obtains target sound signal.Specifically, mapping block obtains new channel number when through overfrequency
According to rear, former frame addition will be corresponded to after the new channel data adding window of lap, if two frame of front and back corresponds to the window letter of sampled point
Several quadratic sums is 1 (i.e. the energy of front and back window overlap-add is 1), can obtain preferable deblocking effect effect.It is adopted at this time
Window function should be identical as the window function used in step (1).
The efficiency and quality of acoustic processing can be effectively improved by above-mentioned improvement.
It, need to be in the base of existing two channel loudspeaker when the triple-track voice signal obtained by adopting the above technical scheme plays
Increase third channel loudspeaker on plinth, the third sound channel is when playing, loudspeaker and the first channel loudspeaker and the rising tone
Road loudspeaker is on concentric circles, and the angle between the first channel loudspeaker and second sound channel loudspeaker is identical, such as Fig. 2 institute
Show.It, can also be by the first channel loudspeaker, second sound channel loudspeaker and third channel loudspeaker cloth in actual loudspeaker arrangement
It sets point-blank.When the first channel loudspeaker and smaller second sound channel loudspeaker angles, lip synch error resulting from
Cannot be discovered by human ear, at this point, third channel loudspeaker will translate downwards in Fig. 2, device outerplanar and the first channel loudspeaker and
Second sound channel loudspeaker is point-blank.It, can be in more professional application (such as recording studio, post-production and monitoring)
According to the regulation of ITU standard, the mode that the center of circle is directed toward in loudspeaker axle center puts loudspeaker.
First channel loudspeaker and second sound channel loudspeaker can raise the left speaker of system for stereophony
With right loudspeaker, it is also possible to any two adjacent loudspeakers of any other multi-channel system.
As described above, must not be explained although the present invention has been indicated and described referring to specific preferred embodiment
For the limitation to invention itself.It without prejudice to the spirit and scope of the invention as defined in the appended claims, can be right
Various changes can be made in the form and details for it.