CN103460283B - Method for determining encoding parameter for multi-channel audio signal and multi-channel audio encoder - Google Patents

Method for determining encoding parameter for multi-channel audio signal and multi-channel audio encoder Download PDF

Info

Publication number
CN103460283B
CN103460283B CN201280003252.9A CN201280003252A CN103460283B CN 103460283 B CN103460283 B CN 103460283B CN 201280003252 A CN201280003252 A CN 201280003252A CN 103460283 B CN103460283 B CN 103460283B
Authority
CN
China
Prior art keywords
audio
itd
smoothing
group
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201280003252.9A
Other languages
Chinese (zh)
Other versions
CN103460283A (en
Inventor
大卫·维雷特
郎玥
许剑峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of CN103460283A publication Critical patent/CN103460283A/en
Application granted granted Critical
Publication of CN103460283B publication Critical patent/CN103460283B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/06Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being correlation coefficients

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention relates to a method (100) for determining an encoding parameter (ITD) for an audio channel signal (x1) of a plurality of audio channel signals (x1, x2) of a multi-channel audio signal. Each audio channel signal (x1, x2) has audio channel signal values (x1[n], X2[n]). The method comprises: determining (101) for the audio channel signal (x1) a set of functions (c[b]) from the audio channel signal values (x1[n]) of the audio channel signal (x1) and reference audio signal values (X2[n]) of a reference audio signal (x2), wherein the reference audio signal is another audio channel signal (x2) of the plurality of audio channel signals or a down-mix audio signal derived from at least two audio channel signals (x1, x2) of the plurality of multi-channel audio signals; determining (103) a first set of encoding parameters (ITD[b]) based on a smoothing of the set of functions (c[b]) with respect to a frame sequence (i) of the multi-channel audio signal, the smoothing being based on a first smoothing coefficient (SMW1); determining (105) a second set of encoding parameters (ITD_inst[b]) based on a smoothing of the set of functions (c[b]) with respect to the frame sequence (i) of the multi-channel audio signal, the smoothing being based on a second smoothing coefficient (SMW2); and determining (107) the encoding parameter (ITD, CLD) based on a quality criterion with respect to the first set of encoding parameters (ITD[b]) and/or the second set of encoding parameters (ITD_inst[b]).

Description

Determine method and the multichannel audio coding device of the coding parameter of multi channel audio signal
Technical field
The present invention relates to audio coding, exactly, relating to parameter multichannel or the stereo audio coding also known as making parametric spatial audio coding.
Background technology
Such as at the IEEE symposium minutes for audio frequency and sound signal processing application, October calendar year 2001, 199th page to the 202nd page (Proc.IEEE Workshop on Appl.of Sig.Proc.to Audio and Acoust., Oct.2001, pp.199-202) in, method strangles the parameter stereo or multichannel audio coding that describe in (C.Faller) and Bao Mujiate (F.Baumgarte) " using effective expression (Efficient representation of spatial audio using perceptual parametrization) of the parameterized space audio of perception ", the prompting of its usage space is with from lower mixed audio signal, be generally monophonic or stereo audio signal, synthesis multi channel audio signal, under the channel ratio that described multi channel audio signal has, mixed audio signal is many.Usually, lower mixed audio signal because of multi channel audio signal, such as, stereo audio signal, multiple audio channel signals overlap and produce.Waveform coding is carried out to these less channels, and by the side information relevant to primary signal channel relationships, that is, spatial cues, adds the voice-grade channel of coding to as coding parameter.Decoder uses this side information to regenerate the voice-grade channel of original amount based on the waveform coding voice-grade channel of decoding.
Interchannel level difference (ILD or CLD) can be used as from the prompting needed for monophonic downmix signal generation frequently stereophonic signal by basic parameter stereophonic encoder.More multi-stylus end encoder also can use inter-channel coherence (inter-channel coherence, ICC), and inter-channel coherence (ICC) can represent audio channel signals, that is, voice-grade channel, between similar degree.In addition, when coding biphonic signal such as by use head related transfer function (HRTF) to filter to realize based on 3D audio frequency or earphone around in current, interaural difference (ITD) can play a role the delay inequality of reappearing between channel.
As shown in Figure 8, interaural difference (ITD) arrives time difference between two ears 803,805 for sound 801.Interaural difference (ITD) is very important for localization of sound, because it provides the prompting of the incident direction 807 or angle θ (relative to head 809) distinguishing sound source 801.If signal arrives ear 803,805 from side, so to arrive the path 811 of ear 803 (offside) far away longer for this signal, and the path 813 arriving nearly ear 805 (homonymy) is shorter.Between the sound of this path official post arrival ear 803,805, generation time differs from 815, and this time difference 815 is detected and contributes to the differentiation procedure in the direction 807 of sound source 801
Fig. 8 gives an example (being expressed as Δ t or time difference 815) of ITD.The time difference arriving two ears 803,805 is indicated by the time delay of sound waveform.If waveform first arrives left ear 803, so ITD815 is positive, otherwise is negative.If sound source 801 is positioned at the dead ahead of listener, so waveform arrives two ears 803,805 simultaneously, and therefore ITD815 is zero.
ITD prompting is very important for most of stereo recording.Such as, music recording or audio conferencing can be used to by using such as artificial head or ears to synthesize from truly recording the binaural audio signal obtained based on head related transfer function (HRTF) process.Therefore, binaural audio signal is especially extremely important parameter for low bit rate parameter stereo codec for the codec for conversational applications.Low-complexity and stable ITD algorithm for estimating is required for low bit rate parameter stereo codec.In addition, except such as other parameters such as interchannel level difference (CLD or ILD) and inter-channel coherence (ICC), the use of ITD parameter, also can increase the expense of bit rate.In the very low bit rate scene that this is concrete, transmit a Whole frequency band ITD parameter only.When only estimating a Whole frequency band ITD, the constraint of stability being become and is more difficult to realize.
When by using crosscorrelation, cross spectrum or energy to carry out estimated parameter, the Rapid Variable Design of estimation function can cause estimating the instability of parameter.The parameter estimated may too fast mistake change between frames continually, and we do not wish so usually.Can this thing happens when frame is less, the estimator of crosscorrelation can be caused so unreliable.Instability problem can be perceived by sound source and seem to jump to the right and/or on the contrary from left side, but actual sound source does not change position.Instability problem also can be detected by listener, even if sound source position is not jumped to the right from left side.Because sound source position less change in time is all easy to be felt by hearer, since so actual sound source is fixed, the change of sound source position will be avoided.
Such as, interaural difference (ITD) is the important parameter of parameter stereo codec.If based on cross correlation function calculating and estimate ITD in a frequency domain, even if so sound source position is fixed and actual ITD stablizes, the ITD that successive frame is estimated is usually also unstable.Stability problem is by solving with under type: first use smoothing function to crosscorrelation, then this crosscorrelation is used for ITD and estimates.But, when to crosscorrelation smoothing, the Rapid Variable Design of actual ITD can not be followed.In addition, stable smoothing can reduce the tracking behavior of following ITD change fast when sound source or LisPos relative to each other move.
Another example is that channel level difference (CLD) is estimated.CLD is the important parameter of parameter stereo codec.If based on the energy to each window (bin) or sub-band calculating and estimate CLD in a frequency domain, even if so sound source position is fixed and actual electrical adjustment is stablized, the CLD that successive frame is estimated is usually also unstable.Stability problem is by solving with under type: first use smoothing function to energy, then this energy is used for CLD and estimates.But, when making energy smoothing, the Rapid Variable Design of actual CLD can not be followed, therefore can reduce the tracking behavior of following CLD change fast when sound source or LisPos relative to each other move.
Permission is found to follow ITD or CLD change fast and make ITD or CLD keep stable suitable smoothing coefficient to prove impossible simultaneously, especially when correlation function has low resolution, such as, the frequency resolution of FFT.
Summary of the invention
Target of the present invention is to provide a kind of concept for multichannel audio coding device, and described channel audio encoder can provide not only stable but also parameter Estimation fast.
This target realizes by the feature in independent claims.Further form of implementation can be well understood to from dependent claims, description and accompanying drawing.
The present invention is based on following discovery: use strong smoothing and weak smoothing for the energy under the crosscorrelation in ITD situation or CLD situation, weak smoothing is also referred to as low smoothing, two kinds of different coding parameters can be produced, wherein a kind of coding parameter follows ITD or CLD change rapidly, and another kind of coding parameter then provides the steadiness parameter value for successive frame.By using the Intelligent Measurement program according to quality standards such as stability criterion, the coding parameter obtained is not only stable but also follow ITD or CLD change fast.
Be not enough to obtain both stability and reactivity to relevant single assessment, stability namely, when actual sound source is not moved within a period of time to the assessment that ITD parameter is consistent, and reactivity namely, when actual sound source moves or when the new sound source with diverse location appears in audio scene, change valuation functions as quick as thought.Same parameters has two different valuation functions of the different memory effect of band based on different smoothing factor, thus can make that a kind of assessment concentrates on stability and another kind of assessment concentrates on reactivity.Selection algorithm for selecting best-evaluated, that is, is assessed the most reliably.Each aspect of the present invention is two versions of the same valuation functions based on the different smoothing factor of band.Introducing quality or reliability standard are for the decision being switched to acute assessment from long-term evaluation.In order to benefit from both acute assessment and long-term evaluation, upgrade long-term situation by short-term situation, thus eliminate memory effect.
In order to describe the present invention in detail, following term, abbreviation and symbol will be used:
BCC: binaural cues is encoded, uses lower mixed and binaural cues (or spatial parameter) to describe the stereo of interchannel relation or multi-channel signal encoding.
Ears
Prompting: the interchannel at one ear and out at the other between message number and auris dextra entry signal is pointed out (also can see ITD, ILD and IC).
CLD: channel level is poor is identical with ILD.
The Rapid Implementation of FFT:DFT, is expressed as Fast Fourier Transform (FFT) (Fast Fourier Transform).
HRTF: head related transfer function, carries out modeling conversion from sound source to left ear and auris dextra entrance to sound in free field.
IC: coherence between ear, the similar degree namely at one ear and out at the other between message number and auris dextra entry signal.Sometimes crosscorrelation (IACC) between IAC or ear is also called it as.
ICC: inter-channel coherence, interchannel is correlated with.Identical with IC, but be more broadly defined in any signal between (such as, loudspeaker signal pair, ear entry signal equity).
ICPD: interchannel phase difference.Signal between average phase-difference.
ICLD: interchannel level difference.Identical with ILD, but be more broadly defined in any signal between (such as, loudspeaker signal pair, ear entry signal equity).
ICTD: interchannel time differences.Identical with ITD, but be more broadly defined in any signal between (such as, loudspeaker signal pair, ear entry signal equity).
ILD: level difference between ear, that is, the level difference at one ear and out at the other between message number and auris dextra entry signal.Sometimes interaural intensity difference (IID) is also referred to as.
IPD: phase difference between ear, that is, the phase difference at one ear and out at the other between message number and auris dextra entry signal.
ITD: interaural difference, that is, the time difference at one ear and out at the other between message number and auris dextra entry signal.Sometimes time delay between ear is also referred to as.
ICD: inter-channel difference.For representing the generic term of the difference between two channels, such as, the time difference between two channels, phase difference, level difference or coherence is represented.
Mixing: the source signal (such as, the musical instrument separately recorded, many rails are recorded) of given some, the process generating the stereo or multi channel audio signal being used for space audio playback is represented as mixing.
OCPD: total channel phase difference.The common phase amendment of two or more voice-grade channels.
Space
Audio frequency: the audio signal bringing auditory space image when carrying out playback by suitable playback system.
Space
Prompting: the prompting relevant to spatial perception.The channel of term used stereo or multi channel audio signal between prompting (also can see ICTD, ICLD and ICC).Also spatial parameter or binaural cues is expressed as.
According to first aspect, the present invention relates to the method for the coding parameter of an audio channel signals in the multiple audio channel signals for determining multi channel audio signal, each audio channel signals has audio channel signals value, described method comprises: be that described audio channel signals determines one group of function according to the described audio channel signals value of described audio channel signals and the reference audio signal value of reference audio signal, and wherein said reference audio signal is another audio channel signals in described multiple audio channel signals; The first group coding parameter is determined in smoothing based on the described one group function relevant to the frame sequence of described multi channel audio signal, and described smoothing is based on the first smoothing coefficient; The second group coding parameter is determined in smoothing based on the described one group function relevant to the described frame sequence of described multi channel audio signal, and described smoothing is based on the second smoothing coefficient; Described coding parameter is determined based on the quality standard relevant to described first group coding parameter and/or described second group coding parameter.
According to second aspect, the present invention relates to the method for the coding parameter of an audio channel signals in the multiple audio channel signals for determining multi channel audio signal, each audio channel signals has audio channel signals value, described method comprises: be that described audio channel signals determines one group of function according to the described audio channel signals value of described audio channel signals and the reference audio signal value of reference audio signal, and wherein said reference audio signal is the lower mixed audio signal that at least two audio channel signals obtain from described multiple multi channel audio signal; The first group coding parameter is determined in smoothing based on the described one group function relevant to the frame sequence of described multi channel audio signal, and described smoothing is based on the first smoothing coefficient; The second group coding parameter is determined in smoothing based on the described one group function relevant to the described frame sequence of described multi channel audio signal, and described smoothing is based on the second smoothing coefficient; Described coding parameter is determined based on the quality standard relevant to described first group coding parameter and/or described second group coding parameter.
The version of the strong smoothing of described one group of function, such as, based on the smoothing of the first smoothing parameter, makes estimation become stable.The version of the weak smoothing of described one group of function, such as, based on the smoothing of the second smoothing parameter determined simultaneously, makes estimation follow estimated parameter, i.e. ITD or CLD, actual Rapid Variable Design.Upgrade the memory of the version of the strong smoothing of described one group of function with the version of the weak smoothing of described one group of function, thus the optimal result relevant to tracking velocity and stability is provided.The decision that smoothed version adopts is the quality metric based on first group and/or the second group coding parameter.Therefore, can provide stable and parameter Estimation fast.
According to first aspect or according in the first possible form of implementation of the method for second aspect, the determination of described one group of function is comprised: the frequency transformation determining the audio channel signals value of audio channel signals; Determine the frequency transformation of the reference audio signal value of reference audio signal; At least for each sub-band in the subset of sub-band, described one group of function is defined as cross spectrum or crosscorrelation, each function in described one group of function calculates between the frequency band constrained signal part and the frequency band constrained signal part of described reference audio signal of described audio channel signals, and these frequency band constrained signal parts are in the respective sub-bands joined with the described functional dependence in described one group of function.
When estimating the coding parameter in frequency domain based on crosscorrelation, the stability that coding parameter is estimated promotes.Described one group of function can be processed for sub-band, thus the flexibility improved when selecting coding parameter improve and resist the robustness of noise because sub-band to the sensitivity of noise lower than Whole frequency band.
According in the first form of implementation of first aspect or the form of implementation possible according to the second of the method for the first form of implementation of second aspect, sub-band comprises one or more frequency window.
The large I of sub-band adjusts neatly, thus can use different coding parameters in each sub-band.
According in first aspect itself or the third the possible form of implementation according to the method for second aspect itself or the arbitrary foregoing embodiments according to first aspect or the arbitrary foregoing embodiments according to second aspect, first and second group coding parameters comprise inter-channel difference, and wherein said inter-channel difference comprises interchannel time differences and/or interchannel level difference.
Inter-channel difference can be used as spatial parameter with the difference between first and second voice-grade channel detecting multi channel audio signal.Described difference can be, such as, the difference of the time of advent, as interaural difference or interchannel time differences, or the difference of the level of two voice-grade channels.Two kinds of differences are all suitable for use as coding parameter.
According in first aspect itself or the 4th kind of possible form of implementation according to the method for second aspect itself or the arbitrary foregoing embodiments according to first aspect or the arbitrary foregoing embodiments according to second aspect, determine that coding parameter comprises based on quality standard and determine stability parameter, described stability parameter is used for described quality standard.
Described quality standard can, such as, based on stability parameter, thus promote coding parameter estimate stability.Extraly or alternately, described quality standard can based on Quality of experience (QoE) standard for promoting user experience quality (QoE).Described quality standard can based on the bandwidth criteria for the utilized bandwidth effectively when performing audio coding.
In the 5th kind of possible form of implementation of the 4th kind of form of implementation according to first aspect or the method according to the 4th kind of form of implementation of second aspect,
The determination of coding parameter is comprised: based on the second group coding parameter relevant to described frame sequence successive value between comparison, determine the stability parameter of the second group coding parameter; And determine coding parameter according to described stability parameter.
By stability in use parameter, the stability of estimation is improved.In addition, estimating speed have also been obtained lifting, because the smoothing of crosscorrelation or energy can weaken until described stability parameter indicates stability lost.
In the 6th kind of possible form of implementation of the 4th kind of form of implementation according to first aspect or the method according to the 4th kind of form of implementation of second aspect,
Stability parameter is at least based on the standard deviation of the second group coding parameter.
Described standard deviation is easy to calculate, and provides the accurate tolerance of stability.When standard deviation is less, estimation comparatively to be stablized or comparatively reliable, and when standard deviation is larger, estimation is more unstable or more unreliable.
In the 7th kind of possible form of implementation of the 4th kind of form of implementation according to first aspect or the 4th kind of form of implementation according to second aspect or the 5th kind of form of implementation according to first aspect or the method according to the 5th kind of form of implementation of second aspect, determine stability parameter for the frame of in multi channel audio signal or multiple frame.
Determine that stability parameter easily implements for the frame of in multi channel audio signal, and there is low computation complexity, and determine that stability parameter can provide the accurate estimation to stability for multiple frame.
According in any one form of implementation in the 4th kind to the 7th kind form of implementation of first aspect or the 8th kind of possible form of implementation according to the method for any one form of implementation in the 4th kind to the 7th kind form of implementation of second aspect, be determine based on the threshold friendship of stability parameter to the determination of coding parameter.
When stability parameter is lower than threshold value, estimate stable or reliable, stability parameter then indicates unstable or insecure estimation higher than threshold value.
In the 9th kind of possible form of implementation of the 8th kind of form of implementation according to first aspect or the method according to the 8th kind of form of implementation of second aspect, described method comprises further: if stability parameter spans threshold value, then upgrade the first group coding parameter by the second group coding parameter.
By upgrading, the estimation of the first group coding parameter can be improved.When stability parameter stablizes higher than instruction the threshold value estimated, long-term smoothing can upgrade with short-term smoothing or replace, thus promotes estimating speed while maintenance stability.
According in first aspect itself or the tenth kind of possible form of implementation according to the method for second aspect itself or the arbitrary foregoing embodiments according to first aspect or the arbitrary foregoing embodiments according to second aspect, account form based on the smoothing of described one group of function of the first and second smoothing coefficients is, the remember condition of first and second smoothed version of the described one group of function being multiplied by the first coefficient with the described one group of function being multiplied by the second coefficient is added, wherein said first coefficient is based on the first and second smoothing coefficients, described second coefficient is based on the first and second smoothing coefficients.
This type of recursive calculation uses memory to store the past value of the first and second smoothed version of described one group of function.The computational efficiency of recurrence smoothing is higher, because the number of addition and multiplication is less.The memory effect of recurrence smoothing is higher, because only need that group function of a remember condition smoothing of storing over, this remember condition upgrades in each calculation step.
In the tenth kind of form of implementation according to first aspect or the 11 kind of possible form of implementation according to the method for the tenth kind of form of implementation of second aspect, described method comprises further: if stability parameter spans threshold value, then upgrade the remember condition of the first smoothed version of described one group of function with the remember condition of the second smoothed version of described one group of function.
By upgrading the remember condition of the first smoothed version of described one group of function with the remember condition of the second smoothed version of described one group of function according to stability parameter, estimate that stability and speed are improved.When stability parameter stablizes higher than instruction the threshold value estimated, long-term smoothing can upgrade with short-term smoothing or replace, long-term smoothing namely, first smoothed version of described one group of function, and short-term smoothing namely, second smoothed version of described one group of function, thus estimating speed is promoted while maintenance stability.
According in first aspect itself or the 12 kind of possible form of implementation according to the method for second aspect itself or the arbitrary foregoing embodiments according to first aspect or the arbitrary foregoing embodiments according to second aspect, the first smoothing coefficient is higher than the second smoothing coefficient.
First smoothing coefficient is used for estimating for a long time, and the second smoothing coefficient is used for carrying out short term estimated, thus can distinguish different smoothing results.
According in first aspect itself or the 13 kind of possible form of implementation according to the method for second aspect itself or the arbitrary foregoing embodiments according to first aspect or the arbitrary foregoing embodiments according to second aspect, the smoothing of described one group of function is at least two successive frames about multi channel audio signal.
If use two or more successive frames of multi channel audio signal, so described smoothing will be more accurate.
According in first aspect itself or the 14 kind of possible form of implementation according to the method for second aspect itself or the arbitrary foregoing embodiments according to first aspect or the arbitrary foregoing embodiments according to second aspect, the smoothing of described one group of function can distinguish the second group coding parameter on the occasion of the negative value with the second group coding parameter.
By distinguish the second group coding parameter on the occasion of with negative value, described estimation can have higher accuracy.
According in the 14 kind of form of implementation of first aspect or the 15 kind of possible form of implementation according to the method for the 14 kind of form of implementation of second aspect, the smoothing of described one group of function comprises: for frequency window or the sub-band of some, calculate the second group coding parameter on the occasion of the first number and second number of negative value of the second group coding parameter.
To the differentiation carried out the second group coding parameter on the occasion of carrying out counting the sign that can realize according to the second group coding parameter with negative value.By carrying out described differentiation, estimating speed gets a promotion.
According to the third aspect, the present invention relates to multichannel audio coding device, described multichannel audio coding device for determine multi channel audio signal multiple audio channel signals in the coding parameter of an audio channel signals, each audio channel signals has audio channel signals value, described multichannel audio coding device comprises: the first determiner, it is that described audio channel signals determines one group of function according to the described audio channel signals value of described audio channel signals and the reference audio signal value of reference audio signal, wherein said reference audio signal is another audio channel signals in described multiple audio channel signals, second determiner, the first group coding parameter is determined in its smoothing based on the described one group function relevant to the frame sequence of described multi channel audio signal, and described smoothing is based on the first smoothing coefficient, 3rd determiner, the second group coding parameter is determined in its smoothing based on the described one group function relevant to the described frame sequence of described multi channel audio signal, and described smoothing is based on the second smoothing coefficient, and coding parameter determiner, it determines described coding parameter based on the quality standard relevant to the first group coding parameter and/or the second group coding parameter.
According to fourth aspect, the present invention relates to multichannel audio coding device, described multichannel audio coding device for determine multi channel audio signal multiple audio channel signals in the coding parameter of an audio channel signals, each audio channel signals has audio channel signals value, described multichannel audio coding device comprises: the first determiner, it is that described audio channel signals determines one group of function according to the described audio channel signals value of described audio channel signals and the reference audio signal value of reference audio signal, wherein said reference audio signal is the lower mixed audio signal that at least two audio channel signals obtain from described multiple multi channel audio signal, second determiner, the first group coding parameter is determined in its smoothing based on the described one group function relevant to the frame sequence of described multi channel audio signal, and described smoothing is based on the first smoothing coefficient, 3rd determiner, the second group coding parameter is determined in its smoothing based on the described one group function relevant to the described frame sequence of described multi channel audio signal, and described smoothing is based on the second smoothing coefficient, and coding parameter determiner, it, determine described coding parameter based on the quality standard relevant to the first group coding parameter and/or the second group coding parameter.
This type of multichannel audio coding device provides the optimum code about speed and stability.The version of the strong smoothing of described one group of function, such as, based on the smoothing of the first smoothing parameter, makes estimation become stable.The version of the weak smoothing of described one group of function, such as, based on the smoothing of the second smoothing parameter determined simultaneously, makes estimation follow estimated parameter, that is, ITD or CLD, and produces real Rapid Variable Design.Upgrade the memory of the version of the strong smoothing of described one group of function with the version of the weak smoothing of described one group of function, thus the optimal result relevant to tracking velocity and stability is provided.The decision that smoothed version adopts is the quality metric based on first group and/or the second group coding parameter.Therefore, stable and parameter Estimation is fast provided.
According to the 5th aspect, the present invention relates to a kind of computer program of tape program code, perform according to first aspect itself or the method according to second aspect itself or the arbitrary foregoing embodiments according to first aspect or the arbitrary foregoing embodiments according to second aspect when described computer program is used for running on computers.
According to the 6th aspect, the present invention relates to the machine-readable medium that memory etc. has computer program, especially CD, described computer program comprises program code, performs according to first aspect itself or the method according to second aspect itself or the arbitrary aforementioned claim according to first aspect or the arbitrary aforementioned claim according to second aspect during for running on computers.
The ITD that each aspect of the present invention mentioned above can be used in parametric spatial audio encoder estimates.In parametric spatial audio encoder or parameter multichannel audio coding device, first spatial parameter extracted and quantize, then in the bitstream multiplexing being carried out to it.Described parameter (such as ITD) can be estimated based on crosscorrelation in a frequency domain.In order to make estimation more stable, carry out strong smoothing to carry out parameter (ITD) estimation to frequency cross is relevant.In order to follow the actual Rapid Variable Design of parameter, also calculate the version of the weak smoothing that frequency cross is correlated with, described calculating is the almost instantaneous estimation carried out crosscorrelation based on realizing by weakening memory effect simultaneously.
The version of the weak smoothing of estimation function be used to estimated parameter (ITD) and upgrade when parameter status changes crosscorrelation strong smoothing version crosscorrelation memory.The decision used the version of weak smoothing is the quality metric based on estimated parameter.Described parameter estimates based on two versions of estimation function.Best estimate is retained, and if having selected the function of weak smoothing, so it is also for upgrading the version of strong smoothing.
Such as, when ITD estimates, the version of the weak smoothing of being correlated with based on frequency cross calculates ITD_inst (version of the weak smoothing of ITD).If ITD_inst lower than predetermined threshold, so will use the crosscorrelation from the version of weak smoothing to upgrade the crosscorrelation of strong smoothing for the standard deviation of some frequency window/sub-bands, and select the ITD that estimates with the function of weak smoothing.
Simple quality metric is the standard deviation estimated based on weak smoothing version ITD.Certainly, also available similar fashion uses other quality metrics.Such as, the possibility that position changes can calculate based on all available space information (CLD, ITD, ICC).As an example, relevant between the quick change of ITD and the quick change of CLD will represent the high likelihood of modifying to spatial image.
The method described herein software that can be used as in digital signal processor (DSP), microcontroller or any other limit processor is implemented or is implemented as the hardware circuit in special IC (ASIC).
Form of implementation of the present invention can be the combination of Fundamental Digital Circuit or computer hardware, firmware, software or more each.
Accompanying drawing explanation
Further embodiment of the present invention is described with reference to the following drawings, wherein:
Fig. 1 a is depicted as the schematic diagram of a kind of method for determining the code used parameter of audio channel signals according to form of implementation;
Fig. 1 b is depicted as the schematic diagram of a kind of method for determining the code used parameter of audio channel signals according to form of implementation;
Figure 2 shows that the schematic diagram of the ITD algorithm for estimating according to a kind of form of implementation;
Figure 3 shows that the schematic diagram of the CLD algorithm for estimating according to a kind of form of implementation;
Figure 4 shows that the block diagram of the parametric audio coders according to a kind of form of implementation;
Figure 5 shows that the block diagram of the parametric audio decoder according to a kind of form of implementation;
Figure 6 shows that the block diagram of parameter stereo audio coder according to a kind of form of implementation and decoder;
Figure 7 shows that the block diagram of the ITD selection algorithm according to a kind of form of implementation; And
Figure 8 shows that the schematic diagram describing interaural difference principle.
Detailed description of the invention
Fig. 1 a is depicted as the schematic diagram of the method 100a for determining the code used parameter of audio channel signals according to a kind of form of implementation.
Method 100a is the multiple audio channel signals x for determining multi channel audio signal 1, x 2example audio channel signal x 1coding parameter ITD, such as, interchannel time differences or interaural difference.Each audio channel signals x 1, x 2comprise audio channel signals value x 1[n], x 2[n].Method 100a comprises:
According to audio channel signals x 1audio channel signals value x 1[n] and reference audio signal x 2reference audio signal value x 2[n] is audio channel signals x 1determine 101 1 groups of function c [b], wherein said reference audio signal is another audio channel signals x in described multiple audio channel signals 2or from described multiple multi channel audio signal at least two audio channel signals x 1, x 2the lower mixed audio signal obtained;
103a first group coding parameter ITD [b] is determined in smoothing based on the described one group function c [b] relevant to the frame sequence i of multi channel audio signal, and described smoothing is based on the first smoothing coefficient S MW 1;
105a second group coding parameter ITD_inst [b] is determined in smoothing based on the described one group function c [b] relevant to the frame sequence i of multi channel audio signal, and described smoothing is based on the second smoothing coefficient S MW 2; And
107a coding parameter ITD is determined based on the quality standard relevant to described first group coding parameter ITD [b] and/or described second group coding parameter ITD_inst [b].
In a kind of form of implementation, determine that 107a coding parameter ITD comprises the stability of inspection second group coding parameter ITD_inst [b].If the second group coding parameter ITD_inst [b] is stable for all frequency window b, then select coding parameter ITD based on the second group coding parameter ITD_inst [b] as final estimation, and with based on the second smoothing coefficient S MW 2the smoothing of described one group of function c [b] upgrade based on the first smoothing coefficient S MW 1described one group of function c [b] smoothing memory.If the second group coding parameter ITD_inst [b] is unstable for all frequency window b, then select coding parameter ITD based on the first group coding parameter ITD [b] as final estimation.
In a kind of form of implementation, method 100a comprises the following steps:
For the estimation of parameter ITD, according to the input signal x based on the first smoothing coefficient 1[n], x 2[n] calculates 101a first function c [b] and calculates the smooth function c that 103a is associated sm[b].
For the estimation of parameter ITD, according to the input signal x based on the second smoothing coefficient 1[n], x 2[n] calculates 105a second smooth function c sm_inst[b].
Based on two smoothed version c of estimation function sm[b] and c sm_inst[b] calculates 107a parameter ITD and ITD instfirst and second estimate.
Check 107a parameter ITD instsecond estimate stability.If second of parameter estimates it is stable, then Selection parameter ITD instsecond estimate to estimate as final, and upgrade the memory of the first smooth function with the second smooth function.If second of parameter estimates it is unstable, then first of Selection parameter ITD estimates to estimate as final.
In a kind of form of implementation, method 100a comprises the following steps:
1. calculate an x 1[n] and the 2nd x 2the FFT of [n] channel signal.
2. calculate the crosscorrelation c [n] of those two channels in a frequency domain.
2.1. strong smoothing carried out to crosscorrelation c [n] and calculate and the first smoothing coefficient, that is, long-term smoothing coefficient, the ITD (the long-time estimation of interchannel time differences) of relevant each frequency window (or frequency band).
2.2. weak smoothing carried out to crosscorrelation c [n] and calculate and the second smoothing coefficient, that is, short-term smoothing coefficient, the ITD_inst (short time of interchannel time differences is estimated) of relevant each frequency window (or frequency band).
3. calculate mean value and the standard deviation of ITD_inst.
If 4. the standard deviation of ITD_inst is lower than threshold value, then upgrade the memory of the crosscorrelation of strong smoothing with the crosscorrelation of weak smoothing version, and the mean value exporting ITD_inst is as final ITD.If the standard deviation of ITD_inst is higher than threshold value, then export the mean value of ITD as final ITD.
Fig. 1 b is depicted as the schematic diagram of the method 100b for determining the code used parameter of audio channel signals according to a kind of form of implementation.
Method 100b is the multiple audio channel signals x for determining multi channel audio signal 1, x 2example audio channel signal x 1coding parameter CLD, such as, interchannel level difference.Each audio channel signals x 1, x 2comprise audio channel signals value x 1[n], x 2[n].Method 100b comprises:
According to audio channel signals x 1audio channel signals value x 1[n] and reference audio signal x 2reference audio signal value x 2[n] is audio channel signals x 1determine 101 1 groups of function c [b], wherein said reference audio signal is another audio channel signals x in described multiple audio channel signals 2or from described multiple multi channel audio signal at least two audio channel signals x 1, x 2the lower mixed audio signal obtained;
103b first group coding parameter CLD [b] is determined in smoothing based on the described one group function c [b] relevant to the frame sequence i of multi channel audio signal, and described smoothing is based on the first smoothing coefficient S MW 1;
105b second group coding parameter CLD_inst [b] is determined in smoothing based on the described one group function c [b] relevant to the frame sequence i of multi channel audio signal, and described smoothing is based on the second smoothing coefficient S MW 2; And
107b coding parameter CLD is determined based on the quality standard relevant to described first group coding parameter CLD [b] and/or described second group coding parameter CLD_inst [b].
In a kind of form of implementation, determine that 107b coding parameter CLD comprises the stability of inspection second group coding parameter CLD_inst [b].If the second group coding parameter CLD_inst [b] is stable for all frequency window b, then select coding parameter CLD based on the second group coding parameter CLD_inst [b] as final estimation, and with based on the second smoothing coefficient S MW 2the smoothing of described one group of function c [b] upgrade based on the first smoothing coefficient S MW 1described one group of function c [b] smoothing memory.If the second group coding parameter CLD_inst [b] is unstable for all frequency window b, then select coding parameter CLD based on the first group coding parameter CLD [b] as final estimation.
In a kind of form of implementation, method 100b comprises the following steps:
For the estimation of parameter CLD, according to the input signal x based on the first smoothing coefficient 1[n], x 2[n] calculates 101a first function c [b] and calculates the smooth function c that 103b is associated sm[b].
For the estimation of parameter CLD, according to the input signal x based on the second smoothing coefficient 1[n], x 2[n] calculates 105b second smooth function c sm_inst[b].
Based on two smoothed version c of estimation function sm[b] and c sm_inst[b] calculates 107b parameter CLD and CLD instfirst and second estimate.
Check 107b parameter CLD instsecond estimate stability.If second of parameter estimates it is stable, then Selection parameter CLD instsecond estimate to estimate as final, and upgrade the memory of the first smooth function with the second smooth function.If second of parameter estimates it is unstable, then first of Selection parameter CLD estimates to estimate as final.
In a kind of form of implementation, method 100b comprises the following steps:
1. calculate an x 1[n] and the 2nd x 2the FFT of [n] channel signal.
2. calculate the energy en [n] of those two channels in a frequency domain.
2.1. strong smoothing carried out to energy en [n] and calculate and the first smoothing coefficient, that is, long-term smoothing coefficient, the CLD (the long-time estimation of interchannel level difference) of relevant each frequency window (or frequency band).
2.2. weak smoothing carried out to energy en [n] and calculate and the second smoothing coefficient, that is, short-term smoothing coefficient, the CLD_inst (short time of interchannel level difference is estimated) of relevant each frequency window (or frequency band).
3. check the stability based on the stereo image of CLD_inst.
4. if stereo image is unstable, then upgrades the memory of the energy of strong smoothing with the energy of weak smoothing version, and export CLD_inst as final CLD.If stereo image is stable, then export CLD as final CLD.
Figure 2 shows that the schematic diagram of the ITD algorithm for estimating 200 according to a kind of form of implementation.
In first step 209, to the first input channel x 1the sample operate time frequency transformation of [n], thus obtain the first input channel x 1frequency representation X 1[k].In second step 211, to the second input channel x 2the sample operate time frequency transformation of [n], thus obtain the second input channel x 2frequency representation X 2[k].In the form of implementation of stereo input channel, the first input channel x 1can be L channel and the second input channel x 2can be R channel.In a preferred embodiment, temporal frequency is transformed to FFT (Fast Fourier Transform, FFT) or short time discrete Fourier transform (Short Term Fourier Transform, STFT).In an alternative embodiment, temporal frequency conversion is cosine modulated filter banks or Complex filter bank.
In third step 213, the cross spectrum c [b] of each sub-band exports channel x according to first and second 1, x 2frequency representation X 1[k] and X 2[k] calculates, and computing formula is
c [ b ] = Σ k = k b k b + 1 - 1 X 1 [ k ] X 2 * [ k ]
Wherein c [b] cross spectrum that is sub-band b.X 1[k] and X 2[k] is the FFT coefficient of two channels (such as, being L channel and R channel in stereosonic situation).
* complex conjugate is represented.K bfor the beginning window of sub-band b and k b+1for the beginning window of adjacent sub-bands b+1.Therefore, from k bto k b+1the frequency window [k] of the FFT of-1 represents sub-band [b].
Or calculate cross spectrum for each frequency window of FFT, computing formula is
c [ b ] = X 1 [ b ] X 2 * [ b ]
Wherein c [b] cross spectrum that is frequency window [b], and X 1[b] and X 2[b] is the FFT coefficient of two channels.* complex conjugate is represented.For this situation, sub-band [b] directly corresponds to a frequency window [k], and frequency window [b] represents identical frequency window just with [k].In this form of implementation, cross spectrum c [b] corresponds to the one group of function c [b] be described with reference to figure 1a and Fig. 1 b.
In the 4th step 215 and the 5th step 219, two version c of level and smooth cross spectrum sm[b, i] and c sm_inst[b, i] calculates according to cross spectrum c [b], and computing formula is
c sm[b,i]=SMW 1*c sm[b,i-1]+(1-SMW 1)*c[b]
c sm_inst[b,i]=SMW 2*c sm_inst[b,i-1]+(1-SMW 2)*c[b]
Wherein SMW 1and SMW 2for corresponding smoothing factor, and SMW 1> SMW 2.I is the frame index of the corresponding cross spectrum based on multi channel audio signal.Exemplary and in preferred embodiment at one, SMW 1=0.9844 and SMW 2=0.75.
In the 6th step 221 and the 7th step 223, for each window or each sub-band, two version ITD and ITD_inst of interchannel time differences are the cross spectrum c based on strong smoothing respectively smthe cross spectrum c of [b, i] and weak smoothing sm_inst[b, i] calculates, and computing formula is respectively
ITD [ b ] = ∠ c sm [ b , i ] * N π * b
ITD _ inst [ b ] = ∠ c sm _ inst [ b , i ] * N π * b
Wherein computing ∠ is the argument operator (argument operator) of the angle for calculating level and smooth cross spectrum.
N is the number of FFT window.
In the 8th step 225, the mean value of the strong smoothing version of interchannel time differences ITD calculates for all institute'ss focus window (or sub-band).
ITD mean = Σ b = B 1 B 2 ITD [ b ] B 2 - B 1
Wherein B 1and B 2by the index of first and last window (or sub-band) in concern frequency field.
In the 9th step 227 and the tenth step 229, the mean value ITD_inst of the version of the weak smoothing of interchannel time differences ITD_inst meanwith standard deviation ITD_inst stdcalculate for all paid close attention to frequency windows (or sub-band).
ITD _ inst mean = Σ b = B 1 B 2 ITD _ inst [ b ] B 2 - B 1
In the 11 step 231, by relatively checking, prerequisite is that the standard deviation of the version of the weak smoothing of interchannel time differences ITD_inst is less than threshold value (thr):
ITD_inst std<thr。If this is correct (Y=is), so in the 12 step 217 according to c sm[b, i]=c sm_inst[b, i] upgrades the first smooth function c sm[b, i], and by the mean value ITD_inst of the weak smoothing version of interchannel time differences ITD_inst in the 13 step 233 meanexport as final coding parameter ITD.If this is incorrect (N=is not), so in the 14 step 235 by the mean value ITD of the strong smoothing version of interchannel time differences ITD meanexport as final coding parameter ITD.
Step 209 mentioned above, 211 and 213 can be expressed as step 210, and described step 201 corresponds to reference to the step 101 described by figure 1a.Step 215 mentioned above and 221 can be expressed as step 203, and step 203 corresponds to reference to the step 103a described by figure 1a.Step 217 mentioned above, 219 and 223 can be expressed as step 210, and step 205 corresponds to reference to the step 101 described by figure 1a.Step 225 mentioned above, 227,229,231,233 and 235 can be expressed as step 207, and step 207 corresponds to reference to the step 107a described by figure 1a.
In the preferred embodiment that ITD estimates, coding parameter ITD calculates based on two smoothing version ITD and ITD_inst of interchannel time differences, and each in wherein said two smoothing version ITD and ITD_inst calculates determine according to following embodiment, respectively based on the positive and negative of ITD and ITD_inst:
To the version of the strong smoothing of interchannel time differences ITD on the occasion of performing counting with negative value.The mean value of positive ITD and negative ITD and standard deviation are the signs based on ITD, as follows:
ITD mean _ pos = Σ i = 0 i = M ITD ( i ) Nb pos Wherein ITD(i) >=0
ITD mean _ neg = &Sigma; i = 0 i = M ITD ( i ) Nb neg Wherein ITD (i) < 0
ITD std _ pos = &Sigma; i = 0 i = M ( ITD ( i ) - ITD mean _ pos ) 2 Nb pos Wherein ITD (i) >=0
ITD std _ neg = &Sigma; i = 0 i = M ( ITD ( i ) - ITD mean _ neg ) 2 Nb neg Wherein ITD (i) < 0
Wherein Nb posand Nb negbe respectively the number of positive ITD and negative ITD.
M is the total number of the ITD extracted.It should be noted that or, if ITD equals 0, so ITD can count in negative ITD, or is not counted in any one in mean value.
According to selection algorithm as depicted in Figure 7, based on mean value and standard deviation, from positive ITD and negative ITD, select ITD.
Identical calculating is performed to the weak smoothing version of interchannel time differences ITD_inst.
According in a kind of form of implementation for the application of the described method of multi-channel parameter audio codec, method 200 comprises the following steps:
In the first and second steps 209 and 211, to the frequency transformation of input channel operate time.In a preferred embodiment, temporal frequency is transformed to FFT (FFT) or short time discrete Fourier transform (STFT).In alternative embodiments, temporal frequency conversion can be cosine modulated filter banks or Complex filter bank.
In third step 213, the cross spectrum of the channel j of each sub-band calculates according to following formula:
c j [ b ] = &Sigma; k = k b k b + 1 - 1 X j [ k ] X ref * [ k ]
Wherein c j[b] is the cross spectrum of window b or sub-band b.X j[b] and X refthe FFT coefficient that [b] is channel j and reference channel.* complex conjugate is represented.K bfor the beginning window of frequency band b and k b+1for the beginning window of adjacent sub-bands b+1.Therefore, from k bto k b+1the frequency window [k] of the FFT of-1 represents sub-band [b].In a kind of form of implementation, select reference signal X refspectrum as channel X jthe spectrum of (j in [1, M]), and calculate M-1 spatial cues in a decoder subsequently.Substitute in form of implementation in one, X reffor the spectrum of monophonic down-mix signal, this spectrum is the mean value of all M channel, and calculates M spatial cues in a decoder subsequently.Use lower mixed signal to be as the advantage of the reference signal of multi channel audio signal, can avoid using un-voiced signal as reference signal.In fact, lower mixed signal represents the mean value of the energy of all channels, and therefore lessly becomes un-voiced signal.
Substitute in form of implementation in one, each frequency window for FFT calculates cross spectrum, and computing formula is:
c j [ b ] = X j [ b ] X ref * [ b ]
Wherein c j[b] is the cross spectrum of frequency window [b].X refthe spectrum that [b] is reference signal, and X jthe spectrum of each channel that [b] (j in [1, M]) is multi-channel signal.* complex conjugate is represented.For this situation, sub-band [b] directly corresponds to a frequency window [k], and frequency window [b] represents identical frequency window just with [k].
In the 4th step 215 and the 5th step 219, being calculated as follows of two versions of level and smooth cross spectrum
c j,sm[b,i]=SMW 1*c j,sm[b,i-1]+(1-SMW 1)*c j[b]
c j,sm_inst[b,i]=SMW 2*c j,sm_inst[b,i-1]+(1-SMW 1)*c j[b]
Wherein SMW 1and SMW 2for smoothing factor, and SMW 1> SMW 2.I is the frame index based on multi channel audio signal.In a preferred embodiment, SMW 1=0.9844 and SMW 2=0.75.
In the 6th step 221 and the 7th step 223, for each window or each sub-band, ITD and ITD_inst is the cross spectrum c based on strong smoothing respectively smand the cross spectrum c of weak smoothing sm_instcalculate, computing formula is respectively
ITD j [ b ] = &angle; c j , sm [ b , i ] * N &pi; * b
ITD _ inst j ( b ) = &angle; c j , sm _ inst [ b , i ] * N &pi; * b
Wherein computing ∠ is the argument operator of the angle for calculating level and smooth cross spectrum.
N is the number of FFT window.
In the 8th step 225, the mean value of ITD calculates for all institute'ss focus window (or sub-band).
ITD mean , j = &Sigma; b = B 1 B 2 ITD j [ b ] B 2 - B 1
Wherein B 1and B 2by the index of first and last window (or sub-band) in concern frequency field.
In the 9 6th step 227 and the tenth step 229, the mean value of ITD_inst and standard deviation calculate for all institute'ss focus window (or sub-band), and computing formula is as follows:
ITD _ inst mean , j = &Sigma; b = B 1 B 2 ITD _ inst j [ b ] B 2 - B 1
ITD _ inst std , j = &Sigma; b = B 1 B 2 ( ITD _ inst j [ b ] - ITD _ inst mean , j ) 2 B 2 - B 1
In the 11 step 231, according to ITD_inst std, j< threshold value, ITD_inst std, jbe less than threshold value thr on inspection.
If (the Y path) that be less than, so according to c j, sm[b, i]=c j, sm_inst[b, i] upgrades the first smooth function in the 12 step 217, and exports ITD_inst in the 13 step 233 j(ITD_inst mean, j) mean value as final ITD j.If not (the N path) that be less than, then in the 14 step 235, export ITD j(ITD mean, j) mean value as final ITD j.
In the preferred embodiment that ITD estimates, coding parameter ITD jtwo smoothing version ITD based on interchannel time differences jand ITD_inst jcalculate, wherein said two smoothing version ITD jand ITD_inst jin each according to following embodiment, respectively based on ITD jand ITD_inst jpositive and negative calculate determine:
To the version of the strong smoothing of interchannel time differences ITD on the occasion of performing counting with negative value.The mean value of positive ITD and negative ITD and standard deviation are the signs based on ITD, as follows:
ITD mean _ pos = &Sigma; i = 0 i = M ITD ( i ) Nb pos Wherein, ITD (i) >=0
ITD mean _ neg = &Sigma; i = 0 i = M ITD ( i ) Nb neg Wherein, ITD (i) < 0
ITD std _ pos = &Sigma; i = 0 i = M ( ITD ( i ) - ITD mean _ pos ) 2 Nb pos Wherein, ITD (i) >=0
ITD std _ neg = &Sigma; i = 0 i = M ( ITD ( i ) - ITD mean _ neg ) 2 Nb neg Wherein, ITD (i) < 0
Wherein Nb posand Nb negbe respectively the number of positive ITD and negative ITD.
M is the total number of the ITD extracted.It should be noted that or, if ITD equals 0, so ITD can count in negative ITD, or is not counted in any one in mean value.
According to selection algorithm as depicted in Figure 7, based on mean value and standard deviation, from positive ITD and negative ITD, select ITD.
Figure 3 shows that the schematic diagram of the CLD algorithm for estimating according to a kind of form of implementation.
In first step 309, to the first input channel x 1the sample operate time frequency transformation of [n], thus obtain the first input channel x 1frequency representation X 1[k].In second step 311, to the second input channel x 2the sample operate time frequency transformation of [n], thus obtain the second input channel x 2frequency representation X 2[k].In the form of implementation of stereo input channel, the first input channel x 1can be L channel and the second input channel x 2can be R channel.In a preferred embodiment, temporal frequency is transformed to FFT (FFT) or short time discrete Fourier transform (STFT).In an alternate embodiment, temporal frequency conversion is cosine modulated filter banks or Complex filter bank.
In third step 313, for each sub-band, the first channel x 1energy en 1[b] and second channel x 2energy en 2[b] calculates according to following formula
en 1 [ b ] = &Sigma; k = k b k b + 1 - 1 X 1 [ k ] X 1 * [ k ]
en 2 [ b ] = &Sigma; k = k b k b + 1 - 1 X 2 [ k ] X 2 * [ k ]
Wherein en 1[b] and en 2[b] is the energy of sub-band b.X 1[k] and X 2[k] is the FFT coefficient of two channels (such as, being L channel and R channel in stereosonic situation).
* complex conjugate is represented.K bfor the beginning window of frequency band b and k b+1for the beginning window of adjacent sub-bands b+1.Therefore, from k bto k b+1the frequency window [k] of the FFT of-1 represents sub-band [b].
Or, for two channel x of frequency window each in FFT 1and x 2energy calculate according to following formula:
en 1 [ b ] = X 1 [ b ] X 1 * [ b ]
en 2 [ b ] = X 2 [ b ] X 2 * [ b ]
Wherein en 1[b] and en 2[b] is respectively the energy of the frequency window [b] of the first and second channels, X 1[b] and X 2[b] is the FFT coefficient of two channels.* complex conjugate is represented.For this situation, sub-band [b] directly corresponds to a frequency window [k], and frequency window [b] represents identical frequency window just with [k].
The first channel x is determined in the 4th step 315 1the version en of the strong smoothing of energy 1_sm[b, i] and second channel x 2the version en of the strong smoothing of energy 2_sm[b, i], and the first channel x is determined in the 5th step 319 1the version en of the weak smoothing of energy 1_sm_inst[b, i] and second channel x 2the version en of the weak smoothing of energy 2_sm_inst[b, i], determine that formula is:
en 1_sm[b,i]=SMW 1*en 1_sm[b,i-1]+(1-SMW 1)*en 1[b]
en 1_sm_inst[b,i]=SMW 2*en 1_sm_inst[b,i-1]+(1-SMW 2)*en 1[b]
en 2_sm[b,i]=SMW 1*en 2_sm[b,i-1]+(1-SMW 1)*en 2[b]
en 2_sm_inst[b,i]=SMW 2*en 2_sm_inst[b,i-1]+(1-SMW 2)*en 2[b]
Wherein SMW 1and SMW 2for smoothing factor or smoothing coefficient, and SMW 1> SMW 2, that is, SMW 1for strong smoothing factor and SMW 2for weak smoothing factor.I is frame index.In a kind of form of implementation of the definite differentiation according to CLD, SMW 2be set to zero.
In the 6th step 321 and in the 7th step 323, for each window or each sub-band, respectively based on the energy en of strong smoothing 1_smand en 2_smand the energy en of weak strong smoothing 1_sm_instand en 2_sm_instcalculate the version CLD_inst of the version CLD of the strong smoothing of interchannel level difference and the weak smoothing of interchannel level difference, computing formula is as follows:
CLD [ b ] = 10 log ( en 1 _ sm [ b ] en 2 _ sm [ b ] )
CLD _ inst [ b ] = 10 log ( en 1 _ sm _ inst [ b ] en 2 _ sm _ inst [ b ] )
In the 8th step 329, the version CLD_inst based on the weak smoothing of interchannel level difference calculates the stability of stereo image.In a kind of form of implementation, the method described in patent publication " WO2010/079167A1 " determines stability mark, that is, meter sensitivity is measured.The measurable present frame of described sensitivity measure is to the sensitivity level showing error in long-term forecast (LTP) filter status because of packet loss.Described sensitivity measure calculates according to following formula:
s=0.5PG LTP+0.5PG LTP,HP
Wherein PG lTPfor long-term prediction gain, it is according to LPC (linear predictive coding) residue signal r lPCwith LTP (long-term forecast) residue signal r lTPthe metering system of energy Ratios measure, and PG lTP, HPfor running PG by the first rank high-pass filter lTPand the signal obtained, described operation is according to following formula:
PG LTP,HP(n)=PG LTP(n)-PG LTP(n-1)+0.5PG LTP,HP(n-1)。
Sensitivity measure is the LTP prediction gain of same measurement and the combination of high pass version.LTP prediction gain is selected to be because LTP state error is directly associated with output signal error by it.High-pass part is added to emphasize that signal changes.After packet loss, the risk that the signal changed produces gross error propagation is very high, because the LTP state in encoder will be probably far from it.
Sensitivity measure will export the mark representing stereo image stability.In comparison step 331, mark is one or zero on inspection.If mark equals zero (path N), so stereo image is stable, and interchannel level difference CLD does not have large change between two successive frames.If mark equals one (path Y), so stereo image is unstable, and means that the change of interchannel level difference CLD between two successive frames is exceedingly fast.
In the 9th step 331, the stability mark exported from previous steps 329 is checked.If stability mark equals one (path Y), then in the tenth step 317, upgrade memory, that is, upgrade the energy of strong smoothing according to following equation with the energy of weak smoothing:
En 1_sm[b, i]=en 1_sm_inst[b, i] and en 2_sm[b, i]=en 2_sm_inst[b, i], and in the 11 step 333, the version CLD_inst of the weak smoothing of interchannel level difference is output as final coding parameter CLD.If stability mark equals zero (path N), then in the 12 step 335, the version CLD of the strong smoothing of interchannel level difference is exported as final coding parameter CLD.
Step 309 mentioned above, 311 and 313 can be expressed as step 301, and step 301 corresponds to reference to the step 101 described by figure 1b.Step 315 mentioned above and 321 can be expressed as step 303, and step 303 corresponds to reference to the step 103b described by figure 1b.Step 317 mentioned above, 319 and 323 can be expressed as step 305, and step 305 corresponds to reference to the step 105b described by figure 1b.Step 329 mentioned above, 331,333 and 335 can be expressed as step 307, and step 307 corresponds to reference to the step 107b described by figure 1b.
Figure 4 shows that the block diagram of the parametric audio coders 400 according to a kind of form of implementation.Multi channel audio signal 401 receives as input signal and provides bit stream as output signal 403 by parametric audio coders 400.Parametric audio coders 400 comprises: parametric generator 405, and it is coupled to multi channel audio signal 401 to generate coding parameter 415; Lower mixed signal generator 407, its be coupled to multi channel audio signal 401 in case generate lower mixed signal 411 or and signal; Audio coder 409, it is coupled to lower mixed signal generator 407 to encode to provide encoded audio signal 413 to lower mixed signal 411; And combiner 417, such as, be coupled to parametric generator 405 and audio coder 409 to form the bit stream shaper of bit stream 403 from coding parameter 415 and coded signal 413.
Parametric audio coders 400 implements audio coding scheme for stereo and multi channel audio signal, and parametric audio coders 400 only transmits a single audio frequency channel, and such as, the lower mixed expression of input voice-grade channel adds voice-grade channel x 1, x 2..., x mbetween " perceptually relevant difference " additional parameter of being described.Described encoding scheme is according to binaural cues coding (BCC), because binaural cues serves important function in described encoding scheme.As shown in FIG., voice-grade channel x is inputted 1, x 2..., x munder mix a single audio frequency channel 411, single audio frequency channel 411 is also expressed as and signal.As voice-grade channel x 1, x 2..., x mbetween " perceptually relevant difference ", the coding parameters 415 such as interchannel time differences (ICTD), interchannel level difference (ICLD) and/or inter-channel coherence (ICC) are carried out estimating as frequency and the function of time and are transferred to decoder 500 depicted in figure 5 as side information.
The parametric generator 405 implementing BCC adopts specific time and frequency resolution to process multi channel audio signal 401.The frequency resolution used receives exciting of the frequency resolution of auditory system to a great extent.Psychologic acoustics shows that spatial perception probably represents based on the critical band of acoustical input signal.Consider the mode of this frequency resolution be use there is particular sub-band can inverse filterbank, the bandwidth of described particular sub-band equals the critical bandwidth or proportional with the critical bandwidth of auditory system of auditory system.Importantly transmitted all component of signals that will comprise multi channel audio signal 401 with signal 411.Object is to maintain each component of signal completely.To the audio input channel x of multi channel audio signal 401 1, x 2..., x mcarry out amplification or decay that simple summation can cause component of signal usually.In other words, " simply " and in the power of component of signal be usually greater than or less than each channel x 1, x 2..., x mthe power sum of respective signal component.Therefore, by adopting lower mixing device 407 to use lower mixed technology, lower mixing device 407 makes with signal 411 balanced, thus makes all input voice-grade channel x with the power of the component of signal in signal 411 and multi channel audio signal 401 1, x 2..., x min corresponding power roughly the same.Input voice-grade channel x 1, x 2..., x mresolve into many sub-bands.This type of sub-band is expressed as X 1[b] (note, in order to simply represent, not use sub-band index).Similar process is applied independently in all sub-bands, and usually, subbands signal carries out lower sampling.The signal of each sub-band of each input channel is added and is multiplied with power normalization factor subsequently.
Given and signal 411, parametric generator 405 extracts space encoding parameter 415 with the correspondence prompting making ICTD, ICLD and/or ICC be similar to original multi-channel audio signal 401.
When binaural room impulse response (BRIR) of a consideration sound source, there is certain relation at the width of the sense of hearing and hearer's Ambience (listener envelopment) and for early stage of binaural room impulse response and latter portions between the IC estimated.But be not only the relation between IC or ICC and BRIR, the relation between these character of IC or ICC and general signal neither be simple and clear.Stereo and multi channel audio signal comprises the COMPLEX MIXED of simultaneously active source signal usually, described source signal is that the reflected signal component caused by the recording in enclosure space is formed by stacking, or by recording engineer in order to artificially space for the creativity impression is added.Different sound-source signal and be reflected in temporal frequency plane and occupy different regions.This ICTD, ICLD and ICC changed by the function along with time and frequency reflects.In the case, instantaneous ICTD, ICLD and ICC and the relation between sense of hearing direction and spatial impression not obvious.The strategy of parametric generator 405 is, extracts these promptings gropingly with the correspondence prompting making these promptings be similar to original audio signal.
In a kind of form of implementation, parametric audio coders 400 uses the bank of filters with particular sub-band, and the bandwidth of described particular sub-band equals the twice of equivalent rectangular bandwidth.The informal audio quality disclosing the BCC when selecting higher frequency resolution of listening to does not represent a significant improvement.Lower frequency resolution is preferably, because it makes to need ICTD, ICLD and ICC value being transferred to decoder less, and therefore produces comparatively low bit rate.About temporal resolution, ICTD, ICLD and ICC consider within the time interval of routine.In a kind of form of implementation, consider ICTD, ICLD and an ICC for approximately every 4 to 16 milliseconds.Unless it should be noted that consideration prompting within the extremely short time interval, otherwise directly do not considered precedence effect.
The perceptually less difference obtained between reference signal to composite signal through being everlasting shows: the prompting relevant with multiple auditory space image attributes obtains the consideration of implicit expression by synthesizing ICTD, ICLD and ICC within the time interval of routine.The bit rate transmitted needed for these spatial cues is only several kb/s, and therefore parametric audio coders 400 can to transmit stereo and multi channel audio signal close to the bit rate needed for single audio frequency channel.Fig. 1 and Fig. 2 illustrates and ITD is carried out as coding parameter 415 method estimated.Fig. 1 b and Fig. 3 illustrates and CLD is carried out as coding parameter 415 method estimated.
Parametric audio coders 400 comprises: lower mixed signal generator 407, and it is for superposing to obtain lower mixed signal 411 at least two audio channel signals of multi channel audio signal 401; Audio coder 409 is exactly monophonic encoder, and it is for encoding to lower mixed signal 411 audio signal 413 obtaining coding; And combiner 417, it is for closing road by encoded audio signal 413 with corresponding coding parameter 415.
What parametric audio coders 400 generated multi channel audio signal 401 is expressed as x 1, x 2..., x mmultiple audio channel signals in the coding parameter 415 of an audio channel signals.Described audio channel signals x 1, x 2..., x min each signal can be comprise being expressed as x 1[n], x 2[n] ..., x mthe data signal of the digital audio channels signal value of [n].
Parametric audio coders 400 generate coding parameter 415 for exemplary audio channel signal be that there is signal value x 1the first audio channel signals x of [n] 1.Parametric generator 405 is according to the first audio signal x 1audio channel signals value x 1[n] and reference audio signal x 2reference audio signal value x 2[n] determines coding parameter ITD.
Such as, the audio channel signals as reference audio signal is the second audio channel signals x 2.Similarly, audio channel signals x 1, x 2..., x min any other signal all can be used as reference audio signal.According to first aspect, reference audio signal is another audio channel signals of audio channel signals, this audio channel signals be different from generate coding parameter 415 for audio channel signals x 1.
According to second aspect, reference audio signal is the lower mixed audio signal obtained from least two audio channel signals described multiple multi channel audio signal 401, such as, from the first audio channel signals x 1with the second audio channel signals x 2obtain.In a kind of form of implementation, reference audio signal is lower mixed signal 411, also referred to as and signal, it is generated by lower mixing device 407.In a kind of form of implementation, reference audio signal is the signal 413 of the coding that encoder 409 provides.
The exemplary reference audio signal that parametric generator 405 uses has signal value x 2the second audio channel signals x of [n] 2.
Parametric generator 405 determines audio channel signals x 1audio channel signals value x 1the frequency transformation of [n] and reference audio signal x 1reference audio signal value x 2the frequency transformation of [n].Reference audio signal is another audio channel signals x in described multiple audio channel signals 2or from least two audio channel signals x described multiple audio channel signals 1, x 2the lower mixed audio signal obtained.
Parametric generator 405 is at least for each sub-band determination inter-channel difference in the subset of sub-band.Each inter-channel difference instruction, the time difference ITD [b] between the frequency band constrained signal part of the respective sub-bands example audio channel signal associated by inter-channel difference and the frequency band constrained signal part of reference audio signal or phase difference IPD [b] or level difference CLD [b].
Interchannel phase difference (ICPD) be signal between average phase-difference.Interchannel level difference (ICLD) is identical with level difference between ear (ILD), namely, level difference at one ear and out at the other between message number and auris dextra entry signal, but interchannel level difference (ICLD) be more broadly defined as any signal between level difference, such as, loudspeaker signal to, ear entry signal to etc.Inter-channel coherence or interchannel are correlated with identical with coherence between ear (IC), namely, similar degree at one ear and out at the other between message number and auris dextra entry signal, but inter-channel coherence or interchannel be correlated with more be broadly defined as any signal between similar degree, such as, loudspeaker signal to, ear entry signal to etc.Interchannel time differences (ICTD) is identical with level difference between ear (ILD), sometimes also referred to as time delay between ear, namely, time difference at one ear and out at the other between message number and auris dextra entry signal, but interchannel time differences (ICTD) be more broadly defined as any signal between time difference, such as, loudspeaker signal to, ear entry signal to etc.Between sub-band channel between level difference, sub-band channel between phase difference, sub-band channel between coherence and sub-band channel intensity difference with above about the relating to parameters that sub-band bandwidth describes in detail.
Parametric generator 405 is for implementing above with reference to a kind of method in the method described by figure 1a, Fig. 1 b, Fig. 2 and Fig. 3.
In a kind of form of implementation, parametric generator 405 comprises:
First determiner, it is according to audio channel signals (x 1) described audio channel signals value (x 1[n]) and reference audio signal (x 2) reference audio signal value (x 2[n]) be audio channel signals (x 1) determine one group of function (c [b]), wherein said reference audio signal is another audio channel signals (x in described multiple audio channel signals 2) or from described multiple multi channel audio signal at least two audio channel signals (x 1, x 2) the lower mixed audio signal that obtains;
Second determiner, the first group coding parameter (ITD [b], CLD [b]) is determined in its smoothing based on the described one group function (c [b]) relevant to the frame sequence (i) of described multi channel audio signal, and described smoothing is based on the first smoothing coefficient (SMW 1);
3rd determiner, the second group coding parameter (ITD_inst [b], CLD_inst [b]) is determined in its smoothing based on the described one group function (c [b]) relevant to the frame sequence (i) of described multi channel audio signal, and described smoothing is based on the second smoothing coefficient (SMW 2); And
Coding parameter determiner, it determines coding parameter (ITD, CLD) based on the quality standard relevant to the first group coding parameter (ITD [b], CLD [b]) and/or the second group coding parameter (ITD_inst [b], CLD_inst [b]).
Figure 5 shows that the block diagram of the parametric audio decoder 500 according to a kind of form of implementation.Bit stream 503 via traffic channel receives as input signal and provides the multi channel audio signal 501 of decoding as output signal by parametric audio decoder 500.Parametric audio decoder 500 comprises: bit stream decoding device 517, and it is coupled to bit stream 503 bit stream 503 to be decoded as the signal 513 of coding parameter 515 and coding; Decoder 509, it is coupled to bit stream decoding device 517 to generate and signal 511 according to the signal 513 of coding; Parametric solution parser 505, it is coupled to bit stream decoding device 517 to parse parameter 521 according to coding parameter 515; And synthesizer 505, its be coupled to parametric solution parser 505 and decoder 509 in case from parameter 521 and and signal 511 synthesize the multi channel audio signal 501 of decoding.
Parametric audio decoder 500 generates the output channel of its multi channel audio signal 501, makes ICTD, ICLD and/or ICC between channel be similar to ICTD, ICLD and/or ICC of original multi-channel audio signal.Described scheme can represent multi channel audio signal under only a little more than the bit rate representing bit rate needed for monophonic audio signal.Such reason is, channel between estimated by information two orders of magnitude fewer than audio volume control that comprise of ICTD, ICLD and ICC.That paid close attention to is not only low bit rate but also has backward compatibility aspect.Transmit and signal correspond to the monophonic of stereo or multi-channel signal under mixed.
Figure 6 shows that the block diagram of parameter stereo audio coder 601 according to a kind of form of implementation and decoder 603.Parameter stereo audio coder 601 corresponds to reference to the parametric audio coders 400 described by figure 4, but multi channel audio signal 401 is the stereo audio signals with left audio channel 605 and right voice-grade channel 607.
Stereo audio signal 605,607 receives as input signal and provides bit stream as output signal 609 by parameter stereo audio coder 601.Parameter stereo audio coder 601 comprises: parametric generator 611, and it is coupled to stereo audio signal 605,607 so that span parameter 613; Lower mixed signal generator 615, its be coupled to stereo audio signal 605,607 so as to generate lower mixed signal 617 or and signal; Monophonic encoder 619, it is coupled to lower mixed signal generator 615 to encode to provide the audio signal 621 of coding to lower mixed signal 617; And bit stream combiner 623, it is coupled to parametric generator 611 and monophonic encoder 619 to close in road to bit stream the audio signal 621 of coding parameter 613 and coding to provide output signal 609.In parametric generator 611, first spatial parameter 613 extracted and quantize, then in the bitstream multiplexing being carried out to it.
Parameter stereo audio decoder 603 is by this bit stream, namely, via the output signal 609 of the parameter stereo audio coder 601 of traffic channel, receive as input signal and provide the stereo audio signal of the decoding with L channel 625 and R channel 627 as output signal.Parameter stereo audio decoder 603 comprises: bit stream decoding device 629, and it is coupled to received bit stream 609 bit stream 609 to be decoded as the signal 633 of coding parameter 631 and coding; Mono decoder 635, it is coupled to bit stream decoding device 629 to generate and signal 637 according to the signal 633 of coding; Spatial parameter resolver 639, it is coupled to bit stream decoding device 629 to parse spatial parameter 641 according to coding parameter 631; And synthesizer 643, its be coupled to spatial parameter resolver 639 and mono decoder 635 in case from spatial parameter 641 and and signal 637 synthesize the stereo audio signal 625,627 of decoding.
Process in parameter stereo audio decoder 603 can be introduced time delay and revise Audio Meter with span parameter 631 according to time and frequency adaptively, such as, interchannel time differences (ICTD) and interchannel level difference (ICLD).In addition, parameter stereo audio decoder 603 carrys out time of implementation adaptive filtering effectively for inter-channel coherence (ICC) synthesis.In a kind of form of implementation, parametric stereo encoder uses short time discrete Fourier transform (the short time Fourier transform based on bank of filters, STFT), effectively to implement binaural cues coding (BCC) scheme with low computational complexity.Process in parameter stereo audio coder 601 has low computation complexity and low time delay, and therefore parameter stereo audio coding is applicable to being implemented on the microprocessor for application in real time or digital signal processor with bearing.
Except with the addition of spatial cues quantification and coding except, parametric generator 611 depicted in figure 6 is identical with the function of the corresponding parametric generator 405 described with reference to figure 4.With signal 617 be adopt traditional monophonic audio encoder 619 encode.In a kind of form of implementation, parameter stereo audio coder 601 uses the temporal frequency conversion based on STFT to convert the stereo audio channels signal 605,607 in frequency domain.DFT (discrete Fourier transform, DFT) is applied to the part of windowing of input signal x (n) by STFT.Before utilization N point DFT, the window being W by the signal frame of N number of sample and length is multiplied.Adjacent window apertures is overlapping and is displaced W/2 sample.Select to make overlaid windows add up to constant value 1 to window.Therefore, inverse transformation is not needed additionally to window.Size is N and the simple inverse DFT that the Timing Advance of successive frame is W/2 sample is used in decoder 603.If spectrum unmodified, so will be obtained by overlap/phase Calais and perfectly rebuild.
Uniform spectrum resolution ratio due to STFT is not too applicable to the perception of the mankind, and therefore the uniform spectral coefficient in the interval of STFT exports the B non-overlapping partition being grouped into bandwidth ratio and being comparatively applicable to perception.According to the description carried out with reference to figure 4, a subregion conceptually corresponds to one " sub-band ".Substitute in form of implementation in one, parameter stereo audio coder 601 uses Nonuniform Filter Banks to convert the stereo audio channels signal 605,607 in frequency domain.
In a kind of form of implementation, down-mixer 615 determines the spectral coefficient of an a subregion b or sub-band b of equilibrium and signal Sm (k) 617, and this determines that formula used is
S m ( k ) = e b ( k ) &Sigma; c = 1 C X c , m ( k ) ,
Wherein Xc, m (k) are the spectrums of input voice-grade channel 605,607, and eb (k) is gain
Being calculated as follows of factor:
e b ( k ) = &Sigma; c = 1 C p x ~ c , b ( k ) p x ~ b ( k ) ,
Wherein being estimated as follows of division power,
p x ~ c , b ( k ) = &Sigma; m = A b - 1 A b - 1 | X c , m ( k ) | 2
p x ~ b ( k ) = &Sigma; m = A b - 1 A b - 1 | &Sigma; c = 1 C X c , m ( k ) | 2 .
In order to avoid the artefact caused by large gain factor when sub-band signal sum significantly decays, gain factor eb (k) is limited to 6dB, that is, eb (k)≤2.
In a kind of form of implementation of parameter stereo audio coder 601 and decoder 603, the type of ITD information (Whole frequency band) is signaled for a very short time remote decoder 603.In a kind of form of implementation, rely on the auxiliary data transmitted at least one bit stream, performed the signaling of described type by conceal signaling.Substitute in form of implementation in one, rely on the mark of instruction corresponding bits stream type, perform described signaling by explicit signaling.In a kind of form of implementation, likely switch comprising between the first signaling option of conceal signaling and the second signaling option comprising explicit signaling.In a kind of form of implementation of conceal signaling, the existence of secondary channel information in the auxiliary data in the bit stream of at least one back compatible of mark instruction.Whether old-fashioned decoder not checkmark exists, and only decodes to the bit stream of back compatible.Such as, the signaling of secondary channel bit stream can be included in the auxiliary data of AAC bit stream.In addition, secondary bit stream also can be included in the auxiliary data of AAC bit stream.In this case, old-fashioned AAC decoder is only decoded to the back compatible part of bit stream and is abandoned auxiliary data.In a kind of form of implementation of parameter stereo audio coder 601 and decoder 603, the existence of this type of mark obtains inspection, and if mark is present in received bit stream, so decoder 603 rebuilds multi channel audio signal based on extra Whole frequency band ITD information.
In a kind of form of implementation of explicit signaling, mark indication bit stream is the new bit stream using new-type but not old-fashioned encoder and obtain.Old-fashioned decoder can not be decoded to described bit stream, because old-fashioned decoder does not know how to explain that this indicates.But the decoder 603 according to a kind of form of implementation can only be decoded to back compatible part or whole multi channel audio signal, and can determine that only carrying out decoding to back compatible part still decodes to whole multi channel audio signal.
According to the benefit hereafter can understanding this type of backward compatibility.Comprise and can determine to decode to back compatible part according to a kind of mobile terminal of decoder 603 of form of implementation, to save the battery life of integrated type battery when complexity load is lower.In addition, by means of presenting system, decoder 603 can determine part to be decoded in bit stream.Such as, present for employing earphone, the back compatible part of received signal may be enough, but multi channel audio signal is only connected to such as to have when multichannel presents the docking station of ability in terminal and is just decoded.
In a kind of form of implementation, with reference to the method described by the one in figure 1a, Fig. 1 b, Fig. 2 and Fig. 3 be employed for ITU-T G.722, in the encoder of G.722 accessories B, G.711.1 and/or the G.711.1 stereophonic widening of annex D.In addition, in a kind of form of implementation, voice for the Mobile solution defined in 3GGP EVS (enhancement mode voice service) codec and audio coder is employed for reference to the method described by the one in figure 1a, Fig. 1 b, Fig. 2 and Fig. 3.
In a kind of form of implementation, be used to auditory scene analysis with reference to the method described by the one in figure 1a, Fig. 1 b, Fig. 2 and Fig. 3.In this case, can be used alone or combine and use one in ITD estimates or CLD estimates embodiment to assess the characteristic of spatial image and to detect the position of sound source in audio scene.
Figure 7 shows that the schematic diagram of the ITD selection algorithm according to a kind of form of implementation.
In first step 701, relative to the number N b of negative ITD value negcheck the number N b of positive ITD value pos.If Nb posbe greater than Nb neg, then step 703 is performed; If Nb posbe not more than Nb neg, then step 705 is performed.
In step 703, relative to the standard deviation ITD of negative ITD std_negcheck the standard deviation ITD of positive ITD std_pos, and relative to the number N b of negative ITD value negbe multiplied with the first factor A and check the number N b of positive ITD value pos, such as basis:
(ITD std_pos<ITD std_neg)||(Nb pos>=A*Nb neg)。If ITD std_pos< ITD std_negor Nb pos> A*Nb neg, in step 707, so select ITD as the mean value of positive ITD.Otherwise, the relation between positive ITD and negative ITD will be checked further in step 709.
In step 709, relative to the standard deviation ITD of positive ITD std_posbe multiplied with the second factor B and check the standard deviation ITD of negative ITD std_neg, such as basis: (ITD std_neg< B*ITD std_pos).If ITD std_neg< B*ITD std_pos, will the inverse value of negative ITD mean value be selected as output ITD so in a step 715.Otherwise, the ITD from previous frame (Pre_itd) will be checked in step 717.
In step 717, the ITD from previous frame is checked, judge whether it is greater than zero, such as, according to " rPre_itd > 0 ".If Pre_itd > 0, so select to export the mean value of ITD as positive ITD in step 723, otherwise, in step 725, export the inverse value that ITD is negative ITD mean value.
In step 705, relative to the standard deviation ITD of positive ITD std_poscheck the standard deviation ITD of negative ITD std_neg, and relative to the number N b of positive ITD value posbe multiplied with the first factor A and check the number N b of negative ITD value neg, such as basis: (ITD std_neg< ITD std_pos) || (Nb neg>=A*Nb pos).If ITD std_neg< ITD std_posor Nb neg> A*Nb pos, select ITD as the mean value of negative ITD so in step 711.Otherwise, in step 713, check the relation between negative ITD and positive ITD further.
In step 713, relative to the standard deviation ITD of negative ITD std_negbe multiplied with the second factor B and check the standard deviation ITD of positive ITD std_pos, such as basis: (ITD std_pos< B*ITD std_neg).If ITD std_pos< B*ITD std_neg, in step 719, so select the inverse value of positive ITD mean value as output ITD.Otherwise, check the ITD from previous frame (Pre_itd) in step 721.
In step 721, the ITD from previous frame is checked, judge whether it is greater than zero, such as, according to " rPre_itd > 0 ".If Pre_itd > 0, so select to export the mean value of ITD as negative ITD in step 727, otherwise, in step 729, export the inverse value that ITD is positive ITD mean value.
Version (the ITD of the strong smoothing based on cross spectrum is obtained respectively for positive ITD and negative ITD mean) ITD with based on the version (ITD of the weak smoothing of cross spectrum mean_inst) ITD between selection.Finally, according in Fig. 7 described to ITD decision.
According to above content, those skilled in the art will be well understood to, and provide the computer program etc. on multiple method, system, recording medium.
The present invention goes back the computer program that support package contains computer-executable code or computer executable instructions, and these computer-executable code or computer executable instructions can make at least one computer perform execution as herein described and calculation procedure when performing.
The present invention also supports the system for performing execution as herein described and calculation procedure.
According to above teaching, the technical staff in described field by be easy to expect many other substitute products, amendment and variant.Obviously, those skilled in the art is easy to expect, except application as herein described, also there are other application numerous of the present invention.Although describe the present invention with reference to one or more specific embodiment, those skilled in the art will realize that under the prerequisite not departing from spirit of the present invention and scope, still can make many changes to the present invention.Therefore, as long as should be understood that in the scope of appended claims and equivalent thereof, so also the present invention can be put into practice with being different from specifically described mode herein.

Claims (14)

1. one kind for determining multiple audio channel signals example audio channel signal x of multi channel audio signal 1the method of coding parameter, each audio channel signals has audio channel signals value, and described method comprises:
According to described audio channel signals x 1audio channel signals value and the reference audio signal value of reference audio signal be described audio channel signals x 1determine one group of function, wherein said reference audio signal is another audio channel signals or at least two audio channel signals obtain from described multiple channel audio signal lower mixed audio signal in described multiple audio channel signals;
The first group coding parameter is determined in smoothing based on the described one group function relevant to the frame sequence of described multi channel audio signal, and described smoothing is based on the first smoothing coefficient;
The second group coding parameter is determined in smoothing based on the described one group function relevant to the described frame sequence of described multi channel audio signal, and described smoothing is based on the second smoothing coefficient; And
Described audio channel signals x is determined based on the quality standard relevant to described first group coding parameter and/or described second group coding parameter 1coding parameter.
2. method according to claim 1, wherein saidly determine that described one group of function comprises:
Determine described audio channel signals x 1the frequency transformation of audio channel signals value;
Determine the frequency transformation of the reference audio signal value of described reference audio signal;
Each sub-band in the subset of subbands, described one group of function is defined as cross spectrum or crosscorrelation, each function in described one group of function calculates between the frequency band constrained signal part and the frequency band constrained signal part of described reference audio signal of described audio channel signals, and these frequency band constrained signal parts are in the respective sub-bands joined with each functional dependence in described one group of function.
3. method according to claim 2, its sub-bands comprises one or more frequency window.
4. the method according to claim arbitrary in aforementioned claim, wherein said first group coding parameter and described second group coding parameter comprise inter-channel difference, and wherein said inter-channel difference comprises interchannel time differences and/or interchannel level difference.
5. the method according to claim arbitrary in claims 1 to 3, wherein based on quality standard to described audio channel signals x 1the determination carried out of coding parameter comprise and determine stability parameter, described stability parameter is used for described quality standard.
6. method according to claim 5, wherein to described audio channel signals x 1the described of coding parameter determine to comprise:
Based on the described second group coding parameter relevant to described frame sequence successive value between comparison, determine the stability parameter of described second group coding parameter; And
Described coding parameter is determined according to described stability parameter.
7. method according to claim 5, wherein said stability parameter is at least based on the standard deviation of described second group coding parameter.
8., according to claim 6 or method according to claim 7, wherein said stability parameter determines for a frame of described multi channel audio signal or multiple frame.
9., according to claim 6 or method according to claim 7, wherein the described of described coding parameter is determined that the threshold value based on described stability parameter is determined.
10. method according to claim 9, it comprises further:
If described stability parameter spans described threshold value, then upgrade described first group coding parameter by described second group coding parameter.
11. methods according to claim 9, account form wherein based on the described smoothing of described one group of function of the first smoothing coefficient and the second smoothing coefficient is, the described one group of function being multiplied by the first coefficient is added with the first smoothed version of described one group of function and the remember condition of the second smoothed version being multiplied by the second coefficient, wherein said first coefficient is based on described first smoothing coefficient and described second smoothing coefficient, and described second coefficient is based on described first smoothing coefficient and described second smoothing coefficient.
12. methods according to claim 11, it comprises further:
If described stability parameter spans described threshold value, then upgrade the described remember condition of described first smoothed version of described one group of function with the described remember condition of described second smoothed version of described one group of function.
13. methods according to claim arbitrary in claims 1 to 3, wherein said first smoothing coefficient is higher than described second smoothing coefficient.
14. 1 kinds of multichannel audio coding devices, for the multiple audio channel signals example audio channel signal x for multi channel audio signal 1determine coding parameter, each audio channel signals has audio channel signals value, and described multichannel audio coding device comprises:
First determiner, it is according to described audio channel signals x 1audio channel signals value and the reference audio signal value of reference audio signal be described audio channel signals x 1determine one group of function, wherein said reference audio signal is another audio channel signals or at least two audio channel signals obtain from described multiple channel audio signal lower mixed audio signal in described multiple audio channel signals;
Second determiner, the first group coding parameter is determined in its smoothing based on the described one group function relevant to the frame sequence of described multi channel audio signal, and described smoothing is based on the first smoothing coefficient;
3rd determiner, the second group coding parameter is determined in its smoothing based on the described one group function relevant to the described frame sequence of described multi channel audio signal, and described smoothing is based on the second smoothing coefficient; And
Coding parameter determiner, it determines described audio channel signals x based on the quality standard relevant to described first group coding parameter and/or described second group coding parameter 1coding parameter.
CN201280003252.9A 2012-04-05 2012-04-05 Method for determining encoding parameter for multi-channel audio signal and multi-channel audio encoder Active CN103460283B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2012/056340 WO2013149672A1 (en) 2012-04-05 2012-04-05 Method for determining an encoding parameter for a multi-channel audio signal and multi-channel audio encoder

Publications (2)

Publication Number Publication Date
CN103460283A CN103460283A (en) 2013-12-18
CN103460283B true CN103460283B (en) 2015-04-29

Family

ID=45952541

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201280003252.9A Active CN103460283B (en) 2012-04-05 2012-04-05 Method for determining encoding parameter for multi-channel audio signal and multi-channel audio encoder

Country Status (7)

Country Link
US (1) US9449604B2 (en)
EP (1) EP2834814B1 (en)
JP (1) JP5947971B2 (en)
KR (1) KR101621287B1 (en)
CN (1) CN103460283B (en)
ES (1) ES2571742T3 (en)
WO (1) WO2013149672A1 (en)

Families Citing this family (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6216553B2 (en) * 2013-06-27 2017-10-18 クラリオン株式会社 Propagation delay correction apparatus and propagation delay correction method
KR102486338B1 (en) * 2014-10-31 2023-01-10 돌비 인터네셔널 에이비 Parametric encoding and decoding of multichannel audio signals
CN107004419B (en) * 2014-11-28 2021-02-02 索尼公司 Transmission device, transmission method, reception device, and reception method
CN106033672B (en) 2015-03-09 2021-04-09 华为技术有限公司 Method and apparatus for determining inter-channel time difference parameters
CN106033671B (en) 2015-03-09 2020-11-06 华为技术有限公司 Method and apparatus for determining inter-channel time difference parameters
EP3353779B1 (en) * 2015-09-25 2020-06-24 VoiceAge Corporation Method and system for encoding a stereo sound signal using coding parameters of a primary channel to encode a secondary channel
US10045145B2 (en) * 2015-12-18 2018-08-07 Qualcomm Incorporated Temporal offset estimation
ES2768052T3 (en) 2016-01-22 2020-06-19 Fraunhofer Ges Forschung Apparatus and procedures for encoding or decoding a multichannel audio signal using frame control timing
AU2017229323B2 (en) 2016-03-09 2020-01-16 Telefonaktiebolaget Lm Ericsson (Publ) A method and apparatus for increasing stability of an inter-channel time difference parameter
US10304468B2 (en) * 2017-03-20 2019-05-28 Qualcomm Incorporated Target sample generation
CN108877815B (en) * 2017-05-16 2021-02-23 华为技术有限公司 Stereo signal processing method and device
CN109215668B (en) 2017-06-30 2021-01-05 华为技术有限公司 Method and device for encoding inter-channel phase difference parameters
CN109300480B (en) * 2017-07-25 2020-10-16 华为技术有限公司 Coding and decoding method and coding and decoding device for stereo signal
CN117292695A (en) * 2017-08-10 2023-12-26 华为技术有限公司 Coding method of time domain stereo parameter and related product
US10891960B2 (en) * 2017-09-11 2021-01-12 Qualcomm Incorproated Temporal offset estimation
EP3483884A1 (en) 2017-11-10 2019-05-15 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Signal filtering
WO2019091576A1 (en) 2017-11-10 2019-05-16 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Audio encoders, audio decoders, methods and computer programs adapting an encoding and decoding of least significant bits
EP3483878A1 (en) 2017-11-10 2019-05-15 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Audio decoder supporting a set of different loss concealment tools
EP3483882A1 (en) 2017-11-10 2019-05-15 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Controlling bandwidth in encoders and/or decoders
EP3483880A1 (en) 2017-11-10 2019-05-15 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Temporal noise shaping
EP3483886A1 (en) * 2017-11-10 2019-05-15 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Selecting pitch lag
EP3483883A1 (en) 2017-11-10 2019-05-15 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Audio coding and decoding with selective postfiltering
WO2019091573A1 (en) 2017-11-10 2019-05-16 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for encoding and decoding an audio signal using downsampling or interpolation of scale parameters
EP3483879A1 (en) 2017-11-10 2019-05-15 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Analysis/synthesis windowing function for modulated lapped transformation
CN111341319B (en) * 2018-12-19 2023-05-16 中国科学院声学研究所 Audio scene identification method and system based on local texture features
CN113129910A (en) * 2019-12-31 2021-07-16 华为技术有限公司 Coding and decoding method and coding and decoding device for audio signal
CN111935624B (en) * 2020-09-27 2021-04-06 广州汽车集团股份有限公司 Objective evaluation method, system, equipment and storage medium for in-vehicle sound space sense
WO2022153632A1 (en) * 2021-01-18 2022-07-21 パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ Signal processing device and signal processing method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1954642A (en) * 2004-06-30 2007-04-25 德商弗朗霍夫应用研究促进学会 Multi-channel synthesizer and method for generating a multi-channel output signal
CN101410889A (en) * 2005-08-02 2009-04-15 杜比实验室特许公司 Controlling spatial audio coding parameters as a function of auditory events

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9626973B2 (en) * 2005-02-23 2017-04-18 Telefonaktiebolaget L M Ericsson (Publ) Adaptive bit allocation for multi-channel audio encoding
US7983922B2 (en) * 2005-04-15 2011-07-19 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for generating multi-channel synthesizer control signal and apparatus and method for multi-channel synthesizing
GB2466672B (en) 2009-01-06 2013-03-13 Skype Speech coding
CA2746524C (en) 2009-04-08 2015-03-03 Matthias Neusinger Apparatus, method and computer program for upmixing a downmix audio signal using a phase value smoothing

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1954642A (en) * 2004-06-30 2007-04-25 德商弗朗霍夫应用研究促进学会 Multi-channel synthesizer and method for generating a multi-channel output signal
CN101410889A (en) * 2005-08-02 2009-04-15 杜比实验室特许公司 Controlling spatial audio coding parameters as a function of auditory events

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Estimation of auditory spatial cues for Binaural Cue Coding;Frank Baumgarte等;《IEEE Xplore》;20020517;II-1801~II-1804 *

Also Published As

Publication number Publication date
KR20140140101A (en) 2014-12-08
EP2834814B1 (en) 2016-03-02
JP2015518176A (en) 2015-06-25
US9449604B2 (en) 2016-09-20
JP5947971B2 (en) 2016-07-06
US20150010155A1 (en) 2015-01-08
ES2571742T3 (en) 2016-05-26
WO2013149672A1 (en) 2013-10-10
EP2834814A1 (en) 2015-02-11
KR101621287B1 (en) 2016-05-16
CN103460283A (en) 2013-12-18

Similar Documents

Publication Publication Date Title
CN103460283B (en) Method for determining encoding parameter for multi-channel audio signal and multi-channel audio encoder
US11887609B2 (en) Apparatus and method for estimating an inter-channel time difference
KR101662681B1 (en) Multi-channel audio encoder and method for encoding a multi-channel audio signal
US9401151B2 (en) Parametric encoder for encoding a multi-channel audio signal
KR100773562B1 (en) Method and apparatus for generating stereo signal
EP3035330B1 (en) Determining the inter-channel time difference of a multi-channel audio signal
US9275646B2 (en) Method for inter-channel difference estimation and spatial audio coding device
JP2017058696A (en) Inter-channel difference estimation method and space audio encoder
CN104205211B (en) Multichannel audio encoder and the method being used for multi-channel audio signal is encoded

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant