WO2014072513A1 - Nichtlineare inverse kodierung von multikanal-signalen - Google Patents
Nichtlineare inverse kodierung von multikanal-signalen Download PDFInfo
- Publication number
- WO2014072513A1 WO2014072513A1 PCT/EP2013/073526 EP2013073526W WO2014072513A1 WO 2014072513 A1 WO2014072513 A1 WO 2014072513A1 EP 2013073526 W EP2013073526 W EP 2013073526W WO 2014072513 A1 WO2014072513 A1 WO 2014072513A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- channel
- gain
- coding device
- signal
- coding
- Prior art date
Links
- 230000005236 sound signal Effects 0.000 claims abstract description 59
- 238000000034 method Methods 0.000 claims description 74
- 238000012937 correction Methods 0.000 claims description 22
- 238000005457 optimization Methods 0.000 claims description 21
- 238000000513 principal component analysis Methods 0.000 claims description 14
- 230000002596 correlated effect Effects 0.000 claims description 12
- 230000015572 biosynthetic process Effects 0.000 claims description 9
- 230000001934 delay Effects 0.000 claims description 9
- 230000009466 transformation Effects 0.000 claims description 8
- 230000006870 function Effects 0.000 claims description 6
- 238000012546 transfer Methods 0.000 claims description 6
- 238000004458 analytical method Methods 0.000 claims description 4
- 230000008569 process Effects 0.000 claims description 4
- 230000004044 response Effects 0.000 claims description 4
- 238000003786 synthesis reaction Methods 0.000 claims description 3
- 108091006146 Channels Proteins 0.000 claims 65
- 238000004590 computer program Methods 0.000 claims 1
- 238000012545 processing Methods 0.000 claims 1
- 238000012360 testing method Methods 0.000 claims 1
- 230000003044 adaptive effect Effects 0.000 description 23
- 239000011159 matrix material Substances 0.000 description 23
- 238000000605 extraction Methods 0.000 description 12
- 238000010606 normalization Methods 0.000 description 7
- 230000001755 vocal effect Effects 0.000 description 7
- 230000000875 corresponding effect Effects 0.000 description 6
- 230000008859 change Effects 0.000 description 4
- 230000008447 perception Effects 0.000 description 4
- 230000006978 adaptation Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 3
- 230000004807 localization Effects 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 230000001174 ascending effect Effects 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000009877 rendering Methods 0.000 description 2
- 101100018996 Caenorhabditis elegans lfe-2 gene Proteins 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000003874 inverse correlation nuclear magnetic resonance spectroscopy Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000000116 mitigating effect Effects 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 230000036962 time dependent Effects 0.000 description 1
- 230000003313 weakening effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S3/00—Systems employing more than two channels, e.g. quadraphonic
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S5/00—Pseudo-stereo systems, e.g. in which additional channel signals are derived from monophonic signals by means of phase shifting, time delay or reverberation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S5/00—Pseudo-stereo systems, e.g. in which additional channel signals are derived from monophonic signals by means of phase shifting, time delay or reverberation
- H04S5/005—Pseudo-stereo systems, e.g. in which additional channel signals are derived from monophonic signals by means of phase shifting, time delay or reverberation of the pseudo five- or more-channel type, e.g. virtual surround
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S1/00—Two-channel systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/01—Multi-channel, i.e. more than two input channels, sound reproduction with two speakers wherein the multi-channel information is substantially preserved
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/03—Aspects of down-mixing multi-channel audio to configurations with lower numbers of playback channels, e.g. 7.1 -> 5.1
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/01—Enhancing the perception of the sound image or of the spatial distribution using head related transfer functions [HRTF's] or equivalents thereof, e.g. interaural time difference [ITD] or interaural level difference [ILD]
Definitions
- upmixing Obtaining higher order signals (with a higher number of output channels) from lower order signals (with fewer channels) is an important part of audio technology. This is referred to as "upmixing".
- the present invention provides the audio coding advanced options, spatial
- Coding process - do not need to be constantly added to the data stream.
- the system works independently of the choice of a suitable codec for the compression of
- Base Audio Coder Such codecs describe, for example, valid or in-progress standards that have become known as MP3, AAC, HE-AAC or USAC.
- inverse coding is understood to mean a technical procedure that involves one or more methods or one or more
- Audio signals generated by the specific application of functionally interdependent gains and delays are related to the specific application of functionally interdependent gains and delays.
- EP1850629 or WO2009138205 or WO2011009649 or WO2011009650 or WO2012016992 or WO2012032178 generates spatial audio signals whose individual channels have no different modulation. Such a uniform modulation is necessary to achieve a uniform image of the phantom sound sources. This applies, for example, as FIG. 6F, FIG. 7F and FIG. 8F of WO2012032178 for a 5.1 surround signal, also for the inverse coding of
- Multi-channel signals For example, from ITU-R BS.775-1 are so-called
- Matrix Surround Downmixing involves the use of 90 ° filters known in the art.
- Such downmixing techniques may be adaptive by adjusting the levels of specific channels over time
- Loudspeaker arrangements are known from the literature, which are compared with commercially available surround arrangements such 5.1 or 7.1, where the speakers are in one plane, also provide speakers outside this plane. These are partly own
- Speaker signals which is usually a
- WO2011009649 describes a system in which two panoramic potentiometers of an MS matrix are connected downstream within a device or a method for linear inverse coding, wherein each panoramic potentiometer has two
- Busbar signals forms. Such an arrangement allows any increase or decrease in the degree of correlation and leads to an increase or
- the first output of the MS matrix if the first Panoramic potentiometer is effective, in a predetermined ratio the two channels of the first
- Pan potentiometer is effective, fed in a predetermined ratio to the two channels of the second busbar signal.
- Audio signals or the levels used in the downmix may be wholly or partially derived, or may be determined in whole or in part independently of these.
- the inverse coding already take place on the basis of their differently controlled output channels. In both cases we speak, if such a technical step
- the non-linear inverse coding therefore has no uniform energy density with slightly changed
- Phantom sound source formation contradicts the ostensible postulate of the most homogeneous stereo base between adjacent speakers for the production of phantom sound sources.
- the present invention thus utilizes this principle in a targeted manner.
- punctiform sound sources compared to the perception of phantom sound sources between the speakers.
- the nonlinear inverse coding thus ensures that a correct distribution or weighting of these punctiform sound sources as well as the formed phantom sound sources between the
- Loudspeakers takes place.
- the perception of the depth graduation of phantom sound sources can be obtained
- Phantom sound source based signals substantially depends on the loudness of a loudspeaker signal as well as the perceived spatiality.
- perceived spatiality can be directly controlled by an inverse coding, without the need for additional technical means such as artificial reverberation.
- the levels of the output signals of an inverse coding can vary in a time-dependent manner, for example in the case of an adaptive downmix method, or else remain constant over time, this
- Busbar signals are formed. Rather, these amplification factors only affect the channel to which they are applied. The technical effect is thus not the arbitrary increase or decrease of the degree of correlation of two equally weighted channels. Also, with non-linear inverse coding, if a gain factor of the final level correction is at least one
- Output signal converges to 0, unlike WO2011009649, the audio information of this signal inevitably lost, and it is thus no longer the lossless increase or decrease in the image width on the stereo base between two speakers, but to the, in their simplicity convenient, purposeful uniform Weighting of perceived point sound sources
- Busbar signals forms to consider as part of a linear inverse coding on the
- An embodiment shows an apparatus / method for non-linear inverse coding of an audio signal, characterized in that either: a gain of one of the two output signals
- An embodiment shows an apparatus / method for non-linear inverse coding of an audio signal, characterized in that either: a gain of one of the two output signals
- An embodiment shows a device / a
- An embodiment shows an apparatus / method for non-linear inverse coding of an audio signal, characterized in that the non-linear inverse coding is performed on the basis of signals of a downmix.
- An embodiment shows an apparatus / method for non-linear inverse coding of an audio signal, characterized in that the downmix is formed on the basis of one or more gains, which are the factor 0.5 or the factor
- An exemplary embodiment shows a device / a method for the non-linear inverse coding of an audio signal, characterized in that the downmix is formed in addition to means for forming sum signals by means of further technical means.
- One embodiment shows an apparatus / method for non-linear inverse coding of an audio signal, characterized in that means for directly reproducing the downmix on loudspeakers are used.
- An exemplary embodiment shows a device / method for the non-linear inverse coding of an audio signal, characterized in that means for obtaining further signals from previously existing or formed signals are used.
- An embodiment shows an apparatus / method for non-linear inverse coding of an audio signal, characterized in that means are used for summing signals.
- An embodiment shows an apparatus / method for non-linear inverse coding of a Audio signal, characterized in that means for subtracting signals are used.
- An embodiment shows an apparatus / method for nonlinear inverse coding of an audio signal, characterized in that means for the correlation comparison of signals are used.
- An exemplary embodiment shows a device / method for the non-linear inverse coding of an audio signal, characterized in that means for normalizing signals are used based on the levels of previously existing or formed signals.
- One embodiment shows an apparatus / method for non-linear inverse coding of an audio signal, characterized in that means are used for summing signals respectively with non-adjacent loudspeaker channels.
- An embodiment shows an apparatus / method for non-linear inverse encoding of an audio signal, characterized in that means are used to form a fictitious loudspeaker.
- An embodiment shows an apparatus / method for non-linear inverse coding of an audio signal, characterized in that means for coding the downmix by means of a base audio coder are used.
- An embodiment shows an apparatus / method for non-linear inverse coding of an audio signal, characterized in that means are used to form signals for a loudspeaker arrangement of the form Hamasaki 22.2 or for a subset of such a loudspeaker arrangement.
- An exemplary embodiment shows a device / method for the non-linear inverse coding of an audio signal, characterized in that means for determining the position of phantom sound sources are used.
- An embodiment shows an apparatus / method for non-linear inverse coding of an audio signal, characterized in that means for a signal analysis or means for the determination of algebraic invariants are used.
- One embodiment shows an apparatus / method for nonlinear inverse coding of an audio signal, characterized in that means for a Karhunen-Loeve transformation (KLT) or Principal Component Analysis (PCA) are used.
- KLT Karhunen-Loeve transformation
- PCA Principal Component Analysis
- An exemplary embodiment shows an apparatus / method for non-linear inverse coding of an audio signal, characterized in that means for optimizing the determination of algebraic invariants are used by means of a Karhunen-Loeve transformation (KLT) or Principal Component Analysis (PCA).
- KLT Karhunen-Loeve transformation
- PCA Principal Component Analysis
- One embodiment shows an apparatus / method for non-linear inverse coding of an audio signal, characterized in that either: a gain of the non-linear inverse coding has the same factor of a gain used in the downmix or a multiple of this gain; or:
- At least one of the two gains (60001, 60002) of the nonlinear inverse coding has or has the same factor of a gain used in the downmix
- An embodiment shows an apparatus / method for non-linear inverse coding of an audio signal, characterized in that the
- non-linear inverse coding means for optimization using the corresponding linear inverse coding.
- An embodiment shows an apparatus / method for nonlinear inverse coding of an audio signal, characterized in that means for the immediate optimization of one or more
- An embodiment shows an apparatus / method for nonlinear inverse encoding of an audio signal, characterized in that means for optimizing one or more parameters of the nonlinear or associated linear inverse
- An embodiment shows an apparatus / method for nonlinear inverse coding of an audio signal, characterized in that means for optimizing one or more parameters of the nonlinear or associated linear inverse coding are used on the basis of a target correlation k.
- An embodiment shows an apparatus / method for non-linear inverse encoding of an audio signal, characterized in that means are used to determine the nature of the signal.
- An embodiment shows an apparatus / method for nonlinear inverse coding of an audio signal, characterized in that means are used for the determination of speech or vocal signals or transients.
- An embodiment shows an apparatus / method for non-linear inverse coding of an audio signal, characterized in that means for determining the target correlation k based on
- One embodiment shows an apparatus / method for nonlinear inverse coding of an audio signal, characterized in that means are used to provide either nonlinear inverse coding: specify a target correlation k> +0.51 for voice or vocal recordings; or:
- One embodiment shows an apparatus / method for nonlinear inverse coding of an audio signal, characterized in that means are used to provide for nonlinear linear inverse coding either:
- An embodiment shows a device / a
- Method for the non-linear inverse coding of an audio signal characterized in that for a non-linear or associated linear inverse coding means are used for their optimization, which in turn use a signal section smaller than or equal to 40 ms.
- An embodiment shows an apparatus / method for the non-linear inverse coding of an audio signal, characterized in that for a non-linear or associated linear inverse coding means are used for their optimization, the in turn means for weighting the fictional
- An embodiment shows an apparatus / method for non-linear inverse coding of an audio signal, characterized in that means for optimizing one or more parameters of a nonlinear or associated linear inverse
- An embodiment shows an apparatus / method for nonlinear inverse coding of an audio signal, characterized in that means for level correction of signals based on the respective speaker positions are used.
- An embodiment shows a device / method for non-linear inverse coding of an audio signal, characterized in that a
- Panoramic potentiometer is used.
- An embodiment shows an apparatus / method for non-linear inverse coding of an audio signal, characterized in that means for varying the gain (717) with the factor ⁇ are used.
- An embodiment shows an apparatus / method for nonlinear inverse coding of an audio signal, characterized in that
- An embodiment shows an apparatus / method for non-linear inverse coding of an audio signal, characterized in that means for storing or transmitting one or more parameters of a non-linear or associated
- An exemplary embodiment shows an apparatus / method for non-linear inverse coding of an audio signal, characterized in that it has fewer output channels than a multi-channel signal.
- An exemplary embodiment shows a device / method for the non-linear inverse coding of an audio signal, characterized in that it has more output channels than an audio signal
- An embodiment shows an apparatus / method for nonlinear inverse coding of an audio signal, characterized in that the
- Speaker arrangement takes place, which corresponds to the format of the respective signal.
- One embodiment shows an apparatus / method for non-linear inverse coding of an audio signal, characterized in that either: means for wave field synthesis are used; or: Means may be used for Head Related Transfer Functions (HRTFs) or Binaural Room Impulse Responses (BRIRs).
- HRTFs Head Related Transfer Functions
- BRIRs Binaural Room Impulse Responses
- FIG. 1 shows the loudspeaker arrangement of the format Hamasaki 22.2 of the Japanese transmitter NHK.
- FIG. 2 shows the example of a downmix matrix for the Hamasaki 22.2 format.
- FIG. 3 shows a loudspeaker arrangement for a
- FIG. 4 shows the example of a downmix matrix for a 12.1 signal. This in turn makes one
- FIG. 5 shows the example of a circuit for the non-linear inverse coding of an audio signal.
- FIG. 6 shows another example of a non-linear inverse coding circuit of FIG
- Audio signal where l 2 .
- FIG. Figure 7 illustrates a matrix for extraction of
- FIG. Fig. 8 shows a further example (shown in Fig. 7) of the extraction of a signal by means of correlation comparison.
- FIG. Figure 9 shows a normalization of signals (shown in Figure 8) based on known levels of the original multi-channel signal.
- FIG. 10 shows a (following in FIG. 9)
- FIG. Figure 11 shows the matrix of two non-linear inverse encodings (following Figure 10).
- FIG. 12 shows the following (shown in FIG. 11)
- FIG. Fig. 13 shows the attenuation characteristic of a prior art pan potentiometer. This attenuation curve can also be used in multichannel coding as the basis for the calculation of level corrections.
- FIG. 14 shows the second example of a matrix for extracting signals by means of
- FIG. Fig. 15 shows a normalization of signals obtained (in Fig. 14) from known levels of sum signals.
- FIG. Fig. 16 shows a (following in Fig. 15)
- FIG. Figure 17 shows the matrix of two non-linear inverse codings (following Figure 16).
- FIG. 18 shows the following (shown in FIG. 17)
- FIG. 19 shows the block diagram of a circuit for optimizing linear or non-linear inverse coding.
- FIG. 20 shows by way of example the header information as well as the downmix for - based on a
- FIG. 21 shows the downmix matrix for the downmix of 3/2 source material according to ITU-R BS.775-1, Table 2.
- a downmix matrix is defined, which may contain various technical means (such as those described by Faller and Schlllebeeckx, supra) and in functional dependence on a signal analysis of the respective multi-channel signal (for example, by means of the State of the art Karhunen Loeve transformation (KLT) or Principal Component Analysis (PCA) or by algebraic invariants according to EP1850629, WO2009138205, WO2011009649,
- KLT Karhunen Loeve transformation
- PCA Principal Component Analysis
- WO2011009650, WO2012016992 and WO2012032178 can be determined or optimized (we speak in the following of an "adaptive downmix") or a priori
- FIG. 2 the example of a downmix for Hamasaki 22.2, which consists of a total of four stereo signals with the following loudspeaker arrangement (see FIG. 1): FL '-F', BL '-BR', TpFL '-TpFR', TpBL '-TpBR' ,
- the illustrated matrix is similar to the prior art matrix of FIG. 21, although the rows are to be read as columns and vice versa the columns as rows.
- TpC with a level reduced by -6 dB (corresponding to a multiplication of the signal level by a factor of 0.5) is mixed with TpFL ', TpFR', TpBL 'and TpBR', respectively
- Playback of the downmix leads to the psychoacoustic phenomenon of localization of such a speaker TpC (henceforth called “fictional TpC”);
- the same principle of operation can also be applied to other loudspeakers, sometimes using different level differences (henceforth called “fictitious loudspeakers”, see below).
- short-term cross-correlation will be used for extraction by means of correlation comparison, which will be discussed frequently in the following
- BtFC is mixed with -3dB reduced level respectively BtFL 'and BtFR'.
- BtFL ' is then mixed with the level reduced by -3dB each to FL' and BR ', and then BtFR' is mixed in with FRD and BL 'reduced by -3dB, respectively.
- BtFL then approximately approximates the correlated fraction of FL 'and BR', BtFR approximately the correlated fraction of FR 'and BL', and BtFC approximately correlates
- Correlation comparison extracted signal which leads to the basic problem of the fundamental impossibility of an absolute reconstruction of a signal of higher order from a signal of lower order exclusively by means of correlation comparison.
- nonlinear inverse coding opens up completely new perspectives!
- a mitigation of the problem can be brought about, for example, if the absolute levels of the previously existing or stepwise obtained signals are known, and thus, since the degree of correlation +1 for the signal components in question, draw conclusions about the respective level of the correlated signal components in all affected channels:
- the correlated signal component with absolute level p of BtFL which was respectively mixed with FL '(with known absolute level p 2 ) and BR' (with known absolute level p 3 ) with the absolute level p - 3dB, allows its approximate extraction by means of Correlation comparison, now the resulting signal BtFL * the absolute level p
- the correlated signal portion with absolute level p 4 of BtFR admixed with each of FR '(with known absolute level p 5 ) and BIZ (with known absolute level p 6 ) with absolute level p 4 - 3dB allows its approximate extraction by means of correlation comparison, whereby now the resulting signal BtFR * has the absolute level p 4 and its subtraction with the absolute level p - 3dB of FR 'with the absolute level p 5 or its subtraction with the absolute level p 4 - 3dB of BL' with the absolute level p 6 the respective resulting channels - but only approximately - the
- a downmix matrix may be the factor
- Downmix is a 7.1 surround signal, can be defined in the same manner as in the above example, a fictional TpC.
- TpFL and TpBL are summed with the level reduced by -3dB, respectively, and the resultant sum is mixed with each level reduced by -3dB, respectively, FL 'and BL'.
- TpFR and TpBR are summed with the level reduced by -3dB, respectively, and the resulting sum mixed with the levels reduced by -3dB, respectively, to FR 'and BR'.
- the associated downmix matrix is FIG. 4 to remove.
- the sum of TpFL, TpBL and TpC or the sum of TpFR, TpBR and TpC can be extracted approximately with the above-described correlation comparison of FL 'and BL' or FR 'or BR'. This is for the respective inverse coding of these sums
- TpFR * and TpBR * are of crucial importance.
- Both illustrated downmix matrices are concrete examples based on ITU-R BS.775-1; however, level adjustments other than -3dB and -6dB are, as will be appreciated, readily possible and desirable in the specific case.
- Tonstudiotechnik. Volume I - Saur: Kunststoff 1987 shows on page 375 the attenuation curve of a state of the art belonging to panoramic potentiometer (see FIG. 13). This attenuation curve can also be called
- automatic or adaptive downmix related levels may be wholly or partially derived, or may be determined in whole or in part independently of these.
- the optimization of the nonlinear inverse coding of a downmix generated by any technical means can already take place on the basis of their differently controlled output channels.
- Computing capacity for decoding and playback of audio data is available - yet high quality multichannel signals can be reproduced.
- Speaker arrangement which corresponds to the display format of the resulting multi-channel signal, via a speaker assembly that simulates such a display format (for example by means of the prior art - based on the principle of Huygens - wave field synthesis) or even done via headphones or speakers that in this case, the loudspeaker positions are simulated by means of known prior art Head Related Transfer Functions (HRTFs) or Binaural Room Impulse Responses (BRIRs).
- HRTFs Head Related Transfer Functions
- BRIRs Binaural Room Impulse Responses
- FIG. 5 The example of a basic circuit according to the invention for non-linear inverse coding is shown in FIG. 5 shown, which is characterized by the downstream
- FIG. 6 shows the downstream connection of two different gains (60001, 60002), which are for example the non-linear one Inverse coding of complex multi-channel signals prove to be extremely beneficial.
- gains 50001, 60001, 60002
- EP1850629 For the basic operation of both circuits is, apart from just mentioned, in FIG. 5 and FIG. 6 illustrated gains (50001, 60001, 60002), on EP1850629,
- FIG. 7 illustrates the extraction by means of
- FIG. Figure 8 illustrates the correlation comparison between BtFL 'and BtFR', resulting in BtFC '.
- FIG. Figure 11 now illustrates the nonlinear inverse coding of FL '', yielding FL '''andFLc'.
- FRc ' also appears amplified by a factor of 0.7071.
- FL '''andFR''' are normalized to the known levels of the original signals of the same name, which finally results in FL * and FR * .
- the channels FLc 'and FRc' are then adjusted to the normalized signals FL * and FR * so that all level ratios of the non-linear inverse coding are maintained (thus the gains each with the factor 0.7071 in relation to the current level of these channels for these remain effective), and finally conclude FLc * and FRc * .
- the means or methodologies thus used for this non-linear inverse coding again comprise:
- FIG. 14 illustrates the approximate extraction of the above-described sum TpL 'of TpFL, TpBL and TpC by means of correlation comparison of FL' and BL 'and also the approximate extraction of those described above Sum TpR 'of TpFR, TpBR and TpC using
- TpFL normalizes the original level of the sum of TpFL, TpBL and TpC and yields TpL ''.
- TpR ' is also normalized to the original level of the sum of TpFR, TpBR and TpC and yields TpR' '.
- TpL '' is subtracted with -3dB reduced level from each of FL 'and BL', resulting in finally FL * and BL * .
- TpR '' is subtracted from FR 'and BR' at -3dB of reduced level, resulting in finally FR * and BR * .
- FIG. Figure 17 now illustrates the non-linear inverse coding of TpL '', resulting in TpFL '' and TpBL ''.
- TpBL '' appears amplified by a factor of 0.7071. Likewise finds one
- TpR '' nonlinear inverse coding of TpR '', resulting in TpFR '' and TpBR ''.
- TpBR '' also appears amplified by a factor of 0.7071.
- TpFL '' and TpFR '' are normalized to the known levels of the original signals of the same name, resulting in TpFL * and TpFR * .
- the channels TpBL '' and TpBR '' are then adapted to the thus normalized signals TpFL * and TpFR * so that all levels of the non-linear inverse encoding are maintained
- nonlinear inverse decoding whose parameters are to be determined in such a way that the highest possible approximation of the resulting signal to the
- the degree of correlation r of those original signal pairs is determined on the basis of the short-term cross-correlation, which are to be approximated in the sequence by non-linear inverse coding. It is on WO2011009649, page 12 (line 7) to page 13 (line 10), as well as on
- this degree of correlation r may be negative or in an environment of zero. This would lead to a strongly decorrelated signal in an inverse coding, which starts from a single-channel input signal, but at the same time to strong artifacts in the case of transients, vocal or vocal recordings.
- the specified lower limits for the specific signal types may also be between -0.10 and -0.15
- the linear or nonlinear inverse coded signal is then optimized so that be on the basis of
- Short-term cross-correlation correlate certain r with the set target correlation k matches.
- the position of the phantom sound sources is determined in the case of the original signal pair or the linear or nonlinear inverse coded signal to be optimized, for example with the state-of-the-art Karhunen-Loeve transformation (KLT) or Principal Component Analysis (PCA). or also its algebraic invariants according to EP1850629, WO2009138205, WO2011009649, WO2011009650, WO2012016992 and WO2012032178. A combination of the just mentioned methods is also possible.
- KLT Karhunen-Loeve transformation
- PCA Principal Component Analysis
- a Karhunen-Loeve transformation can first be carried out on a signal section of, for example, 40 ms of the original signal pair, with the aid of which the linkage A (WO 2212016992 on page 4 (line 22) to page 5 (line 2) t) or several links / i A (t), 2 A (t), ..., f p A (t) of at least two signals 5 1 (t), s 2 (t), ⁇ / s m (or their transfer functions t ⁇ s ⁇ t), t 2 (s 2 (t)),
- KLT Karhunen-Loeve transformation
- Peak is located at the origin of the complex number plane and its axis of symmetry perpendicular to the complex plane
- WO2012016992 page 10 (line 21) to page 12 (line 3) and, for example, according to the figures to WO2012016992, described in detail from page 19 (line 1) to page 78 (line 15) optimized.
- WO2012016992 (FIG.1B, 3A, 4A, 5A, 6A, 7A, 7B, 8A) a gain in accordance with FIG. 5 or FIG. Insert 6 of the present application and thus directly optimize the already non-linear inverse coded signal.
- encoded signal can be considered or optimized in an optional fifth step with respect to the main reflections and the reverb tail.
- a signal cutout of 40 ms is generally sufficient to keep the latency of the entire coding correspondingly low and nevertheless to record all essential parameters.
- Correlation degree r coincides with the specified target correlation k
- Transparency is assessed less with respect to the absolute position of the phantom sound sources than with respect to the energy density of the sound field, and
- Nonlinear inverse coding in particular the advantage of a homogeneous stereo base, the optimization - in particular with regard to degree of correlation, location of the phantom sound sources and the main reflections and the reverb tail - much easier.
- FIG.14 for example, FIG.14, FIG.15, FIG.16, FIG.17, FIG.18;
- Coding may be transmitted once for each signal segment (e.g., every second).
- the permanent transfer for example, to a sample or a frame or its sections, although
- This increase or decrease of the total level can, in particular, take into account the peculiarities of a base audio coder, which is based on the subjective
- Loudness impression of a multi-channel signal can exert significant influence.
- DRC Dynamic Range Control
- a higher order signal may be derived with any speaker arrangement, as non-existent channels, for example, by linear or nonlinear inverse coding, can be derived from existing or
- a “non-linear inverse coding” is characterized by the superficial not useful additional downstream of at least one gain (50001) in the left or in the right output channel of an arrangement for an "inverse coding” or “linear inverse coding
- Embodiments are part of the invention.
- a gain in the sense of the claims may mean both a gain factor greater or less than 1, i.
- a gain in the sense of the invention can also mean a weakening.
- Two signals based on a multi-channel signal may both directly be two channels of the multi-channel signal, or one (or both) of the two signals may be based on the combination of two channels of the multi-channel signal. The same applies to signals that are based on a downmix signal.
- encoding includes the notion of encoding as well as decoding.
- upmix describes the formation of a higher number of channels from a smaller number of channels.
- downmix describes the formation of a smaller number of channels from a higher number of channels.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Stereophonic System (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Description
Claims
Priority Applications (9)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2015541175A JP2016501456A (ja) | 2012-11-09 | 2013-11-11 | 多チャンネル信号の非線形逆コーディング |
AU2013343445A AU2013343445A1 (en) | 2012-11-09 | 2013-11-11 | Non-linear inverse coding of multichannel signals |
CN201380070069.5A CN105229730A (zh) | 2012-11-09 | 2013-11-11 | 多信道信号的非线性逆编码 |
KR1020157015177A KR20150101999A (ko) | 2012-11-09 | 2013-11-11 | 다채널 신호의 비선형 역부호화 |
RU2015121941A RU2015121941A (ru) | 2012-11-09 | 2013-11-11 | Нелинейное обратное кодирование многоканальных сигналов |
EP13789019.0A EP2917908A1 (de) | 2012-11-09 | 2013-11-11 | Nichtlineare inverse kodierung von multikanal-signalen |
US14/441,898 US20150371644A1 (en) | 2012-11-09 | 2013-11-11 | Non-linear inverse coding of multichannel signals |
SG11201504514WA SG11201504514WA (en) | 2012-11-09 | 2013-11-11 | Non-linear inverse coding of multichannel signals |
HK16107907.9A HK1220034A1 (zh) | 2012-11-09 | 2016-07-06 | 多信道信號的非線性逆編碼 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CH2300/12 | 2012-11-09 | ||
CH23002012 | 2012-11-09 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2014072513A1 true WO2014072513A1 (de) | 2014-05-15 |
Family
ID=47360247
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2013/073526 WO2014072513A1 (de) | 2012-11-09 | 2013-11-11 | Nichtlineare inverse kodierung von multikanal-signalen |
Country Status (10)
Country | Link |
---|---|
US (1) | US20150371644A1 (de) |
EP (1) | EP2917908A1 (de) |
JP (1) | JP2016501456A (de) |
KR (1) | KR20150101999A (de) |
CN (1) | CN105229730A (de) |
AU (1) | AU2013343445A1 (de) |
HK (1) | HK1220034A1 (de) |
RU (1) | RU2015121941A (de) |
SG (1) | SG11201504514WA (de) |
WO (1) | WO2014072513A1 (de) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016030545A2 (de) | 2014-08-29 | 2016-03-03 | Clemens Par | Vergleich oder optimierung von signalen anhand der kovarianz algebraischer invarianten |
CN106796792A (zh) * | 2014-07-30 | 2017-05-31 | 弗劳恩霍夫应用研究促进协会 | 用于增强音频信号的装置和方法、声音增强*** |
EP3937515A1 (de) | 2020-07-06 | 2022-01-12 | Clemens Par | Invarianzgesteuerter elektroakustischer übertrager |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
MX365274B (es) * | 2015-06-17 | 2019-05-29 | Sony Corp | Dispositivo de transmisión, método de transmisión, dispositivo de recepción, y método de recepción. |
CN108665902B (zh) | 2017-03-31 | 2020-12-01 | 华为技术有限公司 | 多声道信号的编解码方法和编解码器 |
US9820073B1 (en) | 2017-05-10 | 2017-11-14 | Tls Corp. | Extracting a common signal from multiple audio signals |
CN110739000B (zh) * | 2019-10-14 | 2022-02-01 | 武汉大学 | 一种适应于个性化交互***的音频对象编码方法 |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2011009649A1 (de) * | 2009-07-22 | 2011-01-27 | Stormingswiss Gmbh | Vorrichtung und verfahren zur verbesserung stereophoner oder pseudostereophoner audiosignale |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5757927A (en) * | 1992-03-02 | 1998-05-26 | Trifield Productions Ltd. | Surround sound apparatus |
GB9211756D0 (en) * | 1992-06-03 | 1992-07-15 | Gerzon Michael A | Stereophonic directional dispersion method |
KR20070001267A (ko) * | 2004-04-09 | 2007-01-03 | 닛본 덴끼 가부시끼가이샤 | 음성 통신 방법 및 장치 |
SE0402649D0 (sv) * | 2004-11-02 | 2004-11-02 | Coding Tech Ab | Advanced methods of creating orthogonal signals |
EP2081400B1 (de) * | 2006-04-27 | 2013-11-27 | BlackBerry Limited | Tragbare elektronische Vorrichtung mit verborgenen, von einer Tonquelle versetzten Tonöffnungen |
US8027479B2 (en) * | 2006-06-02 | 2011-09-27 | Coding Technologies Ab | Binaural multi-channel decoder in the context of non-energy conserving upmix rules |
CN101652810B (zh) * | 2006-09-29 | 2012-04-11 | Lg电子株式会社 | 用于处理混合信号的装置及其方法 |
CN101478296B (zh) * | 2009-01-05 | 2011-12-21 | 华为终端有限公司 | 一种多声道***中的增益控制方法及装置 |
-
2013
- 2013-11-11 KR KR1020157015177A patent/KR20150101999A/ko not_active Application Discontinuation
- 2013-11-11 US US14/441,898 patent/US20150371644A1/en not_active Abandoned
- 2013-11-11 EP EP13789019.0A patent/EP2917908A1/de not_active Withdrawn
- 2013-11-11 CN CN201380070069.5A patent/CN105229730A/zh active Pending
- 2013-11-11 WO PCT/EP2013/073526 patent/WO2014072513A1/de active Application Filing
- 2013-11-11 SG SG11201504514WA patent/SG11201504514WA/en unknown
- 2013-11-11 AU AU2013343445A patent/AU2013343445A1/en not_active Abandoned
- 2013-11-11 JP JP2015541175A patent/JP2016501456A/ja active Pending
- 2013-11-11 RU RU2015121941A patent/RU2015121941A/ru not_active Application Discontinuation
-
2016
- 2016-07-06 HK HK16107907.9A patent/HK1220034A1/zh unknown
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2011009649A1 (de) * | 2009-07-22 | 2011-01-27 | Stormingswiss Gmbh | Vorrichtung und verfahren zur verbesserung stereophoner oder pseudostereophoner audiosignale |
Non-Patent Citations (2)
Title |
---|
HAMASAKI KIMIO ET AL: "The 22.2 Multichannel Sound System and Its Application", AES CONVENTION 118; MAY 2005, AES, 60 EAST 42ND STREET, ROOM 2520 NEW YORK 10165-2520, USA, 1 May 2005 (2005-05-01), XP040507214 * |
PASI OJALA ET AL: "Further information on Nokia binaural decoder", 76. MPEG MEETING; 03-04-2006 - 07-04-2006; MONTREUX; (MOTION PICTUREEXPERT GROUP OR ISO/IEC JTC1/SC29/WG11),, no. M13231, 29 March 2006 (2006-03-29), XP030041900, ISSN: 0000-0239 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106796792A (zh) * | 2014-07-30 | 2017-05-31 | 弗劳恩霍夫应用研究促进协会 | 用于增强音频信号的装置和方法、声音增强*** |
WO2016030545A2 (de) | 2014-08-29 | 2016-03-03 | Clemens Par | Vergleich oder optimierung von signalen anhand der kovarianz algebraischer invarianten |
EP3937515A1 (de) | 2020-07-06 | 2022-01-12 | Clemens Par | Invarianzgesteuerter elektroakustischer übertrager |
WO2022008092A1 (de) | 2020-07-06 | 2022-01-13 | Clemens Par | Invarianzgesteuerter elektroakustischer übertrager |
Also Published As
Publication number | Publication date |
---|---|
SG11201504514WA (en) | 2015-07-30 |
CN105229730A (zh) | 2016-01-06 |
US20150371644A1 (en) | 2015-12-24 |
RU2015121941A (ru) | 2017-01-10 |
AU2013343445A1 (en) | 2015-07-02 |
EP2917908A1 (de) | 2015-09-16 |
KR20150101999A (ko) | 2015-09-04 |
HK1220034A1 (zh) | 2017-04-21 |
JP2016501456A (ja) | 2016-01-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP1854334B1 (de) | Vorrichtung und verfahren zum erzeugen eines codierten stereo-signals eines audiostücks oder audiodatenstroms | |
DE102006050068B4 (de) | Vorrichtung und Verfahren zum Erzeugen eines Umgebungssignals aus einem Audiosignal, Vorrichtung und Verfahren zum Ableiten eines Mehrkanal-Audiosignals aus einem Audiosignal und Computerprogramm | |
WO2014072513A1 (de) | Nichtlineare inverse kodierung von multikanal-signalen | |
DE602005002942T2 (de) | Verfahren zur darstellung von mehrkanal-audiosignalen | |
DE69633633T2 (de) | Mehrkanaliger prädiktiver subband-kodierer mit adaptiver, psychoakustischer bitzuweisung | |
DE602004004168T2 (de) | Kompatible mehrkanal-codierung/-decodierung | |
DE602005006385T2 (de) | Vorrichtung und verfahren zum konstruieren eines mehrkanaligen ausgangssignals oder zum erzeugen eines downmix-signals | |
DE602006000239T2 (de) | Energieabhängige quantisierung für effiziente kodierung räumlicher audioparameter | |
EP2206113B1 (de) | Vorrichtung und verfahren zum erzeugen eines multikanalsignals mit einer sprachsignalverarbeitung | |
EP1687809B1 (de) | Vorrichtung und verfahren zur wiederherstellung eines multikanal-audiosignals und zum erzeugen eines parameterdatensatzes hierfür | |
DE4328620C1 (de) | Verfahren zur Simulation eines Raum- und/oder Klangeindrucks | |
EP2036400B1 (de) | Erzeugung dekorrelierter signale | |
EP2005421B1 (de) | Vorrichtung und verfahren zum erzeugen eines umgebungssignals | |
DE102013223201B3 (de) | Verfahren und Vorrichtung zum Komprimieren und Dekomprimieren von Schallfelddaten eines Gebietes | |
DE69932861T2 (de) | Verfahren zur kodierung eines audiosignals mit einem qualitätswert für bit-zuordnung | |
DE102005014477A1 (de) | Vorrichtung und Verfahren zum Erzeugen eines Datenstroms und zum Erzeugen einer Multikanal-Darstellung | |
EP2891334B1 (de) | Erzeugung von mehrkanalton aus stereo-audiosignalen | |
DE102007011436B4 (de) | Vorrichtung und Verfahren zum Formen eines digitalen Audiosignals | |
WO2015128379A1 (de) | Kodierung und dekodierung eines niederfrequenten kanals in einem audiomultikanalsignal | |
DE102023209048A1 (de) | Verfahren und system zum verlagern von lautstärkeanpassungen von audiokomponenten | |
EP3937515A1 (de) | Invarianzgesteuerter elektroakustischer übertrager | |
DE102017121876A1 (de) | Verfahren und vorrichtung zur formatumwandlung eines mehrkanaligen audiosignals | |
CH708710A1 (de) | Ableitung von Multikanalsignalen aus zwei oder mehreren Grundsignalen. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 201380070069.5 Country of ref document: CN |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 13789019 Country of ref document: EP Kind code of ref document: A1 |
|
DPE1 | Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101) | ||
ENP | Entry into the national phase |
Ref document number: 2015541175 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2013789019 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 20157015177 Country of ref document: KR Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 2015121941 Country of ref document: RU Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 2013343445 Country of ref document: AU Date of ref document: 20131111 Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 14441898 Country of ref document: US |