EP2139000B1 - Verfahren und Vorrichtung zur Kodierung und Dekodierung von Sprache bzw. Nicht-Sprache-Audioeingabesignalen - Google Patents

Verfahren und Vorrichtung zur Kodierung und Dekodierung von Sprache bzw. Nicht-Sprache-Audioeingabesignalen Download PDF

Info

Publication number
EP2139000B1
EP2139000B1 EP08159018A EP08159018A EP2139000B1 EP 2139000 B1 EP2139000 B1 EP 2139000B1 EP 08159018 A EP08159018 A EP 08159018A EP 08159018 A EP08159018 A EP 08159018A EP 2139000 B1 EP2139000 B1 EP 2139000B1
Authority
EP
European Patent Office
Prior art keywords
signal
speech
encoding
mlt
output
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
EP08159018A
Other languages
English (en)
French (fr)
Other versions
EP2139000A1 (de
Inventor
Oliver Wuebbolt
Johannes Boehm
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
THOMSON LICENSING
Original Assignee
Thomson Licensing SAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Thomson Licensing SAS filed Critical Thomson Licensing SAS
Priority to EP08159018A priority Critical patent/EP2139000B1/de
Priority to CN2009101503026A priority patent/CN101615393B/zh
Publication of EP2139000A1 publication Critical patent/EP2139000A1/de
Application granted granted Critical
Publication of EP2139000B1 publication Critical patent/EP2139000B1/de
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/24Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0212Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using orthogonal transformation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals

Definitions

  • the invention relates to a method and to an apparatus for encoding or decoding a speech and/or non-speech audio input signal.
  • This wideband speech coder includes an embedded G.729 speech coder, which is used permanently. Therefore the quality for music-like signals (non-speech) is not very good. Although this coder uses transform coding techniques it is a speech coder.
  • This coder uses a principle structure similar to that of the above-mentioned coder.
  • the processing is based on time domain signals, which implies a difficult handling of the delay in the core encoder/decoder (speech coder). Therefore the processing is based on a common transform in order to reduce this problem.
  • the core coder i.e. the speech coder
  • the speech coder is used permanently, which results in a non-optimal quality for music like (non-speech) signals.
  • a disadvantage of the known audio/speech codecs is a clear dependency of the coding quality on the types of content, i.e. music-like audio signals are best coded by audio codecs and speech-like audio signals are best coded by speech codecs.
  • No known codec is holding a dominant position for mixed speech/music content.
  • a problem to be solved by the invention is to provide a good codec performance for both, speech and music, and to further improve the codec performance for such mixed signals. This problem is solved by the methods disclosed in claims 1 and 3. Apparatuses that utilise these methods are disclosed in claims 2 and 4.
  • the inventive joined speech/audio codec uses speech coding techniques as well as audio transform coding techniques.
  • MLT Modulated Lapped Transform
  • IMLT inverse Modulated Lapped Transform
  • the MLT output spectrum is separated into frequency bins (low frequencies) assigned to the speech coding section of the codec, and the remaining frequency bins (high frequencies) assigned to the transform-based coding section of the codec, wherein the transform length at the codec input and output can be switched signal adaptively.
  • the transform length can be switched input signal adaptively.
  • the invention achieves a uniform good codec quality for both speech-like and music-like audio signals, especially for very low bit rates but also for higher bit rates.
  • the inventive method is suited for encoding a speech and/or non-speech audio input signal, including the steps:
  • the inventive apparatus is suited for encoding a speech and/or non-speech audio input signal, said apparatus including means being adapted for:
  • the inventive method is suited for decoding a bit stream representing an encoded speech and/or non-speech audio input signal that was encoded according to the above method, said decoding method including the steps:
  • the inventive apparatus is suited for decoding a bit stream representing an encoded speech and/or non-speech audio input signal that was encoded according to the above encoding method, said apparatus including means being adapted for:
  • coding processing for speech-like signals linear prediction based speech coding processing, e.g. CELP, ACELP, cf. ISO/IEC 14496-3, Subparts 2 and 3, and MPEG4-CELP
  • state-of-the-art coding processing for general audio or music-like signals based on a time-frequency transform, e.g. MDCT.
  • the PCM audio input signal IS is transformed by a Modulated Lapped Transform MLT having a pre-determined length in step/stage 10.
  • MLT Modulated Lapped Transform
  • a Modified Discrete Cosine Transform MDCT is appropriate for audio coding applications.
  • the MDCT was first called by Princen and Bradley "Oddly-stacked Time Domain Alias Cancellation Transform" and was published in John P. Princen and Alan B. Bradley, "Analysis/synthesis filter bank design based on time domain aliasing cancellation", IEEE Transactions on Acoustics Speech Sigal Processing ASSP-34 (5), pp.1153-1161, 1986 .
  • the obtained spectrum is separated into frequency bins belonging to the speech band (representing a low band signal) and the remaining bins (high frequencies) representing a remaining band signal RBS.
  • the speech band bins are transformed back into time domain using the inverse MLT, e.g. an inverse MDCT, with a short transform length with respect to the pre-determined length used in step/stage 10.
  • the resulting time signal has a lower sampling frequency than the input time signal and contains only the corresponding frequencies of the speech band bins.
  • the generated time domain signal is then used as input signal for a speech encoding step/stage 12.
  • the output of the speech encoding can be transmitted in the output bit stream OBS, depending on a decision made by a below-described speech/audio switch 15.
  • the encoded 'speech' signal is decoded in a related speech decoding step/stage 13, and the decoded 'speech' signal is transformed back into frequency domain in step/stage 14 using the MLT corresponding to the inverse MLT of step/stage 11 (i.e. an 'opposite type' MLT having the short length) in order to re-generate the speech band signal, i.e. a reconstructed speech signal RSS.
  • That switch it is decided, whether the original low frequency bins are coded together with the remaining high frequency bins (this indicates that the coded 'speech' signal is not transmitted in bit stream OBS), or the difference signal DS is coded together with the remaining high frequency bins in a following quantisation&coding step/stage 16 (this indicates that the coded 'speech' signal is transmitted in bit stream OBS).
  • That switch may be operated by using a rate-distortion optimisation.
  • An information item SWI about the decision of switch 15 is included in bit stream OBS for use in the decoding. In this switch, but also in the other steps/stages, the different delays introduced by the cascaded transforms are to be taken into account. The different delays can be balanced using corresponding buffering for these steps/stages.
  • step/stage 16 It is possible to use a mixture of original frequency bins and difference signal frequency bins in the low frequency band as input to step/stage 16. In such case, information about how that mixture is composed is conveyed to the decoding side.
  • step/stage 10 i.e. the high frequencies
  • step/stage 16 the remaining frequency bins output by step/stage 10 (i.e. the high frequencies) are processed in quantisation&coding step/stage 16.
  • step/stage 16 an appropriate quantisation is used (e.g. like the quantisation techniques used in AAC), and subsequently the quantised frequency bins are coded using e.g. Huffman coding or arithmetic coding.
  • an appropriate quantisation e.g. like the quantisation techniques used in AAC
  • the quantised frequency bins are coded using e.g. Huffman coding or arithmetic coding.
  • the speech/audio switch 15 decides that a music-like input signal is present and therefore the speech coder/decoder or its output is not used at all, the original frequency bins corresponding to the speech band are to be encoded (together with the remaining frequency bins) in the quantisation&coding step/stage 16.
  • the quantisation&coding step/stage 16 is controlled by a psycho-acoustic model calculation 18 that exploits masking properties of the input signal IS for the quantisation. Therefore side information SI can be transmitted in the bit stream multiplex to the decoder.
  • Switch 15 can also receive suitable control information (e.g. degree of tonality or spectral flatness, or how noise-like the signal is) from psycho-acoustic model step/stage 18.
  • suitable control information e.g. degree of tonality or spectral flatness, or how noise-like the signal is
  • a bit stream multiplexer step/stage 17 combines the output code (if present) of the speech encoder 12, the switch information of switch 15, the output code of the quantisation&coding step/stage 16, and optionally side information code SI, and provides the output bit stream OBS.
  • inverse MDCT inverse MDCT
  • iMDCT inverse MDCT
  • the inverse MLT steps/stages 22 are arranged between a first grouping step/stage 21 and a second grouping step/stage 23 and provide a doubled number of output values.
  • the number of combined MLT bins which means the transform length of the inverse MLT, defines the resulting time and frequency resolution, wherein a longer inverse MLTs delivers a higher time resolution.
  • overlap/add is performed (optionally involving application of window functions) and the output of the inverse MLTs applied on the same input spectrum is sorted such that it results in several (the quantity depends on the size of the inverse MLTs) temporally successive 'short block' spectra which are quantised and coded in step/stage 16.
  • the information about this 'short block coding' mode being used is included in the side information SI.
  • multiple 'short block coding' modes with different inverse MLT transform lengths can be used and signalled in SI.
  • a non-uniform time-frequency resolution over the short block spectra is facilitated, e.g. a higher time resolution for high frequencies and a higher frequency resolution for low frequencies.
  • the inverse MLT can get a length of 2 successive frequency bins and for the highest frequencies the inverse MLT can get a length of 16 successive frequency bins.
  • a different order of coding the resulting frequency bins can be used, for example one 'spectrum' may contain not only different frequency bins at a time, but also the same frequency bins at different points in time may be included.
  • the input signal IS adaptive switching between the processing according to Fig. 1 and the processing according to Fig. 2 is controlled by psycho-acoustic model step/stage 18. For example, if from one frame to the following frame the signal energy in input signal IS rises above a threshold (i.e. there is a transient in the input signal), the processing according to Fig. 2 is carried out. In case the signal energy is below that threshold, the processing according to Fig. 1 is carried out.
  • This switching information is included in output bitstream OBS for a corresponding switching in the decoding.
  • the transform block sections can be weighted by a window function, in particular in an overlapping manner, wherein the length of a window function corresponds to the current transform length.
  • Analysis and synthesis windows can be identical, but need not.
  • the functions of the analysis an synthesis windows h A (n) and h S (n) must fulfil some constraints for the overlapping regions of successive blocks i and i+1 in order to enable a perfect reconstruction:
  • a further window function is disclosed in table 7.33. of the AC-3 audio coding standard.
  • transition window functions are used, e.g. as described in B.Edler, "Cod mich von Audiosignalen mit überlappender Transformation und adaptiven Novafunktionen", FREQUENZ, vol.43, pp.252-256, 1989 , or as used in mp3 and described in the MPEG1 standard ISO/IEC 11172-3 in particular section 2.4.3.4.10.3, or as in AAC (e.g. as described in the MPEG4 standard ISO/IEC 14496-3, Subpart 4).
  • the received or replayed bit stream OBS is demultiplexed in a corresponding step/stage 37, thereby providing code (if present) for the speech decoder 33, the switch information SWI for switch 35, the code and the switching information for the decoding step/stage 36, and optionally side information code SI.
  • code if present
  • the speech subcoder 11,12,13,14 was used at encoding side for a current data frame, in that current frame the corresponding encoded speech band frequency bins are correspondingly reconstructed by the speech decoding step/stage 33 and the downstream MLT step/stage 34, thereby providing the reconstructed speech signal RSS.
  • the remaining encoded frequency bins are correspondingly decoded in decoding step/stage 36, whereby the encoder-side quantisation operation is reversed correspondingly.
  • the speech/audio switch 35 operates corresponding to its operation at encoding side, controlled by switch information SWI.
  • switch information SWI indicates that a music-like input signal is present in the current frame and therefore the speech coding/decoding was not used
  • the frequency bins corresponding to the low band are decoded together with the remaining frequency bins in the decoding step/stage 36, thereby providing the reconstructed remaining band signal RRBS and the reconstructed low band signal RLBS.
  • step/stage 36 and of switch 35 are correspondingly combined in inverse MLT (e.g. iMDCT) step/stage 30 and are synthesised in order to provide the decoded output signal OS.
  • inverse MLT e.g. iMDCT
  • switch 35 and in the other steps/stages the different delays introduced by the cascaded transforms are to be taken into account. The different delays can be balanced using corresponding buffering for these steps/stages.
  • step/stage 36 several temporally successive 'short block' spectra are to be decoded in step/stage 36 and collected in a first grouping step/stage 43. Overlap/add is performed (optionally involving application of window functions). Thereafter each set of temporally successive spectral coefficients is transformed using the corresponding MLT steps/stages 42, and provides a halved number of output values. The generated spectral coefficients are then grouped in a second grouping step/stage 41 to one MLT spectrum with the initial high frequency resolution and transform length.
  • multiple 'short block decoding' modes with different MLT transform lengths can be used as signalled in SI, whereby a non-uniform time-frequency resolution over the short block spectra is facilitated, e.g. a higher time resolution for high frequencies and a higher frequency resolution for low frequencies.
  • a different cascading of the MLTs can be used wherein the order of the inner MLT/inverse MLT pair in the speech encoder is switched.
  • Fig. 5 a block diagram of a corresponding encoding is depicted, wherein Fig. 1 reference signs mean the same operation as in Fig. 1 .
  • the inverse MLT 11 is replaced by an MLT step/stage 51, and the MLT 14 is replaced by an inverse MLT step/stage 54 (i.e. an 'opposite type' MLT). Due to the exchanged order of these MLTs the speech encoder input signal has different properties compared to those in Fig. 1 . Therefore the speech coder 52 and the speech decoder 53 are adapted to these different properties (e.g. such that aliasing components are cancelled out).
  • a 'short block mode' processing can be used as shown in Fig. 6 , wherein MLT steps/stages 62 corresponding to that in Fig. 4 replace the inverse MLT steps/stages 22 in Fig. 2 .
  • the speech decoding step/stage 33 in Fig. 3 is replaced by a correspondingly adapted speech decoding step/stage 73 and the MLT step/stage 34 in Fig. 3 is replaced by a corresponding inverse MLT step/stage 74.
  • a 'short block mode' processing can be used as shown in Fig. 8 , wherein corresponding inverse MLT steps/stages 82 corresponding to that in Fig. 1 replace the MLT steps/stages 42 in Fig. 4 .
  • a different way of block switching is carried out.
  • a fixed large MLT 10 e.g. an MDCT
  • several short MLTs (or MDCTs) 90 can be switched on.
  • a fixed large MLT 10 e.g. an MDCT
  • 8 short MDCTs with a transform length of 256 samples can be used.
  • the sum of the lengths of the short transforms is equal to the long transform length (although it makes buffer handling even more easier).
  • the internal buffer handling is easier than for the long/short block mode switching according to figures 1 to 8 , at the cost of a less sharp band separation between the speech frequency band and the remaining frequency band.
  • the reason for the internal buffer handling being easier is as follows: at least for each inverse MLT operation an additional buffer is required, which leads in case of an inner transform to the necessity of an additional buffer also in the parallel high frequency path. Therefore the switching at the outmost transform has the least side effects concerning buffers.
  • the Fig. 1 reference signs do mean the same operation as in Fig. 1 .
  • the MLT 10 is input signal IS adaptively replaced by short MLT steps/stages 90, the inverse MLT 11 is replaced by shorter inverse MLT steps/stages 91, and the MLT 14 is replaced by shorter MLT steps/stages 94. Due to this kind of blocks switching, the lengths of the first transform 90, 30 and the second transform 11, 34, 51, 74 (iMDCT to reconstruct the speech band) and the third transform 14, 54 are coordinated. Furthermore, several short blocks of the speech band signal can be buffered after the iMDCT 91 in Fig. 9 in order to collect enough samples for a complete input frame for the speech coder.
  • encoding of Fig. 9 can also be adapted correspondingly to the encoding described for Fig. 5 .
  • the decoding according to Fig. 3 is adapted correspondingly, i.e. the inverse MLTs 34 and 30 are each replaced by corresponding adaptively switched shorter inverse MLTs.
  • the transform block sections are weighted at encoding side in MLT 90 and at decoding side in inverse MLT 30 by window functions, in particular in an overlapping manner, wherein the length of a window function corresponds to the current transform length.
  • window functions in particular in an overlapping manner, wherein the length of a window function corresponds to the current transform length.
  • shaped long windows the start and stop windows, or transition windows

Landscapes

  • Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Claims (13)

  1. Verfahren zum Kodieren eines Sprach- und/oder Nicht-Sprach-Audio-Eingangssignals (IS), enthaltend die Schritte:
    - Transformieren (10, 90) von aufeinanderfolgenden und sich gegebenenfalls überlappenden Abschnitten des Eingangssignals (IS) durch wenigstens eine Anfangs-MLT-Transformation und Aufspalten der resultierenden Ausgangsfrequenz-Bins in ein Niederbandsignal und in ein Restbandsignal (RBS);
    - Durchlassen des Niederbandsignals zu einem Sprach/Audio-Schalter (15) und durch eine Sprach-Kodier/Dekodierschleife, die wenigstens eine kurze MLT-Transformation (11, 51, 91) vom ersten Typ, eine Sprachkodierung (12, 52), eine entsprechende Sprachdekodierung (13, 53) und wenigstens eine kurze MLT-Transformation (14, 54, 94) von einem entgegengesetzten Typ als dem der kurzen MLT-Transformation vom ersten Typ enthält;
    - Quantisieren und Kodieren (16) des Restbandsignals (RBS), gesteuert durch ein psycho-akustisches Modell, das an seinem Eingang das Audiosignal (IS) empfängt;
    - Kombinieren (17) des Ausgangssignals der Quantisierung und Kodierung (16), eines Schaltinformations-Signals (SWI) des Schalters (15), gegebenenfalls des Ausgangssignals der Sprachkodierung (12, 52) und wahlweise anderer kodierseitiger Informationen (SI), um für den aktuellen Abschnitt des Eingangssignals (IS) einen Ausgangs-Bitstrom (OBS) zu bilden,
    wobei der Sprach/Audio-Schalter (15) das Niederbandsignal und ein zweites, vom Ausgang der kurzen MLT-Transformation (14, 54, 94) vom zweiten Typ abgeleitetes Eingangssignal (DS) empfängt und entscheidet, ob das zweite Eingangssignal den Quantisierungs- und Kodierungsschritt (16) umgeht oder das Niederbandsignal zusammen mit dem Restbandsignal (RBS) in dem Quantisierungs- und Kodierungsschritt (16) kodiert wird,
    und wobei im letzteren Fall das Ausgangssignal der Sprachkodierung (12, 52) nicht in den aktuellen Abschnitt des Ausgangs-Bitstroms (OBS) eingeschlossen wird.
  2. Vorrichtung zum Kodieren eines Sprach- und/oder Nicht-Sprach-Audio-Eingangssignals (IS) mit Mitteln zum:
    - Transformieren (10, 90) von aufeinanderfolgenden und sich gegebenenfalls überlappenden Abschnitten des Eingangssignals (IS) durch wenigstens eine Anfangs-MLT-Transformation und Aufspalten der resultierenden Ausgangsfrequenz-Bins in ein Niederbandsignal und ein Restbandsignal (RBS);
    - Durchlassen des Niederbandsignals zu einem Sprach/Audio-Schalter (15) und durch eine Sprach-Kodier/Dekodierschleife, die wenigstens eine kurze MLT-Transformation (11, 51, 91) vom ersten Typ, eine Sprachkodierung (12, 52), eine entsprechende Sprachdekodierung (13, 53) und wenigstens eine kurze MLT-Transformation (14, 54, 94) von einem entgegengesetzten Typ als dem der kurzen MLT-Transformation vom ersten Typ enthält;
    - Quantisieren und Kodieren (16) des Restbandsignals (RBS), gesteuert durch ein psycho-akustisches Modell, das an seinem Eingang das Audio-Eingangssignal (IS) empfängt;
    - Kombinieren (17) des Ausgangssignals der Quantisierung und Kodierung (16), eines Schalt-Informationssignals (SWI) des Schalters (15), gegebenenfalls des Ausgangssignals der Sprachkodierung (12, 52) und wahlweise anderer kodierseitiger Informationen (SI), um für den aktuellen Abschnitt des Eingangssignals (IS) einen Ausgangs-Bitstrom (OBS) zu bilden,
    wobei der Sprach/Audio-Schalter (15) das Niederbandsignal und ein zweites, vom Ausgang der kurzen MLT-Transformation (14, 54, 94) vom zweiten Typ abgeleitetes Eingangssignal (DS) empfängt und entscheidet, ob das zweite Eingangssignal den Quantisierungs- und Kodierungsschritt (16) umgeht oder das Niederbandsignal zusammen mit dem Restbandsignal (RBS) in dem Quantisierungs- und Kodierungsschritt (16) kodiert wird,
    und wobei im letzteren Fall das Ausgangssignal der Sprachkodierung (12, 52) nicht in den aktuellen Abschnitt des Ausgangs-Bitstroms (OBS) eingeschlossen wird.
  3. Verfahren zum Dekodieren eines Bitstroms (OBS), der ein kodiertes Sprach- und/oder Nicht-Sprach-Audio-Eingangssignal (IS) darstellt, das nach dem Verfahren von Anspruch 1 kodiert wurde, wobei das Verfahren die Schritte einschließt:
    - Demultiplexen (37) aufeinanderfolgender Abschnitte des Bitstroms (OBS), um das Ausgangssignal der Quantisierung und Kodierung (16), das Schaltinformations-Signal (SWI), gegebenenfalls das Ausgangssignal der Sprachkodierung (12, 52) und die kodierseitigen Informationen (SI), falls vorhanden, wiederzugewinnen;
    - falls in einem aktuellen Abschnitt des Bitstroms (OBS) vorhanden, Durchlassen des Ausgangssignals der Sprachkodierung durch eine Sprachdekodierung (33, 73) und die kurze MLT-Transformation (34, 74) vom zweiten Typ;
    - Dekodieren (36) des Ausgangssignals der Quantisierung und Kodierung (16), gesteuert durch die kodierseitigen Informationen, falls vorhanden, um für den aktuellen Abschnitt ein rekonstruiertes Restbandsignal (RLBS) zu liefern;
    - Vorsehen eines Sprach/Audio-Schalters (15) bei dem rekonstruierten Niederbandsignal und einem zweiten, von dem Ausgang der MLT-Transformation vom zweiten Typ (34, 74) abgeleiteten Eingangssignal (CS), und Durchlassen gemäß dem Schaltinformations-Signal (SWI) entweder des rekonstruierten Niederbandsignals (RLBS) oder des zweiten Eingangssignals (CS);
    - inverse MLT-Transformation (30) des Ausgangssignals des Schalters (15) kombiniert mit dem rekonstruierten Restbandsignal (RRBS), und gegebenenfalls Überlappen aufeinanderfolgender Abschnitte, um einen aktuellen Abschnitt des rekonstruierten Ausgangssignals (OS) zu bilden.
  4. Vorrichtung zum Dekodieren eines Bitstroms (OBS), der ein kodiertes Sprach- und/oder Nicht-Sprach-Audio-Eingangssignal (IS) darstellt, das nach dem Verfahren von Anspruch 1 kodiert wurde, wobei die Vorrichtung Mittel einschließt zum:
    - Demultiplexen (37) aufeinanderfolgender Abschnitte des Bitstroms (OBS), um das Ausgangssignal der Quantisierung und Kodierung (16), das Schaltinformations-Signal (SWI), gegebenenfalls das Ausgangssignal der Sprachkodierung (12, 52) und die kodierseitigen Informationen, falls vorhanden, wiederzugewinnen;
    - falls in einem aktuellen Abschnitt des Bitstroms (OBS) vorhanden, Durchlassen des Ausgangssignals der Sprachkodierung durch eine Sprach-Dekodierung (33, 73) und die kurze MLT-Transformation (34, 74) vom zweiten Typ;
    - Dekodieren (36) des Ausgangssignals der Quantisierung und Kodierung (16), gesteuert durch die kodierseitigen Informationen, falls vorhanden, um für den aktuellen Abschnitt ein rekonstruiertes Restbandsignal (RRBS) und ein rekonstruiertes Niederbandsignal (RLBS) zu liefern;
    - Vorsehen eines Sprach/Audio-Schalters (15) bei dem rekonstruierten Niederbandsignal und einem zweiten, von dem Ausgang der MLT-Transformation vom zweiten Typ (34, 74) abgeleiteten Eingangssignal (CS), und Durchlassen gemäß dem Schaltinformations-Signal (SWI) entweder des rekonstruierten Niederbandsignals (RLBS) oder des zweiten Eingangssignals (CS);
    - inverse MLT-Transformation (30) des Ausgangssignals des Schalters (15) kombiniert mit dem rekonstruierten Restbandsignal (RRBS) und gegebenenfalls Überlappen aufeinanderfolgender Abschnitte, um einen aktuellen Abschnitt des rekonstruierten Ausgangssignals (OS) zu bilden.
  5. Verfahren nach Anspruch 1 oder 3 oder Vorrichtung nach Anspruch 2 oder 4, bei dem bzw. bei der für den Fall, dass eine einzelne MLT-Transformation (10) am Eingang der Kodierung verwendet wird und eine einzelne inverse MLT-Transformation (30) am Ausgang der Dekodierung verwendet wird, das Eingangssignal (IS) adaptiv am Eingang der Quantisierung und Kodierung (16) liegt und am Ausgang der Kodierung (36) mehrere kurze MLT-Transformationen ausgeführt werden, deren Länge jeweils kleiner ist als die Länge der einzelnen MLT-Transformation (10) bzw. der einzelnen inversen MLT-Transformation (30):
    entweder kurze inverse MLT-Transformationen (22) am Eingang der Quantisierung und Kodierung und kurze MLT-Transformationen (22) am Ausgang der Dekodierung (36),
    oder kurze MLT-Transformationen (62) am Eingang der Quantisierung und Kodierung (16) und kurze inverse MTL-Transformationen (82) am Ausgang der Dekodierung (36).
  6. Verfahren oder Vorrichtung nach Anspruch 5, bei dem bzw. bei der die kurzen MLT-Transformationen bzw. die kurzen inversen MLT-Transformationen ausgeführt werden, wenn die Signalenergie in einem aktuellen Abschnitt des Eingangssignals (IS) einen Schwellwertpegel überschreitet.
  7. Verfahren nach Anspruch 1 oder 3 oder Vorrichtung nach Anspruch 2 oder 4, bei dem bzw. bei der am Eingang der Kodierung das Eingangssignal (IS) adaptiv von einer einzelnen MLT-Transformation (10) auf mehrfache kürzere MLT-Transformationen (90) geschaltet wird, und am Ausgang der Dekodierung (36) entsprechend von einer einzelnen inversen MLT-Transformation (30) auf mehrfache kürzere inverse MLT-Transformationen.
  8. Verfahren oder Vorrichtung nach Anspruch 7, bei dem bzw. bei der die mehrfachen kürzeren MLT-Transformationen bzw. die mehrfachen kürzeren inversen MLT-Transformationen ausgeführt werden, wenn die Signalenergie in einem aktuellen Abschnitt des Eingangssignals (IS) einen Schwellwertpegel überschreitet.
  9. Verfahren nach einem der Ansprüche 1, 3 und 5 bis 8, bei dem das zweite Eingangssignal (DS) das Differenzsignal zwischen dem Niederbandsignal und dem Ausgangssignal (RSS) der MLT-Transformation (14, 54, 94) vom zweiten Typ ist.
  10. Verfahren nach einem der Ansprüche 1, 3 und 5 bis 8 oder Vorrichtung nach einem der Ansprüche 2 und 4 bis 8, bei dem bzw. bei der das zweite Eingangssignal (DS) das Ausgangssignal (RSS) der MLT-Transformation vom zweiten Typ (14, 54, 94) ist.
  11. Verfahren nach einem der Ansprüche 1, 3 und 5 bis 10 oder Vorrichtung nach einem der Ansprüche 2 und 4 bis 10, bei dem bzw. bei der das Schalten (15) durch Informationen gesteuert wird, die von dem psycho-akustischen Modell (18) empfangen werden.
  12. Verfahren nach einem der Ansprüche 1, 3 und 5 bis 11 oder Vorrichtung nach einem der Ansprüche 2 und 4 bis 11, bei dem bzw. bei der das Schalten (15) durch Verwendung einer Optimierung des Verzerrungsverhältnisses (rate-distortion) betätigt wird.
  13. Verfahren nach einem der Ansprüche 1, 3 und 5 bis 12 oder Vorrichtung nach einem der Ansprüche 2 und 4 bis 12, bei dem bzw. bei der aufeinanderfolgende Abschnitte des Eingangssignals (IS) und aufeinanderfolgende Abschnitte für das Ausgangssignal (OS) durch eine Fensterfunktion gewichtet werden, deren Länge der einschlägigen Transformationslänge, insbesondere in einer überlappenden Weise, entspricht, und wobei, wenn die Transformationslänge geschaltet wird, entsprechende Übergangsfensterfunktionen verwendet werden.
EP08159018A 2008-06-25 2008-06-25 Verfahren und Vorrichtung zur Kodierung und Dekodierung von Sprache bzw. Nicht-Sprache-Audioeingabesignalen Expired - Fee Related EP2139000B1 (de)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP08159018A EP2139000B1 (de) 2008-06-25 2008-06-25 Verfahren und Vorrichtung zur Kodierung und Dekodierung von Sprache bzw. Nicht-Sprache-Audioeingabesignalen
CN2009101503026A CN101615393B (zh) 2008-06-25 2009-06-19 对语音和/或非语音音频输入信号编码或解码的方法和设备

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
EP08159018A EP2139000B1 (de) 2008-06-25 2008-06-25 Verfahren und Vorrichtung zur Kodierung und Dekodierung von Sprache bzw. Nicht-Sprache-Audioeingabesignalen

Publications (2)

Publication Number Publication Date
EP2139000A1 EP2139000A1 (de) 2009-12-30
EP2139000B1 true EP2139000B1 (de) 2011-05-25

Family

ID=39718977

Family Applications (1)

Application Number Title Priority Date Filing Date
EP08159018A Expired - Fee Related EP2139000B1 (de) 2008-06-25 2008-06-25 Verfahren und Vorrichtung zur Kodierung und Dekodierung von Sprache bzw. Nicht-Sprache-Audioeingabesignalen

Country Status (2)

Country Link
EP (1) EP2139000B1 (de)
CN (1) CN101615393B (de)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10504532B2 (en) 2014-05-07 2019-12-10 Samsung Electronics Co., Ltd. Method and device for quantizing linear predictive coefficient, and method and device for dequantizing same

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102074242B (zh) * 2010-12-27 2012-03-28 武汉大学 语音音频混合分级编码中核心层残差提取***及方法
CN102103859B (zh) * 2011-01-11 2012-04-11 东南大学 一种数字音频编码、解码方法及装置
CN102737636B (zh) * 2011-04-13 2014-06-04 华为技术有限公司 一种音频编码方法及装置
CN103198834B (zh) * 2012-01-04 2016-12-14 ***通信集团公司 一种音频信号处理方法、装置及终端
KR20240010550A (ko) 2014-03-28 2024-01-23 삼성전자주식회사 선형예측계수 양자화방법 및 장치와 역양자화 방법 및 장치
CN107424622B (zh) * 2014-06-24 2020-12-25 华为技术有限公司 音频编码方法和装置
CN106033982B (zh) * 2015-03-13 2018-10-12 ***通信集团公司 一种实现超宽带语音互通的方法、装置和终端

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6658383B2 (en) * 2001-06-26 2003-12-02 Microsoft Corporation Method for coding speech and music signals
WO2003065353A1 (en) * 2002-01-30 2003-08-07 Matsushita Electric Industrial Co., Ltd. Audio encoding and decoding device and methods thereof
KR100467617B1 (ko) * 2002-10-30 2005-01-24 삼성전자주식회사 개선된 심리 음향 모델을 이용한 디지털 오디오 부호화방법과그 장치
DE10328777A1 (de) * 2003-06-25 2005-01-27 Coding Technologies Ab Vorrichtung und Verfahren zum Codieren eines Audiosignals und Vorrichtung und Verfahren zum Decodieren eines codierten Audiosignals
CN1471236A (zh) * 2003-07-01 2004-01-28 北京阜国数字技术有限公司 用于感知音频编码的信号自适应多分辨率滤波器组

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10504532B2 (en) 2014-05-07 2019-12-10 Samsung Electronics Co., Ltd. Method and device for quantizing linear predictive coefficient, and method and device for dequantizing same
US11238878B2 (en) 2014-05-07 2022-02-01 Samsung Electronics Co., Ltd. Method and device for quantizing linear predictive coefficient, and method and device for dequantizing same
US11922960B2 (en) 2014-05-07 2024-03-05 Samsung Electronics Co., Ltd. Method and device for quantizing linear predictive coefficient, and method and device for dequantizing same

Also Published As

Publication number Publication date
CN101615393B (zh) 2013-01-02
EP2139000A1 (de) 2009-12-30
CN101615393A (zh) 2009-12-30

Similar Documents

Publication Publication Date Title
EP2139000B1 (de) Verfahren und Vorrichtung zur Kodierung und Dekodierung von Sprache bzw. Nicht-Sprache-Audioeingabesignalen
EP2255358B1 (de) Skalierbare sprache und audiocodierung unter verwendung einer kombinatorischen codierung des mdct-spektrums
Neuendorf et al. Unified speech and audio coding scheme for high quality at low bitrates
EP2311032B1 (de) Audiocodierer und decodierer zum codieren und decodieren von audioabtastwerten
EP2301020B1 (de) Vorrichtung und verfahren zur kodierung/dekodierung eines tonsignals anhand eines aliasing-schaltschemas
EP2186088B1 (de) Spektralanalyse/synthese mit geringer komplexität unter verwendung von auswählbarer zeitauflösung
RU2507572C2 (ru) Звуковое кодирующее устройство и декодер для кодирования декодирования фреймов квантованного звукового сигнала
EP2044589B1 (de) Verfahren und vorrichtung zur verlustlosen codierung eines quellensignals unter verwendung eines verlustbehaftet codierten datenstroms und eines verlustlosen erweiterungsdatenstroms
CN101371296B (zh) 用于编码和解码信号的设备和方法
US20130173275A1 (en) Audio encoding device and audio decoding device
US9240192B2 (en) Device and method for efficiently encoding quantization parameters of spectral coefficient coding
AU2013200679B2 (en) Audio encoder and decoder for encoding and decoding audio samples
Motlicek et al. Frequency domain linear prediction for QMF sub-bands and applications to audio coding
EP3002751A1 (de) Audiocodierer und -decodierer zur codierung und decodierung von audioproben
Motlicek et al. Scalable wide-band audio codec based on frequency domain linear prediction
Motlicek et al. Scalable Wide-band Audio Codec based on Frequency Domain Linear Prediction (version 2)
Quackenbush MPEG Audio Compression Future
KR19980036961A (ko) 음성 부호화 및 복호화 장치와 그 방법

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MT NL NO PL PT RO SE SI SK TR

AX Request for extension of the european patent

Extension state: AL BA MK RS

17P Request for examination filed

Effective date: 20100223

17Q First examination report despatched

Effective date: 20100324

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

AKX Designation fees paid

Designated state(s): DE FR GB

RIC1 Information provided on ipc code assigned before grant

Ipc: G10L 19/02 20060101ALI20100830BHEP

Ipc: G10L 19/14 20060101AFI20100830BHEP

Ipc: G10L 19/04 20060101ALI20100830BHEP

Ipc: G10L 11/02 20060101ALI20100830BHEP

GRAS Grant fee paid

Free format text: ORIGINAL CODE: EPIDOSNIGR3

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: THOMSON LICENSING

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): DE FR GB

REG Reference to a national code

Ref country code: GB

Ref legal event code: FG4D

REG Reference to a national code

Ref country code: DE

Ref legal event code: R096

Ref document number: 602008007198

Country of ref document: DE

Effective date: 20110707

REG Reference to a national code

Ref country code: DE

Ref legal event code: R084

Ref document number: 602008007198

Country of ref document: DE

REG Reference to a national code

Ref country code: GB

Ref legal event code: 746

Effective date: 20110627

REG Reference to a national code

Ref country code: DE

Ref legal event code: R084

Ref document number: 602008007198

Country of ref document: DE

Effective date: 20110622

PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

26N No opposition filed

Effective date: 20120228

REG Reference to a national code

Ref country code: DE

Ref legal event code: R097

Ref document number: 602008007198

Country of ref document: DE

Effective date: 20120228

REG Reference to a national code

Ref country code: FR

Ref legal event code: PLFP

Year of fee payment: 8

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GB

Payment date: 20150626

Year of fee payment: 8

Ref country code: DE

Payment date: 20150625

Year of fee payment: 8

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: FR

Payment date: 20150622

Year of fee payment: 8

REG Reference to a national code

Ref country code: DE

Ref legal event code: R082

Ref document number: 602008007198

Country of ref document: DE

Representative=s name: KASTEL PATENTANWAELTE, DE

REG Reference to a national code

Ref country code: DE

Ref legal event code: R082

Ref document number: 602008007198

Country of ref document: DE

REG Reference to a national code

Ref country code: DE

Ref legal event code: R119

Ref document number: 602008007198

Country of ref document: DE

GBPC Gb: european patent ceased through non-payment of renewal fee

Effective date: 20160625

REG Reference to a national code

Ref country code: FR

Ref legal event code: ST

Effective date: 20170228

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: FR

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20160630

Ref country code: DE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20170103

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GB

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20160625