EP2245621B1 - Method and means for encoding background noise information - Google Patents

Method and means for encoding background noise information Download PDF

Info

Publication number
EP2245621B1
EP2245621B1 EP09711908.5A EP09711908A EP2245621B1 EP 2245621 B1 EP2245621 B1 EP 2245621B1 EP 09711908 A EP09711908 A EP 09711908A EP 2245621 B1 EP2245621 B1 EP 2245621B1
Authority
EP
European Patent Office
Prior art keywords
sid
background noise
narrowband
speech
parameters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
EP09711908.5A
Other languages
German (de)
French (fr)
Other versions
EP2245621A1 (en
Inventor
Herve Taddei
Stefan Schandl
Panji Setiawan
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Unify GmbH and Co KG
Original Assignee
Unify GmbH and Co KG
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Unify GmbH and Co KG filed Critical Unify GmbH and Co KG
Publication of EP2245621A1 publication Critical patent/EP2245621A1/en
Application granted granted Critical
Publication of EP2245621B1 publication Critical patent/EP2245621B1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/012Comfort noise or silence coding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0204Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using subband decomposition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/24Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding

Definitions

  • the invention relates to methods and means for encoding background noise information in speech signal coding methods.
  • Such a limited frequency range is also provided in many speech signal coding methods for today's digital telecommunications.
  • a bandwidth limitation of the analog signal Prior to a coding process, a bandwidth limitation of the analog signal is performed for this purpose.
  • a codec is used which, due to the described bandwidth limitation in the frequency range between 300 Hz and 3400 Hz, is also referred to below as narrow-band speech codec (Narrow Band Speech Codec).
  • the term codec is understood to mean both the coding rule for the digital coding of audio signals and the decoding rule for the decoding of data with the aim of reconstructing the audio signal.
  • a narrowband speech codec is known from ITU-T Recommendation G.729.
  • a transmission of a narrowband speech signal with a data rate of 8 kbit / s is provided.
  • broadband speech codecs Wide Band Speech Codec
  • Such an extended frequency range is e.g. between a frequency of 50 Hz and 7000 Hz.
  • a wideband voice codec is known from ITU-T Recommendation G.729.EV.
  • coding methods for broadband speech codecs are made scalable.
  • scalability it is meant here that the transmitted coded data includes various demarcated blocks containing the narrowband portion, the wideband portion and / or the full bandwidth of the coded voice signal.
  • such a scalable design allows for backward compatibility on the receiver side and, on the other hand, offers a simple possibility of adapting the data rate and the size of transmitted data frames in the transmission channel in the case of limited data transmission capacities in the transmission channel.
  • a compression of the data to be transmitted For a reduction of the data transmission rate by a codec is usually provided a compression of the data to be transmitted. Compression is achieved, for example, by coding methods, parameters for an excitation signal and filter parameters being determined for encoding the speech data. The filter parameters and parameters specifying the excitation signal are then transmitted to the receiver. There, a synthetic speech signal is synthesized using the codec, which is as similar as possible to the original speech signal in terms of a subjective hearing impression. With the help of this method, also known as "analysis-by-synthesis", the determined and digitized samples are not transmitted, but determined parameters that allow a receiver-side synthesis of the speech signal.
  • a further measure for reducing the data transmission rate is provided by a method for discontinuous transmission (Discontinuous Transmission), which is also familiar in the art under the term DTX.
  • DTX discontinuous Transmission
  • the basic goal of DTX is to reduce the data transfer rate in the event of a speech break.
  • a pause detection (Voice Activity Detection, VAD) is used on the part of the transmitter, which recognizes when a certain signal level falls below a speech break.
  • VAD Voice Activity Detection
  • the receiver is not expected to have complete silence during a speech break.
  • a complete silence on the receiver side would lead to irritation or even to the suspicion of a breakdown of the connection.
  • methods for generating a so-called comfort noise are applied.
  • Comfort noise is noise that is synthesized to fill silence phases on the receiver's side.
  • the comfort noise serves as a subjective impression of a continuing connection, without claiming the data transmission rate intended for the transmission of speech signals. In other words, less effort is required to code the speech data for the transmitter-side coding of the noise. For a receiver-side still perceived as realistic synthesizing the comfort noise data are transmitted at a much lower data rate.
  • the data transmitted here are also referred to in the art as SID (Silence Insertion Description).
  • G.729.1 SID has an embedded structure with a core SID equal to the G.729 SID and a first and second extension layer.
  • the first enhancement layer adds some parameters for narrowband comfort noise, while the second enhancement layer adds wideband information, with the SID much smaller than any other frame.
  • a marker (M) bit should be set to 1 when using DTX in the RTP header.
  • ITU-T standard G.729.1 (05/2006): SERIES G: TRANSMISSION SYSTEMS AND MEDIA, DIGITAL SYSTEMS AND NETWORKS, Digital terminal equipments - Coding of analogue signals by methods other than PCM, "G.729-based embedded variable bit-rate coder: to 8-32 kbps scalable wideband coder bitstream interoperable with G.729", ITU-T RECOMMENDATION G.729.1, approved on 29 May 2006 (2006-05 -29) by ITU-T Study Group 16 (2005-2008), International Telecommunication Union, Geneva, CH, ITU-T Rec.
  • the encoding of the background noise information occurs over either the entire bandwidth of the input noise signal or over a portion of the bandwidth of the input noise signal.
  • the encoded noise signal is transmitted in the form of SID frames via the DTX method and reconstructed on the receiver side.
  • the reconstructed, ie synthesized, comfort noise may therefore have a different quality than the speech information synthesized on the receiver side. This adversely affects the reception of the recipient.
  • the object of the invention is to provide an improved implementation of the DTX method in scalable speech codecs.
  • a basic idea of the invention is to provide the scalability known for the transmission of speech information analogously to the formation of a SID frame.
  • the inventive method for encoding a SID frame for transmission of background noise information using a scalable speech signal encoding method provides for encoding a narrowband first and a wideband second portion of the background noise information.
  • the encoding is usually done at the same time and in different ways. However, the encoding of a share can of course also take place with a time offset before or after an encoding of another share. Likewise, the encoding of the two components can optionally also be carried out in the same way.
  • a SID frame is formed with separate regions for the first and the second component. In other words, in the SID frame, this means that a first data area receives the data for the encoded first portion, while a second data area separate from it receives the data for the encoded second portion.
  • An essential advantage of the invention is that it can be determined on the receiver side whether comfort noise should be based on the broadband portion of the transmitted SID frames or on the basis of the narrowband portion.
  • This is of particular advantage for the receiver-side acoustic reception in a situation where the transmission rate for speech information frames has been reduced so that only narrowband speech information is transmitted. Namely, as synthesized in the current state of the art, narrowband speech information in conjunction with broadband noise, this is very irritating for the receiver.
  • the said reduction of the transmission rate for speech information frames can be caused, for example, by a high congestion of the network between transmitter and receiver.
  • the much smaller SID frames are not affected by such a network bottleneck. For them, there is no compulsion to reduce their data transfer rate or their content.
  • a third portion is provided in the definition of the SID frame.
  • This contains encoded background noise parameters, which are encoded with an increased data rate, although the third component still contains narrowband data (extended narrowband data or "enhanced low band").
  • the advantage of defining the SID frame with this third component is the ability to reproduce a noise signal in a quality enhanced in comparison to conventional narrowband coding while remaining in compliance with the G.729.B standard.
  • the single FIGURE shows a structure of a SID frame according to the invention.
  • discontinuous transmission (DTX) methods for the transmission of background noise information currently do not support the scalable character intended for the transmission of the speech information.
  • narrow-band speech codecs such as e.g. 3GPP AMR, ITU-T G.729 and on the other hand broadband speech codecs, e.g. 3GPP AMR-WB, ITU-T G.722.
  • a narrow-band speech codec encodes speech signals at a sampling frequency of 8 kHz with a bandwidth which is usually in the frequency range between 300 and 3400 Hz.
  • a wideband speech codec encodes a speech signal having a sampling frequency of 16 kHz at a bandwidth in a frequency range between 50 and 7000 Hz.
  • Some of these codecs use DTX techniques, that is, discontinuous transmission techniques to reduce the overall transmission rate in the communication channel.
  • DTX discontinuous transmission techniques to reduce the overall transmission rate in the communication channel.
  • SID frames are transmitted, with the bandwidth of the SID frames corresponding to the bandwidth of the voice signal.
  • the background noise during a speech break is described.
  • This codec G.729.1 is a scalable speech codec in which the DTX method is currently not scalable across the entire bandwidth.
  • the coding method can be characterized as follows during an active speech period, in contrast to a speech pause recognized as »Silent Period «:
  • the speech signal is split into two parts, namely a narrowband (lowband) part and a broadband (highband) part. Both signals are sampled at a sampling frequency of 8 kHz.
  • the division into a narrowband and a broadband component takes place in a special bandpass filter, which is also referred to as QMF (Quadrature Mirror Filter).
  • the narrowband portion of the speech signal is encoded at a data rate of 8 and 12 kbit / s.
  • a CELP Code Excited Linear Prediction
  • the narrowband component is further modified taking into account the »Transform Codec « section of G.729.1.
  • the broadband portion of the current frame again assuming it contains voice signals, is encoded at a data rate of 14 kbit / s using the TDBWE (Time Domain Bandwidth Extension) method.
  • TDBWE Time Domain Bandwidth Extension
  • the speech signal is also split into a narrowband and a broadband component, with both components sampled at a frequency of 8 kHz.
  • the decomposition also takes place via a QMF filter.
  • the narrowband portion is encoded using narrow-band SID information.
  • This narrowband SID information is sent to the receiver at a later time in a SID frame compatible with the G.729 standard. Further measures as described above can contribute to an improvement of the narrowband SID component.
  • the broadband component is encoded using a modified TDBWE method.
  • the speech signal is further encoded at a data rate of 14 kbit / s, while at the same time the background noise detected during the speech pause is evaluated and corresponding parameters are set.
  • the background noise is evaluated with regard to the energy of the noise signal and its frequency distribution.
  • the temporal fine structure is not evaluated, but merely an average of the energy is formed over the frame.
  • FIG shows a SID frame with separate areas for a narrowband first portion LB ("Low Band”), a broadband second portion HB ("High Band”) and an itermediate third portion ELB ("Enhanced Low Band”).
  • LB narrowband first portion
  • HB broadband second portion
  • ELB Enhanced Low Band
  • the first component LB contains encoded background noise parameters, which are encoded at a data rate of 8 kbit / s or below.
  • the data length of the first component LB is, for example, 15 bits.
  • the second component HB contains encoded background noise parameters, which are encoded with a data rate between 14 kbit / s and 32 kbit / s.
  • the data length of the second component HB is for example 19 bits.
  • the third component ELB contains encoded background noise parameters, which are encoded with a data rate of greater than 8 kbit / s, for example 12 kbit / s.
  • the data length of the third component ELB is 9 bits, for example.
  • characteristics of the background noise are learned on the part of the encoder.
  • the characteristics include in particular the temporal distribution as well as the spectral form of the background noise.
  • a filtering method is used, which takes into account temporal and spectral parameters of background noise from previous frames. If there are significant changes in the character or magnitude of background noise, a decision is made based on threshold values as to whether there is a need to update the learned parameters.
  • the following procedure is carried out:
  • a "regular" ie a speech signal containing, frame is received
  • the data rate for such regular frames is usually 8 kbit / s or above.
  • comfort noise is synthesized, and in the case of a wideband SID, a broadband comfort noise is synthesized and output with a read-out gain.
  • the embodiments relate to further details for incorporating the DTX method into broadband codecs such as e.g. G.729.1, and further methods for modifying the TDBWE method, which include synthesizing comfort noise during non-active frames, i. Frames without language information, support.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Telephonic Communication Services (AREA)
  • Mobile Radio Communication Systems (AREA)

Description

Die Erfindung betrifft Verfahren und Mittel zur Enkodierung von Hintergrundrauschinformationen bei Sprachsignalkodierungsverfahren.The invention relates to methods and means for encoding background noise information in speech signal coding methods.

Für Telefongespräche ist seit den Anfängen der Telekommunikation eine Bandbreitenbeschränkung für eine analoge Sprachübertragung vorgesehen. Die Sprachübertragung erfolgt auf einem eingeschränkten Frequenzbereich von 300 Hz bis 3400 Hz.For telephone calls, a bandwidth limitation has been provided for analog voice transmission since the beginning of telecommunications. The voice transmission takes place over a restricted frequency range from 300 Hz to 3400 Hz.

Ein solcher eingeschränkter Frequenzbereich ist auch bei vielen Sprachsignalkodierungsverfahren für die heutige digitale Telekommunikation vorgesehen. Vor einem Kodiervorgang wird hierzu eine Bandbreitenbegrenzung des analogen Signals durchgeführt. Zur Kodierung und zur Dekodierung kommt dabei ein Codec zum Einsatz, welcher aufgrund der beschriebenen Bandbreitenbeschränkung im Frequenzbereich zwischen 300 Hz und 3400 Hz im Folgenden auch als schmalbandiger Sprach-Codec (Narrow Band Speech Codec) bezeichnet wird. Unter dem Begriff Codec wird dabei sowohl die Kodiervorschrift zur digitalen Kodierung von Audiosignalen als auch die Dekodiervorschrift zur Dekodierung von Daten mit dem Ziel einer Rekonstruktion des Audiosignals verstanden.Such a limited frequency range is also provided in many speech signal coding methods for today's digital telecommunications. Prior to a coding process, a bandwidth limitation of the analog signal is performed for this purpose. For coding and decoding, a codec is used which, due to the described bandwidth limitation in the frequency range between 300 Hz and 3400 Hz, is also referred to below as narrow-band speech codec (Narrow Band Speech Codec). The term codec is understood to mean both the coding rule for the digital coding of audio signals and the decoding rule for the decoding of data with the aim of reconstructing the audio signal.

Ein schmalbandiger Sprach-Codec ist beispielsweise aus der ITU-T-Empfehlung G.729 bekannt. Mittels der dort beschriebenen Kodiervorschrift ist eine Übertragung eines schmalbandigen Sprachsignals mit einer Datenrate von 8 kbit/s vorgesehen.For example, a narrowband speech codec is known from ITU-T Recommendation G.729. By means of the coding rule described therein, a transmission of a narrowband speech signal with a data rate of 8 kbit / s is provided.

Weiterhin sind sogenannte breitbandige Sprach-Codecs (Wide Band Speech Codec) bekannt, welche zur Verbesserung des Höreindrucks eine Kodierung eines in einem erweiterten Frequenzbereich vorsehen. Ein derart erweiterter Frequenzbereich liegt z.B. zwischen einer Frequenz von 50 Hz und 7000 Hz. Ein breitbandiger Sprach-Codec ist beispielsweise aus der ITU-T-Empfehlung G.729.EV bekannt.Furthermore, so-called broadband speech codecs (Wide Band Speech Codec) are known, which provide for the improvement of the auditory impression encoding one in an extended frequency range. Such an extended frequency range is e.g. between a frequency of 50 Hz and 7000 Hz. For example, a wideband voice codec is known from ITU-T Recommendation G.729.EV.

Üblicherweise sind Kodierungsverfahren für breitbandige Sprach-Codecs skalierbar gestaltet. Mit einer Skalierbarkeit ist hier gemeint, dass die übertragenen kodierten Daten verschiedene abgegrenzte Blöcke enthalten, welche den schmalbandigen Anteil, den breitbandigen Anteil und/oder die volle Bandbreite des kodierten Sprachsignals enthalten. Eine solche skalierbare Gestaltung gestattet einerseits eine empfängerseitige Abwärtskompatibilität und andererseits bietet sie eine einfache Möglichkeit, im Falle von eingeschränkten Datenübertragungskapazitäten im Übertragungskanal eine sender- und empfängerseitige Anpassung der Datenrate und der Größe von übertragenen Datenrahmen vorzunehmen.Usually, coding methods for broadband speech codecs are made scalable. By scalability, it is meant here that the transmitted coded data includes various demarcated blocks containing the narrowband portion, the wideband portion and / or the full bandwidth of the coded voice signal. On the one hand, such a scalable design allows for backward compatibility on the receiver side and, on the other hand, offers a simple possibility of adapting the data rate and the size of transmitted data frames in the transmission channel in the case of limited data transmission capacities in the transmission channel.

Für eine Reduzierung der Datenübertragungsrate durch einen Codec ist üblicherweise eine Komprimierung der zu übertragenden Daten vorgesehen. Eine Komprimierung wird beispielsweise durch Kodierungsverfahren erreicht, wobei zur Kodierung der Sprachdaten Parameter für ein Anregungssignal und Filterparameter bestimmt werden. Die Filterparameter sowie das Anregungssignal spezifizierende Parameter werden dann an den Empfänger übertragen. Dort wird mithilfe des Codecs ein synthetisches Sprachsignal synthetisiert, das dem ursprünglichen Sprachsignal hinsichtlich eines subjektiven Höreindrucks möglichst ähnlich ist. Mit Hilfe diesem auch als »Analysis-by-Synthesis« bezeichneten Verfahren werden nicht die ermittelten und digitalisierten Abtastwerte (Samples) selbst übertragen, sondern ermittelte Parameter, die eine empfängerseitige Synthese des Sprachsignals ermöglichen.For a reduction of the data transmission rate by a codec is usually provided a compression of the data to be transmitted. Compression is achieved, for example, by coding methods, parameters for an excitation signal and filter parameters being determined for encoding the speech data. The filter parameters and parameters specifying the excitation signal are then transmitted to the receiver. There, a synthetic speech signal is synthesized using the codec, which is as similar as possible to the original speech signal in terms of a subjective hearing impression. With the help of this method, also known as "analysis-by-synthesis", the determined and digitized samples are not transmitted, but determined parameters that allow a receiver-side synthesis of the speech signal.

Eine weitere Maßnahme zur Reduzierung der Datenübertragungsrate bietet ein Verfahren zur diskontinuierlichen Übertragung (Discontinuous Transmission), welches in der Fachwelt auch unter dem Begriff DTX geläufig ist. Das grundsätzliche Ziel von DTX ist eine Reduzierung der Datenübertragungsrate im Fall einer Sprechpause.A further measure for reducing the data transmission rate is provided by a method for discontinuous transmission (Discontinuous Transmission), which is also familiar in the art under the term DTX. The basic goal of DTX is to reduce the data transfer rate in the event of a speech break.

Hierzu kommt auf Seiten des Senders eine Sprechpausenerkennung (Voice Activity Detection, VAD) zum Einsatz, welche bei Unterschreiten eines bestimmten Signalpegels auf eine Sprechpause erkennt. Üblicherweise wird vom Empfänger während einer Sprechpause keine völlige Stille erwartet. Im Gegenteil würde eine völlige Stille empfängerseitig zu Irritationen oder sogar zur Vermutung eines Verbindungsabbaus führen. Aus diesem Grund werden Verfahren zur Erzeugung eines sogenannten Komfortrauschen (Comfort Noise) angewandt.For this purpose, a pause detection (Voice Activity Detection, VAD) is used on the part of the transmitter, which recognizes when a certain signal level falls below a speech break. Usually the receiver is not expected to have complete silence during a speech break. On the contrary, a complete silence on the receiver side would lead to irritation or even to the suspicion of a breakdown of the connection. For this reason, methods for generating a so-called comfort noise (Comfort Noise) are applied.

Bei einem Komfortrauschen handelt es sich um Rauschen, welches zur Füllung von Stillephasen auf Seiten des Empfängers synthetisiert wird. Das Komfortrauschen dient einem subjektiven Eindruck einer weiter bestehenden Verbindung, ohne die für die Übertragung von Sprachsignalen vorgesehene Datenübertragungsrate zu beanspruchen. Mit anderen Worten wird zur senderseitigen Kodierung des Rauschens ein geringerer Aufwand als zur Kodierung der Sprachdaten betrieben. Für eine empfängerseitig noch als realistisch empfundene Synthetisierung des Komfortrauschens werden Daten mit einer weitaus niedrigeren Datenrate übertragen. Die hierbei übertragenen Daten werden in der Fachwelt auch als SID (Silence Insertion Description) bezeichnet.Comfort noise is noise that is synthesized to fill silence phases on the receiver's side. The comfort noise serves as a subjective impression of a continuing connection, without claiming the data transmission rate intended for the transmission of speech signals. In other words, less effort is required to code the speech data for the transmitter-side coding of the noise. For a receiver-side still perceived as realistic synthesizing the comfort noise data are transmitted at a much lower data rate. The data transmitted here are also referred to in the art as SID (Silence Insertion Description).

Derzeit in der Entwicklung stehende Codecs konzentrieren sich auf eine skalierbare Enkodierung der Sprachinformation. Mit Hilfe einer skalierbaren Ansatzes wird erreicht, dass das Ergebnis des Enkodiervorgangs verschiedene Blöcke enthält, welche den schmalbandigen Anteil des ursprünglichen Sprachsignals enthalten, den breitbandigen Anteil oder auch die volle Bandbreite des Sprachsignals enthalten, also z.B. einen Frequenzbereich zwischen 50 und 7000 Hz.
In SOLLAUD A: "G.729.1 RTP Payload Format update: DTX support" INTERNET CITATION, [ONLINE] 8. Februar 2008 (2008-02-08), XP002526621, URL: http://tools.ietf.org/id/draft-ietf-avt-rfc4749-dtx-update-00.txt , ist eine Aktualisierung des Real-time Transport Protocol (RTP) Lastformats zur Verwendung für den ITU-T G.729.1 Sprachcodec beschrieben. Die Aktualisierung fügt der RFC 4749 Spezifikation in rückwärts kompatibler Weise eine Unterstützung von DTX hinzu. Als Hintergrundinformation ist aufgeführt, dass G.729.1 SID eine eingebettete Struktur aufweist mit einem Kern-SID gleich dem G.729 SID und einer ersten und zweiten Erweiterungsschicht. Die erste Erweiterungsschicht fügt manche Parameter für Schmalbandkomfortrauschen hinzu, während die zweite Erweiterungsschicht Breitbandinformation hinzufügt, wobei der SID viel kleiner ist als jeder andere Rahmen. Die Bildung der Parameter für Schmalbandkomfortrauschen und der Breitbandinformation ist nicht beschrieben. Ein Marker (M) bit soll bei Verwendung von DTX im RTP header auf eins gesetzt sein.
Zur Sprachpausenerkennung (Voice Activity Detection, VAD) wird hingewiesen auf ITU-T G.729, Annex B (11/96): SERIES G: TRANSMISSION SYSTEMS AND MEDIA, DIGITAL SYSTEMS AND NETWORKS, Digital transmission systems -Terminal equipments - Coding of analogue signals by methods other than PCM, Coding of speech at 8 kbits/s using conjugate-structure algebraic-code-excited linear prediction (CS-ACELP), Annex B: "A silence compression scheme for G.729 optimized for terminals conforming to Recommendation V.70", ITU-T RECOMMENDATION G.729 - Annex B, 1. November 1996 (1996-11-01), Seiten i, ii, iii + 1-16, XP002259964 , insbesondere Abschnitt B.4.1.1.
Weiterhin gibt es den ITU-T Standard G.729.1 (05/2006): SERIES G: TRANSMISSION SYSTEMS AND MEDIA, DIGITAL SYSTEMS AND NETWORKS, Digital terminal equipments - Coding of analogue signals by methods other than PCM, "G.729-based embedded variable bit-rate coder: An 8-32 kbit/s scalable wideband coder bitstream interoperable with G.729", ITU-T RECOMMENDATION G.729.1, approved on 29 May 2006 (2006-05-29) by ITU-T Study Group 16 (2005-2008), International Telecommunication Union, Genf, CH, ITU-T Rec. G.729.1 (05/2006), XP01740459 .
In gegenwärtigen skalierbaren Kodierungsverfahren erfolgt die Enkodierung der Hintergrundrauschinformation entweder über die gesamte Bandbreite des Eingangsrauschsignals oder über einen Ausschnitt aus der Bandbreite des Eingangsrauschsignals. Das enkodierte Rauschsignal wird in Form von SID-Rahmen über das DTX-Verfahren übertragen und empfängerseitig rekonstruiert. Das rekonstruierte, d.h. synthetisierte Komfortrauschen weist also eventuell eine andere Qualität als die empfängerseitig synthetisierte Sprachinformation auf. Dies wirkt sich nachteilig auf die Rezeption des Empfängers aus. Aufgabe der Erfindung ist es, eine verbesserte Implementierung des DTX-Verfahrens in skalierbaren Sprachcodecs anzugeben.
Codecs currently under development focus on scalable encoding of speech information. With the help of a scalable approach, it is achieved that the result of the encoding process contains different blocks which contain the narrow-band component of the original speech signal, the broadband component or also the full bandwidth of the speech signal, eg a frequency range between 50 and 7000 Hz.
In SOLUTION A: "G.729.1 RTP Payload Format update: DTX support" INTERNET CITATION, [ONLINE] February 8, 2008 (2008-02-08), XP002526621, URL: http://tools.ietf.org/id/draft -ietf-avt-rfc4749-dtx-update-00.txt , an actual-time transport protocol (RTP) load format update is described for use with the ITU-T G.729.1 voice codec. The update adds support for DTX in a backward compatible manner to the RFC 4749 specification. As background information, it is stated that G.729.1 SID has an embedded structure with a core SID equal to the G.729 SID and a first and second extension layer. The first enhancement layer adds some parameters for narrowband comfort noise, while the second enhancement layer adds wideband information, with the SID much smaller than any other frame. The formation of parameters for narrow-band comfort noise and broadband information is not described. A marker (M) bit should be set to 1 when using DTX in the RTP header.
For Voice Activity Detection (VAD) reference is made to ITU-T G.729, Annex B (11/96): SERIES G: TRANSMISSION SYSTEMS AND MEDIA, DIGITAL SYSTEMS AND NETWORKS, Digital transmission systems -Terminal equipments - Coding of analogue signals by methods other than PCM, Coding of speech at 8 kbits / s using conjugate-structure algebraic-code-excited linear prediction (CS-ACELP), Annex B: "A silence compression scheme for G.729 optimized for terminals conforming to Recommendation V.70", ITU-T RECOMMENDATION G.729 - Annex B, November 1, 1996 (1996-11-01), pages i, ii , iii + 1-16, XP002259964 , in particular section B.4.1.1.
Furthermore, there is the ITU-T standard G.729.1 (05/2006): SERIES G: TRANSMISSION SYSTEMS AND MEDIA, DIGITAL SYSTEMS AND NETWORKS, Digital terminal equipments - Coding of analogue signals by methods other than PCM, "G.729-based embedded variable bit-rate coder: to 8-32 kbps scalable wideband coder bitstream interoperable with G.729", ITU-T RECOMMENDATION G.729.1, approved on 29 May 2006 (2006-05 -29) by ITU-T Study Group 16 (2005-2008), International Telecommunication Union, Geneva, CH, ITU-T Rec. G.729.1 (05/2006), XP01740459 ,
In current scalable encoding techniques, the encoding of the background noise information occurs over either the entire bandwidth of the input noise signal or over a portion of the bandwidth of the input noise signal. The encoded noise signal is transmitted in the form of SID frames via the DTX method and reconstructed on the receiver side. The reconstructed, ie synthesized, comfort noise may therefore have a different quality than the speech information synthesized on the receiver side. This adversely affects the reception of the recipient. The object of the invention is to provide an improved implementation of the DTX method in scalable speech codecs.

Die Aufgabe wird durch den Gegenstand der unabhängigen Ansprüche gelöst.The object is solved by the subject matter of the independent claims.

Ein Grundgedanke der Erfindung besteht darin, die für die Übertragung von Sprachinformationen bekannte Skalierbarkeit analog bei der Bildung eines SID-Rahmens vorzusehen.A basic idea of the invention is to provide the scalability known for the transmission of speech information analogously to the formation of a SID frame.

Das erfindungsgemäße Verfahren zur Enkodierung eines SID-Rahmens für eine Übermittlung von Hintergrundrauschinformationen in Anwendung eines skalierbaren Sprachsignalkodierungsverfahren sieht eine Enkodierung eines schmalbandigen ersten und eines breitbandigen zweiten Anteils der Hintergrundrauschinformation vor. Die Enkodierung wird üblicherweise zeitgleich und auf verschiedene Art und Weise erfolgen. Die Enkodierung eines Anteils kann jedoch selbstverständlich auch zeitlich versetzt vor oder nach einer Enkodierung eines anderen Anteils erfolgen. Ebenso kann die Enkodierung der beiden Anteile optional auch in gleicher Weise erfolgen.
Nach der Enkodierung der beiden Anteile wird ein SID-Rahmen gebildet mit getrennten Bereichen für den ersten und den zweiten Anteil. Dies bedeutet mit anderen Worten, dass im SID-Rahmen ein erster Datenbereich die Daten für den enkodierten ersten Anteil aufnimmt, während ein davon getrennter zweiter Datenbereich die Daten für den enkodierten zweiten Anteil aufnimmt.
The inventive method for encoding a SID frame for transmission of background noise information using a scalable speech signal encoding method provides for encoding a narrowband first and a wideband second portion of the background noise information. The encoding is usually done at the same time and in different ways. However, the encoding of a share can of course also take place with a time offset before or after an encoding of another share. Likewise, the encoding of the two components can optionally also be carried out in the same way.
After the encoding of the two components, a SID frame is formed with separate regions for the first and the second component. In other words, in the SID frame, this means that a first data area receives the data for the encoded first portion, while a second data area separate from it receives the data for the encoded second portion.

Ein wesentlicher Vorteil der Erfindung besteht darin, dass empfängerseitig bestimmt werden kann, ob ein Komfortrauschen auf Basis des breitbandigen Anteils der übertragenen SID-Rahmen oder auf Basis des schmalbandigen Anteils erfolgen soll. Dies ist von besonderem Vorteil für die empfängerseitige akustische Rezeption in einer Situation, in der die Übertragungsrate für Sprachinformationsrahmen verringert wurde, dass nur noch schmalbandige Sprachinformationen übertragen werden. Wird nämlich, wie im derzeitigen Stand der Technik, schmalbandige Sprachinformationen in Verbindung mit breitbandigen Rauschen synthetisiert, ist dies für den Empfänger sehr irritierend. Die besagte Verringerung der Übertragungsrate für Sprachinformationsrahmen kann zum Beispiel durch eine hohe Auslastung (Congestion) des Netzwerks zwischen Sender und Empfänger verursacht sein. Die wesentlich kleineren SID-Rahmen sind von einem solchen Netzwerkengpass nicht betroffen. Für sie besteht also weder ein Zwang zur Reduzierung ihrer Datenübertragungsrate noch ihres Inhalts.An essential advantage of the invention is that it can be determined on the receiver side whether comfort noise should be based on the broadband portion of the transmitted SID frames or on the basis of the narrowband portion. This is of particular advantage for the receiver-side acoustic reception in a situation where the transmission rate for speech information frames has been reduced so that only narrowband speech information is transmitted. Namely, as synthesized in the current state of the art, narrowband speech information in conjunction with broadband noise, this is very irritating for the receiver. The said reduction of the transmission rate for speech information frames can be caused, for example, by a high congestion of the network between transmitter and receiver. The much smaller SID frames are not affected by such a network bottleneck. For them, there is no compulsion to reduce their data transfer rate or their content.

Vorteilhafte Weiterbildungen der Erfindung sind in den Unteransprüchen angegeben.Advantageous developments of the invention are specified in the subclaims.

Gemäß einer ersten vorteilhaften Ausgestaltung der Erfindung ist ein dritter Anteil in der Definition des SID-Rahmens vorgesehen. Dieser enthält enkodierte Hintergrundrauschparameter, welche mit einer erhöhten Datenrate enkodiert sind, wenngleich der dritte Anteil immer noch schmalbandige Daten (erweiterte schmalbandige Daten bzw. »Enhanced Low Band«) enthält. Der Vorteil einer Definition des SID-Rahmens mit diesem dritten Anteil besteht in einer Möglichkeit, ein Rauschsignal in einer im Vergleich zur herkömmlichen schmalbandigen Kodierungsweise gesteigerten Qualität wiederzugeben und dabei noch in Konformität zum Standard G.729.B zu bleiben.According to a first advantageous embodiment of the invention, a third portion is provided in the definition of the SID frame. This contains encoded background noise parameters, which are encoded with an increased data rate, although the third component still contains narrowband data (extended narrowband data or "enhanced low band"). The advantage of defining the SID frame with this third component is the ability to reproduce a noise signal in a quality enhanced in comparison to conventional narrowband coding while remaining in compliance with the G.729.B standard.

Ein Ausführungsbeispiel mit weiteren Vorteilen und Ausgestaltungen der Erfindung wird im Folgenden anhand der Zeichnung näher erläutert.An embodiment with further advantages and embodiments of the invention will be explained in more detail below with reference to the drawing.

Dabei zeigt die einzige FIG eine Struktur eines erfindungsgemäßen SID-Rahmens.The single FIGURE shows a structure of a SID frame according to the invention.

Im Folgenden wird der der Erfindung zugrundeliegende technische Hintergrund, zunächst ohne Bezugnahme auf die Zeichnung, näher beschrieben.In the following, the technical background underlying the invention, first without reference to the drawings, described in more detail.

In gegenwärtigen skalierbaren Kodierungsverfahren für breitbandige Sprach-Codecs implementierte Verfahren zur diskontinuierlichen Übertragung (DTX) unterstützen für die Übertragung der Hintergrundrauschinformation derzeit nicht den skalierbaren Charakter, welcher für die Übertragung der Sprachinformation vorgesehen ist.At present scalable coding methods for broadband speech codecs, discontinuous transmission (DTX) methods for the transmission of background noise information currently do not support the scalable character intended for the transmission of the speech information.

Als derzeitige Umgehungslösung erfolgt eine Enkodierung entweder über die gesamte Bandbreite des Eingangsrauschsignals oder über einen Ausschnitt aus der Bandbreite des Eingangsrauschsignals. Aus diesem Grund besteht ein Bedarf für verbesserte Verfahren.As a current bypass solution, an encoding either over the entire bandwidth of the input noise signal or over a portion of the bandwidth of the input noise signal. For this reason, there is a need for improved methods.

In der Vergangenheit wurden hauptsächlich zwei Typen von Sprachcodecs entwickelt, einerseits schmalbandige Sprachcodecs wie z.B. 3GPP AMR, ITU-T G.729 und andererseits breitbandige Sprachcodecs, wie z.B. 3GPP AMR-WB, ITU-T G.722. Ein schmalbandiger Sprachcodec enkodiert Sprachsignale mit einer Abtastfrequenz von 8 kHz mit einer Bandbreite welche üblicherweise im Frequenzbereich zwischen 300 und 3400 Hz liegt. Ein breitbandiger Sprachcodec enkodiert ein Sprachsignal mit einer Abtastfrequenz von 16 kHz bei einer Bandbreite in einem Frequenzbereich zwischen 50 und 7000 Hz.In the past, mainly two types of speech codecs have been developed, on the one hand narrow-band speech codecs such as e.g. 3GPP AMR, ITU-T G.729 and on the other hand broadband speech codecs, e.g. 3GPP AMR-WB, ITU-T G.722. A narrow-band speech codec encodes speech signals at a sampling frequency of 8 kHz with a bandwidth which is usually in the frequency range between 300 and 3400 Hz. A wideband speech codec encodes a speech signal having a sampling frequency of 16 kHz at a bandwidth in a frequency range between 50 and 7000 Hz.

Einige dieser Codecs verwenden DTX-Verfahren, also diskontinuierliche Übertragungsverfahren, um die Gesamtübertragungsrate im Kommunikationskanal zu reduzieren. Gemäß dem DTX-verfahren werden SID-Rahmen gesendet, wobei die Bandbreite der SID-Rahmen mit der Bandbreite des Sprachsignals korrespondiert. In einem SID-Rahmen wird das Hintergrundrauschen während einer Sprechpause beschrieben.Some of these codecs use DTX techniques, that is, discontinuous transmission techniques to reduce the overall transmission rate in the communication channel. According to the DTX method, SID frames are transmitted, with the bandwidth of the SID frames corresponding to the bandwidth of the voice signal. In a SID frame the background noise during a speech break is described.

Derzeit in der Entwicklung stehende Codecs konzentrieren sich auf eine skalierbare Kodierung. Mit Hilfe einer skalierbaren Ansatzes wird erreicht, dass das Ergebnis des Enkodiervorgangs verschiedene Blöcke enthält, welche den schmalbandigen Anteil des ursprünglichen Sprachsignals enthalten, den breitbandigen Anteil oder auch die volle Bandbreite des Sprachsignals enthalten, also z.B. einen Frequenzbereich zwischen 50 und 7000 Hz. Der breitbandige Anteil beginnt üblicherweise ab einer Frequenz von 4 kHz.Currently under development codecs focus on scalable coding. With the help of a scalable approach, it is achieved that the result of the encoding process contains different blocks which contain the narrow-band component of the original speech signal, the broadband component or also the full bandwidth of the speech signal, eg a frequency range between 50 and 7000 Hz Proportion usually starts at a frequency of 4 kHz.

Die gegenwärtigen DTX-Verfahren unterstützen derzeit nicht den skalierbaren Charakter von Codecs. Stattdessen erfolgt eine Kodierung entweder über die gesamte Bandbreite des Eingangsprachsignals oder über einen Ausschnitt aus der Bandbreite des Eingangssignals. Aus diesem Grund besteht ein Bedarf für verbesserte Verfahren.The current DTX methods do not currently support the scalable nature of codecs. Instead, coding takes place either over the entire bandwidth of the input speech signal or over a section of the bandwidth of the input signal. For this reason, there is a need for improved methods.

Zur Verdeutlichung wird im Folgenden das Enkodierverfahren gemäß ITU-T-Standards G.729.1 beschrieben. Bei diesem Codec G.729.1 handelt es sich um einen skalierbaren Sprachcodec, in welchem das DTX-Verfahren derzeit nicht skalierbar über die gesamte Bandbreite angewandt wird.For clarity, the encoding method according to ITU-T standards G.729.1 will be described below. This codec G.729.1 is a scalable speech codec in which the DTX method is currently not scalable across the entire bandwidth.

Das Codierverfahren lässt sich während einer aktiven Sprachperiode - in Abgrenzung zu einer als »Silent Period« erkannten Sprechpause - wie folgt charakterisieren:
Das Sprachsignal wird in zwei Anteile, nämlich einen schmalbandigen (Lowband) Teil und einen breitbandigen (Highband) Anteil zerlegt. Beide Signale sind mit einer Abtastfrequenz von 8 kHz abgetastet. Die Aufteilung in einen schmalbandigen und einen breitbandigen Anteil erfolgt in einem speziellen Bandpassfilter, welcher auch als QMF (Quadrature Mirror Filter) bezeichnet wird.
The coding method can be characterized as follows during an active speech period, in contrast to a speech pause recognized as »Silent Period«:
The speech signal is split into two parts, namely a narrowband (lowband) part and a broadband (highband) part. Both signals are sampled at a sampling frequency of 8 kHz. The division into a narrowband and a broadband component takes place in a special bandpass filter, which is also referred to as QMF (Quadrature Mirror Filter).

Der schmalbandige Anteil des Sprachsignals wird mit einer Datenrate von 8 und 12 kbit/s enkodiert. Zur Enkodierung des Sprachsignals wird ein CELP-Verfahren (Code Excited Linear Prediction) angewandt. Für Datenraten oberhalb von 14 kbit/s wird der schmalbandige Anteil weiter unter Berücksichtigung des »Transform Codec«-Abschnitts von G.729.1 modifiziert. Der breitbandige Anteil des aktuellen Rahmens - wiederum unter der Voraussetzung, dass dieser Sprachsignale enthält - wird mit einer Datenrate von 14 kbit/s unter Anwendung des TDBWE-Verfahrens (Time Domain Bandwidth Extension) enkodiert. Für Datenraten von über 14 kbit/s wird der »Transform Codec«-Abschnitt von G.729.1 angewandt.The narrowband portion of the speech signal is encoded at a data rate of 8 and 12 kbit / s. To encode the speech signal, a CELP (Code Excited Linear Prediction) method is used. For data rates above 14 kbit / s, the narrowband component is further modified taking into account the »Transform Codec« section of G.729.1. The broadband portion of the current frame, again assuming it contains voice signals, is encoded at a data rate of 14 kbit / s using the TDBWE (Time Domain Bandwidth Extension) method. For Data rates above 14 kbit / s are used in the »Transform Codec« section of G.729.1.

Da der Standard G.729.1 keine Verfahren zur diskontinuierlichen Übertragung bereitstellt, wird in Sprechpausen bzw. »non active voice periods« eine Umgehungslösung angewandt, welche im Folgenden beschrieben wird.Since the G.729.1 standard does not provide a discontinuous transmission method, a bypass solution is used during pauses or "non active voice periods", which is described below.

Das Sprachsignal wird ebenfalls in einen schmalbandigen und einen breitbandigen Anteil zerlegt, wobei beide Anteile mit einer Frequenz von 8 kHz abgetastet werden. Die Zerlegung erfolgt ebenfalls über ein QMF-Filter.The speech signal is also split into a narrowband and a broadband component, with both components sampled at a frequency of 8 kHz. The decomposition also takes place via a QMF filter.

Der schmalbandige Anteil wird unter Verwendung einer schmalbandigen SID-Information enkodiert. Diese schmalbandige SID-Information wird zu einem späteren Zeitpunkt in einem SID-Rahmen, welcher kompatibel zum Standard G.729 ist, an den Empfänger gesandt. Weitere wie oben beschriebene Maßnahmen können zu einer Verbesserung des schmalbandigen SID-Anteils beitragen.The narrowband portion is encoded using narrow-band SID information. This narrowband SID information is sent to the receiver at a later time in a SID frame compatible with the G.729 standard. Further measures as described above can contribute to an improvement of the narrowband SID component.

Der breitbandige Anteil wird unter Anwendung eines modifizierten TDBWE-Verfahrens enkodiert. Während einer sog. Überhangperiode (Hangover Period) wird das Sprachsignal weiterhin mit einer Datenrate von 14 kbit/s enkodiert, während gleichzeitig das während der Sprechpause erkannte Hintergrundrauschen ausgewertet und entsprechende Parameter eingestellt werden. Die Auswertung des Hintergrundrauschens erfolgt hinsichtlich der Energie des Rauschsignals und hinsichtlich seiner Frequenzverteilung. Im Gegensatz zu dem vom Standard G.729.1 vorgesehenen TDBWE-Verfahren wird jedoch die zeitliche Feinstruktur nicht ausgewertet, sondern lediglich ein Durchschnitt der Energie über den Rahmen gebildet.The broadband component is encoded using a modified TDBWE method. During a so-called hangover period, the speech signal is further encoded at a data rate of 14 kbit / s, while at the same time the background noise detected during the speech pause is evaluated and corresponding parameters are set. The background noise is evaluated with regard to the energy of the noise signal and its frequency distribution. In contrast to the TDBWE method provided by the standard G.729.1, however, the temporal fine structure is not evaluated, but merely an average of the energy is formed over the frame.

Im Folgenden wird eine Ausführungsform des erfindungsgemäßen Verfahrens anhand der FIG erläutert.In the following, an embodiment of the method according to the invention will be explained with reference to the FIG.

Die FIG zeigt einen SID-Rahmen mit getrennten Bereichen für einen schmalbandigen ersten Anteil LB (»Low Band«), einen breitbandigen zweiten Anteil HB (»High Band«) und einen itermediären dritten Anteil ELB (»Enhanced Low Band«).The FIG shows a SID frame with separate areas for a narrowband first portion LB ("Low Band"), a broadband second portion HB ("High Band") and an itermediären third portion ELB ("Enhanced Low Band").

Der erste Anteil LB enthält dabei enkodierte Hintergrundrauschparameter, welche mit einer Datenrate von 8 kbit/s oder darunter enkodiert sind. Die Datenlänge des ersten Anteils LB beträgt beispielsweise 15 Bit.The first component LB contains encoded background noise parameters, which are encoded at a data rate of 8 kbit / s or below. The data length of the first component LB is, for example, 15 bits.

Der zweite Anteil HB enthält enkodierte Hintergrundrauschparameter, welche mit einer Datenrate zwischen 14 kbit/s und 32 kbit/s enkodiert sind. Die Datenlänge des zweiten Anteils HB beträgt beispielsweise 19 Bit.The second component HB contains encoded background noise parameters, which are encoded with a data rate between 14 kbit / s and 32 kbit / s. The data length of the second component HB is for example 19 bits.

Der dritte Anteil ELB enthält enkodierte Hintergrundrauschparameter, welche mit einer Datenrate von größer als 8kbit/s also beispielsweise 12 kbit/s enkodiert sind. Die Datenlänge des dritten Anteils ELB beträgt beispielsweise 9 Bit. Der Vorteil einer Definition des SID-Rahmens mit einem dritten Anteil ELB besteht in einer Möglichkeit, ein Rauschsignal in einer im Vergleich zur herkömmlichen schmalbandigen Kodierungsweise gesteigerten Qualität wiederzugeben und dabei noch in Konformität zum Standard G.729.B zu bleiben.The third component ELB contains encoded background noise parameters, which are encoded with a data rate of greater than 8 kbit / s, for example 12 kbit / s. The data length of the third component ELB is 9 bits, for example. The advantage of defining the SID frame with a third portion ELB is one of the ability to render a noise signal in a quality enhanced in comparison to conventional narrowband coding while remaining in compliance with the G.729.B standard.

Während einer Sprechpause werden auf Seiten des Enkoders Charakteristika des Hintergrundrauschens angelernt. Die Charakteristika umfassen insbesondere die zeitliche Verteilung als auch die spektrale Form des Hintergrundrauschens. Für den Anlernvorgang wird ein Filterverfahren angewandt, welches zeitliche und spektrale Parameter des Hintergrundrauschens aus vorangegangenen Rahmen berücksichtigt. Ergeben sich signifikante Änderungen im Charakter oder in der Stärke des Hintergrundrauschens, wird eine Entscheidung auf Basis von Grenzwertparametern (Threshold Values) getroffen, ob ein Bedarf besteht, die angelernten Parameter zu aktualisieren.During a speech break characteristics of the background noise are learned on the part of the encoder. The characteristics include in particular the temporal distribution as well as the spectral form of the background noise. For the learning process, a filtering method is used, which takes into account temporal and spectral parameters of background noise from previous frames. If there are significant changes in the character or magnitude of background noise, a decision is made based on threshold values as to whether there is a need to update the learned parameters.

Auf Seiten des Dekoders bzw. Empfängers wird folgendes Verfahren durchgeführt: Wenn ein »regulärer«, d.h. ein sprachsignalenthaltender Rahmen empfangen wird, wird die übliche Dekodierung ausgeführt. Die Datenrate für solche regulären Rahmen beträgt üblicherweise 8 kbit/s oder darüber. Wenn ein SID-Rahmen empfangen wird, wird Komfortrauschen synthetisiert, wobei im Falle eines breitbandigen SID ein breitbandiges Komfortrauschen synthetisiert und mit einem ausgelesenen Verstärkungsfaktor ausgegeben wird.On the part of the decoder or receiver, the following procedure is carried out: When a "regular", ie a speech signal containing, frame is received, the usual decoding is carried out. The data rate for such regular frames is usually 8 kbit / s or above. When a SID frame is received, comfort noise is synthesized, and in the case of a wideband SID, a broadband comfort noise is synthesized and output with a read-out gain.

Im Folgenden wird das erfindungsgemäße Verfahren mit weiteren Ausgestaltungen der Erfindung beschrieben.In the following, the method according to the invention will be described with further embodiments of the invention.

Die Ausgestaltungen betreffen weitere Details zur Einbeziehung des DTX-Verfahrens in breitbandige Codecs wie z.B. G.729.1 und weiterhin Verfahren zur Modifizierung des TDBWE-Verfahrens, welche eine Synthetisierung von Komfortrauschen während nicht-aktiver Rahmen (Non Active Frames), d.h. Rahmen ohne Sprachinformation, unterstützen.The embodiments relate to further details for incorporating the DTX method into broadband codecs such as e.g. G.729.1, and further methods for modifying the TDBWE method, which include synthesizing comfort noise during non-active frames, i. Frames without language information, support.

Gemäß einer Ausgestaltung ist folgendes Vorgehen vorgesehen.

  • Produzieren einer schmalbandigen SID-Information zur Erzeugung eines G.729- bzw. G.729.B- kompatiblen SID-Rahmens (erster Anteil LB des erfindungsgemäßen SID-Rahmens)
  • Produzieren einer breitbandigen SID-Information unter Verwendung eines modifizierten TDBWE-Verfahrens (zweiter Anteil HB des erfindungsgemäßen SID-Rahmens)
  • Optional werden Verbesserungen bezüglich der schmalbandigen und/oder der breitbandigen SID-Informationen vorgenommen.
  • Das Hintergrundrauschen wird während einer Phase, welche einem Senden der ersten SID-Rahmen vorausgeht, bezüglich der Energie- und/oder Frequenzverteilung analysiert bzw. »angelernt«.
  • SID-Rahmen werden gesendet, wenn eine signifikante Änderung des breitbandigen Anteils des Hintergrundrauschens detektiert wird oder wenn eine Aktualisierung der schmalbandigen SID-Informationen gesendet werden soll.
According to one embodiment, the following procedure is provided.
  • Produce narrowband SID information to generate a G.729 or G.729.B compatible SID frame (first portion LB of the SID frame of the present invention)
  • Produce wideband SID information using a modified TDBWE method (second part HB of the SID frame according to the invention)
  • Optionally, improvements are made to the narrowband and / or broadband SID information.
  • Background noise is analyzed or "learned" during a phase preceding transmission of the first SID frames with respect to energy and / or frequency distribution.
  • SID frames are sent when a significant change in the broadband portion of the background noise is detected or when an update of the narrowband SID information is to be sent.

Eine Implementierung dieses Ausführungsbeispiels erfolgt in folgenden Phasen:

  • Mit Hilfe eines VAD-Verfahrens wird eine aktive Sprachphase bzw. eine Sprechpause definiert.
  • Wird durch das VAD-Verfahren ein Wechsel in eine Sprechpause angezeigt, wird eine Überhangperiode (Hang Over Period) gestartet. Während der Überhangperiode wird die Datenrate des Enkodierers auf 14kbit/s reduziert, wenn die vorhergehende Datenrate einen höheren Wert aufgewiesen hat. Für den Fall dass die vorhergehende Datenrate des Enkodierers bereits Werte um 12 kbit/s betragen hat, wird die Datenrate auf einen Wert von 8 kbit/s reduziert.
  • Während der Überhangperiode wird das Hintergrundrauschen bezüglich des schmalbandigen Anteils in analoger Weise zum Vorgehen in Standard G.729 angelernt, jedoch unter Verwendung einer höheren Anzahl von Rahmen. Hierbei kann optional ein Filterverfahren angewandt werden, durch welches erreicht wird, dass aktuellen Rahmen eine höhere Wichtigkeit zugeordnet wird als vorausgegangenen Rahmen.
  • Während der Überhangperiode wird das Hintergrundrauschen darüber hinaus im breitbandigen Anteil angelernt. Optional wird für eine Vereinfachung der Implementierung, insbesondere zur Reduzierung des Speicherplatzbedarfs, ein modifiziertes TDBWE-Verfahren eingesetzt, welche durch eine vereinfachte Enkodierung im Zeitbereich gekennzeichnet ist. Optional kann eine weitere Vereinfachung im modifizierten TDBWE-Verfahren dadurch erreicht werden, dass die Enkodierung im Zeitbereich nur mit der Energie des Signals im Zeitbereich korrespondiert. Eine weitere optionale vereinfachte Enkodierung besteht darin, spektrale Glättungsverfahren anzuwenden, da die Energie im Zeitbereich und im Frequenzbereich als Folge des Parsevaltheorems gleiche Werte liefert. Auch im breitbandigen Anteil des Hintergrundrauschens können optional weitere Filterungsmaßnahmen angewandt werden, welche das Ziel haben, aktuellen Rahmen eine höhere Wichtigkeit als vorausgegangenen Rahmen zuzuordnen.
  • Nach Beendigung der Überhangperiode wird ein erster SID-Rahmen gesendet, welcher eine grobe Repräsentierung des Hintergrundrauschens enthält. Die grobe Beschreibung des Hintergrundrauschens wurde während der Überhangperiode angelernt.
  • Solange durch die VAD keine aktive Phase (Sprechen) detektiert wurde, wird auf Seiten des Dekoders bzw. Empfängers ein Komfortrauschen auf Basis der empfangenen SID-Rahmen synthetisiert.
  • Änderungen des Hintergrundrauschens werden im schmalbandigen Anteil des SID-Rahmens detektiert, wobei ein ähnliches Verfahren zu G.729 verfolgt wird, wenngleich verschiedene Parameter berücksichtigt werden.
  • Im breitbandigen Anteil werden gefilterte Energieparameter zur Beschreibung des Hintergrundrauschens benutzt. Diese umfassen z.B. Parameter von Einhüllkurven im Zeitbereich tenv_fidx und/oder Parameter von Einhüllkurven im Frequenzbereich fenv_fidx[i], wobei ein jeweiliger Index idx einen jeweiligen Rahmen identifiziert und wobei die Einhüllkurve im Frequenzbereich von einer geeigneten Anzahl von Frequenzwerten i={1,..., NB-SUBBANDS} zur Beschreibung der spektralen Eigenschaften des Hintergrundrauschens gebildet wird. Die gefilterten Energieparameter werden von den in G.729.1 definierten TDBWE-Parameter abgeleitet unter Verwendung geeigneter Tiefpassfilter: tenv _ f idx = α tenv ten v idx + 1 α tenv tenv _ f idx 1
    Figure imgb0001
    fenv _ f idx i = α tenv fen v idx i + 1 α tenv fenv _ f idx 1 i ,
    Figure imgb0002
    welche auf die Einhüllparameter im Frequenz- und im Zeitbereich entsprechend angewandt werden.
  • Änderungen im breitbandigen Anteil der Energieparameter werden überwacht und detektiert, indem die gefilterten Energieparameter des gegenwärtigen Rauschsignals verglichen werden mit zwei Sätzen aus Vergleichswerten dieser Parameter, wobei ein Satz von Vergleichswerten die Parameter aus dem vorangegangenem Rahmen mit dem Index idx-1 ist: temp _ d = 20 log 2 log 10 | tenv _ f idx tenv _ f idx 1 |
    Figure imgb0003
    spec _ d = 20. log 2 log 10 1 NB _ SUBBANDS i = 1 NB _ SUBBANDS | fenv _ f idx i fenv _ f idx 1 i |
    Figure imgb0004
    und wobei der andere Satz aus Parametern des zuletzt übertragenen Rahmens mit dem Index last_tx besteht. Wenn einer der Parameterunterschiede (temp_d, spec_d, temp_ch, spec_ch) einen geeignet gewählten Grenzwert überschreitet: temp _ ch = 20 log 2 log 10 | tenv _ f idx tenv _ f last _ tx |
    Figure imgb0005
    spec _ ch = 20. log 2 log 10 1 NB _ SUBBANDS i = 1 NB _ SUBBANDS | fenv _ f idx i fenv _ f last _ tx i |
    Figure imgb0006
    muss ein neuer SID-Update-Rahmen gesendet werden.
  • Sobald durch die VAD eine Sprachperiode erkannt wird, wird das Sprachsignal mit der benötigten Übertragungsrate übertragen und die Synthetisierung von Komfortrauschen auf der Dekoderseite beendet. Somit stellt sich ein regulärer Dekodierungsbetrieb ein wie in G.729.1.
An implementation of this embodiment takes place in the following phases:
  • A VAD procedure is used to define an active speech phase or pause.
  • If a change in a speech pause is indicated by the VAD method, a hang over period is started. During the overhang period, the data rate of the encoder is reduced to 14kbit / s if the previous data rate has a higher value. In the event that the previous data rate of the encoder has already been values around 12 kbit / s, the data rate is reduced to a value of 8 kbit / s.
  • During the hangover period, the background noise is learned in terms of the narrowband component in an analogous manner to the procedure in standard G.729, but using a higher number of frames. In this case, a filtering method can be optionally applied by which it is achieved that the current frame is assigned a higher importance than the previous frame.
  • During the overhang period, the background noise is also learned in the broadband portion. Optionally, a modified TDBWE method is used to simplify the implementation, in particular to reduce the storage space requirement, which is characterized by a simplified encoding in the time domain. Optionally, a further simplification in the modified TDBWE method can be achieved in that the encoding in the time domain only corresponds to the energy of the signal in the time domain. Another optional simplified encoding is to use spectral smoothing techniques because the energy in the time domain and in the frequency domain gives equal values as a result of the parsevalt theorem. Also in the broadband portion of the background noise, optionally further filtering measures can be applied which have the goal of assigning a higher importance to current frames than previous frames.
  • Upon completion of the hangover period, a first SID frame is sent containing a rough representation of the background noise. The rough description of the background noise was learned during the overhang period.
  • As long as no active phase (speech) has been detected by the VAD, comfort noise is synthesized on the decoder / receiver side based on the received SID frames.
  • Changes in background noise are detected in the narrowband portion of the SID frame, with a similar approach to G.729 being followed, although different parameters are taken into account.
  • In the broadband portion, filtered energy parameters are used to describe the background noise. These include, for example, parameters of envelopes in the time domain tenv_fidx and / or parameters of envelopes in the frequency range fenv_fidx [i], where a respective index idx identifies a respective frame and wherein the envelope in the frequency domain is of a suitable number of frequency values i = {1, .. ., NB-SUBBANDS} to describe the spectral characteristics of the background noise. The filtered energy parameters are derived from the TDBWE parameters defined in G.729.1 using suitable low-pass filters: TEN-T _ f idx = α TEN-T th v idx + 1 - α TEN-T TEN-T _ f idx - 1
    Figure imgb0001
    FENV _ f idx i = α TEN-T fen v idx i + 1 - α TEN-T FENV _ f idx - 1 i .
    Figure imgb0002
    which are applied to the envelope parameters in the frequency and in the time domain accordingly.
  • Changes in the broadband component of the energy parameters are monitored and detected by comparing the filtered energy parameters of the current noise signal with two sets of comparison values of these parameters, one set of comparison values being the parameters from the previous frame with the index idx-1 is: temp _ d = 20 log 2 log 10 | TEN-T _ f idx - TEN-T _ f idx - 1 |
    Figure imgb0003
    spec _ d = 20th log 2 log 10 1 NB _ subband Σ i = 1 NB _ subband | FENV _ f idx i - FENV _ f idx - 1 i |
    Figure imgb0004
    and wherein the other set consists of parameters of the last transmitted frame with the index last_tx. If one of the parameter differences (temp_d, spec_d, temp_ch, spec_ch) exceeds a suitably selected limit: temp _ ch = 20 log 2 log 10 | TEN-T _ f idx - TEN-T _ f load _ tx |
    Figure imgb0005
    spec _ ch = 20th log 2 log 10 1 NB _ subband Σ i = 1 NB _ subband | FENV _ f idx i - FENV _ f load _ tx i |
    Figure imgb0006
    a new SID update frame must be sent.
  • As soon as a voice period is detected by the VAD, the voice signal is transmitted at the required transmission rate and the synthesizing of comfort noise on the decoder side is terminated. Thus, a regular decoding mode sets in as in G.729.1.

Claims (10)

  1. A method for encoding a SID frame (SID) for a transmission of background noise information by using a scalable speech signal coding method with the following steps:
    Encoding a narrowband first portion (LB) and a broadband second portion (HB) of the background noise information;
    forming the SID frame (SID) with separated domains for the first (LB) and the second (HB) portion;
    providing the scalability for the analogous transmission of speech information when forming the SID frame (SID), such that it is determined on the receiver side whether a comfort noise is based on the broadband second portion (HB) of the transmitted SID frame (SID) or based on the narrowband first portion (LB), wherein,
    when broadband speech information is transmitted, the comfort noise is based on the broadband second portion (HB) of the transmitted SID frame (SID) and when narrowband speech information is transmitted, the comfort noise is based on the narrowband first portion (HB) of the transmitted SID frame (SID) ;
    wherein a narrowband third portion (ELB) is encoded and the SID frame (SID) is formed with an additional separated domain for the third portion (ELB);
    wherein the narrowband third portion (ELB) is rendered in an enhanced quality compared to the narrowband first portion due to an increased data rate of the encoding of temporal and spectral parameters of the background noise and in conformity with standard G.729.B;
    wherein the first portion (LB) of the background noise information is encoded according to coding guidelines of standard G.729.B;
    characterized in that
    the second portion (HB) of the background noise information is encoded according to a modified TDBWE method, wherein a simplification of the modified TDBWE method is achieved in that an encoding in the time domain only corresponds with the energy of the signal in the time domain.
  2. The method according to one of the preceding claims, characterized in that during a speech pause on the part of an encoder, characteristics of a background noise are analyzed and taken into account, i.e. learned, wherein the characteristics in particular comprise the temporal distribution and the spectral form of the background noise.
  3. The method according to claim 2, characterized in that a filter method is used for the learning process, the method taking into account temporal and spectral parameters of the background noise from preceding frames.
  4. The method according to claim 3, characterized in that, if there are significant changes of the character or of the intensity of the background noise, a decision of whether there is a need to update the learned parameters is made on the basis of threshold value parameters.
  5. The method according to claim 4, characterized in that SID frames (SID) are sent when a significant change of the broadband second portion (HB) of the background noise is detected or when an update of the narrowband first portion (LB) is to be sent.
  6. The method according to one of the preceding claims, characterized in that, during a transition period which starts after a switchover from an active speech phase to a speech pause for learning the background noise, filter methods for assigning a higher priority to a current frame than to preceding frames are used.
  7. The method according to one of the preceding claims, characterized in that, in the broadband second portion (HB), filtered energy parameters for describing a background noise are used, comprising parameters of enveloping curves in the time domain (tenv_fidx) and/or parameters of enveloping curves in the frequency domain (fenv_fidx[i]).
  8. The method according to claim 7, characterized in that a respective index (idx) identifies a respective frame, wherein the enveloping curve in the frequency domain is formed by a suitable number of frequency values (i={1,..., NB-SUBBANDS}) for describing the spectral characteristics of the background noise.
  9. A codec with means for performing the method according to one of claims 1 to 8.
  10. The codec according to claim 9, characterized by an implementation of standard ITU-T G.729.1 which is known per se.
EP09711908.5A 2008-02-19 2009-02-02 Method and means for encoding background noise information Active EP2245621B1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
DE102008009719A DE102008009719A1 (en) 2008-02-19 2008-02-19 Method and means for encoding background noise information
PCT/EP2009/051118 WO2009103608A1 (en) 2008-02-19 2009-02-02 Method and means for encoding background noise information

Publications (2)

Publication Number Publication Date
EP2245621A1 EP2245621A1 (en) 2010-11-03
EP2245621B1 true EP2245621B1 (en) 2019-05-01

Family

ID=40652248

Family Applications (1)

Application Number Title Priority Date Filing Date
EP09711908.5A Active EP2245621B1 (en) 2008-02-19 2009-02-02 Method and means for encoding background noise information

Country Status (8)

Country Link
US (2) US20100318352A1 (en)
EP (1) EP2245621B1 (en)
JP (1) JP5361909B2 (en)
KR (2) KR101364983B1 (en)
CN (1) CN101952886B (en)
DE (1) DE102008009719A1 (en)
RU (1) RU2461080C2 (en)
WO (1) WO2009103608A1 (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101483495B (en) 2008-03-20 2012-02-15 华为技术有限公司 Background noise generation method and noise processing apparatus
CN103187065B (en) 2011-12-30 2015-12-16 华为技术有限公司 The disposal route of voice data, device and system
SG11201504899XA (en) 2012-12-21 2015-07-30 Fraunhofer Ges Forschung Comfort noise addition for modeling background noise at low bit-rates
ES2588156T3 (en) * 2012-12-21 2016-10-31 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Comfort noise generation with high spectrum-time resolution in discontinuous transmission of audio signals
CA2899134C (en) * 2013-01-29 2019-07-30 Frederik Nagel Decoder for generating a frequency enhanced audio signal, method of decoding, encoder for generating an encoded signal and method of encoding using compact selection side information
CN106169297B (en) * 2013-05-30 2019-04-19 华为技术有限公司 Coding method and equipment
BR112015031606B1 (en) * 2013-06-21 2021-12-14 Fraunhofer-Gesellschaft Zur Forderung Der Angewandten Forschung E.V. DEVICE AND METHOD FOR IMPROVED SIGNAL FADING IN DIFFERENT DOMAINS DURING ERROR HIDING
JP6035270B2 (en) * 2014-03-24 2016-11-30 株式会社Nttドコモ Speech decoding apparatus, speech encoding apparatus, speech decoding method, speech encoding method, speech decoding program, and speech encoding program
EP2980790A1 (en) 2014-07-28 2016-02-03 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for comfort noise generation mode selection
KR101701623B1 (en) * 2015-07-09 2017-02-13 라인 가부시키가이샤 System and method for concealing bandwidth reduction for voice call of voice-over internet protocol
US10978096B2 (en) * 2017-04-25 2021-04-13 Qualcomm Incorporated Optimized uplink operation for voice over long-term evolution (VoLte) and voice over new radio (VoNR) listen or silent periods

Family Cites Families (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FI105001B (en) * 1995-06-30 2000-05-15 Nokia Mobile Phones Ltd Method for Determining Wait Time in Speech Decoder in Continuous Transmission and Speech Decoder and Transceiver
US5960389A (en) 1996-11-15 1999-09-28 Nokia Mobile Phones Limited Methods for generating comfort noise during discontinuous transmission
RU2237296C2 (en) * 1998-11-23 2004-09-27 Телефонактиеболагет Лм Эрикссон (Пабл) Method for encoding speech with function for altering comfort noise for increasing reproduction precision
US7124079B1 (en) * 1998-11-23 2006-10-17 Telefonaktiebolaget Lm Ericsson (Publ) Speech coding with comfort noise variability feature for increased fidelity
US6424938B1 (en) * 1998-11-23 2002-07-23 Telefonaktiebolaget L M Ericsson Complex signal activity detection for improved speech/noise classification of an audio signal
US6397177B1 (en) * 1999-03-10 2002-05-28 Samsung Electronics, Co., Ltd. Speech-encoding rate decision apparatus and method in a variable rate
CA2290037A1 (en) * 1999-11-18 2001-05-18 Voiceage Corporation Gain-smoothing amplifier device and method in codecs for wideband speech and audio signals
JP3761795B2 (en) * 2000-04-10 2006-03-29 三菱電機株式会社 Digital line multiplexer
US6889187B2 (en) * 2000-12-28 2005-05-03 Nortel Networks Limited Method and apparatus for improved voice activity detection in a packet voice network
US20030120484A1 (en) * 2001-06-12 2003-06-26 David Wong Method and system for generating colored comfort noise in the absence of silence insertion description packets
US20030112758A1 (en) * 2001-12-03 2003-06-19 Pang Jon Laurent Methods and systems for managing variable delays in packet transmission
EP1808852A1 (en) * 2002-10-11 2007-07-18 Nokia Corporation Method of interoperation between adaptive multi-rate wideband (AMR-WB) and multi-mode variable bit-rate wideband (VMR-WB) codecs
CA2501368C (en) * 2002-10-11 2013-06-25 Nokia Corporation Methods and devices for source controlled variable bit-rate wideband speech coding
US7391768B1 (en) * 2003-05-13 2008-06-24 Cisco Technology, Inc. IPv4-IPv6 FTP application level gateway
US20050004793A1 (en) * 2003-07-03 2005-01-06 Pasi Ojala Signal adaptation for higher band coding in a codec utilizing band split coding
ES2634511T3 (en) * 2004-07-23 2017-09-28 Iii Holdings 12, Llc Audio coding apparatus and audio coding procedure
US20060149536A1 (en) * 2004-12-30 2006-07-06 Dunling Li SID frame update using SID prediction error
JP4806418B2 (en) * 2005-01-10 2011-11-02 クォーティックス インク Integrated architecture for visual media integration
CN100592389C (en) * 2008-01-18 2010-02-24 华为技术有限公司 State updating method and apparatus of synthetic filter
US7693708B2 (en) * 2005-06-18 2010-04-06 Nokia Corporation System and method for adaptive transmission of comfort noise parameters during discontinuous speech transmission
US7546237B2 (en) * 2005-12-23 2009-06-09 Qnx Software Systems (Wavemakers), Inc. Bandwidth extension of narrowband speech
US8260609B2 (en) 2006-07-31 2012-09-04 Qualcomm Incorporated Systems, methods, and apparatus for wideband encoding and decoding of inactive frames
US8725499B2 (en) * 2006-07-31 2014-05-13 Qualcomm Incorporated Systems, methods, and apparatus for signal change detection
US7796626B2 (en) * 2006-09-26 2010-09-14 Nokia Corporation Supporting a decoding of frames
US8032359B2 (en) * 2007-02-14 2011-10-04 Mindspeed Technologies, Inc. Embedded silence and background noise compression
CN101246688B (en) * 2007-02-14 2011-01-12 华为技术有限公司 Method, system and device for coding and decoding ambient noise signal
EP2629293A3 (en) * 2007-11-02 2014-01-08 Huawei Technologies Co., Ltd. Method and apparatus for audio decoding
US8554550B2 (en) * 2008-01-28 2013-10-08 Qualcomm Incorporated Systems, methods, and apparatus for context processing using multi resolution analysis
CN101335000B (en) * 2008-03-26 2010-04-21 华为技术有限公司 Method and apparatus for encoding

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"CODING OF SPEECH AT 8 KBIT/S USING CONJUGATE STRUCTURE ALGEBRAIC-CODE-EXCITED LINEAR-PREDICTION (CS-ACELP). ANNEX B: A SILENCE COMPRESSION SCHEME FOR G.729 OPTIMIZED FOR TERMINALS CONFORMING TO RECOMMENDATION V.70", ITU-T RECOMMENDATION G.729, XX, XX, 1 November 1996 (1996-11-01), pages COMPLETE23, XP002259964 *
"G.729 based Embedded Variable bit-rate coder: An 8-32 kbit/s scalable wideband coder bitstream interoperable with G.729; G.729.1 (05/06)", ITU-T DRAFT STUDY PERIOD 2005-2008, INTERNATIONAL TELECOMMUNICATION UNION, GENEVA ; CH, no. G.729.1 (05/06), 29 May 2006 (2006-05-29), XP017404590 *

Also Published As

Publication number Publication date
WO2009103608A1 (en) 2009-08-27
KR20120089378A (en) 2012-08-09
KR20100120217A (en) 2010-11-12
US20160035360A1 (en) 2016-02-04
DE102008009719A1 (en) 2009-08-20
EP2245621A1 (en) 2010-11-03
US20100318352A1 (en) 2010-12-16
JP2011512563A (en) 2011-04-21
CN101952886A (en) 2011-01-19
CN101952886B (en) 2013-03-06
JP5361909B2 (en) 2013-12-04
RU2461080C2 (en) 2012-09-10
KR101364983B1 (en) 2014-02-20
RU2010138563A (en) 2012-04-10

Similar Documents

Publication Publication Date Title
EP2245621B1 (en) Method and means for encoding background noise information
DE60117471T2 (en) BROADBAND SIGNAL TRANSMISSION SYSTEM
AT405346B (en) METHOD FOR DERIVING THE AFTER-EFFECT PERIOD IN A VOICE DECODER FOR DISCONTINUOUS TRANSMISSION, AND VOICE ENCODER AND TRANSMITTER RECEIVER
DE69917677T2 (en) LANGUAGE CODING WITH ADJUSTABLE COMFORT NOISE FOR IMPROVED PLAYBACK QUALITY
EP3217583B1 (en) Decoder and method for decoding a sequence of packets
DE60319590T2 (en) METHOD FOR CODING AND DECODING AUDIO AT A VARIABLE RATE
EP1953739B1 (en) Method and device for reducing noise in a decoded signal
DE69730721T2 (en) METHOD AND DEVICES FOR NOISE CONDITIONING OF SIGNALS WHICH REPRESENT AUDIO INFORMATION IN COMPRESSED AND DIGITIZED FORM
WO2002063611A1 (en) Method and device for the generation or decoding of a scalable data stream with provision for a bit-store, encoder and scalable encoder
DE60224005T2 (en) METHOD AND DEVICE FOR PROCESSING MULTIPLE AUDIOBIT STREAMS
EP2245620B1 (en) Method and means for encoding background noise information
EP1677286A1 (en) Process for adaptation of comfort noise generation parameters
EP1327243A1 (en) Method and device for the generation of a scalable data stream and method and device for decoding a scalable data stream
EP2245622B1 (en) Method and means for decoding background noise information
EP0658874A1 (en) Process and circuit for producing from a speech signal with small bandwidth a speech signal with great bandwidth
DE2303497C2 (en) Method for the transmission of speech signals
EP1433166B1 (en) Speech extender and method for estimating a broadband speech signal from a narrowband speech signal
DE69921643T2 (en) AV SIGNAL TRANSMISSION WITH VARIABLE BITRATE IN A PACKET NETWORK
EP1390946B1 (en) Method for estimating a codec parameter
DE69836454T2 (en) COMMUNICATION NETWORK FOR TRANSMITTING LANGUAGE SIGNALS
DE19906223B4 (en) Method and radio communication system for voice transmission, in particular for digital mobile communication systems
WO2006072526A1 (en) Method for bandwidth extension
CH680030A5 (en)
WO2006072519A1 (en) Analog signal encoding method
DE102005019863A1 (en) Noise suppression process for decoded signal comprise first and second decoded signal portion and involves determining a first energy envelope generating curve, forming an identification number, deriving amplification factor

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20100812

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO SE SI SK TR

AX Request for extension of the european patent

Extension state: AL BA RS

RIN1 Information on inventor provided before grant (corrected)

Inventor name: SETIAWAN, PANJI

Inventor name: SCHANDL, STEFAN

Inventor name: TADDEI, HERVE

DAX Request for extension of the european patent (deleted)
RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: UNIFY GMBH & CO. KG

17Q First examination report despatched

Effective date: 20150706

RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: UNIFY GMBH & CO. KG

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: GRANT OF PATENT IS INTENDED

INTG Intention to grant announced

Effective date: 20181129

GRAS Grant fee paid

Free format text: ORIGINAL CODE: EPIDOSNIGR3

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE PATENT HAS BEEN GRANTED

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO SE SI SK TR

REG Reference to a national code

Ref country code: GB

Ref legal event code: FG4D

Free format text: NOT ENGLISH

REG Reference to a national code

Ref country code: CH

Ref legal event code: EP

Ref country code: AT

Ref legal event code: REF

Ref document number: 1127980

Country of ref document: AT

Kind code of ref document: T

Effective date: 20190515

REG Reference to a national code

Ref country code: DE

Ref legal event code: R096

Ref document number: 502009015751

Country of ref document: DE

REG Reference to a national code

Ref country code: IE

Ref legal event code: FG4D

Free format text: LANGUAGE OF EP DOCUMENT: GERMAN

REG Reference to a national code

Ref country code: NL

Ref legal event code: MP

Effective date: 20190501

REG Reference to a national code

Ref country code: LT

Ref legal event code: MG4D

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190501

Ref country code: HR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190501

Ref country code: NO

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190801

Ref country code: FI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190501

Ref country code: PT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190901

Ref country code: ES

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190501

Ref country code: LT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190501

Ref country code: NL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190501

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190802

Ref country code: BG

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190801

Ref country code: LV

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190501

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: IS

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190901

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: EE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190501

Ref country code: DK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190501

Ref country code: CZ

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190501

Ref country code: RO

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190501

Ref country code: SK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190501

REG Reference to a national code

Ref country code: DE

Ref legal event code: R097

Ref document number: 502009015751

Country of ref document: DE

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: IT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190501

PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: TR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190501

26N No opposition filed

Effective date: 20200204

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: PL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190501

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190501

REG Reference to a national code

Ref country code: CH

Ref legal event code: PL

REG Reference to a national code

Ref country code: BE

Ref legal event code: MM

Effective date: 20200229

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: MC

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190501

Ref country code: LU

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20200202

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: LI

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20200229

Ref country code: CH

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20200229

REG Reference to a national code

Ref country code: DE

Ref legal event code: R082

Ref document number: 502009015751

Country of ref document: DE

Representative=s name: SCHAAFHAUSEN PATENTANWAELTE PARTNERSCHAFTSGESE, DE

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: IE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20200202

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: BE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20200229

REG Reference to a national code

Ref country code: AT

Ref legal event code: MM01

Ref document number: 1127980

Country of ref document: AT

Kind code of ref document: T

Effective date: 20200202

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: AT

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20200202

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: MT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190501

Ref country code: CY

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190501

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: MK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190501

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: DE

Payment date: 20240216

Year of fee payment: 16

Ref country code: GB

Payment date: 20240222

Year of fee payment: 16

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: FR

Payment date: 20240222

Year of fee payment: 16