EP1048025B1 - Method for objective voice quality evaluation - Google Patents
Method for objective voice quality evaluation Download PDFInfo
- Publication number
- EP1048025B1 EP1048025B1 EP99942871A EP99942871A EP1048025B1 EP 1048025 B1 EP1048025 B1 EP 1048025B1 EP 99942871 A EP99942871 A EP 99942871A EP 99942871 A EP99942871 A EP 99942871A EP 1048025 B1 EP1048025 B1 EP 1048025B1
- Authority
- EP
- European Patent Office
- Prior art keywords
- spectral
- computed
- speech signal
- evaluated
- signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
- 238000000034 method Methods 0.000 title claims abstract description 27
- 238000013441 quality evaluation Methods 0.000 title 1
- 230000003595 spectral effect Effects 0.000 claims abstract description 56
- 238000001228 spectrum Methods 0.000 claims 3
- 230000010354 integration Effects 0.000 abstract description 3
- 238000005457 optimization Methods 0.000 abstract description 3
- 239000013589 supplement Substances 0.000 abstract 1
- 230000006870 function Effects 0.000 description 12
- 230000005540 biological transmission Effects 0.000 description 7
- 238000013459 approach Methods 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 5
- 238000012360 testing method Methods 0.000 description 4
- 230000006735 deficit Effects 0.000 description 3
- 238000012937 correction Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000001303 quality assessment method Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000000691 measurement method Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/69—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for evaluating synthetic or decoded voice signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
Definitions
- the invention relates to a method for instrumental ("objective") Speech quality determination, by comparing the properties of one evaluating speech signal with properties of a reference speech signal (undisturbed Signal) characteristic values for determining the speech quality (speech quality) can be derived.
- Speech quality determinations of speech signals are generally made using auditory ("subjective") investigations with test persons.
- the goal of instrumental ("objective") procedures for determining speech quality is it, from properties of the speech signal to be evaluated by means of suitable computing methods To determine characteristic values that determine the speech quality of the speech signal to be assessed describe without having to resort to judgments from test subjects.
- the calculated characteristic values and the underlying method for instrumental Speech quality assessment is considered recognized if there is a high correlation Results of auditory comparative examinations is achieved.
- the means of auditory Examinations of speech quality values thus represent the target values that to be achieved through instrumental procedures.
- Known methods for instrumental speech quality determination are based on a comparison of a reference speech signal with the speech signal to be evaluated.
- the reference speech signal and the speech signal to be evaluated are segmented into short time segments.
- the spectral properties of the two signals are compared in these segments.
- Various approaches and models are used to calculate the spectral short-term properties.
- the signal intensity is calculated in frequency bands whose width increases with increasing center frequency. Examples of such frequency bands are the known third octave bands or frequency groups according to Zwicker (published in Zwicker, E .: “Psychoakustik", Berlin: Springer-Verlag, 1982).
- the spectral intensity map calculated in this way for each period of time under consideration can be understood as a series of numerical values in which the number of individual values of the The number of frequency bands used corresponds to the numerical values even the calculated ones Represent intensity values and a continuous index of the frequency bands the order which describes numerical values.
- the calculated intensities are to be evaluated Voice signal and reference voice signal compared in each band.
- the difference both values, or the similarity of the two resulting spectral values Intensity maps, is the basis for the calculation of a quality value (Fig. 1).
- a disadvantage of the methods known today in such cases is that a comparison differences between the speech signal to be evaluated with a reference speech signal between the two signal sections in the selected display level in the calculative quality parameter that does not or hardly at all - also in audible test perceptible - lead to qualitative impairment.
- Frequency band limits and spectral deformations of the speech signal to be evaluated contributes only to a limited extent to a perceived qualitative impairment.
- the invention has as its object the influence of spectral limits and Deformations of the speech signal to be assessed and of spectral shifts Short-term maxima before comparing the spectral properties of a test item Signal with a reference speech signal and the calculation of a quality value in to reduce instrumental procedures.
- one is used in the invention described here generated spectral weighting function based on medium spectral envelopes, e.g. the average spectral power density, of the speech signal to be evaluated and Reference speech signal based. This also enables the use of the method non-linear and time-variant transmission.
- the evaluation function a (f) can weight the weighting function W T (f) differently over the effective range, in the simplest case it is constant 1.
- the spectral weighting function W T (f) calculated in this way approximates the mean spectral envelopes of the speech signal and the reference speech signal to be evaluated, so that differences between the two spectral envelopes only have a reduced effect on the calculated quality value.
- the spectral weighting function W T (f) can be applied to the reference speech signal.
- the average spectral power density of the reference speech signal is approximated to the signal to be evaluated (FIG. 2a).
- the spectral weighting function can be inverted to the signal to be evaluated be applied. This is equalized and, with regard to its middle one spectral power density, approximated to the reference speech signal (Fig. 2b).
- Another part of the invention relates to the correction of displacements short-term spectral maxima caused by the transmission systems.
- the intensity is integrated in frequency bands for each time period.
- the result is one Series of intensity values for each spectral representation of a signal section, where each individual value represents the intensity in a frequency band.
- the shifts Short-term spectral maxima can lead to deviating calculated intensities in the frequency bands of the reference speech signal and the speech signal to be evaluated.
- variable band limits for calculating the spectral intensity mapping is not only limited to the signal in which the described spectral weighting function W T (f) is also used, but can also be applied to the other signal and even to both signals. (see FIGS. 2a and 2b).
- a special embodiment shows an implementation according to FIG. 3, which is called TOSQA (Telecommunication Objective Speech Quality Assessment). This is done advanced preprocessing of the reference speech signal.
- TOSQA Telecommunication Objective Speech Quality Assessment
- Speech pauses are recognized by a speech pause recognizer and do not go into that Quality measure.
- the reference speech signal and is also filtered evaluating speech signal with a bandpass 300 ... 3400 Hz and a filtering on the Frequency response of a telephone handset.
- the integration of the spectral power density takes place in frequency groups, which are the basis for the calculation of specific loudness represent.
- the calculated loudness patterns are in addition to the general approach supplemented by an error evaluation function.
- the calculated quality value is over a Average of the correlation coefficients of the specific loudness for each considered short time segment over the number of evaluated language segments formed.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Machine Translation (AREA)
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
Abstract
Description
Die Erfindung bezieht sich auf ein Verfahren zur instrumentellen ("objektiven") Sprachqualitätsbestimmung, bei dem durch Vergleich von Eigenschaften eines zu bewertenden Sprachsignals mit Eigenschaften eines Referenzsprachsignals (ungestörtes Signal) Kennwerte zur Bestimmung der Sprachqualität (Sprachgüte) abgeleitet werden.The invention relates to a method for instrumental ("objective") Speech quality determination, by comparing the properties of one evaluating speech signal with properties of a reference speech signal (undisturbed Signal) characteristic values for determining the speech quality (speech quality) can be derived.
Sprachqualitätsbestimmungen von Sprachsignalen werden in der Regel mittels auditiver ("subjektiver") Untersuchungen mit Versuchspersonen vorgenommen.Speech quality determinations of speech signals are generally made using auditory ("subjective") investigations with test persons.
Das Ziel von instrumentellen ("objektiven") Verfahren zur Sprachqualitätsbestimmung ist es, aus Eigenschaften des zu bewertenden Sprachsignals mittels geeigneter Rechenverfahren Kennwerte zu ermitteln, die die Sprachqualität des zu bewertenden Sprachsignals beschreiben, ohne auf Urteile von Versuchspersonen zurückgreifen zu müssen.The goal of instrumental ("objective") procedures for determining speech quality is it, from properties of the speech signal to be evaluated by means of suitable computing methods To determine characteristic values that determine the speech quality of the speech signal to be assessed describe without having to resort to judgments from test subjects.
Die berechneten Kennwerte und das zugrunde gelegte Verfahren zur instrumentellen Sprachqualitätsbestimmung gelten als anerkannt, wenn eine hohe Korrelation zu Ergebnissen auditiver Vergleichsuntersuchungen erreicht wird. Die mittels auditiver Untersuchungen gewonnenen Sprachqualitätswerte stellen somit die Zielwerte dar, die durch instrumentelle Verfahren erreicht werden sollen.The calculated characteristic values and the underlying method for instrumental Speech quality assessment is considered recognized if there is a high correlation Results of auditory comparative examinations is achieved. The means of auditory Examinations of speech quality values thus represent the target values that to be achieved through instrumental procedures.
Bekannte Verfahren zur instrumentellen Sprachqualitätsbestimmung beruhen auf einem
Vergleich eines Referenzsprachsignals mit dem zu bewertenden Sprachsignal. Dabei werden
das Referenzsprachsignal und das zu bewertendes Sprachsignal in kurze Zeitabschnitte
segmentiert. In diesen Segmenten werden die spektralen Eigenschaften der beiden Signale
verglichen.
Für die Berechnung der spektralen Kurzzeiteigenschaften kommen verschiedene Ansätze
und Modelle zur Anwendung. In der Regel erfolgt die Berechnung der Signalintensität in
Frequenzbändern, deren Breite mit zunehmender Mittenfrequenz größer wird. Beispiele für
solche Frequenzbänder sind die bekannten Terzbänder oder Frequenzgruppen nach Zwicker
(veröffentlicht in Zwicker, E.: "Psychoakustik", Berlin: Springer-Verlag, 1982).Known methods for instrumental speech quality determination are based on a comparison of a reference speech signal with the speech signal to be evaluated. The reference speech signal and the speech signal to be evaluated are segmented into short time segments. The spectral properties of the two signals are compared in these segments.
Various approaches and models are used to calculate the spectral short-term properties. As a rule, the signal intensity is calculated in frequency bands whose width increases with increasing center frequency. Examples of such frequency bands are the known third octave bands or frequency groups according to Zwicker (published in Zwicker, E .: "Psychoakustik", Berlin: Springer-Verlag, 1982).
Die derart berechnete spektrale Intensitätsabbildung für jeden betrachteten Zeitabschnitt läßt sich als Reihe von Zahlenwerten auffassen, in der die Anzahl der Einzelwerte der Anzahl der verwendeten Frequenzbänder entspricht, die Zahlenwerte selbst die berechneten Intensitätswerte darstellen und ein fortlaufender Index der Frequenzbänder die Reihenfolge der Zahlenwerte beschreibt.The spectral intensity map calculated in this way for each period of time under consideration can be understood as a series of numerical values in which the number of individual values of the The number of frequency bands used corresponds to the numerical values even the calculated ones Represent intensity values and a continuous index of the frequency bands the order which describes numerical values.
Bei den derzeit bekannten Verfahren zur instrumentellen Sprachqualitätsbestimmung werden die Grenzen der benutzten Frequenzbänder auf der Frequenzachse konstant gehalten.In the currently known methods for instrumental language quality determination the limits of the frequency bands used on the frequency axis become constant held.
In jedem betrachteten Zeitsegment werden die berechneten Intensitäten von zu bewertenden Sprachsignal und Referenzsprachsignal in jedem Band miteinander verglichen. Die Differenz beider Werte, bzw. die Ähnlichkeit der beiden entstehenden spektralen Intensitätsabbildungen, stellt die Grundlage für die Berechnung eines Qualitätswertes dar (Fig. 1).In each time segment considered, the calculated intensities are to be evaluated Voice signal and reference voice signal compared in each band. The difference both values, or the similarity of the two resulting spectral values Intensity maps, is the basis for the calculation of a quality value (Fig. 1).
Solche Verfahren wurden insbesondere für die qualitative Bewertung der Sprache in der
Telefonieanwendung entwickelt. Beispiele hierfür sind
US-A-5,621,854 (HOLLIER MICHAEL P), die als nächstliegender Stand der Technik
zitiert wird, und
die Veröffentlichungen:
Der derzeit gültige ITU-T Standard P.861 beschreibt ebenfalls ein derartiges Verfahren: "Objective quality measurement of telephone-band speech codecs" (ITU-T Rec. P.861, Genf 1996). The currently valid ITU-T standard P.861 also describes such a procedure: "Objective quality measurement of telephone-band speech codecs" (ITU-T Rec. P.861, Geneva 1996).
Der Einsatz von bekannten Verfahren zur instrumentellen Sprachqualitätsbestimmung scheitert an der Zuverlässigkeit der berechneten Qualitätswerte für bestimmte zu bewertende Signaleigenschaften. Insbesondere bei Beeinträchtigungen im zu bewertenden Sprachsignal, wie sie z.B. durch Sprachcodierverfahren mit niedrigen Bitraten oder Kombinationen von unterschiedlichen Störungen hervorgerufen werden, liefern derzeit bekannte Verfahren nur unsichere Qualitätswerte.The use of known methods for instrumental language quality determination fails due to the reliability of the calculated quality values for certain ones evaluating signal properties. Especially in the case of impairments in the Speech signal, e.g. through speech coding methods with low bit rates or Combinations of different disorders are currently being delivered known methods only unsafe quality values.
Nachteilig bei den heute bekannten Verfahren ist in solchen Fällen, daß bei einem Vergleich zwischen dem zu bewertenden Sprachsignal mit einem Referenzsprachsignal Unterschiede zwischen beiden Signalabschnitten in der gewählten Darstellungsebene in den zu berechnenden Qualitätskennwert einfließen, die nicht oder kaum zu einer - auch im auditiven Test wahrnehmbaren - qualitativen Beeinträchtigung führen.A disadvantage of the methods known today in such cases is that a comparison differences between the speech signal to be evaluated with a reference speech signal between the two signal sections in the selected display level in the calculative quality parameter that does not or hardly at all - also in audible test perceptible - lead to qualitative impairment.
Im Rahmen der hier betrachteten Sprachübertragung in Telefonanwendungen tragen Frequenzbandbegrenzungen und spektrale Verformungen des zu bewertenden Sprachsignals (z.B. hervorgerufen durch Filtereigenschaften des Telefongerätes oder des Übertragungskanals) nur begrenzt zu einer empfundenen qualitativen Beeinträchtigung bei.Wear in the context of the voice transmission considered here in telephone applications Frequency band limits and spectral deformations of the speech signal to be evaluated (e.g. caused by filter properties of the telephone device or the Transmission channel) contributes only to a limited extent to a perceived qualitative impairment.
Um diese Mängel teilweise zu vermeiden, wird in einem anderen Ansatz versucht, die
linearen Verzerrungen (Frequenzgang) durch ein Korrekturfilter bzw. eine
Leistungsübertragungsfunktion zu kompensieren (veröffentlicht in: "A new approach to
objective quality-measures based on attribute-matching", Halka, U.; Heute, U., Speech
communication, 11(1992)1, S. 15-30). Die Anwendung dieses Verfahrens ist jedoch bei
nichtlinearer und zeitinvarianter Übertragung nachteilig, da die so berechnete
Kompensationsfunktion nicht mehr ausschließlich die spektralen Verformungen des zu
bewertenden Signals beschreibt.
Verschiebungen spektraler Kurzzeit-Maxima ("Formantverschiebungen") im zu testenden
Signal gegenüber dem Referenzsprachsignal, z.B. verursacht durch Codiersysteme mit
niedriger Bitrate, führen bei bekannten Verfahren zu großen Unterschieden in den
spektralen Intensitätsabbildungen und gehen damit stark in den berechneten Qualitätswert
ein. Untersuchungen haben ergeben, daß in einer auditiven Sprachqualitätsuntersuchung
diese Verschiebungen spektraler Kurzzeit-Maxima jedoch nur begrenzten Einfluß auf das
Qualitätsurteil haben.In order to partially avoid these deficiencies, another approach attempts to compensate for the linear distortions (frequency response) by means of a correction filter or a power transmission function (published in: "A new approach to objective quality-measures based on attribute-matching", Halka , U .; Today, U., Speech communication, 11 (1992) 1, pp. 15-30). However, the use of this method is disadvantageous in the case of nonlinear and time-invariant transmission, since the compensation function calculated in this way no longer exclusively describes the spectral deformations of the signal to be evaluated.
Shifts in short-term spectral maxima ("formant shifts") in the signal to be tested compared to the reference speech signal, for example caused by coding systems with a low bit rate, lead to large differences in the spectral intensity maps in known methods and are therefore strongly incorporated into the calculated quality value. Studies have shown that, in an auditory speech quality examination, these shifts in short-term spectral maxima have only a limited influence on the quality judgment.
Die Erfindung stellt sich die Aufgabe, den Einfluß von spektralen Begrenzungen und Verformungen des zu bewertenden Sprachsignals sowie von Verschiebungen spektraler Kurzzeit-Maxima vor dem Vergleich der spektralen Eigenschaften eines zu testenden Signals mit einem Referenzsprachsignal und der Berechnung eines Qualitätswertes in instrumentellen Verfahren zu reduzieren.The invention has as its object the influence of spectral limits and Deformations of the speech signal to be assessed and of spectral shifts Short-term maxima before comparing the spectral properties of a test item Signal with a reference speech signal and the calculation of a quality value in to reduce instrumental procedures.
Die obengenannte Aufgabe wird durch ein Verfahren gemäß
Patentanspruch 1 gelöst.The above object is accomplished by a method according to
Im Gegensatz zu bekannten Ansätzen wird in der hier beschriebenen Erfindung eine spektrale Wichtungsfunktion generiert, die auf mittleren spektralen Einhüllenden, z.B. der mittleren spektralen Leistungsdichte, von zu bewertendem Sprachsignal und Referenzsprachsignal beruht. Dies ermöglicht den Einsatz des Verfahrens ebenfalls bei nichtlinearer und zeitvarianter Übertragung.In contrast to known approaches, one is used in the invention described here generated spectral weighting function based on medium spectral envelopes, e.g. the average spectral power density, of the speech signal to be evaluated and Reference speech signal based. This also enables the use of the method non-linear and time-variant transmission.
Die spektrale Wichtungsfunktion wird aus den Quotienten der Stützwerte der mittleren
spektralen Leistungsdichte des zu bewertenden Signals PhiY(f) und der des Eingangssignals
des Übertragungssystems PhiX(f) derart berechnet, daß die Wichtungsfunktion
über
Die derart berechnete spektrale Wichtungsfunktion WT(f) nähert die mittleren spektralen Einhüllenden von zu bewertenden Sprachsignal und Referenzsprachsignal einander an, so daß Unterschiede der beiden spektralen Einhüllenden nur noch vermindert in den berechneten Qualitätswert einfließen. The spectral weighting function W T (f) calculated in this way approximates the mean spectral envelopes of the speech signal and the reference speech signal to be evaluated, so that differences between the two spectral envelopes only have a reduced effect on the calculated quality value.
Die spektrale Wichtungsfunktion WT(f) kann zum einen auf das Referenzsprachsignal angewendet werden. Dabei wird das Referenzsprachsignal in seiner mittleren spektralen Leistungsdichte dem zu bewertenden Signal angenähert (Fig. 2a).The spectral weighting function W T (f) can be applied to the reference speech signal. The average spectral power density of the reference speech signal is approximated to the signal to be evaluated (FIG. 2a).
Zum anderen kann die spektrale Wichtungsfunktion invertiert auf das zu bewertende Signal angewendet werden. Dieses wird dadurch entzerrt und, hinsichtlich seiner mittleren spektralen Leistungsdichte, an das Referenzsprachsignal angenähert (Fig. 2b).On the other hand, the spectral weighting function can be inverted to the signal to be evaluated be applied. This is equalized and, with regard to its middle one spectral power density, approximated to the reference speech signal (Fig. 2b).
Ein weiterer Teil der Erfindung bezieht sich auf die Korrektur von Verschiebungen spektraler Kurzzeit-Maxima, die durch die Übertragungssysteme verursacht werden.Another part of the invention relates to the correction of displacements short-term spectral maxima caused by the transmission systems.
Die Intensität wird für jeden Zeitabschnitt in Frequenzbändern integriert. Resultat ist eine Reihe von Intensitätswerten für jede spektrale Darstellung eines Signalabschnitts, wobei jeder Einzelwert die Intensität in einem Frequenzband repräsentiert. Die Verschiebungen spektraler Kurzzeit-Maxima können hierbei zu abweichenden berechneten Intensitäten in den Frequenzbändern von Referenzsprachsignal und zu bewertenden Sprachsignal führen.The intensity is integrated in frequency bands for each time period. The result is one Series of intensity values for each spectral representation of a signal section, where each individual value represents the intensity in a frequency band. The shifts Short-term spectral maxima can lead to deviating calculated intensities in the frequency bands of the reference speech signal and the speech signal to be evaluated.
Diese Abweichungen in den spektralen Intensitätsabbildungen - verursacht Verschiebungen spektraler Kurzzeit-Maxima -können durch eine variable Anordnung der Frequenzbänder auf der Frequenzachse reduziert werden. Im Gegensatz zu den konstanten Bandgrenzen bei bekannten Verfahren werden die Bandgrenzen auf der Frequenzachse verschoben. Die Zahl der Frequenzbänder und deren Index bleibt aber konstant. In einer Optimierungsschleife werden dann diejenigen Bandgrenzen akzeptiert, bei denen die beiden entstehenden spektralen Abbildungen von zu bewertenden Sprachsignal und Referenzsprachsignal maximale Ähnlichkeit aufweisen bzw. deren Abstand minimal ist. Diese Optimierung wird für alle Bänder in allen betrachteten Zeitsegmenten durchgeführt.These deviations in the spectral intensity images - causes shifts spectral short-term maxima - can by variable arrangement of the frequency bands be reduced on the frequency axis. In contrast to the constant band limits at In known methods, the band limits are shifted on the frequency axis. The number the frequency bands and their index remain constant. In an optimization loop the band limits at which the two arise are then accepted spectral images of the speech signal to be evaluated and the reference speech signal have maximum similarity or their distance is minimal. This optimization will carried out for all bands in all considered time segments.
Der Einsatz variabler Bandgrenzen zur Berechnung der spektralen Intensitätsabbildung ist nicht nur auf das Signal, in dem auch die beschriebene spektrale Wichtungsfunktion WT(f) zum Einsatz kommt, beschränkt, sondern kann auch auf das jeweils andere Signal und sogar auf beide Signale angewendet werden. (vgl. Fig. 2a und 2b). The use of variable band limits for calculating the spectral intensity mapping is not only limited to the signal in which the described spectral weighting function W T (f) is also used, but can also be applied to the other signal and even to both signals. (see FIGS. 2a and 2b).
Ein spezielles Ausführungsbeispiel zeigt eine Realisierung gemäß Fig. 3, die als TOSQA (Telecommunication Objective Speech Quality Assessment) bezeichnet wird. Hierbei erfolgt eine erweiterte Vorverarbeitung des Referenzsprachsignals.A special embodiment shows an implementation according to FIG. 3, which is called TOSQA (Telecommunication Objective Speech Quality Assessment). This is done advanced preprocessing of the reference speech signal.
In Spezifikation der allgemeinen Realisierungen nach Fig. 2a und 2b werden hier Sprachpausen mittels eines Sprachpausenerkenners erkannt und gehen nicht in das Qualitätsmaß ein. Ebenfalls erfolgt eine Filterung von Referenzsprachsignal und zu bewertendem Sprachsignal mit einem Bandpaß 300...3400 Hz sowie eine Filterung auf den Frequenzgang eines Telefonhandapparates. Die Integration der spektralen Leistungsdichte erfolgt in Frequenzgruppen, die die Basis für die Berechnung der spezifischen Lautheit darstellen.In specification of the general realizations according to FIGS. 2a and 2b are here Speech pauses are recognized by a speech pause recognizer and do not go into that Quality measure. The reference speech signal and is also filtered evaluating speech signal with a bandpass 300 ... 3400 Hz and a filtering on the Frequency response of a telephone handset. The integration of the spectral power density takes place in frequency groups, which are the basis for the calculation of specific loudness represent.
Die Integration in Frequenzgruppen erfolgt jedoch nicht in festen Frequenzgruppengrenzen, sondern mit den in dieser Erfindung beschriebenen variablen Frequenzgruppengrenzen. Die berechneten Signalleistungen in den so modifizierten Frequenzgruppen bilden die Basis für die Intensitätsberechnung. Hier wurde auf ein Modell zur Berechnung der spezifischen Lautheit nach Zwicker, einer gehörrichtigen Intensitätsabbildung, zurückgegriffen (veröffentlicht in Zwicker, E.: "Psychoakustik", Berlin: Springer-Verlag, 1982).However, the integration into frequency groups does not take place in fixed frequency group limits, but with the variable frequency group limits described in this invention. The calculated signal powers in the frequency groups modified in this way form the basis for the intensity calculation. Here, a model was used to calculate the specific loudness according to Zwicker, an aurally accurate intensity mapping (published in Zwicker, E .: "Psychoacoustics", Berlin: Springer-Verlag, 1982).
Die berechneten Lautheitsmuster werden in Ergänzung des allgemeinen Ansatzes noch durch eine Fehlerbewertungsfunktion ergänzt. Der berechnete Qualitätswert wird über einen Mittelwert der Korrelationskoeffizienten der spezifischen Lautheiten für jedes betrachtete kurze Zeitsegment über die Zahl der ausgewerteten Sprachsegmente gebildet.The calculated loudness patterns are in addition to the general approach supplemented by an error evaluation function. The calculated quality value is over a Average of the correlation coefficients of the specific loudness for each considered short time segment over the number of evaluated language segments formed.
Claims (6)
- Method for instrumental speech quality determination in which characteristic values for determining the speech quality are computed by comparing spectral short-time properties of a speech signal to be evaluated with a reference speech
signal, characterized in that
prior to comparison of the properties of the speech signals, differences in mean spectral envelope curves are reduced in that first a spectral weighting function is computed therefrom, said spectral weighting function being used to weight the spectral short-time properties of the speech signals in all time segments under consideration, with the result that the differences in the mean spectral envelope curves are thereby included only to a limited extent in the quality characteristic value to be computed, and
in that, for computing the signal intensity, the limits of the frequency bands used are made variable, with the result that, for each signal portion under consideration, the computed intensities of reference speech signal and signal to be evaluated have differences as small as possible with respect to each other in all evaluated frequency bands. - Method according to claim 1, characterized in that first the mean spectral envelope curves of speech signal to be evaluated and reference speech signal are computed in the form of a mean power density spectrum and a spectral weighting function WT(f) is computed from the quotient of both spectra, said spectral weighting function WT(f) being used to weight the short-time power density spectra of the reference speech signal prior to the computation of a quality characteristic value.
- Method according to claims 1 and 2, characterized in that the weighting function WT(f) to be computed is computed only from partial regions of the computed mean spectral envelope curves of speech signal to be evaluated and reference speech signal and, consequently, the differences in mean spectral envelope curves between both signals are reduced only in spectral partial regions.
- Method according to claims 1 to 3, characterized in that, prior to computation of the quality characteristic values, the signal intensity for each evaluated short time portion is integrated in frequency groups, the limits of the frequency groups being variable on the frequency axis, but the width of the frequency groups remaining constant on the critical band rate scale, and in that from the signal intensities in the frequency groups a computation is made of the specific loudness, use being made of the limits of the frequency groups in which the computed differences in the specific loudness between the signal to be evaluated and the reference speech signal have the smallest difference in the respective band and time segment under consideration.
- Method according to claims 1 to 4, characterized in that the quality characteristic value is computed from the similarity of the spectral representations in each time portion under consideration, the similarity representing a correlation coefficient averaged over all time portions under consideration between the spectral representation of the speech signal to be evaluated and the spectral representation of the reference speech signal in the respective time segment.
- Method according to claim 5, characterized in that the correlation coefficient between the spectral representation of the speech signal to be evaluated and the spectral representation of the reference speech signal in the respective time segment is computed only from a partial region of the spectral representation, i.e. not all the computed spectral values are taken into consideration for the computation of the quality characteristic value.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
DE19840548 | 1998-08-27 | ||
DE19840548A DE19840548C2 (en) | 1998-08-27 | 1998-08-27 | Procedures for instrumental language quality determination |
PCT/EP1999/005972 WO2000013173A1 (en) | 1998-08-27 | 1999-08-14 | Method for instrumental voice quality evaluation |
Publications (2)
Publication Number | Publication Date |
---|---|
EP1048025A1 EP1048025A1 (en) | 2000-11-02 |
EP1048025B1 true EP1048025B1 (en) | 2003-11-05 |
Family
ID=7879918
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP99942871A Expired - Lifetime EP1048025B1 (en) | 1998-08-27 | 1999-08-14 | Method for objective voice quality evaluation |
Country Status (6)
Country | Link |
---|---|
US (1) | US7013266B1 (en) |
EP (1) | EP1048025B1 (en) |
AT (1) | ATE253765T1 (en) |
CA (1) | CA2305652A1 (en) |
DE (2) | DE19840548C2 (en) |
WO (1) | WO2000013173A1 (en) |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2001065543A1 (en) * | 2000-02-29 | 2001-09-07 | Telefonaktiebolaget Lm Ericsson (Publ) | Compensation for linear filtering using frequency weighting factors |
EP1241663A1 (en) * | 2001-03-13 | 2002-09-18 | Koninklijke KPN N.V. | Method and device for determining the quality of speech signal |
EP1292036B1 (en) * | 2001-08-23 | 2012-08-01 | Nippon Telegraph And Telephone Corporation | Digital signal decoding methods and apparatuses |
DE10142846A1 (en) * | 2001-08-29 | 2003-03-20 | Deutsche Telekom Ag | Procedure for the correction of measured speech quality values |
DE10150519B4 (en) | 2001-10-12 | 2014-01-09 | Hewlett-Packard Development Co., L.P. | Method and arrangement for speech processing |
EP1492084B1 (en) * | 2003-06-25 | 2006-05-17 | Psytechnics Ltd | Binaural quality assessment apparatus and method |
US7305341B2 (en) | 2003-06-25 | 2007-12-04 | Lucent Technologies Inc. | Method of reflecting time/language distortion in objective speech quality assessment |
WO2006033570A1 (en) * | 2004-09-20 | 2006-03-30 | Nederlandse Organisatie Voor Toegepast- Natuurwetenschappelijk Onderzoek Tno | Frequency compensation for perceptual speech analysis |
EP2249333B1 (en) * | 2009-05-06 | 2014-08-27 | Nuance Communications, Inc. | Method and apparatus for estimating a fundamental frequency of a speech signal |
EP2388779B1 (en) * | 2010-05-21 | 2013-02-20 | SwissQual License AG | Method for estimating speech quality |
WO2013142695A1 (en) * | 2012-03-23 | 2013-09-26 | Dolby Laboratories Licensing Corporation | Method and system for bias corrected speech level determination |
CN112233693B (en) * | 2020-10-14 | 2023-12-01 | 腾讯音乐娱乐科技(深圳)有限公司 | Sound quality evaluation method, device and equipment |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE3708002A1 (en) * | 1987-03-12 | 1988-09-22 | Telefonbau & Normalzeit Gmbh | Measuring method for assessing the quality of speech coders and/or transmission routes |
US4860360A (en) * | 1987-04-06 | 1989-08-22 | Gte Laboratories Incorporated | Method of evaluating speech |
GB9213459D0 (en) | 1992-06-24 | 1992-08-05 | British Telecomm | Characterisation of communications systems using a speech-like test stimulus |
SE517836C2 (en) * | 1995-02-14 | 2002-07-23 | Telia Ab | Method and apparatus for determining speech quality |
NL9500512A (en) * | 1995-03-15 | 1996-10-01 | Nederland Ptt | Apparatus for determining the quality of an output signal to be generated by a signal processing circuit, and a method for determining the quality of an output signal to be generated by a signal processing circuit. |
EP0809236B1 (en) * | 1996-05-21 | 2001-08-29 | Koninklijke KPN N.V. | Device for determining the quality of an output signal to be generated by a signal processing circuit, and also method |
-
1998
- 1998-08-27 DE DE19840548A patent/DE19840548C2/en not_active Expired - Fee Related
-
1999
- 1999-08-14 WO PCT/EP1999/005972 patent/WO2000013173A1/en active IP Right Grant
- 1999-08-14 US US09/530,389 patent/US7013266B1/en not_active Expired - Lifetime
- 1999-08-14 AT AT99942871T patent/ATE253765T1/en active
- 1999-08-14 EP EP99942871A patent/EP1048025B1/en not_active Expired - Lifetime
- 1999-08-14 CA CA002305652A patent/CA2305652A1/en not_active Abandoned
- 1999-08-14 DE DE59907623T patent/DE59907623D1/en not_active Expired - Lifetime
Also Published As
Publication number | Publication date |
---|---|
DE19840548C2 (en) | 2001-02-15 |
US7013266B1 (en) | 2006-03-14 |
DE59907623D1 (en) | 2003-12-11 |
EP1048025A1 (en) | 2000-11-02 |
CA2305652A1 (en) | 2000-03-09 |
ATE253765T1 (en) | 2003-11-15 |
DE19840548A1 (en) | 2000-03-02 |
WO2000013173A1 (en) | 2000-03-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
DE112017004548B4 (en) | Method and apparatus for robust noise estimation for speech enhancement in variable noise conditions | |
DE60009206T2 (en) | Noise suppression by means of spectral subtraction | |
DE10041512B4 (en) | Method and device for artificially expanding the bandwidth of speech signals | |
DE19952538C2 (en) | Automatic gain control in a speech recognition system | |
DE60020865T2 (en) | System, method and computer program for a telephone emotion detector with feedback to an operator | |
DE60031432T2 (en) | SYSTEM, METHOD, AND MANUFACTURED SUBJECT FOR DETECTING EMOTIONS IN LANGUAGE SIGNALS BY STATISTICAL ANALYSIS OF LANGUAGE SIGNAL PARAMETERS | |
EP1048025B1 (en) | Method for objective voice quality evaluation | |
EP0938831B1 (en) | Hearing-adapted quality assessment of audio signals | |
DE60122751T2 (en) | METHOD AND DEVICE FOR OBJECTIVE EVALUATION OF LANGUAGE QUALITY WITHOUT REFERENCE SIGNAL | |
DE69730721T2 (en) | METHOD AND DEVICES FOR NOISE CONDITIONING OF SIGNALS WHICH REPRESENT AUDIO INFORMATION IN COMPRESSED AND DIGITIZED FORM | |
EP0980064A1 (en) | Method for carrying an automatic judgement of the transmission quality of audio signals | |
KR19990028694A (en) | Method and device for evaluating the property of speech transmission signal | |
DE602004010634T2 (en) | METHOD AND SYSTEM FOR LANGUAGE QUALITY FORECASTING AN AUDIO TRANSMISSION SYSTEM | |
DE69918635T2 (en) | Apparatus and method for speech processing | |
DE3043516C2 (en) | Method and device for speech recognition | |
DE602004008666T2 (en) | Tracking vocal tract resonances using a nonlinear predictor | |
DE10254612A1 (en) | Method for determining specifically relevant acoustic characteristics of sound signals for the analysis of unknown sound signals from a sound generation | |
EP0772764B1 (en) | Process and device for determining the tonality of an audio signal | |
EP3291234A1 (en) | Method for evaluation of a quality of the voice usage of a speaker | |
DE69922769T2 (en) | Apparatus and method for speech processing | |
EP1382034B1 (en) | Method for determining intensity parameters of background noise in speech pauses of voice signals | |
DE60110541T2 (en) | Method for speech recognition with noise-dependent normalization of the variance | |
DE60305306T2 (en) | Apparatus and method for binaural quality assessment | |
EP0535425A2 (en) | Method for amplifying an acoustic signal for the hard of hearing and device for carrying out the method | |
DE602004011292T2 (en) | Device for speech detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE |
|
17P | Request for examination filed |
Effective date: 20000911 |
|
RTI1 | Title (correction) |
Free format text: METHOD FOR OBJECTIVE VOICE QUALITY EVALUATION |
|
RTI1 | Title (correction) |
Free format text: METHOD FOR OBJECTIVE VOICE QUALITY EVALUATION |
|
GRAH | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOS IGRA |
|
GRAS | Grant fee paid |
Free format text: ORIGINAL CODE: EPIDOSNIGR3 |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: NL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20031105 Ref country code: IT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT;WARNING: LAPSES OF ITALIAN PATENTS WITH EFFECTIVE DATE BEFORE 2007 MAY HAVE OCCURRED AT ANY TIME BEFORE 2007. THE CORRECT EFFECTIVE DATE MAY BE DIFFERENT FROM THE ONE RECORDED. Effective date: 20031105 Ref country code: IE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20031105 Ref country code: FI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20031105 Ref country code: CY Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20031105 |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: FG4D Free format text: NOT ENGLISH |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: NV Representative=s name: ISLER & PEDRAZZINI AG Ref country code: CH Ref legal event code: EP |
|
REF | Corresponds to: |
Ref document number: 59907623 Country of ref document: DE Date of ref document: 20031211 Kind code of ref document: P |
|
REG | Reference to a national code |
Ref country code: IE Ref legal event code: FG4D Free format text: GERMAN |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: SE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20040205 Ref country code: GR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20040205 Ref country code: DK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20040205 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: ES Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20040216 |
|
GBT | Gb: translation of ep patent filed (gb section 77(6)(a)/1977) |
Effective date: 20040211 |
|
NLV1 | Nl: lapsed or annulled due to failure to fulfill the requirements of art. 29p and 29m of the patents act | ||
REG | Reference to a national code |
Ref country code: IE Ref legal event code: FD4D |
|
ET | Fr: translation filed | ||
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: LU Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20040814 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: MC Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20040831 Ref country code: BE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20040831 |
|
PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
26N | No opposition filed |
Effective date: 20040806 |
|
BERE | Be: lapsed |
Owner name: DEUTSCHE *TELEKOM A.G. Effective date: 20040831 |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: PCAR Free format text: ISLER & PEDRAZZINI AG;POSTFACH 1772;8027 ZUERICH (CH) |
|
BERE | Be: lapsed |
Owner name: DEUTSCHE *TELEKOM A.G. Effective date: 20040831 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: PT Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20040405 |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: PLFP Year of fee payment: 18 |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: PLFP Year of fee payment: 19 |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: PLFP Year of fee payment: 20 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: DE Payment date: 20180827 Year of fee payment: 20 Ref country code: FR Payment date: 20180824 Year of fee payment: 20 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: AT Payment date: 20180821 Year of fee payment: 20 Ref country code: GB Payment date: 20180828 Year of fee payment: 20 Ref country code: CH Payment date: 20180827 Year of fee payment: 20 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R071 Ref document number: 59907623 Country of ref document: DE |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: PL |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: PE20 Expiry date: 20190813 |
|
REG | Reference to a national code |
Ref country code: AT Ref legal event code: MK07 Ref document number: 253765 Country of ref document: AT Kind code of ref document: T Effective date: 20190814 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: GB Free format text: LAPSE BECAUSE OF EXPIRATION OF PROTECTION Effective date: 20190813 |