EP1850328A1 - Verstärkung und Extraktion von Sprachsignalformanten - Google Patents
Verstärkung und Extraktion von Sprachsignalformanten Download PDFInfo
- Publication number
- EP1850328A1 EP1850328A1 EP06013126A EP06013126A EP1850328A1 EP 1850328 A1 EP1850328 A1 EP 1850328A1 EP 06013126 A EP06013126 A EP 06013126A EP 06013126 A EP06013126 A EP 06013126A EP 1850328 A1 EP1850328 A1 EP 1850328A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- formants
- filters
- filtering
- audio signal
- frequency conversion
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000000605 extraction Methods 0.000 title description 7
- 238000000034 method Methods 0.000 claims abstract description 32
- 238000001914 filtration Methods 0.000 claims abstract description 23
- 230000005236 sound signal Effects 0.000 claims abstract description 19
- 230000003595 spectral effect Effects 0.000 claims abstract description 13
- 238000006243 chemical reaction Methods 0.000 claims abstract description 12
- 230000002708 enhancing effect Effects 0.000 claims abstract description 7
- 230000015572 biosynthetic process Effects 0.000 claims description 11
- 230000004044 response Effects 0.000 claims description 11
- 238000003786 synthesis reaction Methods 0.000 claims description 11
- 238000004422 calculation algorithm Methods 0.000 claims description 9
- 230000005284 excitation Effects 0.000 claims description 7
- 230000003321 amplification Effects 0.000 claims description 2
- 238000004590 computer program Methods 0.000 claims description 2
- 238000003199 nucleic acid amplification method Methods 0.000 claims description 2
- 238000009499 grossing Methods 0.000 description 6
- 238000012545 processing Methods 0.000 description 5
- 230000001755 vocal effect Effects 0.000 description 5
- 244000286916 Ratibida columnifera Species 0.000 description 4
- 235000009413 Ratibida columnifera Nutrition 0.000 description 4
- 238000013459 approach Methods 0.000 description 4
- 230000008030 elimination Effects 0.000 description 4
- 238000003379 elimination reaction Methods 0.000 description 4
- 230000007704 transition Effects 0.000 description 4
- 230000003247 decreasing effect Effects 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- MQJKPEGWNLWLTK-UHFFFAOYSA-N Dapsone Chemical compound C1=CC(N)=CC=C1S(=O)(=O)C1=CC=C(N)C=C1 MQJKPEGWNLWLTK-UHFFFAOYSA-N 0.000 description 1
- 208000032041 Hearing impaired Diseases 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000002156 mixing Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 230000002269 spontaneous effect Effects 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0316—Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
- G10L21/0364—Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude for improving intelligibility
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/15—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being formant information
Definitions
- the present invention generally relates to the processing of voice signals in order to enhance characteristics which are useful for a further technical use of the processed voice signal.
- the invention particularly relates to the enhancement and extraction of formants from audio signals such as e.g. speech signals.
- the proposed processing is useful e.g. for hearing aids, automatic speech recognition and the training of artificial speech synthesis with the extracted formants.
- Formants are the distinguishing or meaningful frequency components of human speech. According to one definition (see e.g. http://en.wikipedia.org/wiki/Formants also for more details and citations) a formant is a peak in an acoustic frequency spectrum which results from the resonant frequencies of any acoustical system (acoustical tube). It is most commonly invoked in phonetics or acoustics involving the resonant frequencies of vocal tracts.
- the detection of formants is useful e.g. in the framework of speech recognition systems and speech synthesizing systems.
- Today's speech recognition systems work very good in well controlled, low-noise environments but show severe performance degradation when the distances between the speaker and the microphone varies or noise is present.
- the formant frequencies i. e. the resonance frequencies of the vocal tract, are one of the cues for speech recognition.
- Vowels are mostly recognized based on the formant frequencies and their transitions and also for consonants they play a very important role.
- a different applicability of the formant transitions is their use for speech synthesis.
- synthesis systems based on the concatenation of prerecorded blocks perform significantly better than those using directly formants or vocal tract filter shapes. But this is rather due to the difficulty in finding the right parameterization of these models and not an intrinsic problem to the sound generation.
- driving such a formant based synthesis system with parameters extracted from measurements on humans produces naturally sounding speech.
- Formant extraction algorithms can be used to perform this determination of the articulation parameters from large corpuses of speech and a learning algorithm can be developed which determines their correct setting during the speech synthesis process.
- bandpass filters and first order LPC analysis to extract the formants.
- the bandpass center frequencies are adapted based on the found location in the previous time step. Additionally a voiced/unvoiced decision is incorporated in the formant extraction.
- a first aspect of the invention relates to a method for enhancing the formants of an audio signal, the method comprising the following steps:
- the size of the filters used in step b.) can be adapted, in a configuration step, depending on center frequencies of the frequency conversion step.
- the size of the filters used in step b.) can be adapted corresponding to the spectral resolution of the frequency conversion step.
- the size of the filters used in step b.) can be adapted corresponding to expected formants which e.g. occur typically in speech signals.
- step b. the fundamental frequency of the audio signal can be estimated and then essentially eliminated.
- step b. the spectral distribution of the excitation of the acoustic tube can be estimated and an amplification of the spectrogram with the inverse of this distribution can be performed.
- the envelope of the signal can be determined e.g. via rectification and low-pass filtering.
- a Gammatone filter bank can be used for the frequency conversion step.
- a reconstructive filtering can be applied on the result of step b.).
- the reconstructive filtering can use filters adapted to the expected formants of the supplied audio signal, and the reconstruction is done by adding the impulse response of the used filters weighted with the response when filtering with said filter.
- Pair Gabor filters can be used for the reconstructive filtering.
- the width of the reconstructing filters is adapted corresponding to the spectral resolution of the frequency conversion step or the mean bandwidths of preset formants expected to be present in the supplied audio signal.
- the enhanced formants can then be extracted from the signal for further use.
- the method of enhancing the formants can be used e.g. for speech enhancement.
- the method can be used together with a tracking algorithm in order to carry out a speech recognition on the supplied audio signal.
- the method can be used to train artificial speech synthesis systems with the extracted formants.
- the invention also relates to a computer program product, implementing such a method.
- the invention further relates to a hearing aid comprising a computing unit designed to implement such a method.
- the invention proposes a method and a system (see figure 10) which enhances the formants in the spectrogram and allows a subsequent extraction of the enhanced formants.
- the invention proposes to apply e.g. a Gammatone filter bank on a supplied audio signal representation to obtain a spectro-temporal representation of the signals.
- the audio signal is converted in the frequency domain.
- the first stage in the system as shown in figure 10 is the application of a Gammatone filter bank on the signal.
- the filter bank has e.g. 128 channels ranging from e.g. 80Hz to 5 kHz. From this signal the envelope is calculated via rectification and low-pass filtering. In Fig. 1 the results of this processing can be seen.
- the fundamental frequency of voiced signal parts can be estimated and subsequently eliminated from the spectrogram.
- the energy of the fundamental frequency is normally much higher than that of the harmonics.
- the invention proposes to eliminate the fundamental frequency of voiced signal parts from the spectrogram.
- an algorithm based on a histogram of zero crossing distances can be used to estimate the fundamental frequency.
- any pitch estimation algorithm can be used for the estimation of the fundamental frequency.
- the filter channels in the neighborhood of the found fundamental frequency are set to the noise floor.
- a smoothing in the time domain and an optional sub-sampling is performed.
- a filtering along the channel axis is performed.
- the size of the filtering kernel is changed position-dependent, i. e. with wide kernels at low frequencies and narrow kernels at high frequencies. This takes into account the logarithmic arrangement of the center frequencies in the Gammatone filter bank.
- the energy of the glottal excitation signal shows a general decay with frequency. Therefore low formants being excited by low harmonics have much more energy than high formants. In a similar way the noise like excitation, which has mostly energy in the high frequencies, has a much lower overall energy than the harmonic excitation. As a consequence in a speech signal the energy in the low frequencies is much higher than in the high frequencies.
- a pre-emphasis of the spectrogram This pre-emphasis raises the energy of the high frequencies (compare Fig. 3).
- a known way is to use a high-pass filter but as the audio signal is already represented in the spectro-temporal domain the invention proposes to weight the energy of the filter channels with an exponentially decreasing weight from the high to the low frequencies. Subsequently a smoothing along the frequency axis is carried out. Via this smoothing the energy of the single harmonics is spread and peaks at the formant location form.
- the size of the smoothing kernel has to be set depending on the center frequencies. It has to be wide at low frequencies where the filter bandwidths and hence the increment of the center frequencies is low in order to cover the necessary frequency range.
- Figure 4 shows the results of this operation.
- the resulting spectrogram contains negative values due to the application of the Mexican Hat. Depending on the further processing it can be beneficial to set these negative values to zero, but in our case they have been kept as they permit a better enhancement of the formant tracks.
- the formants were extracted from a synthetically generated speech signal for which we know the correct formant positions.
- the system used for speech synthesis is ipox (see A. Dikensen and J. Coleman, "All-prosodic speech synthesis"in Progress in Speech Synthesis, J. P. H. van Santen, R. W. Sproat, J. P. Olive, and J. Hirschberg, Eds., pp.91-108. Springer, New York, 1995 .)
- the ipox system is a compilation of rules which allows to generate words from phonemic input. More precisely it produces parameter files for the Klatt80 formant synthesizer. As test sentence"Five women played basketball” generated by a male voice has been used. By mixing babble noise from the Noisex database with varying SNR to the signal we can evaluate the robustness of the formant extraction process to additional noise.
- Fig. 7 the result of the same signal with babble noise added at an SNR of 20 dB is shown. The location of the peaks is hardly affected by the additional noise.
- Figure 11 shows a block diagram of the reconstructive filtering.
- the result of the previous enhancement is filtered with a set of n parallel filters whose impulse responses are adapted to the expected structure.
- This can for example be a set of n even Gabor filters with different orientations and frequencies.
- Gabor filters are known to the skilled person and can be defined as linear filters whose impulse response is defined by a harmonic function multiplied by a Gaussian function.
- the impulse responses (receptive fields) of these filters are then respectively weighted with the corresponding response when applying the filter to the data, i.e. the result of the preceding filtering step. Therefore during the reconstruction the filter does not only generate one single point at the center of the filter but a structure corresponding to the whole area of the impulse response.
- the invention can be applied to the enhancement of speech signals, especially for the hearing impaired as it is known that enhancing the formants increases intelligibility for them.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP06013126A EP1850328A1 (de) | 2006-04-26 | 2006-06-26 | Verstärkung und Extraktion von Sprachsignalformanten |
JP2007061984A JP2007293285A (ja) | 2006-04-26 | 2007-03-12 | 音声信号のフォルマントの強調および抽出 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP06008675 | 2006-04-26 | ||
EP06013126A EP1850328A1 (de) | 2006-04-26 | 2006-06-26 | Verstärkung und Extraktion von Sprachsignalformanten |
Publications (1)
Publication Number | Publication Date |
---|---|
EP1850328A1 true EP1850328A1 (de) | 2007-10-31 |
Family
ID=36968222
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP06013126A Withdrawn EP1850328A1 (de) | 2006-04-26 | 2006-06-26 | Verstärkung und Extraktion von Sprachsignalformanten |
Country Status (2)
Country | Link |
---|---|
EP (1) | EP1850328A1 (de) |
JP (1) | JP2007293285A (de) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2232700A1 (de) * | 2007-12-21 | 2010-09-29 | Srs Labs, Inc. | System zur einstellung der wahrgenommenen lautstärke von tonsignalen |
US8538042B2 (en) | 2009-08-11 | 2013-09-17 | Dts Llc | System for increasing perceived loudness of speakers |
WO2014039028A1 (en) * | 2012-09-04 | 2014-03-13 | Nuance Communications, Inc. | Formant dependent speech signal enhancement |
WO2015147363A1 (ko) * | 2014-03-28 | 2015-10-01 | 숭실대학교산학협력단 | 차신호 주파수 프레임 비교법에 의한 음주 판별 방법, 이를 수행하기 위한 기록 매체 및 장치 |
WO2015147362A1 (ko) * | 2014-03-28 | 2015-10-01 | 숭실대학교산학협력단 | 차신호 고주파 신호의 비교법에 의한 음주 판별 방법, 이를 수행하기 위한 기록 매체 및 장치 |
US9312829B2 (en) | 2012-04-12 | 2016-04-12 | Dts Llc | System for adjusting loudness of audio signals in real time |
CN106486110A (zh) * | 2016-10-21 | 2017-03-08 | 清华大学 | 一种支持语音实时分解/合成的伽马通滤波器组芯片*** |
US9613633B2 (en) | 2012-10-30 | 2017-04-04 | Nuance Communications, Inc. | Speech enhancement |
EP3113183A4 (de) * | 2014-02-28 | 2017-07-26 | National Institute of Information and Communications Technology | Sprachklärungsvorrichtung und computerprogramm dafür |
US9899039B2 (en) | 2014-01-24 | 2018-02-20 | Foundation Of Soongsil University-Industry Cooperation | Method for determining alcohol consumption, and recording medium and terminal for carrying out same |
US9916844B2 (en) | 2014-01-28 | 2018-03-13 | Foundation Of Soongsil University-Industry Cooperation | Method for determining alcohol consumption, and recording medium and terminal for carrying out same |
US9934793B2 (en) | 2014-01-24 | 2018-04-03 | Foundation Of Soongsil University-Industry Cooperation | Method for determining alcohol consumption, and recording medium and terminal for carrying out same |
US9943260B2 (en) | 2014-03-28 | 2018-04-17 | Foundation of Soongsil University—Industry Cooperation | Method for judgment of drinking using differential energy in time domain, recording medium and device for performing the method |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5313528B2 (ja) * | 2008-03-18 | 2013-10-09 | リオン株式会社 | 補聴器の信号処理方法 |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4477925A (en) * | 1981-12-11 | 1984-10-16 | Ncr Corporation | Clipped speech-linear predictive coding speech processor |
US4820059A (en) * | 1985-10-30 | 1989-04-11 | Central Institute For The Deaf | Speech processing apparatus and methods |
EP0742548A2 (de) * | 1995-05-12 | 1996-11-13 | Mitsubishi Denki Kabushiki Kaisha | Vorrichtung und Verfahren zur Sprachkodierung unter Verwendung eines Filters zur Verbesserung der Signalqualität |
US6223151B1 (en) * | 1999-02-10 | 2001-04-24 | Telefon Aktie Bolaget Lm Ericsson | Method and apparatus for pre-processing speech signals prior to coding by transform-based speech coders |
US20050165608A1 (en) * | 2002-10-31 | 2005-07-28 | Masanao Suzuki | Voice enhancement device |
US20050197832A1 (en) * | 2003-12-31 | 2005-09-08 | Hearworks Pty Limited | Modulation depth enhancement for tone perception |
EP1600947A2 (de) * | 2004-05-26 | 2005-11-30 | Honda Research Institute Europe GmbH | Subtraktive Reduktion von harmonischen Störgeräuschen |
-
2006
- 2006-06-26 EP EP06013126A patent/EP1850328A1/de not_active Withdrawn
-
2007
- 2007-03-12 JP JP2007061984A patent/JP2007293285A/ja active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4477925A (en) * | 1981-12-11 | 1984-10-16 | Ncr Corporation | Clipped speech-linear predictive coding speech processor |
US4820059A (en) * | 1985-10-30 | 1989-04-11 | Central Institute For The Deaf | Speech processing apparatus and methods |
EP0742548A2 (de) * | 1995-05-12 | 1996-11-13 | Mitsubishi Denki Kabushiki Kaisha | Vorrichtung und Verfahren zur Sprachkodierung unter Verwendung eines Filters zur Verbesserung der Signalqualität |
US6223151B1 (en) * | 1999-02-10 | 2001-04-24 | Telefon Aktie Bolaget Lm Ericsson | Method and apparatus for pre-processing speech signals prior to coding by transform-based speech coders |
US20050165608A1 (en) * | 2002-10-31 | 2005-07-28 | Masanao Suzuki | Voice enhancement device |
US20050197832A1 (en) * | 2003-12-31 | 2005-09-08 | Hearworks Pty Limited | Modulation depth enhancement for tone perception |
EP1600947A2 (de) * | 2004-05-26 | 2005-11-30 | Honda Research Institute Europe GmbH | Subtraktive Reduktion von harmonischen Störgeräuschen |
Non-Patent Citations (2)
Title |
---|
POTAMIANOS A ET AL: "Speech formant frequency and bandwidth tracking using multiband energy demodulation", ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 1995. ICASSP-95., 1995 INTERNATIONAL CONFERENCE ON DETROIT, MI, USA 9-12 MAY 1995, NEW YORK, NY, USA,IEEE, US, vol. 1, 9 May 1995 (1995-05-09), pages 784 - 787, XP010625350, ISBN: 0-7803-2431-5 * |
RAO P ET AL: "Speech formant frequency estimation: evaluating a nonstationary analysis method", SIGNAL PROCESSING, AMSTERDAM, NL, vol. 80, no. 8, August 2000 (2000-08-01), pages 1655 - 1667, XP004222586, ISSN: 0165-1684 * |
Cited By (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2232700A4 (de) * | 2007-12-21 | 2012-10-10 | Dts Llc | System zur einstellung der wahrgenommenen lautstärke von tonsignalen |
US8315398B2 (en) | 2007-12-21 | 2012-11-20 | Dts Llc | System for adjusting perceived loudness of audio signals |
EP2232700A1 (de) * | 2007-12-21 | 2010-09-29 | Srs Labs, Inc. | System zur einstellung der wahrgenommenen lautstärke von tonsignalen |
US9264836B2 (en) | 2007-12-21 | 2016-02-16 | Dts Llc | System for adjusting perceived loudness of audio signals |
US8538042B2 (en) | 2009-08-11 | 2013-09-17 | Dts Llc | System for increasing perceived loudness of speakers |
US9820044B2 (en) | 2009-08-11 | 2017-11-14 | Dts Llc | System for increasing perceived loudness of speakers |
US10299040B2 (en) | 2009-08-11 | 2019-05-21 | Dts, Inc. | System for increasing perceived loudness of speakers |
US9312829B2 (en) | 2012-04-12 | 2016-04-12 | Dts Llc | System for adjusting loudness of audio signals in real time |
US9559656B2 (en) | 2012-04-12 | 2017-01-31 | Dts Llc | System for adjusting loudness of audio signals in real time |
WO2014039028A1 (en) * | 2012-09-04 | 2014-03-13 | Nuance Communications, Inc. | Formant dependent speech signal enhancement |
CN104704560B (zh) * | 2012-09-04 | 2018-06-05 | 纽昂斯通讯公司 | 共振峰依赖的语音信号增强 |
DE112012006876B4 (de) * | 2012-09-04 | 2021-06-10 | Cerence Operating Company | Verfahren und Sprachsignal-Verarbeitungssystem zur formantabhängigen Sprachsignalverstärkung |
CN104704560A (zh) * | 2012-09-04 | 2015-06-10 | 纽昂斯通讯公司 | 共振峰依赖的语音信号增强 |
US9805738B2 (en) | 2012-09-04 | 2017-10-31 | Nuance Communications, Inc. | Formant dependent speech signal enhancement |
US9613633B2 (en) | 2012-10-30 | 2017-04-04 | Nuance Communications, Inc. | Speech enhancement |
US9934793B2 (en) | 2014-01-24 | 2018-04-03 | Foundation Of Soongsil University-Industry Cooperation | Method for determining alcohol consumption, and recording medium and terminal for carrying out same |
US9899039B2 (en) | 2014-01-24 | 2018-02-20 | Foundation Of Soongsil University-Industry Cooperation | Method for determining alcohol consumption, and recording medium and terminal for carrying out same |
US9916844B2 (en) | 2014-01-28 | 2018-03-13 | Foundation Of Soongsil University-Industry Cooperation | Method for determining alcohol consumption, and recording medium and terminal for carrying out same |
EP3113183A4 (de) * | 2014-02-28 | 2017-07-26 | National Institute of Information and Communications Technology | Sprachklärungsvorrichtung und computerprogramm dafür |
US9842607B2 (en) | 2014-02-28 | 2017-12-12 | National Institute Of Information And Communications Technology | Speech intelligibility improving apparatus and computer program therefor |
US9907509B2 (en) | 2014-03-28 | 2018-03-06 | Foundation of Soongsil University—Industry Cooperation | Method for judgment of drinking using differential frequency energy, recording medium and device for performing the method |
US9916845B2 (en) | 2014-03-28 | 2018-03-13 | Foundation of Soongsil University—Industry Cooperation | Method for determining alcohol use by comparison of high-frequency signals in difference signal, and recording medium and device for implementing same |
US9943260B2 (en) | 2014-03-28 | 2018-04-17 | Foundation of Soongsil University—Industry Cooperation | Method for judgment of drinking using differential energy in time domain, recording medium and device for performing the method |
CN106133833A (zh) * | 2014-03-28 | 2016-11-16 | 崇实大学校产学协力团 | 通过差分信号中的高频信号的比较确定饮酒的方法以及用于实现该方法的记录介质和装置 |
WO2015147362A1 (ko) * | 2014-03-28 | 2015-10-01 | 숭실대학교산학협력단 | 차신호 고주파 신호의 비교법에 의한 음주 판별 방법, 이를 수행하기 위한 기록 매체 및 장치 |
WO2015147363A1 (ko) * | 2014-03-28 | 2015-10-01 | 숭실대학교산학협력단 | 차신호 주파수 프레임 비교법에 의한 음주 판별 방법, 이를 수행하기 위한 기록 매체 및 장치 |
CN106486110A (zh) * | 2016-10-21 | 2017-03-08 | 清华大学 | 一种支持语音实时分解/合成的伽马通滤波器组芯片*** |
CN106486110B (zh) * | 2016-10-21 | 2019-11-08 | 清华大学 | 一种支持语音实时分解/合成的伽马通滤波器组芯片*** |
Also Published As
Publication number | Publication date |
---|---|
JP2007293285A (ja) | 2007-11-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP1850328A1 (de) | Verstärkung und Extraktion von Sprachsignalformanten | |
CN103503060B (zh) | 使用听觉注意力线索的语音音节/元音/音素边界检测 | |
Shrawankar et al. | Techniques for feature extraction in speech recognition system: A comparative study | |
Magi et al. | Stabilised weighted linear prediction | |
RU2557469C2 (ru) | Способы синтеза и кодирования речи | |
Wang et al. | Robust speaker recognition using denoised vocal source and vocal tract features | |
Mowlaee et al. | Interspeech 2014 special session: Phase importance in speech processing applications | |
Xiao et al. | Normalization of the speech modulation spectra for robust speech recognition | |
Faundez-Zanuy et al. | Nonlinear speech processing: overview and applications | |
CN108305639B (zh) | 语音情感识别方法、计算机可读存储介质、终端 | |
Shahnaz et al. | Pitch estimation based on a harmonic sinusoidal autocorrelation model and a time-domain matching scheme | |
CN108682432B (zh) | 语音情感识别装置 | |
Cabral et al. | Glottal spectral separation for parametric speech synthesis | |
JPH10133693A (ja) | 音声認識装置 | |
Narendra et al. | Robust voicing detection and F 0 estimation for HMM-based speech synthesis | |
Hansen et al. | Robust estimation of speech in noisy backgrounds based on aspects of the auditory process | |
Hasan et al. | Preprocessing of continuous bengali speech for feature extraction | |
Shukla et al. | Spectral slope based analysis and classification of stressed speech | |
van Santen et al. | Estimating phrase curves in the general superpositional intonation model. | |
Bhukya et al. | Robust methods for text-dependent speaker verification | |
Babacan et al. | Parametric representation for singing voice synthesis: A comparative evaluation | |
Prakash et al. | Fourier-Bessel cepstral coefficients for robust speech recognition | |
Ou et al. | Probabilistic acoustic tube: a probabilistic generative model of speech for speech analysis/synthesis | |
López et al. | Normal-to-shouted speech spectral mapping for speaker recognition under vocal effort mismatch | |
Ramabadran et al. | The ETSI extended distributed speech recognition (DSR) standards: server-side speech reconstruction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC NL PL PT RO SE SI SK TR |
|
AX | Request for extension of the european patent |
Extension state: AL BA HR MK YU |
|
AKX | Designation fees paid | ||
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20080503 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: 8566 |