CN107919136A - A kind of digital speech samples frequency estimating methods based on gauss hybrid models - Google Patents

A kind of digital speech samples frequency estimating methods based on gauss hybrid models Download PDF

Info

Publication number
CN107919136A
CN107919136A CN201711112810.6A CN201711112810A CN107919136A CN 107919136 A CN107919136 A CN 107919136A CN 201711112810 A CN201711112810 A CN 201711112810A CN 107919136 A CN107919136 A CN 107919136A
Authority
CN
China
Prior art keywords
mrow
interpolation
digital speech
msub
sample frequency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711112810.6A
Other languages
Chinese (zh)
Other versions
CN107919136B (en
Inventor
吕勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hohai University HHU
Original Assignee
Hohai University HHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hohai University HHU filed Critical Hohai University HHU
Priority to CN201711112810.6A priority Critical patent/CN107919136B/en
Publication of CN107919136A publication Critical patent/CN107919136A/en
Application granted granted Critical
Publication of CN107919136B publication Critical patent/CN107919136B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Transmission Systems Not Characterized By The Medium Used For Transmission (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The present invention discloses a kind of digital speech samples frequency estimating methods based on gauss hybrid models, trains one GMM of generation with high sampling rate digital speech first;Then interpolation is carried out to low sampling rate input voice to be estimated, improves its sample frequency;Probability calculation is finally carried out to the digital speech after interpolation with GMM, and interpolation multiple is adjusted according to result of calculation, the output probability of GMM is reached maximum, so as to obtain the sample frequency of input voice.The present invention can identify the sample frequency of unknown digital speech, reduce system performance caused by sample frequency mismatches and decline.

Description

A kind of digital speech samples frequency estimating methods based on gauss hybrid models
Technical field
The invention belongs to speech processes field, and in particular to the height for training generation by high sampling rate digital speech with one The method of speech processing of this mixed model estimation input speech sample frequency.
Background technology
Voice is the basic means of Human communication's information, and most convenient, most effective human-computer interaction work in motion process Tool.Digital speech has the advantages that precision is high, easily storage and transmission, but different digital display circuits have it is different computational Energy, access speed, memory space, battery capacity and application scenario, thus different sample frequencys can be used.If input voice Sample frequency and digital display circuit sample frequency mismatch, may result in the hydraulic performance decline of speech processing system.Therefore, having must Input voice is converted, its sample frequency is matched with digital display circuit, strengthen the practical application energy of speech processing system Power.
If known to the sample frequency for inputting voice, it is only necessary to the ratio of its sample frequency and system sampling frequency is calculated, Then interpolation or extraction are carried out to input voice, makes its sample frequency and systems compliant.However, in some application scenarios, The sample frequency for inputting voice is unknown.For example the audio on network is monitored, capture audio digital signals piece Disconnected, its sample frequency may be exactly unknown.
The content of the invention
Goal of the invention:For problems of the prior art, the present invention provides one kind to be based on gauss hybrid models (GMM:Gaussian Mixture Model) digital speech samples frequency estimating methods.In the method, adopted first with height One GMM of sample rate digital speech training generation;Then interpolation is carried out to low sampling rate input voice to be estimated, improves its sampling Frequency;Probability calculation is finally carried out to the digital speech after interpolation with GMM, and interpolation multiple is adjusted according to result of calculation, makes GMM Output probability reach maximum so that obtain input voice sample frequency.
The present invention's comprises the following steps that:
(1) training voice is sampled using 48kHz, and adding window, framing is carried out to it, cepstrum feature is extracted, with whole languages Feature vector training one gauss hybrid models of generation of sound unit;
(2) interpolation is carried out to low sampling rate input voice (referring to that sample frequency is less than the voice of 48kHz) to be estimated, carried Its high sample frequency;
(3) digital speech after interpolation is inputted into GMM, calculates its output probability;
(4) to all interpolation multiples, repeat (2) and (3), and record each output probability;
(5) the corresponding output probability of whole interpolation multiples, the corresponding interpolation multiple of maximum output probability are to train The ratio of speech sample frequency and input speech sample frequency.
Brief description of the drawings
Fig. 1 is the overall framework of the digital speech samples Frequency Estimation system based on gauss hybrid models, mainly including mould Type training, Interpolation of signals, the control of interpolation multiple and Frequency Estimation module.
Embodiment
With reference to specific embodiment, the present invention is furture elucidated, it should be understood that these embodiments are merely to illustrate the present invention Rather than limit the scope of the invention, after the present invention has been read, various equivalences of the those skilled in the art to the present invention The modification of form falls within the application appended claims limited range.
As shown in Figure 1, the digital speech samples frequency estimating methods based on gauss hybrid models mainly include model training, Interpolation of signals, the control of interpolation multiple and Frequency Estimation module.Describe the specific reality of each main modular in attached drawing in detail one by one below Apply scheme:
1st, model training
Training voice is sampled using 48kHz first, adding window, framing, and fast Fourier is carried out to every frame voice signal Conversion, obtains the amplitude spectrum of every frame signal;Then, Mel filtering is carried out to the amplitude spectrum of every frame signal, takes the logarithm, trained The cepstrum feature parameter of voice;Finally with feature vector training one gauss hybrid models of generation of whole voice units:
Wherein, otRepresent the cepstrum feature vector of t frames training voice;cm、μmAnd ΣmRepresent respectively m-th high in GMM Mixed coefficint, mean vector and the covariance matrix of this unit.
2nd, Interpolation of signals
Since gauss hybrid models are formed with high sampling rate training voice training, it can be considered that input digital speech Sample frequency be less than training voice sample frequency, to input voice carry out interpolation, you can make its sample frequency and GMM Match somebody with somebody.
If interpolation multiple is Di, then to input digital speech x (n) interpolation after, obtain xi(n):
Digital speech x after interpolationi(n) sample frequency is to be originally inputted the D of digital speech x (n)iTimes.
3rd, interpolation multiple controls
With the sample frequency and one group of sample frequency f for commonly using voice of training voice1, f2..., fi..., fNRatio D1, D2..., Di..., DNAs first value interpolation multiple.
4th, Frequency Estimation
Digital speech after each interpolation is inputted into gauss hybrid models, calculates its output probability.Compare whole interpolations times The corresponding output probability of number, determines the interpolation multiple for making output probability maximumAndFine setting interpolation multiple nearby, makes GMM Output probability reach maximum.If the interpolation multiple of note at this time isThen it is originally inputted the sample frequency of voiceIt can estimate For:
Wherein, f0The sample frequency of voice is trained for high-speed, is taken as 48kHz here.
Estimate after must inputting the sample frequency of voice, you can in input voice is carried out according to the sample frequency of goal systems Insert, then input goal systems and handled.

Claims (5)

1. a kind of digital speech samples frequency estimating methods based on gauss hybrid models, it is characterised in that first with high sampling One GMM of rate digital speech training generation;Then interpolation is carried out to low sampling rate input voice to be estimated, improves its sampling frequency Rate;Probability calculation is finally carried out to the digital speech after interpolation with GMM, and interpolation multiple is adjusted according to result of calculation, makes GMM's Output probability reaches maximum, so as to obtain the sample frequency of input voice.
2. a kind of digital speech samples frequency estimating methods based on gauss hybrid models according to claim 1, it is special Sign is, specifically includes:
(1) training voice is sampled using 48kHz, and adding window, framing is carried out to it, cepstrum feature is extracted, with whole voice lists Feature vector training one gauss hybrid models of generation of member;
(2) interpolation is carried out to low sampling rate input voice to be estimated, improves its sample frequency;
(3) digital speech after interpolation is inputted into GMM, calculates its output probability;
(4) to all interpolation multiples, repeat (2) and (3), and record each output probability;
(5) the corresponding output probability of whole interpolation multiples, the corresponding interpolation multiple of maximum output probability are training voice The ratio of sample frequency and input speech sample frequency.
3. a kind of digital speech samples frequency estimating methods based on gauss hybrid models according to claim 2, it is special Sign is that the gauss hybrid models of step (1) generation are:
<mrow> <mi>b</mi> <mrow> <mo>(</mo> <msub> <mi>o</mi> <mi>t</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>m</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </munderover> <msub> <mi>c</mi> <mi>m</mi> </msub> <mo>{</mo> <msup> <mrow> <mo>(</mo> <mn>2</mn> <mi>&amp;pi;</mi> <mo>)</mo> </mrow> <mrow> <mo>-</mo> <mfrac> <mi>d</mi> <mn>2</mn> </mfrac> </mrow> </msup> <mo>|</mo> <msub> <mi>&amp;Sigma;</mi> <mi>m</mi> </msub> <msup> <mo>|</mo> <mrow> <mo>-</mo> <mn>1</mn> <mo>/</mo> <mn>2</mn> </mrow> </msup> <mi>exp</mi> <mo>&amp;lsqb;</mo> <mo>-</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <msup> <mrow> <mo>(</mo> <msub> <mi>o</mi> <mi>t</mi> </msub> <mo>-</mo> <msub> <mi>&amp;mu;</mi> <mi>m</mi> </msub> <mo>)</mo> </mrow> <mi>T</mi> </msup> <msubsup> <mi>&amp;Sigma;</mi> <mi>m</mi> <mrow> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mrow> <mo>(</mo> <msub> <mi>o</mi> <mi>t</mi> </msub> <mo>-</mo> <msub> <mi>&amp;mu;</mi> <mi>m</mi> </msub> <mo>)</mo> </mrow> <mo>&amp;rsqb;</mo> <mo>}</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow>
Wherein, otRepresent the cepstrum feature vector of t frames training voice;cm、μmAnd ΣmM-th of Gauss unit in GMM is represented respectively Mixed coefficint, mean vector and covariance matrix.
4. a kind of digital speech samples frequency estimating methods based on gauss hybrid models according to claim 2, it is special Sign is, interpolation is carried out to low sampling rate input voice to be estimated:
If interpolation multiple is Di, then to input digital speech x (n) interpolation after, obtain xi(n):
Digital speech x after interpolationi(n) sample frequency is to be originally inputted the D of digital speech x (n)iTimes.
5. a kind of digital speech samples frequency estimating methods based on gauss hybrid models according to claim 4, it is special Sign is, with the sample frequency and one group of sample frequency f for commonly using voice of training voice1, f2..., fi..., fNRatio D1, D2..., Di..., DNAs first value interpolation multiple.
CN201711112810.6A 2017-11-13 2017-11-13 Digital voice sampling frequency estimation method based on Gaussian mixture model Active CN107919136B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711112810.6A CN107919136B (en) 2017-11-13 2017-11-13 Digital voice sampling frequency estimation method based on Gaussian mixture model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711112810.6A CN107919136B (en) 2017-11-13 2017-11-13 Digital voice sampling frequency estimation method based on Gaussian mixture model

Publications (2)

Publication Number Publication Date
CN107919136A true CN107919136A (en) 2018-04-17
CN107919136B CN107919136B (en) 2021-07-09

Family

ID=61896270

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711112810.6A Active CN107919136B (en) 2017-11-13 2017-11-13 Digital voice sampling frequency estimation method based on Gaussian mixture model

Country Status (1)

Country Link
CN (1) CN107919136B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109459612A (en) * 2019-01-09 2019-03-12 上海艾为电子技术股份有限公司 The detection method and device of the sample frequency of digital audio and video signals
CN111341302A (en) * 2020-03-02 2020-06-26 苏宁云计算有限公司 Voice stream sampling rate determining method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102576541A (en) * 2009-10-21 2012-07-11 杜比国际公司 Oversampling in a combined transposer filter bank
US8639502B1 (en) * 2009-02-16 2014-01-28 Arrowhead Center, Inc. Speaker model-based speech enhancement system
CN107204840A (en) * 2017-07-31 2017-09-26 电子科技大学 Sinusoidal signal frequency method of estimation based on DFT and iteration correction

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8639502B1 (en) * 2009-02-16 2014-01-28 Arrowhead Center, Inc. Speaker model-based speech enhancement system
CN102576541A (en) * 2009-10-21 2012-07-11 杜比国际公司 Oversampling in a combined transposer filter bank
CN107204840A (en) * 2017-07-31 2017-09-26 电子科技大学 Sinusoidal signal frequency method of estimation based on DFT and iteration correction

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
HAIYANG WU ET AL.: "《Improved AdaBoost Algorithm Using VQMAP for Speaker Identification》", 《2010 INTERNATIONAL CONFERENCE ON ELECTRICAL AND CONTROL ENGINEERING》 *
LIN ZHOU ET AL.: "《VTS feature compensation based on two-layer GMM structure for robust speech recognition》", 《 2016 8TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS & SIGNAL PROCESSING (WCSP)》 *
PETR ZELINKA ET AL.: "《Smooth interpolation of Gaussian mixture models》", 《2009 19TH INTERNATIONAL CONFERENCE RADIOELEKTRONIKA》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109459612A (en) * 2019-01-09 2019-03-12 上海艾为电子技术股份有限公司 The detection method and device of the sample frequency of digital audio and video signals
CN111341302A (en) * 2020-03-02 2020-06-26 苏宁云计算有限公司 Voice stream sampling rate determining method and device
CN111341302B (en) * 2020-03-02 2023-10-31 苏宁云计算有限公司 Voice stream sampling rate determining method and device

Also Published As

Publication number Publication date
CN107919136B (en) 2021-07-09

Similar Documents

Publication Publication Date Title
CN110600017B (en) Training method of voice processing model, voice recognition method, system and device
EP3926623A1 (en) Speech recognition method and apparatus, and neural network training method and apparatus
CN109036467B (en) TF-LSTM-based CFFD extraction method, voice emotion recognition method and system
CN110782872A (en) Language identification method and device based on deep convolutional recurrent neural network
CN110459205B (en) Speech recognition method and device, computer storage medium
CN108305616A (en) A kind of audio scene recognition method and device based on long feature extraction in short-term
CN109754790B (en) Speech recognition system and method based on hybrid acoustic model
CN104538028A (en) Continuous voice recognition method based on deep long and short term memory recurrent neural network
CN102664010B (en) Robust speaker distinguishing method based on multifactor frequency displacement invariant feature
CN112634876A (en) Voice recognition method, voice recognition device, storage medium and electronic equipment
CN110176250B (en) Robust acoustic scene recognition method based on local learning
CN102789779A (en) Speech recognition system and recognition method thereof
CN111128211B (en) Voice separation method and device
CN110047478A (en) Multicenter voice based on space characteristics compensation identifies Acoustic Modeling method and device
CN111653270B (en) Voice processing method and device, computer readable storage medium and electronic equipment
CN112331218A (en) Single-channel voice separation method and device for multiple speakers
CN108831447A (en) Audio recognition method, device and storage medium based on HMM and PNN
CN114387997B (en) Voice emotion recognition method based on deep learning
CN102436815B (en) Voice identifying device applied to on-line test system of spoken English
CN103258531A (en) Harmonic wave feature extracting method for irrelevant speech emotion recognition of speaker
CN107919136A (en) A kind of digital speech samples frequency estimating methods based on gauss hybrid models
CN106228976A (en) Audio recognition method and device
CN112133288A (en) Method, system and equipment for processing voice to character
CN113160796B (en) Language identification method, device and equipment for broadcast audio and storage medium
CN111768764B (en) Voice data processing method and device, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant