CN104658547A - Method for expanding artificial voice bandwidth - Google Patents

Method for expanding artificial voice bandwidth Download PDF

Info

Publication number
CN104658547A
CN104658547A CN201310590362.6A CN201310590362A CN104658547A CN 104658547 A CN104658547 A CN 104658547A CN 201310590362 A CN201310590362 A CN 201310590362A CN 104658547 A CN104658547 A CN 104658547A
Authority
CN
China
Prior art keywords
frequency
speech
high frequency
voice
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310590362.6A
Other languages
Chinese (zh)
Inventor
盖丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian You Jia Software Science And Technology Ltd
Original Assignee
Dalian You Jia Software Science And Technology Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian You Jia Software Science And Technology Ltd filed Critical Dalian You Jia Software Science And Technology Ltd
Priority to CN201310590362.6A priority Critical patent/CN104658547A/en
Publication of CN104658547A publication Critical patent/CN104658547A/en
Pending legal-status Critical Current

Links

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses a method for expanding artificial voice bandwidth. A working process of the method is as follows: a narrowband voice signal passes through a curve-fitting module and is input to an extrapolation high-frequency envelope module for processing, and an output signal of the extrapolation high-frequency envelope module enters a frequency spectrum formation module; the narrowband voice signal passes through a feature extraction module, each frame obtains a set of linear prediction coefficients, an auto-regression (AR) model and a filter module are constructed by utilizing the linear prediction coefficients, white noise is processed through the AR model to generate a high-frequency noise random sequence associated with low frequency, and the high-frequency noise random sequence enters the frequency spectrum formation module; the frequency spectrum formation module outputs a high-frequency voice; the high-frequency voice and an original narrowband voice signal pass through a voice synthesis module to obtain a bandwidth voice.

Description

A kind of method of artificial speech bandwidth expansion
Technical field
The present invention relates to a kind of method of artificial speech bandwidth expansion, belong to digital signal processing technique field.
Background technology
At present, public telephone network (PSTN) effective frequency range is only that 0.3 ~ 3.4KHz, GSM digital cellular telephone effective bandwidth is no more than 4KHz.Although the main energetic of speech signal concentrates on 0.3 ~ 3.4KHz frequency range, the actual frequency range taken wants large many.4KHz narrowband speech is owing to having lacked high fdrequency component, and its naturalness, the aspects such as intelligibility are obviously deteriorated, and sound " vexed ".
Summary of the invention
In order to overcome above-mentioned deficiency, the object of the present invention is to provide a kind of method of artificial speech bandwidth expansion.
A method for artificial speech bandwidth expansion, its course of work is as follows:
Narrow band voice signal is through extrapolation high-frequency envelope module after curve fitting module, and the output signal of extrapolation high-frequency envelope module enters spectral shaping module; Narrow band voice signal every frame after characteristic extracting module obtains one group of linear predictor coefficient, autoregressive model and filtration module is constructed after utilizing linear predictor coefficient, undertaken processing by this autoregressive model by white noise and produce the high frequency noise random series relevant to low frequency, high frequency noise random series enters spectral shaping module; Spectral shaping module exports High frequency speech; High frequency speech and narrow band voice signal obtain broadband voice through voice synthetic module.
The principle of the invention and beneficial effect: keep the advantage that algorithm complex is lower, produce and truly encourage the artificial excitation that correlativity is higher.First the present invention carries out curve fitting to known low frequency log-domain frequency spectrum, obtains curvilinear equation, and then extrapolated high frequency log-domain spectrum envelope curve.From narrowband speech medium and low frequency parameter, utilize linear predictor coefficient to form autoregressive model, use uniform white noise sequence by this autoregressive model, obtain high frequency noise sequence.This high frequency noise sequence is the white noise with narrowband speech with certain correlativity, is converted into log-domain frequency spectrum, then through the modulation of high frequency log spectrum envelope, can recover High frequency speech, and at cepstrum territory synthetic wideband voice.The present invention is a kind of total blindness's Speech bandwidth extension technology, can directly apply to narrowband speech receiving end.The present invention is without any need for priori or high-frequency information, and algorithm complex is lower, can recover the HFS higher with low correlation, and the broadband voice auditory effect of synthesis is good.
Accompanying drawing explanation
Fig. 1 is process flow diagram of the present invention.
Fig. 2 is broadband voice building-up process of the present invention.
Fig. 3 (a) original broadband voice sound spectrograph.
Fig. 3 (b) narrowband speech sound spectrograph.
Voice sound spectrograph after Fig. 3 (c) bandwidth expansion.
Fig. 4 (a) algorithm of the present invention exports and the output comparing result distribution plan of adaptive variable rate audio coder & decoder (codec) when code rate is 12.2kbps.
Fig. 4 (b) algorithm of the present invention exports and the output comparing result distribution plan of wideband adaptive rate speech codec when code rate is 8.85kbps.
The Spectrum Distortion Measure figure of the broadband voice of Fig. 5 narrowband speech and the present invention's synthesis.
Fig. 6 shows subjective testing standard.
Embodiment
Below in conjunction with accompanying drawing, the present invention will be further described.
Fig. 1 is process flow diagram of the present invention.As shown in Figure 1:
Narrow band voice signal is through extrapolation high-frequency envelope module after curve fitting module, and the output signal of extrapolation high-frequency envelope module enters spectral shaping module; Narrow band voice signal every frame after characteristic extracting module obtains one group of linear predictor coefficient, structure autoregressive model and filtration module, undertaken processing by this AR model by white noise and produce the high frequency noise random series relevant to low frequency, high frequency noise random series enters spectral shaping module; Spectral shaping module exports High frequency speech; High frequency speech and narrow band voice signal obtain broadband voice through voice synthetic module.
Curve fitting module
This module adopts the method for curve to obtain narrowband speech low frequency log spectrum enveloping curve equation, by curvilinear equation extrapolated high frequency log spectrum envelope.Choose the input of resonance peak as curve of low frequency part.First the narrowband speech of 8kHz sampling is inputted, estimate pitch period, and time-domain signal is transformed in log-spectral domain, by the pitch period search log-spectral domain peak point estimated, the change curve of resonance peak is described through curve fitting technique again, and then extrapolated high frequency log spectrum enveloping curve.
First, to narrowband speech sub-frame processing, frame length is 128, overlapping 64 sampled points of interframe.Frequency domain method is adopted namely to calculate the correlativity of signal to calculate the pitch period T of this frame voice.If input narrowband speech is x (n), autocorrelation function R (k) is
R ( k ) = Σ n = 0 N - 1 x ( n ) x ( n - k )
Wherein, N is frame length, described N=128, searches for the position k' of the maximal value of R (k) in the scope of correlation delay k=20 ~ 143, and k' is the valuation T of pitch period.Narrowband speech x (n) is done Fourier transform, is then transformed into log-spectral domain, search out first resonance peak in log-spectral domain, first resonance peak is set to p 0.Due to the size of pitch period and the spacing of resonance peak roughly equal, by fixed first resonance peak p 0with pitch period T, other low-frequency resonance peak can be searched out.When searching for other low-frequency resonance peaks, only need, searching for apart near the point for T with last resonance peak, the accurate location of other resonance peaks can be obtained, if its amplitude is lo_env (ω), i.e. low frequency log spectrum envelope, corresponding Frequency point ω.Lo_env (ω) and ω is as the input of curve.
Low frequency log spectrum envelope lo_env (ω) is set up mapping relations with low frequency frequency ω
lo_env(ω)=a·e +c·e ,ω=0~2π×4000
Obtain the parameter a in fitting function, b, c, d, both determine mapping equation.
Extrapolation high-frequency envelope module
By fixed mapping equation, higher frequency point is substituted into formula, the high frequency spectrum envelope data hi_env (ω) of the unknown is extrapolated, extrapolated high frequency log spectrum envelope hi_env (ω)
hi_env(ω)=a·e +c·e ,ω=2π×4000~2π×8000。
Characteristic extracting module
Carry out linear prediction analysis to narrowband speech, every frame obtains one group of linear predictor coefficient, structure autoregressive model.First narrowband speech structure autoregressive model is used.Linear prediction analysis is carried out to speech frame x (n) that each length is N (N=128), namely the autocorrelation function of each windowing speech frame is calculated, and using Levinson-Durbin algorithm to convert thereof into linear predictor coefficient, concrete steps are as follows.
Here Hamming window window (n)=0.5-0.5cos (2 π n/N), n=0 is used, 1 ..., N-1 carries out windowing process to input speech signal x (n), voice x'(n after windowing) be
x'(n)=x(n)·window(n),
Calculate autocorrelation function
R ( k ) = Σ n = k N - 1 x ' ( n ) x ' ( n - k ) , K=0,1 ..., N-1, N are positive integer.
L rank linear predictor coefficient a can be obtained by solving following system of equations i, i=1,2 ..., L, L are positive integer.
Σ i = 1 L a i R ( | i - k | ) = - R ( k ) K=1 ..., L, L are positive integer.
Adopt Levinson-Durbin algorithm, solve above-mentioned system of equations, linear predictor coefficient a can be obtained i, i=1 .2.., L.
Structure autoregressive model and filtration module
By low frequency speech linear predictive coefficient a i, i=1 ..., L constructs composite filter, namely
H ( z ) = G 1 - Σ i = 1 L a i z - i ,
Wherein, L is autoregressive model exponent number, and described L is positive integer, and L is certain integer between 8 ~ 20, and G is certain decimal between 0.1 ~ 1.Embodiments of the invention arrange L=10, and G=1 is optimum embodiment.
White noise is processed by this composite filter, produces the random series relevant to low frequency voice.The production method of white noise sequence is
w(n)=[w(n-1)·31821+13849],
Wherein, w (0)=0.
White noise sequence w (n), by after above-mentioned composite filter, exports high frequency noise sequences y (n), namely
y ( n ) = w ( n ) + Σ i = 1 L a i y ( n - i ) ,
Wherein, a ifor composite filter coefficient.In order to limit HFS energy, high frequency noise sequences y (n) is normalized, namely
y ( n ) = y ( n ) Σ i = 0 N - 1 y ( n ) · y ( n ) ,
Wherein, N is frame length, and the present invention's suggestion arranges N=128.
Spectral shaping module
High frequency log-spectral domain envelope hi_env (ω) that utilization is estimated above is modulated high frequency noise sequence [7].First, Fourier transform is carried out to high frequency noise sequences y (n), then is transformed into log-domain, obtain the frequency domain logarithm value C of high frequency noise sequence y(ω).Use high frequency log spectrum envelope to modulate high frequency noise sequence spectrum, obtain the frequency spectrum logarithm value C of High frequency speech wide(ω)
C wide(ω)=C y(w)·hi_env(w),
If the frequency domain value of High frequency speech and High frequency speech time-domain value use S respectively wide(ω) and S widen () represents, then have
S wide(ω)=exp(C wide(ω)), (1)
s wide(n)=IFFT(S wide(ω)), (2)
Wherein, exp () is exponent arithmetic, and IFFT () is inverse Fourier transform.Through formula (1), formula (2) inverse transformation process, High frequency speech can be obtained.
Voice synthetic module
The present invention utilizes the feature of cepstrum, the HFS of voice and low frequency part is synthesized [8], and then obtains the broadband voice after synthesizing.The building-up process of voice as shown in Figure 2.
Be the method raising sampling rate of narrow band signal by interpolation of 8KHz by sample frequency, be promoted to 16KHz, obtain the cepstrum of narrowband speech through cepstrum computation process, High frequency speech obtains the cepstrum of High frequency speech equally through cepstrum computation process.The cepstrum of narrowband speech and High frequency speech is transformed into frequency domain respectively, and the frequency domain amplitude of narrowband speech does following process:
C wide(ω)=C narrow(ω)+C high(ω)
Wherein, C narrow(ω) and C high(ω) the cepstrum frequency domain value of narrowband speech and High frequency speech is respectively; C wide(ω) be the frequency domain value of the broadband cepstrum of synthesis.Again through inverse Fourier transform, obtain the cepstrum of broadband voice, eventually pass the inverse process of cepstrum, obtain the broadband voice after synthesizing.As shown in Figure 2.
The present invention is a kind of total blindness's Speech bandwidth extension technology, can directly apply to narrowband speech receiving end.The present invention is without any need for priori or high-frequency information, and algorithm complex is lower, can recover the HFS higher with low correlation, and the broadband voice auditory effect of synthesis is good.
In order to verify validity of the present invention, objective examination and subjective testing are carried out.
Objective examination's result
Spectrum Distortion Measure and sound spectrograph are the effective ways of objective evidence voice quality.Without loss of generality, the method calculating Spectrum Distortion Measure and draw sound spectrograph is selected in objective examination's link.
Spectrum Distortion Measure is defined as
D HC 2 = 1 K Σ k = 1 k ∫ 0.25 ω 0.5 ω ( 20 log 10 ( A k ( ω ) A k ' ( ω ) ) + G C ) 2 dω ,
G C = 1 0.25 ω s ∫ 0.25 ω s 0.5 ω s 20 log 10 ( A k ' ( ω ) A k ( ω ) ) dω ,
Wherein, ω sbe 2 π, G cfor gain compensation factor, it can remove the square error between two original envelope effectively, and K is total number of speech frames, A k(ω) and A' k(ω) be respectively the spectrum envelope of kth frame original reference voice and tested voice, computing formula is as follows
A k ( ω ) = | Σ n = 0 N - 1 x ( n ) e - jωn | ,
A k ' ( ω ) = | Σ n = 0 N - 1 x ' ( n ) e - jωn | ,
The present invention's suggestion arranges N=128, and x (n) and x ' (n) represents original reference voice and tested voice respectively, and original reference voice are original broadband voice here, and tested voice are the broadband voice of original narrow-band voice or synthesis.
Respectively in the manner described above Spectrum Distortion Measure is calculated to the broadband voice of original narrow-band voice and the synthesis of this algorithm of use.Test result is shown in Fig. 5.As can be seen from Figure 5, the spectrum distortion of the broadband voice of algorithm synthesis herein obviously reduces compared with the spectrum distortion of narrowband speech, illustrates that algorithm can estimate High frequency speech and synthetic wideband voice preferably herein.
Sound spectrograph is the energy information representing one section of voice intermediate frequency spectrum with gray level image, and the brighter part of image illustrates that this portion of energy is larger, and darker part illustrates that the energy of this partial frequency spectrum is less.Sound spectrograph can show the change of voice medium frequency intuitively, therefore, in order to contrast frequency spectrum difference more intuitively, give the narrowband speech of man in tested speech, the sound spectrograph of original broadband voice and the broadband voice through the synthesis of this illiteracy bandwidth expansion algorithm, as shown in Fig. 3 (a), (b), (c).From the sound spectrograph that Fig. 3 (a) is primary speech signal, can find out, sound spectrograph is all brighter in 0 ~ 8KHz frequency range.Fig. 3 (b) is the sound spectrograph of narrow band voice signal, and the sound spectrograph of narrowband speech is very dark in 4 ~ 8KHz frequency range, illustrates at HFS energy very little, so narrowband speech sounds natural not.Fig. 3 (c) is the sound spectrograph of the blind bandwidth expansion algorithm output voice that the present invention proposes, and in 4 ~ 8KHz frequency range, sound spectrograph obviously brightens, and illustrates that the high fdrequency component of voice obviously increases.
Subjective test results
Subjective testing adopts subjective testing standards of grading method conventional in the world, namely compares mean opinion score.Fig. 6 gives subjective testing standards of grading, and scoring scope is between-3 ~+3.
The tested speech that the present invention chooses is as follows: the narrowband telephone voice that (1) adaptive variable rate audio coder & decoder (codec) exports under code rate is 12.2kbps; (2) the wideband telephony voice that export under code rate is 8.85kbps of wideband adaptive rate speech codec; (3) the wideband telephony voice of narrowband telephone voice after the new blind bandwidth expansion algorithm that the present invention proposes that export under code rate is 12.2kbps of adaptive variable rate audio coder & decoder (codec).
The narrowband telephone voice that the wideband telephony voice of narrowband telephone voice after the new blind bandwidth expansion algorithm that the present invention proposes and adaptive variable rate audio coder & decoder (codec) export under code rate is 12.2kbps are as first group of tested speech; The wideband telephony voice that the wideband telephony voice of narrowband telephone voice after the new blind bandwidth expansion algorithm that the present invention proposes and wideband adaptive rate speech codec export under code rate is 8.85kbps are as second group of tested speech.Every section of voice all will be clipped to-26 decibels.
In subjective testing, invite 20 audiences (10 male 10 female) to test in same environment, the age of test subject is between 20 years old ~ 40 years old, and within half a year, do not participate in the relevant subjective testing in any voice.Before the test begins, by the effect after bandwidth expansion to audience display, and inform that audience needs to evaluate two main aspects of voice, evaluate voice quality and experience the high fdrequency component expanded.When test subject understanding of guidance, first they will listen to preliminary feelings row, and provide their suggestion.During test, often organize tested speech and show test subject according to random order, and allow them unrestrictedly to repeat to listen to.Finally, every bit test main body will provide their suggestion according to subjective testing standards of grading.Fig. 4 (a) and 4 (b) give the distribution plan of the comparing result of two groups of tested speech.
In distribution plan, horizontal ordinate represents subjective testing standards of grading score, and ordinate represents the audience's proportion providing a certain mark.Comment scoring criteria according to subjective testing, positive number represent the narrowband telephone voice that herein algorithm exports compared with adaptive variable rate audio coder & decoder (codec) under code rate is 12.2kbps or the wideband telephony voice that wideband adaptive rate speech codec exports under code rate is 8.85kbps better.This process adopts difference analysis method, adopts the fiducial interval of 95%, analyzes bandwidth expansion pattern test result.Fig. 4 (a) is the comparing result distribution plan of the narrowband telephone voice that Output rusults of the present invention and adaptive variable rate audio coder & decoder (codec) export under code rate is 12.2kbps; Fig. 4 (b) is the comparing result figure of the wideband telephony voice that this paper algorithm Output rusults and wideband adaptive rate speech codec export under code rate is 8.85kbps.As can be seen from Fig. 4 (a) and 4 (b), the result that algorithm draws herein is slightly better than the broadband voice that wideband adaptive rate speech codec exports under 8.85kbps code rate, but had larger improvement compared with the narrowband speech exported under 12.2kbps code rate with adaptive variable rate audio coder & decoder (codec), auditory effect significantly improves.
The above; be only the present invention's preferably embodiment; but protection scope of the present invention is not limited thereto; anyly be familiar with those skilled in the art in the technical scope that the present invention discloses; be equal to according to technical scheme of the present invention and inventive concept thereof and replace or change, all should be encompassed within protection scope of the present invention.

Claims (7)

1. a method for artificial speech bandwidth expansion, is characterized in that:
Narrow band voice signal is through extrapolation high-frequency envelope module after curve fitting module, and the output signal of extrapolation high-frequency envelope module enters spectral shaping module; Narrow band voice signal every frame after characteristic extracting module obtains one group of linear predictor coefficient, structure autoregressive model and filtration module, undertaken processing by this autoregressive model by white noise and produce the high frequency noise random series relevant to low frequency, high frequency noise random series enters spectral shaping module; Spectral shaping module exports High frequency speech; High frequency speech and narrow band voice signal obtain broadband voice through voice synthetic module.
2. the method for a kind of artificial speech bandwidth expansion according to claim 1, it is characterized in that: curve fitting module adopts the method for curve to obtain narrowband speech low frequency log spectrum enveloping curve equation, by curvilinear equation extrapolated high frequency log spectrum envelope, choose the input of resonance peak as linear fit of low frequency part; First input the narrowband speech of 8kHz sampling, estimate pitch period, and time-domain signal is transformed in log-spectral domain, by the pitch period search log-spectral domain peak point estimated, the change curve of resonance peak is described through curve fitting technique again, and then extrapolated high frequency log spectrum enveloping curve
To narrowband speech sub-frame processing: frame length is 128, overlapping 64 sampled points of interframe, adopt frequency domain method namely to calculate the correlativity of signal to calculate the pitch period T of this frame voice, input narrowband speech is x (n), and autocorrelation function R (k) is
Wherein N is frame length, described N=128, the position k' of the maximal value of R (k) is searched in the scope of correlation delay k=20 ~ 143, k' is the valuation T of pitch period, narrowband speech is done Fourier transform, then be transformed into log-spectral domain, search out first resonance peak in log-spectral domain, first resonance peak is set to p 0.Due to the size of pitch period and the spacing of resonance peak roughly equal, by fixed first resonance peak p 0with pitch period T, other low-frequency resonance peak can be searched out, when searching for other low-frequency resonance peaks, only need, searching for apart near the point for T with last resonance peak, the accurate location of other resonance peaks can be obtained, if its amplitude is lo_env (ω), i.e. low frequency log spectrum envelope, low frequency log spectrum envelope lo_env (ω), as the input of curve, is set up mapping relations with low frequency frequency ω by corresponding Frequency point ω, lo_env (ω) and ω
Lo_env (ω)=ae b ω+ ce d ω, ω=0 ~ 2 π * 4000, obtains the parameter a in fitting function, b, c, d, both determined mapping equation.
3. the apparatus and method of a kind of artificial speech bandwidth expansion according to claim 1, it is characterized in that: extrapolation high-frequency envelope module is by fixed mapping equation, higher frequency point is substituted into formula, the high frequency log spectrum envelope data hi_env (ω) of the unknown is extrapolated, extrapolated high frequency log spectrum envelope hi_env (ω)
hi_env(ω)=a·e +c·e ,ω=2π*4000~2π*8000。
4. the method for a kind of artificial speech bandwidth expansion according to claim 1, is characterized in that: characteristic extracting module carries out linear prediction analysis to narrowband speech, and every frame obtains one group of linear predictor coefficient, structure autoregressive model; First narrowband speech structure autoregressive model is used, linear prediction analysis is carried out to speech frame x (n) that each length is N, described N=128, namely the autocorrelation function of each windowing speech frame is calculated, and use Levinson-Durbin algorithm to convert thereof into linear predictor coefficient, concrete steps are as follows:
Use Hamming window window (n)=0.5-0.5cos (2 π n/N), n=0,1 ..., N-1, N are positive integer, carry out windowing process to input speech signal x (n), voice x'(n after windowing) be
x'(n)=x(n)·window(n),
Calculate autocorrelation function,
k=0,1 ..., N-1, N are positive integer,
Adopting Levinson-Durbin algorithm, L rank autoregressive model coefficient a can be obtained by solving following system of equations i, i=1,2 ..., L, L are positive integer
5. the method for a kind of artificial speech bandwidth expansion according to claim 1, is characterized in that: structure autoregressive model and filtration module method as follows:
By low frequency voice autoregressive model coefficient a i, i=1 ..., L, L are positive integer, structure composite filter model, namely
Wherein, G is gain, and L is autoregressive model exponent number, and described L is 8,9,10 ..., certain positive integer between 20, L is integer, and G is certain decimal between 0.1 ~ 1.
White noise is processed by this composite filter, produces the random series relevant to low frequency voice; The production method of white noise sequence is
w(n)=[w(n-1)·31821+13849],
Wherein, w (0)=0;
White noise sequence w (n), by after above-mentioned composite filter, exports high frequency noise sequences y (n), namely
Wherein, a ifor composite filter coefficient.In order to limit HFS energy, high frequency noise sequences y (n) is normalized, namely
Wherein, N is frame length, described N=128.
6. the method for a kind of artificial speech bandwidth expansion according to claim 1, is characterized in that: spectral shaping module utilizes high frequency log-spectral domain envelope hi_env (ω) estimated to modulate high frequency noise sequence above,
First, Fourier transform is carried out to high frequency noise sequences y (n), then is transformed into log-domain, obtain the frequency domain logarithm value C of high frequency noise sequence y(ω), use high frequency log spectrum envelope to modulating, obtains the frequency spectrum logarithm value C of High frequency speech to high frequency noise sequence spectrum wide(ω)
C wide(ω)=C y(w)·hi_env(w),
If the frequency domain value of High frequency speech and High frequency speech time-domain value use S respectively wide(ω) and S widen () represents, then have
S wide(ω)=exp(C wide(ω)), (1)
s wide(n)=IFFT(S wide(ω)), (2)
Wherein, exp () is exponent arithmetic, and IFFT () is inverse Fourier transform.Through formula (1), formula (2) inverse transformation process, High frequency speech can be obtained.
7. the method for a kind of artificial speech bandwidth expansion according to claim 1, it is characterized in that: voice synthetic module is that the narrow band signal of 8KHz improves sampling rate by the method for interpolation by sample frequency, be promoted to 16KHz, obtain the cepstrum of narrowband speech through cepstrum computation process, High frequency speech obtains the cepstrum of High frequency speech equally through cepstrum computation process; The cepstrum of narrowband speech and High frequency speech is transformed into frequency domain respectively, and the frequency domain amplitude of narrowband speech does following process:
C wide(ω)=C narrow(ω)+C high(ω),
Wherein, C narrow(ω) and C high(ω) the cepstrum frequency domain value of narrowband speech and High frequency speech is respectively; C wide(ω) be the frequency domain value of the broadband cepstrum of synthesis, then obtain the cepstrum of broadband voice through inverse Fourier transform, eventually pass the inverse process of cepstrum, obtain the broadband voice after synthesizing.
CN201310590362.6A 2013-11-20 2013-11-20 Method for expanding artificial voice bandwidth Pending CN104658547A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310590362.6A CN104658547A (en) 2013-11-20 2013-11-20 Method for expanding artificial voice bandwidth

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310590362.6A CN104658547A (en) 2013-11-20 2013-11-20 Method for expanding artificial voice bandwidth

Publications (1)

Publication Number Publication Date
CN104658547A true CN104658547A (en) 2015-05-27

Family

ID=53249586

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310590362.6A Pending CN104658547A (en) 2013-11-20 2013-11-20 Method for expanding artificial voice bandwidth

Country Status (1)

Country Link
CN (1) CN104658547A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017206842A1 (en) * 2016-05-31 2017-12-07 华为技术有限公司 Voice signal processing method, and related device and system
CN109688531A (en) * 2017-10-18 2019-04-26 宏达国际电子股份有限公司 Obtain method, electronic device and the recording medium of high-sound quality audio information converting
CN110322891A (en) * 2019-07-03 2019-10-11 南方科技大学 Voice signal processing method and device, terminal and storage medium
CN112562704A (en) * 2020-11-17 2021-03-26 中国人民解放军陆军工程大学 BLSTM-based frequency division spectrum expansion anti-noise voice conversion method

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017206842A1 (en) * 2016-05-31 2017-12-07 华为技术有限公司 Voice signal processing method, and related device and system
US10218856B2 (en) 2016-05-31 2019-02-26 Huawei Technologies Co., Ltd. Voice signal processing method, related apparatus, and system
CN109688531A (en) * 2017-10-18 2019-04-26 宏达国际电子股份有限公司 Obtain method, electronic device and the recording medium of high-sound quality audio information converting
CN110322891A (en) * 2019-07-03 2019-10-11 南方科技大学 Voice signal processing method and device, terminal and storage medium
CN110322891B (en) * 2019-07-03 2021-12-10 南方科技大学 Voice signal processing method and device, terminal and storage medium
CN112562704A (en) * 2020-11-17 2021-03-26 中国人民解放军陆军工程大学 BLSTM-based frequency division spectrum expansion anti-noise voice conversion method
CN112562704B (en) * 2020-11-17 2023-08-18 中国人民解放军陆军工程大学 Frequency division topological anti-noise voice conversion method based on BLSTM

Similar Documents

Publication Publication Date Title
CN103258543B (en) Method for expanding artificial voice bandwidth
CN103854662B (en) Adaptive voice detection method based on multiple domain Combined estimator
CN101976566B (en) Voice enhancement method and device using same
CN108447495B (en) Deep learning voice enhancement method based on comprehensive feature set
CN111128213B (en) Noise suppression method and system for processing in different frequency bands
CN110246510B (en) End-to-end voice enhancement method based on RefineNet
CN103021420B (en) Speech enhancement method of multi-sub-band spectral subtraction based on phase adjustment and amplitude compensation
CN102054480B (en) Method for separating monaural overlapping speeches based on fractional Fourier transform (FrFT)
CN108831499A (en) Utilize the sound enhancement method of voice existing probability
CN102664017B (en) Three-dimensional (3D) audio quality objective evaluation method
CN103440869A (en) Audio-reverberation inhibiting device and inhibiting method thereof
EP4191583A1 (en) Transient speech or audio signal encoding method and device, decoding method and device, processing system and computer-readable storage medium
CN101527141B (en) Method of converting whispered voice into normal voice based on radial group neutral network
JPS63259696A (en) Voice pre-processing method and apparatus
CN103474074B (en) Pitch estimation method and apparatus
CN102664003A (en) Residual excitation signal synthesis and voice conversion method based on harmonic plus noise model (HNM)
CN103413547A (en) Method for eliminating indoor reverberations
CN106997765B (en) Quantitative characterization method for human voice timbre
CN103440872A (en) Transient state noise removing method
CN104658547A (en) Method for expanding artificial voice bandwidth
CN107221334B (en) Audio bandwidth extension method and extension device
CN103345920B (en) Self-adaptation interpolation weighted spectrum model voice conversion and reconstructing method based on Mel-KSVD sparse representation
CN103093757B (en) Conversion method for conversion from narrow-band code stream to wide-band code stream
CN103559893B (en) One is target gammachirp cepstrum coefficient aural signature extracting method under water
CN103971697B (en) Sound enhancement method based on non-local mean filtering

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20150527