CN107293306A - A kind of appraisal procedure of the Objective speech quality based on output - Google Patents

A kind of appraisal procedure of the Objective speech quality based on output Download PDF

Info

Publication number
CN107293306A
CN107293306A CN201710475912.8A CN201710475912A CN107293306A CN 107293306 A CN107293306 A CN 107293306A CN 201710475912 A CN201710475912 A CN 201710475912A CN 107293306 A CN107293306 A CN 107293306A
Authority
CN
China
Prior art keywords
mrow
msub
munderover
sequence
mfrac
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710475912.8A
Other languages
Chinese (zh)
Other versions
CN107293306B (en
Inventor
李庆先
刘良江
王晋威
朱宪宇
熊婕
李彦博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
HUNAN MEASUREMENT INSPECTION RESEARCH INSTITUTE
Hunan Institute of Metrology and Test
Original Assignee
HUNAN MEASUREMENT INSPECTION RESEARCH INSTITUTE
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by HUNAN MEASUREMENT INSPECTION RESEARCH INSTITUTE filed Critical HUNAN MEASUREMENT INSPECTION RESEARCH INSTITUTE
Priority to CN201710475912.8A priority Critical patent/CN107293306B/en
Publication of CN107293306A publication Critical patent/CN107293306A/en
Application granted granted Critical
Publication of CN107293306B publication Critical patent/CN107293306B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/60Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Complex Calculations (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)

Abstract

The present invention provides a kind of method of the objective speech quality assessment based on output, comprises the following steps:Calculate the mel-frequency cepstrum coefficient of the distorted speech after system is transmitted;Obtain the reference model for meeting human hearing characteristic;The mel-frequency cepstrum coefficient of distorted speech and the reference model for meeting human hearing characteristic are subjected to uniformity measure calculation;One section of sequence is inserted in raw tone, the bit error rate that the sequence is extracted in the distorted speech after being transmitted by system is calculated;Measured according to uniformity and mapping relations that the bit error rate is set up between subjective MOS and homogeneity measure, obtain the objective forecast model to voice MOS to be evaluated points, pass through the objective evaluation that the objective forecast model carries out voice quality.Using the method for the present invention, step is simplified, easy to use, and is capable of the quality of effectively objective evaluation voice, independent of subjective assessment.

Description

A kind of appraisal procedure of the Objective speech quality based on output
Technical field
The present invention relates to voice process technology field, especially, it is related to a kind of Objective speech quality based on output Appraisal procedure.
Background technology
Speech quality objective assessment refer to use machine automatic discrimination voice quality, by whether need to use input voice angle Degree can be divided into two classes:Objective evaluation based on input-output mode and the objective evaluation based on the way of output.
In many fields, such as wireless mobile communications, space flight navigation and modern military often require that evaluation method has Higher flexibility, real-time and versatility, and also will can be to voice matter in the case of it cannot get original input speech signal Amount is estimated, and is difficult often to obtain corresponding raw tone, phonetic storage in the objective evaluation of the mode based on input-output In terms of cost it is bigger, the drawbacks of existing certain under these application scenarios.
The general process of objective speech quality assessment method based on output is certain characteristic parameter of Calculation Estimation voice, And uniformity calculating is carried out with learning the characteristic parameter of reference voice after concluding by particular model, final mapping obtains subjectivity MOS points of estimate.In this process, the selection of characteristic parameter, training pattern and MOS points of mapping methods is most important , it affects the performance of assessment system.Because human ear meets Bark critical band to the perception characteristic of sound, therefore in feature Need to realize linear frequency and inflection frequency conversion during parameter extraction.Meanwhile, in this kind of application of radio communication, except from voice Itself analyze outer, in addition it is also necessary to consider influence of the external factors such as channel quality to voice quality.
Therefore, a kind of appraisal procedure tool for the voice quality that can be used for after objective evaluation coding or channel transmission is designed It is significant.
The content of the invention
It is an object of the invention to provide a kind of method of the objective speech quality assessment based on output.In view of human ear pair The auditory properties of frequency, while the cepstral analysis of voice signal is taken into account, using mel-frequency cepstrum coefficient (Mel-Frequency Cepstral Coefficients, MFCC) description phonetic feature.Trained by combining mel-frequency cepstrum coefficient and GMM-HMM Model obtains speech objective distortion value, at the same by channel effect by error rate index be introduced into it is objective estimate, then set up master See MOS point and it is objective estimate between mapping relations, the forecast model to subjective MOS is obtained, so as to be commented for objective Voice quality after valency coding or channel transmission.Details as Follows:
A kind of appraisal procedure of the Objective speech quality based on output, comprises the following steps:
Calculate the mel-frequency cepstrum coefficient of the distorted speech after system is transmitted;Acquisition meets human hearing characteristic Reference model;
The mel-frequency cepstrum coefficient of distorted speech and the reference model for meeting human hearing characteristic are subjected to uniformity amount Degree is calculated;One section of sequence is inserted in raw tone, calculating extracts the sequence in the distorted speech after being transmitted by system The bit error rate of row;
Measured and mapping relations that the bit error rate is set up between subjective MOS and homogeneity measure, obtained pair according to uniformity Voice MOS to be evaluated points of objective forecast model, the objective evaluation of voice quality is carried out by the objective forecast model.
Preferred in above technical scheme, the calculating process of the mel-frequency cepstrum coefficient includes pretreatment, FFT and become Change, four steps of Mel frequency filterings and discrete cosine transform.
Preferred in above technical scheme, the pretreatment specifically includes following steps:
Step 1.1, preemphasis, be specifically:Come using the digital filter of the lifting high frequency characteristics with 6dB/ octaves Preemphasis is realized, its transmission function is expression formula 1):
H (z)=1- μ z-11);
Wherein:μ is pre emphasis factor, and its value is 0.9-1.0;
Step 1.2, end-point detection, be specifically:Carried out by setting the thresholding of short-time energy and short-time zero-crossing rate, if certain The Short Time Speech signal that individual length is N is x (m), its short-time energy E expression formulas 2) calculate:
Its short-time zero-crossing rate Z expression formulas 3) calculate:
Wherein, sgn [] is sign function, i.e.,:
Step 1.3, framing and adding window, be specifically:The framing is that voice is divided into frame one by one, the length of each frame For 10-30ms;The adding window is to carry out adding window to each frame signal using Hamming windows.
Preferred in above technical scheme, the detailed process of the adding window is:If frame signal is x (n), window function is w (n), then signal y (n) after adding window is expression formula 4):
Y (n)=x (n) w (n), 0≤n≤N-1 4);
Wherein, N is the number of sampling per frame, and w (n) expression formula is w (n)=0.54-0.46cos [2 π n/ (N-1)], 0≤n≤ N-1。
Preferred in above technical scheme, the Mel frequency filterings are specifically:The discrete spectrum handled by FFT is used Sequence triangular filter is filtered processing, obtains one group of Coefficient ml、m2、……;The number p of the wave filter group is cut by signal Only frequency is determined, all wave filters generally cover from 0Hz to Nyquist 1/2nd of frequency, i.e. sample rate;miBy expressing Formula 5) calculate obtain:
Wherein:
F [i] is triangle filtering The centre frequency of device, meets:Mel (f [i+1])-Mel (f [i])=Mel (f [i])-Mel (f [i-1]);X (k) is frame signal x (n) discrete spectrum after being handled through FFT.
Preferred in above technical scheme, the discrete cosine transform is specifically:By the Mel frequencies by Mel frequency filterings Spectral transformation obtains Mel frequency cepstral coefficients, it is by expression formula 6 to time domain) calculate obtain:
Wherein:MFCC (i) is Mel frequency cepstral coefficients, and N is points per frame sample, and P is the number of wave filter group.
It is preferred in above technical scheme, obtain meet human hearing characteristic reference model detailed process it is as follows:
If the characteristic vector sequence of observation is O=o1,o2,…,oT, the state model sequence of the sequence is S=s1,s2,…, sN, then the HMM model of the sequence be expressed as expression formula 7):
λ=(π, A, B) is 7);
Wherein, π={ πi=P (s1=i), i=1,2 ..., N } it is initial state probabilities vector;A={ aijFor between state The transition probability matrix redirected, aijTo jump to state j probability from state i;B={ bi(ot)=P (ot|st=i), 2≤i≤ N-1 } collect for the output probability distribution of state;
To continuous HMM model, observation sequence is continuous signal, and its signal space corresponding with state j is with M mixed Gaussian Density function and represent, such as expression formula 8) and expression formula 9) under:
Wherein, cjkThe coefficient of expression state j k-th of Gaussian Mixture Model Probability Density Function;μjkIt is the average of Gaussian density function Vector;CjkFor covariance matrix, D is observation sequence O dimension;HMM parameters are by observation sequence O=o1,o2,…,oTEstimate Arrive, the target of estimation is that the likelihood function P (O | λ) for making model and training data maximizes to estimate newest λ, even if
The likelihood function p (O | λ) forward direction probability calculation formula such as expression formula 10):
Wherein:α1(i)=π bi(o1),1≤i≤N;
The likelihood function p (O | λ) backward probability calculation formula such as expression formula 11):
Wherein:βT(i)=1,1≤i≤N;
To giving observation sequence O=o1,o2,…,oTNewest λ is obtained by revaluation, ξ is defined hereintWhen (i, j) is t Quarter state be siAnd t+1 moment state is sjProbability, by expression formula 12) obtain:
Under conditions of setting models λ and observation sequence O, state siIt is expression formula 13 in moment t posterior probability):
Thus, the revaluation of HMM parameter lambdas is as follows:
In the parameter c of t state k-th of Gaussian mixture components of jjk, μjkAnd CjkBy expression formula 14), 15) and 16) weight New estimation:
Wherein, γt(j, k) represents the probability in t state k-th of Gaussian mixture components of j, can be obtained by following formula:
Preferred in above technical scheme, the computational methods that uniformity is measured are specifically:Using expression formula 17) counted Calculate:
Wherein:X1,...,XNIt is the mel-frequency cepstrum coefficient vector of distorted speech, N is vectorial number, and C is distorted speech Measured with the uniformity of model.
Preferred in above technical scheme, the calculating process of the bit error rate is as follows:
Step A, one PN sequence of generation, and be multiplied with a chaos sequence, the generation of chaos sequence is reflected by logistic Generation is penetrated, logistic mapping definitions are as follows:
xk+1=μ xk(1-xk)
Wherein, 0≤μ≤4 are referred to as branch parameter, xk∈ (0,1), when 3.5699456 ...<During μ≤4, logistic mappings Work in the sequence { x that chaos state, i.e. primary condition are produced under logistic mappingsk;K=0,1,2,3 ... } it is aperiodic , it is not convergent and very sensitive to initial value;Generate comprising the following steps that for monitoring data sequent:
Step a1, produce real-valued sequence first, and it is big for monitoring data sequent to choose the length that the position of some in sequence starts Small one section;
Step a2, real-valued sequence is changed into binary sequence:By defining a threshold value Γ, obtained by real-valued sequence:
Binary chaotic sequence is { Γ (xk);K=0,1,2,3 ... };
Step a3, binary chaotic sequence is multiplied with a PN sequence, you can obtain monitoring data sequent;
Step B, for monitoring data sequent insert synchronous code, monitoring data sequent embedded below is extracted frame by frame;
Step C, the monitoring data sequent of synchronous code will be inserted in wavelet field in embedded voice signal, detailed process is as follows:
Step c1, selection Daubechies10 small echos are used as wavelet function;
Step c2, sub-frame processing is carried out to voice signal, the size per frame is 1152 sampled points, and every frame signal is entered 3 layers of wavelet transformation of row;
Step c3, wavelet coefficient is quantified, and monitoring data sequent is modulated, so that monitoring data sequent is embedded in into voice In signal, if coefficient to be quantified is f, the bit of embedded monitoring data sequent is w, and quantization step is Δ, and monitoring sequence is contained after quantization The coefficient of column information concretely comprises the following steps for f':
To f modulus and floor operation, as f > 0, ifN=m%2, then:
As f < 0, ifN=m%2, n=w, then:
Monitoring data sequent is embedded into voice signal one by one according to above-mentioned formula;
Step c4, the signal of embedded monitoring data sequent switched back into time-domain signal;
Embedded monitoring data sequent is extracted in step D, the voice received, and calculates the bit error rate, the process specifically extracted Comprise the following steps:
Step d1, synchronous code is searched in voice signal, be specifically:If needing the length that the signal length searched for is L, then L Degree should be more than the length of two synchronous codes and the summation of a complete monitoring data sequent length;If the initiating searches point of signal is I=1, if the sample value of signalIn the range of 900-1100, then it is assumed that searched possible synchronous code, It is compared using default synchronous code;If determined as synchronous code, then I points are the original position of monitoring data sequent, otherwise make I =I+L;
Step d2, since the starting point found, to voice signal carry out wavelet transform;
Step d3, the operation for making contrary during with insertion to the coefficient f after wavelet decomposition, i.e.,:During f > 0, if W=m%2;During f < 0, ifW=m%2;
So as to extract binary system monitoring data sequent;
Step d4, compare the monitoring data sequent extracted and the monitoring data sequent of insertion, by expression formula 18) calculate the bit error rate:
Wherein Seqsend、SeqreceiveAnd SeqlengthRepresent that transmission monitoring data sequent, reception monitoring data sequent and sequence are long respectively Degree;The Hamming weight of sequence is sought in Hammingweight () expressions, and XOR represents xor operation.
Preferred in above technical scheme, the mapping relations pass through expression formula 19) obtain:
In formula:F () is multivariate nonlinear regression analysis model;CiIt is that the uniformity of i-th kind of parameter is measured;N is phonetic feature The number of parameter;It is c1,...,cNThe objective MOS scorings predicted by f ().
Apply the technical scheme of the present invention, effect is:
1st, Mel frequency scales are approached using MFCC, so as to stretch the low-frequency information and compacting high-frequency information of voice, it can be used In voice robust analysis and speech recognition, suppress the feature dependent on speaker, retain the philological quality of voice segments.
2nd, the present invention set up subjective MOS and it is objective estimate and channel quality between mapping relations, obtain to subjectivity MOS points of forecast model so that point closer subjective quality.
3rd, the inventive method step is simplified, easy to use, and is capable of the quality of effectively objective evaluation voice, independent of master See and evaluate.
In addition to objects, features and advantages described above, the present invention also has other objects, features and advantages. Below with reference to accompanying drawings, the present invention is further detailed explanation.
Brief description of the drawings
The accompanying drawing for constituting the part of the application is used for providing a further understanding of the present invention, schematic reality of the invention Apply example and its illustrate to be used to explain the present invention, do not constitute inappropriate limitation of the present invention.In the accompanying drawings:
Fig. 1 is the principle schematic diagram of the appraisal procedure of the Objective speech quality based on output in embodiment 1.
Embodiment
Embodiments of the invention are described in detail below in conjunction with accompanying drawing, but the present invention can be limited according to claim Fixed and covering multitude of different ways is implemented.
Embodiment 1:
A kind of appraisal procedure of the Objective speech quality based on output, refers to Fig. 1, specifically includes:Calculate and passed by system Raw tone (is obtained distorted speech by the mel-frequency cepstrum coefficient of the distorted speech after defeated after system is transmitted;Calculate plum Your process of frequency cepstral coefficient is MFCC parameter extraction process);The reference model that acquisition meets human hearing characteristic (is first carried The MFCC parameters of reference voice are taken, then obtain GMM-HMM models);By the mel-frequency cepstrum coefficient of distorted speech with meeting people The reference model of ear auditory properties carries out uniformity measure calculation (i.e. uniformity is calculated);One section of sequence is inserted in raw tone Row, calculate the bit error rate that the sequence is extracted in the distorted speech after being transmitted by system;Measured and missed according to uniformity Code check sets up the mapping relations between subjective MOS and homogeneity measure (i.e. MOS points of mappings in Fig. 1), obtains to be evaluated Voice MOS points of objective forecast model, by the objective forecast model carry out voice quality objective evaluation (here by The MOD of the mapping points of degrees of correlation and biased error between subjective MOS are used as evaluation criterion).Evaluation voice is ITU voices Storehouse (International Telecommunication Union's sound bank), Details as Follows:
The calculating process of mel-frequency cepstrum coefficient includes pretreatment, FFT (Fast Fourier Transform (FFT)) conversion, Mel frequencies Filtering and four steps of discrete cosine transform, be specifically:
The pretreatment specifically includes following steps:
Step 1.1, preemphasis, be specifically:Come using the digital filter of the lifting high frequency characteristics with 6dB/ octaves Preemphasis is realized, its transmission function is expression formula 1):
H (z)=1- μ z-11);
Wherein:μ is pre emphasis factor, and its value is 0.9-1.0 (taking 0.95 herein);
Step 1.2, end-point detection, be specifically:Carried out by setting the thresholding of short-time energy and short-time zero-crossing rate, if certain The Short Time Speech signal that individual length is N is x (m), its short-time energy E expression formulas 2) calculate:
Its short-time zero-crossing rate Z expression formulas 3) calculate:
Wherein, sgn [] is sign function, i.e.,:
Step 1.3, framing and adding window, be specifically:In order to be analyzed using the analysis method of stationary process, by language Sound is divided into frame one by one, and the length of each frame is 10-30ms;Meanwhile, in order to reduce the truncation effect of speech frame, use Hamming windows (hamming code window) carry out adding window to each frame signal, are specifically:
If frame signal is x (n), window function is w (n), then the signal y (n) after adding window is expression formula 4):
Y (n)=x (n) w (n), 0≤n≤N-1 4);
Wherein, N is the number of sampling per frame, and w (n) expression formula is w (n)=0.54-0.46cos [2 π n/ (N-1)], 0≤n≤ N-1。
The Mel frequency filterings are specifically:It will be filtered by the FFT discrete spectrums handled with sequence triangular filter Processing, obtains one group of Coefficient ml、m2、……;The number p of the wave filter group is determined that all wave filters are total by the cut-off frequency of signal Frequency (nyquist frequency), i.e. 1/2nd of sample rate are covered from 0Hz to Nyquist on body;miBy expression formula 5) calculate Obtain:
Wherein:
F [i] is triangle filtering The centre frequency of device, meets:Mel (f [i+1])-Mel (f [i])=Mel (f [i])-Mel (f [i-1]).
Because Mel spectral coefficients are all real numbers, time domain can be transformed to by discrete cosine transform.It is described discrete remaining String is converted:, to time domain, Mel frequency cepstral coefficients will be obtained, it is by table by the Mel Spectrum Conversions of Mel frequency filterings Up to formula 6) calculate obtain:
Wherein:MFCC (i) is Mel frequency cepstral coefficients, and N is points per frame sample, and P is the number of wave filter group.
The reference model detailed process that acquisition meets human hearing characteristic is as follows:
Pronunciation modeling and training based on GMM-HMM, if the characteristic vector sequence of observation is O=o1,o2,…,oT, the sequence The state model sequence of row is S=s1,s2,…,sN, then the HMM model (hidden Markov model) of the sequence be expressed as expression formula 7):
λ=(π, A, B) is 7);
Wherein, π={ πi=P (s1=i), i=1,2 ..., N } it is initial state probabilities vector;A={ aijFor between state The transition probability matrix redirected, aijTo jump to state j probability from state i;B={ bi(ot)=P (ot|st=i), 2≤i≤ N-1 } collect for the output probability distribution of state;
To continuous HMM model, observation sequence is continuous signal, and its signal space corresponding with state j is with M mixed Gaussian Density function and represent, such as expression formula 8) and expression formula 9) under:
Wherein, cjkThe coefficient of expression state j k-th of Gaussian Mixture Model Probability Density Function;μjkIt is the average of Gaussian density function Vector;CjkFor covariance matrix, D is observation sequence O dimension;HMM parameters are by observation sequence O=o1,o2,…,oTEstimate Arrive, the target of estimation is that the likelihood function P (O | λ) for making model and training data maximizes to estimate newest λ, even ifThis can realize that the EM algorithms include two parts using EM algorithms (EM algorithm):Before It is as follows to estimating again for backward probability calculation and HMM parameters and Gaussian mixture parameters:
The likelihood function p (O | λ) forward direction probability calculation formula such as expression formula 10):
Wherein:α1(i)=π bi(o1),1≤i≤N;
The likelihood function p (O | λ) backward probability calculation formula such as expression formula 11):
Wherein:βT(i)=1,1≤i≤N;
To giving observation sequence O=o1,o2,…,oTNewest λ is obtained by revaluation, ξ is defined hereintWhen (i, j) is t Quarter state be siAnd t+1 moment state is sjProbability, by expression formula 12) obtain:
Under conditions of setting models λ and observation sequence O, state siIt is expression formula 13 in moment t posterior probability):
Thus, the revaluation of HMM parameter lambdas is as follows:
In the parameter c of t state k-th of Gaussian mixture components of jjk、μjkAnd CjkBy expression formula 14), 15) and 16) weight New estimation:
Wherein, γt(j, k) represents the probability in t state k-th of Gaussian mixture components of j, can be obtained by following formula:
The computational methods that uniformity is measured are specifically:After modeling, the mel-frequency cepstrum coefficient of distorted speech and the ginseng Model progress uniformity is examined to measure using expression formula 17) calculated:
Wherein:X1,...,XNIt is mel-frequency cepstrum coefficient (MFCC) vector of distorted speech, N is vectorial number, and C is to lose The uniformity of true voice and model is measured.
The calculating process of the bit error rate is as follows:
Step A, one PN sequence of generation, and be multiplied with a chaos sequence, the generation of chaos sequence is reflected by logistic Generation is penetrated, logistic mapping definitions are as follows:
xk+1=μ xk(1-xk)
Wherein, 0≤μ≤4 are referred to as branch parameter, xk∈ (0,1), when 3.5699456 ...<During μ≤4, logistic mappings Work in the sequence { x that chaos state, i.e. primary condition are produced under logistic mappingsk;K=0,1,2,3 ... } it is aperiodic , it is not convergent and very sensitive to initial value;Generate comprising the following steps that for monitoring data sequent:
Step a1, produce real-valued sequence first, and it is big for monitoring data sequent to choose the length that the position of some in sequence starts Small one section;
Step a2, real-valued sequence is changed into binary sequence:By defining a threshold value Γ, obtained by real-valued sequence:
Binary chaotic sequence is { Γ (xk);K=0,1,2,3 ... };
Step a3, binary chaotic sequence is multiplied with a PN sequence (pseudo noise sequence), you can monitoring data sequent;
Step B, for monitoring data sequent insert synchronous code, monitoring data sequent embedded below is extracted frame by frame, be specifically: Synchronous code is inserted for monitoring data sequent, the purpose of insertion synchronous code is that, in order to prevent audio after the decay of channel, receiving terminal is difficult To extract monitoring data sequent;The synchronous code that we use is 16 bits, and in order to the code of positioning synchronous exactly, we adopt The method taken is the embedded synchronous code in the time domain of voice signal, and concrete methods of realizing is by 16 sampled points before monitoring data sequent Amplitude be set to 1000, so receiving terminal extract monitoring data sequent during, if there is the nonsynchronous situation of starting point, then may be used With the sampled point using continuous 16 sample values 900~1100, rising for watermark is rapidly found out in the way of searching synchronous code Beginning sample position, in this way, monitoring data sequent embedded below can be extracted frame by frame;
Step C, the monitoring data sequent for inserting synchronous code in embedded voice signal, selected embedding in wavelet field in wavelet field The reason for entering is that embedded monitoring data sequent has more preferable disguised in transform domain, human ear will not be caused to distinguish to raw tone Influence.The detailed process of sequence embedded voice in wavelet field is as follows:
Step c1, due to analyzing same problem using different wavelet basis different results can be produced, accordingly, it would be desirable to root The problem of according to analysis, selects suitable wavelet basis, and Daubechies10 small echos are chosen herein and are used as wavelet function;
Step c2, sub-frame processing is carried out to voice signal, the size per frame is 1152 sampled points, and every frame signal is entered 3 layers of wavelet transformation of row;In view of the auditory properties of human ear, select to be embedded in sequence in high band herein;
Step c3, wavelet coefficient is quantified, and monitoring data sequent is modulated, so that monitoring data sequent is embedded in into voice In signal, if coefficient to be quantified is f, the bit of embedded monitoring data sequent is w, and quantization step is Δ, and monitoring sequence is contained after quantization The coefficient of column information concretely comprises the following steps for f':First to f modulus and floor operation, as f > 0, ifN= M%2, then:
As f < 0, ifN=m%2, n=w, then:
Monitoring data sequent can be embedded into voice signal one by one according to above-mentioned formula.
Step c4, the signal of embedded monitoring data sequent switched back into time-domain signal;
Embedded monitoring data sequent is extracted in step D, the voice received, and calculates the bit error rate, details are:Monitoring data sequent Extraction be embedded inverse process, therefore the wavelet function and the series of wavelet decomposition used all keep constant, specifically extract Process comprises the following steps:
Step d1, synchronous code is searched in voice signal, be specifically:If needing the length that the signal length searched for is L, then L Degree should be more than the length of two synchronous codes and the summation of a complete monitoring data sequent length.If the initiating searches point of signal is I=1, if the sample value of signalIn the range of 900-1100, then it is assumed that searched possible synchronous code, It is compared using default synchronous code;If determined as synchronous code, then I points are the original position of monitoring data sequent, otherwise make I =I+L;
Step d2, since the starting point found, to voice signal carry out wavelet transform;
Step d3, the operation for making contrary during with insertion to the coefficient f after wavelet decomposition, i.e.,:
During f > 0, ifW=m%2;
During f < 0, ifW=m%2;
So as to extract binary system monitoring data sequent;
Step d4, compare the monitoring data sequent extracted and the monitoring data sequent of insertion, by expression formula 18) calculate the bit error rate (bit error rate as one of speech quality evaluation objective estimate):
Wherein Seqsend、SeqreceiveAnd SeqlengthRepresent that transmission monitoring data sequent, reception monitoring data sequent and sequence are long respectively Degree;The Hamming weight of sequence is sought in Hammingweight () expressions, and XOR represents xor operation.
After the parameter consistency of voice under calculating various distortion conditions is measured, a kind of Function Mapping relation can be used Come represent parameter consistency measure with it is objectiveBetween relation, i.e., described mapping relations pass through expression formula 19) obtain:
In formula:F () is that (it can be linearly or nonlinearly regression relation or fitting of a polynomial to anticipation function Relation, in this patent embodiment, in order to obtain more accurately predicting MOS values, preferred multivariate nonlinear regression analysis model herein); CiIt is that the uniformity of i-th kind of parameter is measured;N is the number of speech characteristic parameter;It is c1,...,cNPredicted by f () The objective MOS scorings gone out.The bit error rate is bigger, illustrates to disturb stronger in channel, the speech damage brought in transmitting procedure is also corresponding Also it is big, it is correspondingValue is smaller, and the quality of voice is poorer.
The performance of speech quality evaluation algorithm is weighed from the degree of correlation, biased error below.The degree of correlation mainly reflects voice Whether the mapping relations that quality evaluation algorithm obtains MOS points of prediction by distortion map are reasonable, the general MOS with Algorithm mapping points Degree of correlation and biased error between known subjective MOS values are used as evaluation criterion.
Correlation coefficient ρ and pass through expression formula 20 with standard estimated bias σ) and expression formula 21) obtain:
Wherein:MOSo(i) be i-th of voice prediction MOS values, MOSs(i) it is known MOS points, N is total voice pair Number,The average of prediction MOS values is represented,Represent the average that MOS divides.
Correlation coefficient ρ is closer to 1, and prediction MOS values are closer to true MOS values;Biased error σ is smaller, then predicated error is got over Small, the performance of algorithm is better.
The appraisal procedure of the present embodiment 1 and International Telecommunication Union ITU P.563 method for objectively evaluating (ITU-TP.563) Performance comparison is the results detailed in Table 1.
From table 1 it follows that the inventive method (embodiment 1) is certain relative to having on ITU-TP.563 algorithm performances The raising of degree, the average degree of correlation ρ of subjective MOS is higher, and estimated bias σ is relatively low, therefore, and the inventive method has validity And feasibility.
The performance comparision table that the inventive method of table 1 (embodiment 1) and ITU-TP.563 are handled voice respectively
The preferred embodiments of the present invention are the foregoing is only, are not intended to limit the invention, for the skill of this area For art personnel, the present invention can have various modifications and variations.Within the spirit and principles of the invention, that is made any repaiies Change, equivalent substitution, improvement etc., should be included in the scope of the protection.

Claims (10)

1. a kind of appraisal procedure of the Objective speech quality based on output, it is characterised in that comprise the following steps:
Calculate the mel-frequency cepstrum coefficient of the distorted speech after system is transmitted;Obtain the reference for meeting human hearing characteristic Model;
The mel-frequency cepstrum coefficient of distorted speech and the reference model progress uniformity for meeting human hearing characteristic are measured into meter Calculate;One section of sequence is inserted in raw tone, calculating extracts the sequence in the distorted speech after being transmitted by system The bit error rate;
Measured and mapping relations that the bit error rate is set up between subjective MOS and homogeneity measure, obtained to be evaluated according to uniformity The objective forecast model that MOS points of valency voice, the objective evaluation of voice quality is carried out by the objective forecast model.
2. the appraisal procedure of the Objective speech quality according to claim 1 based on output, it is characterised in that:The Mel The calculating process of frequency cepstral coefficient includes pretreatment, FFT, four steps of Mel frequency filterings and discrete cosine transform.
3. the appraisal procedure of the Objective speech quality according to claim 2 based on output, it is characterised in that:
The pretreatment specifically includes following steps:
Step 1.1, preemphasis, be specifically:Realized using the digital filter of the lifting high frequency characteristics with 6dB/ octaves Preemphasis, its transmission function is expression formula 1):
H (z)=1- μ z-11);
Wherein:μ is pre emphasis factor, and its value is 0.9-1.0;
Step 1.2, end-point detection, be specifically:Carried out by setting the thresholding of short-time energy and short-time zero-crossing rate, if some is grown The Short Time Speech signal that degree is N is x (m), its short-time energy E expression formulas 2) calculate:
<mrow> <mi>E</mi> <mo>=</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>m</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>N</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <msup> <mi>x</mi> <mn>2</mn> </msup> <mrow> <mo>(</mo> <mi>m</mi> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mn>2</mn> <mo>)</mo> <mo>;</mo> </mrow>
Its short-time zero-crossing rate Z expression formulas 3) calculate:
<mrow> <mi>Z</mi> <mo>=</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>m</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>N</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mo>|</mo> <mi>sgn</mi> <mo>&amp;lsqb;</mo> <mi>x</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>)</mo> </mrow> <mo>&amp;rsqb;</mo> <mo>-</mo> <mi>sgn</mi> <mo>&amp;lsqb;</mo> <mi>x</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>&amp;rsqb;</mo> <mo>|</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mn>3</mn> <mo>)</mo> <mo>;</mo> </mrow>
Wherein, sgn [] is sign function, i.e.,:
Step 1.3, framing and adding window, be specifically:The framing is that voice is divided into frame one by one, and the length of each frame is 10-30ms;The adding window is to carry out adding window to each frame signal using Hamming windows.
4. the appraisal procedure of the Objective speech quality according to claim 3 based on output, it is characterised in that:The adding window Detailed process be:If frame signal is x (n), window function is w (n), then the signal y (n) after adding window is expression formula 4):
Y (n)=x (n) w (n), 0≤n≤N-1 4);
Wherein, N is the number of sampling per frame, and w (n) expression formula is w (n)=0.54-0.46cos [2 π n/ (N-1)], 0≤n≤ N-1。
5. the appraisal procedure of the Objective speech quality according to claim 2 based on output, it is characterised in that:The Mel Frequency filtering is specifically:The discrete spectrum handled by FFT is filtered processing with sequence triangular filter, a system is obtained Number ml、m2、……;The number p of the wave filter group determines by the cut-off frequency of signal, all wave filters generally cover from 0Hz to / 2nd of Nyquist frequencies, i.e. sample rate;miBy expression formula 5) calculate obtain:
<mrow> <msub> <mi>m</mi> <mi>i</mi> </msub> <mo>=</mo> <mi>l</mi> <mi>n</mi> <mrow> <mo>(</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>N</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mo>|</mo> <mi>X</mi> <mo>(</mo> <mi>k</mi> <mo>)</mo> <mo>|</mo> <mo>&amp;CenterDot;</mo> <msub> <mi>H</mi> <mi>i</mi> </msub> <mo>(</mo> <mi>k</mi> <mo>)</mo> <mo>)</mo> </mrow> <mo>,</mo> <mi>i</mi> <mo>=</mo> <mn>1</mn> <mo>,</mo> <mn>2</mn> <mo>,</mo> <mo>...</mo> <mo>,</mo> <mi>p</mi> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mn>5</mn> <mo>)</mo> <mo>;</mo> </mrow>
Wherein:
F [i] is the centre frequency of triangular filter, is met:Mel (f [i+1])-Mel (f [i])=Mel (f [i])-Mel (f [i- 1]);X (k) is the discrete spectrum after frame signal x (n) is handled through FFT.
6. the appraisal procedure of the Objective speech quality according to claim 2 based on output, it is characterised in that:It is described discrete Cosine transform is specifically:Will by Mel frequency filterings Mel Spectrum Conversions to time domain, obtain Mel frequency cepstral coefficients, its by Expression formula 6) calculate obtain:
<mrow> <mi>M</mi> <mi>F</mi> <mi>C</mi> <mi>C</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> <mo>=</mo> <msqrt> <mrow> <mfrac> <mn>2</mn> <mi>N</mi> </mfrac> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>P</mi> </munderover> <msub> <mi>m</mi> <mi>j</mi> </msub> <mi>c</mi> <mi>o</mi> <mi>s</mi> <mo>&amp;lsqb;</mo> <mrow> <mo>(</mo> <mi>j</mi> <mo>-</mo> <mn>0.5</mn> <mo>)</mo> </mrow> <mfrac> <mrow> <mi>&amp;pi;</mi> <mi>i</mi> </mrow> <mi>p</mi> </mfrac> <mo>&amp;rsqb;</mo> </mrow> </msqrt> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mn>6</mn> <mo>)</mo> <mo>;</mo> </mrow>
Wherein:MFCC (i) is Mel frequency cepstral coefficients, and N is points per frame sample, and P is the number of wave filter group.
7. the appraisal procedure of the Objective speech quality according to claim 1 based on output, it is characterised in that:Met The reference model detailed process of human hearing characteristic is as follows:
If the characteristic vector sequence of observation is O=o1,o2,…,oT, the state model sequence of the sequence is S=s1,s2,…,sN, Then the HMM model of the sequence is expressed as expression formula 7):
λ=(π, A, B) is 7);
Wherein, π={ πi=P (s1=i), i=1,2 ..., N } it is initial state probabilities vector;A={ aijFor what is redirected between state Transition probability matrix, aijTo jump to state j probability from state i;B={ bi(ot)=P (ot|st=i), 2≤i≤N-1 } be The output probability distribution collection of state;
To continuous HMM model, observation sequence is continuous signal, and its signal space corresponding with state j is with M mixed Gaussian density Function and represent, such as expression formula 8) and expression formula 9) under:
<mrow> <msub> <mi>b</mi> <mi>j</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>o</mi> <mi>t</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </munderover> <msub> <mi>c</mi> <mrow> <mi>j</mi> <mi>k</mi> </mrow> </msub> <mi>N</mi> <mrow> <mo>(</mo> <msub> <mi>o</mi> <mi>t</mi> </msub> <mo>,</mo> <msub> <mi>&amp;mu;</mi> <mrow> <mi>j</mi> <mi>k</mi> </mrow> </msub> <mo>,</mo> <msub> <mi>C</mi> <mrow> <mi>j</mi> <mi>k</mi> </mrow> </msub> <mo>)</mo> </mrow> <mo>,</mo> <mn>1</mn> <mo>&amp;le;</mo> <mi>j</mi> <mo>&amp;le;</mo> <mi>N</mi> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mn>8</mn> <mo>)</mo> <mo>;</mo> </mrow>
<mrow> <mi>N</mi> <mrow> <mo>(</mo> <msub> <mi>o</mi> <mi>t</mi> </msub> <mo>,</mo> <msub> <mi>&amp;mu;</mi> <mrow> <mi>j</mi> <mi>k</mi> </mrow> </msub> <mo>,</mo> <msub> <mi>C</mi> <mrow> <mi>j</mi> <mi>k</mi> </mrow> </msub> <mo>)</mo> </mrow> <mo>=</mo> <msup> <mrow> <mo>(</mo> <mn>2</mn> <mi>&amp;pi;</mi> <mo>)</mo> </mrow> <mrow> <mo>-</mo> <mi>D</mi> <mo>/</mo> <mn>2</mn> </mrow> </msup> <mo>|</mo> <msub> <mi>C</mi> <mrow> <mi>j</mi> <mi>k</mi> </mrow> </msub> <msup> <mo>|</mo> <mrow> <mo>-</mo> <mn>1</mn> <mo>/</mo> <mn>2</mn> </mrow> </msup> <mi>exp</mi> <mrow> <mo>(</mo> <mo>-</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <msup> <mrow> <mo>(</mo> <mrow> <msub> <mi>o</mi> <mi>t</mi> </msub> <mo>-</mo> <msub> <mi>&amp;mu;</mi> <mrow> <mi>j</mi> <mi>k</mi> </mrow> </msub> </mrow> <mo>)</mo> </mrow> <mi>T</mi> </msup> <msubsup> <mi>C</mi> <mrow> <mi>j</mi> <mi>k</mi> </mrow> <mrow> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>(</mo> <mrow> <msub> <mi>o</mi> <mi>t</mi> </msub> <mo>-</mo> <msub> <mi>&amp;mu;</mi> <mrow> <mi>j</mi> <mi>k</mi> </mrow> </msub> </mrow> <mo>)</mo> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mn>9</mn> <mo>)</mo> <mo>;</mo> </mrow>
Wherein, cjkThe coefficient of expression state j k-th of Gaussian Mixture Model Probability Density Function;μjkIt is the mean vector of Gaussian density function; CjkFor covariance matrix, D is observation sequence O dimension;HMM parameters are by observation sequence O=o1,o2,…,oTEstimation is obtained, estimation Target be that the likelihood function P (O | λ) for making model and training data maximizes to estimate newest λ, even if
The likelihood function p (O | λ) forward direction probability calculation formula such as expression formula 10):
<mrow> <mi>P</mi> <mrow> <mo>(</mo> <mi>O</mi> <mo>|</mo> <mi>&amp;lambda;</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <msub> <mi>&amp;alpha;</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mn>10</mn> <mo>)</mo> <mo>;</mo> </mrow>
Wherein:α1(i)=π bi(o1),1≤i≤N;
<mrow> <msub> <mi>&amp;alpha;</mi> <mrow> <mi>t</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>&amp;lsqb;</mo> <munderover> <mi>&amp;Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <msub> <mi>&amp;alpha;</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> <msub> <mi>a</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mo>&amp;rsqb;</mo> <msub> <mi>b</mi> <mi>j</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>o</mi> <mrow> <mi>t</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> <mo>)</mo> </mrow> <mo>,</mo> <mn>1</mn> <mo>&amp;le;</mo> <mi>t</mi> <mo>&amp;le;</mo> <mi>T</mi> <mo>-</mo> <mn>1</mn> <mo>,</mo> <mn>1</mn> <mo>&amp;le;</mo> <mi>j</mi> <mo>&amp;le;</mo> <mi>N</mi> <mo>;</mo> </mrow>
The likelihood function p (O | λ) backward probability calculation formula such as expression formula 11):
<mrow> <mi>P</mi> <mrow> <mo>(</mo> <mi>O</mi> <mo>|</mo> <mi>&amp;lambda;</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <msub> <mi>&amp;pi;</mi> <mi>i</mi> </msub> <msub> <mi>b</mi> <mn>1</mn> </msub> <mrow> <mo>(</mo> <msub> <mi>o</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <msub> <mi>&amp;beta;</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mn>11</mn> <mo>)</mo> <mo>;</mo> </mrow>
Wherein:βT(i)=1,1≤i≤N;
<mrow> <msub> <mi>&amp;beta;</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mi>&amp;Sigma;</mi> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <msub> <mi>a</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <msub> <mi>b</mi> <mi>j</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>o</mi> <mrow> <mi>i</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> <mo>)</mo> </mrow> <msub> <mi>&amp;beta;</mi> <mrow> <mi>t</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> <mrow> <mo>(</mo> <mi>j</mi> <mo>)</mo> </mrow> <mo>,</mo> <mn>1</mn> <mo>&amp;le;</mo> <mi>j</mi> <mo>&amp;le;</mo> <mi>N</mi> <mo>,</mo> <mi>t</mi> <mo>=</mo> <mi>T</mi> <mo>-</mo> <mn>1</mn> <mo>,</mo> <mi>T</mi> <mo>-</mo> <mn>2</mn> <mo>,</mo> <mo>...</mo> <mo>,</mo> <mn>1</mn> <mo>;</mo> </mrow>
To giving observation sequence O=o1,o2,…,oTNewest λ is obtained by revaluation, ξ is defined hereint(i, j) is t shape State is siAnd t+1 moment state is sjProbability, by expression formula 12) obtain:
<mrow> <mtable> <mtr> <mtd> <mrow> <msub> <mi>&amp;xi;</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>)</mo> </mrow> <mo>=</mo> <msub> <mi>P</mi> <mrow> <mi>S</mi> <mo>|</mo> <mi>O</mi> <mo>,</mo> <mi>&amp;lambda;</mi> </mrow> </msub> <mrow> <mo>(</mo> <msub> <mi>s</mi> <mi>t</mi> </msub> <mo>=</mo> <msub> <mi>s</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>s</mi> <mrow> <mi>t</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> <mo>=</mo> <msub> <mi>s</mi> <mi>j</mi> </msub> <mo>|</mo> <mi>O</mi> <mo>,</mo> <mi>&amp;lambda;</mi> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>=</mo> <mfrac> <mrow> <msub> <mi>&amp;alpha;</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> <msub> <mi>a</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <msub> <mi>b</mi> <mi>j</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>o</mi> <mrow> <mi>t</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> <mo>)</mo> </mrow> <msub> <mi>&amp;beta;</mi> <mrow> <mi>t</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> <mrow> <mo>(</mo> <mi>j</mi> <mo>)</mo> </mrow> </mrow> <mrow> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <msub> <mi>&amp;alpha;</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> <msub> <mi>a</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <msub> <mi>b</mi> <mi>j</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>o</mi> <mrow> <mi>t</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> <mo>)</mo> </mrow> <msub> <mi>&amp;beta;</mi> <mrow> <mi>t</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> <mrow> <mo>(</mo> <mi>j</mi> <mo>)</mo> </mrow> </mrow> </mfrac> </mrow> </mtd> </mtr> </mtable> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mn>12</mn> <mo>)</mo> <mo>;</mo> </mrow>
Under conditions of setting models λ and observation sequence O, state siIt is expression formula 13 in moment t posterior probability):
<mrow> <mtable> <mtr> <mtd> <mrow> <msub> <mi>&amp;gamma;</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> <mo>=</mo> <msub> <mi>P</mi> <mrow> <mi>S</mi> <mo>|</mo> <mi>O</mi> <mo>,</mo> <mi>&amp;lambda;</mi> </mrow> </msub> <mrow> <mo>(</mo> <mrow> <msub> <mi>s</mi> <mi>t</mi> </msub> <mo>=</mo> <mi>i</mi> <mo>|</mo> <mi>O</mi> <mo>,</mo> <mi>&amp;lambda;</mi> </mrow> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>=</mo> <mfrac> <mrow> <msub> <mi>&amp;alpha;</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> <msub> <mi>&amp;beta;</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> <mrow> <munderover> <mi>&amp;Sigma;</mi> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <msub> <mi>&amp;alpha;</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> <msub> <mi>&amp;beta;</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> </mfrac> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>=</mo> <munderover> <mi>&amp;Sigma;</mi> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <msub> <mi>&amp;xi;</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> </mtable> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mn>13</mn> <mo>)</mo> <mo>;</mo> </mrow>
Thus, the revaluation of HMM parameter lambdas is as follows:
<mrow> <msub> <mover> <mi>&amp;pi;</mi> <mo>&amp;OverBar;</mo> </mover> <mi>i</mi> </msub> <mo>=</mo> <msub> <mi>&amp;gamma;</mi> <mn>1</mn> </msub> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> <mo>;</mo> </mrow>
<mrow> <msub> <mover> <mi>a</mi> <mo>&amp;OverBar;</mo> </mover> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mo>=</mo> <mfrac> <mrow> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>t</mi> <mo>=</mo> <mn>1</mn> </mrow> <mrow> <mi>T</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <msub> <mi>&amp;xi;</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>)</mo> </mrow> </mrow> <mrow> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>t</mi> <mo>=</mo> <mn>1</mn> </mrow> <mrow> <mi>T</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <msub> <mi>&amp;gamma;</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>;</mo> </mrow>
In the parameter c of t state k-th of Gaussian mixture components of jjk, μjkAnd CjkBy expression formula 14), 15) and 16) estimate again Meter:
<mrow> <msub> <mover> <mi>c</mi> <mo>&amp;OverBar;</mo> </mover> <mrow> <mi>j</mi> <mi>k</mi> </mrow> </msub> <mo>=</mo> <mfrac> <mrow> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>t</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>T</mi> </munderover> <msub> <mi>&amp;gamma;</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <mi>j</mi> <mo>,</mo> <mi>k</mi> <mo>)</mo> </mrow> </mrow> <mrow> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>t</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>T</mi> </munderover> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </munderover> <msub> <mi>&amp;gamma;</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <mi>j</mi> <mo>,</mo> <mi>k</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mn>14</mn> <mo>)</mo> <mo>;</mo> </mrow>
<mrow> <msub> <mover> <mi>&amp;mu;</mi> <mo>&amp;OverBar;</mo> </mover> <mrow> <mi>j</mi> <mi>k</mi> </mrow> </msub> <mo>=</mo> <mfrac> <mrow> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>t</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>T</mi> </munderover> <msub> <mi>&amp;gamma;</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <mi>j</mi> <mo>,</mo> <mi>k</mi> <mo>)</mo> </mrow> <msub> <mi>o</mi> <mi>t</mi> </msub> </mrow> <mrow> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>t</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>T</mi> </munderover> <msub> <mi>&amp;gamma;</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <mi>j</mi> <mo>,</mo> <mi>k</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mn>15</mn> <mo>)</mo> <mo>;</mo> </mrow>
<mrow> <msub> <mover> <mi>C</mi> <mo>&amp;OverBar;</mo> </mover> <mrow> <mi>j</mi> <mi>k</mi> </mrow> </msub> <mo>=</mo> <mfrac> <mrow> <munderover> <mi>&amp;Sigma;</mi> <mrow> <mi>t</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>T</mi> </munderover> <msub> <mi>&amp;gamma;</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <mrow> <mi>j</mi> <mo>,</mo> <mi>k</mi> </mrow> <mo>)</mo> </mrow> <mrow> <mo>(</mo> <mrow> <msub> <mi>o</mi> <mi>t</mi> </msub> <mo>-</mo> <msub> <mi>&amp;mu;</mi> <mrow> <mi>j</mi> <mi>k</mi> </mrow> </msub> </mrow> <mo>)</mo> </mrow> <msup> <mrow> <mo>(</mo> <mrow> <msub> <mi>o</mi> <mi>t</mi> </msub> <mo>-</mo> <msub> <mi>&amp;mu;</mi> <mrow> <mi>j</mi> <mi>k</mi> </mrow> </msub> </mrow> <mo>)</mo> </mrow> <mi>T</mi> </msup> </mrow> <mrow> <munderover> <mi>&amp;Sigma;</mi> <mrow> <mi>t</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>T</mi> </munderover> <msub> <mi>&amp;gamma;</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <mrow> <mi>j</mi> <mo>,</mo> <mi>k</mi> </mrow> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mn>16</mn> <mo>)</mo> <mo>;</mo> </mrow>
Wherein, γt(j, k) represents the probability in t state k-th of Gaussian mixture components of j, can be obtained by following formula:
<mrow> <msub> <mi>&amp;gamma;</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <mi>j</mi> <mo>,</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <msub> <mi>&amp;alpha;</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <mi>j</mi> <mo>)</mo> </mrow> <msub> <mi>&amp;beta;</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <mi>j</mi> <mo>)</mo> </mrow> </mrow> <mrow> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <msub> <mi>&amp;alpha;</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> <msub> <mi>&amp;beta;</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>&amp;lsqb;</mo> <mfrac> <mrow> <msub> <mi>c</mi> <mrow> <mi>j</mi> <mi>k</mi> </mrow> </msub> <mi>N</mi> <mrow> <mo>(</mo> <msub> <mi>o</mi> <mi>t</mi> </msub> <mo>,</mo> <msub> <mi>&amp;mu;</mi> <mrow> <mi>j</mi> <mi>k</mi> </mrow> </msub> <mo>,</mo> <msub> <mi>C</mi> <mrow> <mi>j</mi> <mi>k</mi> </mrow> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>m</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </munderover> <msub> <mi>c</mi> <mrow> <mi>j</mi> <mi>m</mi> </mrow> </msub> <mi>N</mi> <mrow> <mo>(</mo> <msub> <mi>o</mi> <mi>t</mi> </msub> <mo>,</mo> <msub> <mi>&amp;mu;</mi> <mrow> <mi>j</mi> <mi>m</mi> </mrow> </msub> <mo>,</mo> <msub> <mi>C</mi> <mrow> <mi>j</mi> <mi>m</mi> </mrow> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>&amp;rsqb;</mo> <mo>.</mo> </mrow>
8. the appraisal procedure of the Objective speech quality according to claim 1 based on output, it is characterised in that:Uniformity amount The computational methods of degree are specifically:Using expression formula 17) calculated:
<mrow> <mi>C</mi> <mrow> <mo>(</mo> <msub> <mi>X</mi> <mn>1</mn> </msub> <mo>,</mo> <mo>...</mo> <mo>,</mo> <msub> <mi>X</mi> <mi>N</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mi>N</mi> </mfrac> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <mi>l</mi> <mi>o</mi> <mi>g</mi> <mrow> <mo>(</mo> <mi>P</mi> <mo>(</mo> <mrow> <msub> <mi>X</mi> <mi>j</mi> </msub> <mo>|</mo> <mi>&amp;lambda;</mi> </mrow> <mo>)</mo> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mn>17</mn> <mo>)</mo> <mo>;</mo> </mrow>
Wherein:X1,...,XNIt is the mel-frequency cepstrum coefficient vector of distorted speech, N is vectorial number, and C is distorted speech and mould The uniformity of type is measured.
9. the appraisal procedure of the Objective speech quality according to claim 1 based on output, it is characterised in that:The error code The calculating process of rate is as follows:
Step A, one PN sequence of generation, and be multiplied with a chaos sequence, the generation of chaos sequence is mapped by logistic produces Raw, logistic mapping definitions are as follows:
xk+1=μ xk(1-xk)
Wherein, 0≤μ≤4 are referred to as branch parameter, xk∈ (0,1), when 3.5699456 ...<During μ≤4, logistic mappings works in Sequence { the x that chaos state, i.e. primary condition are produced under logistic mappingsk;K=0,1,2,3 ... } it is aperiodic, does not receive It is holding back and very sensitive to initial value;Generate comprising the following steps that for monitoring data sequent:
Step a1, produce real-valued sequence first, and it is monitoring data sequent size to choose the length that the position of some in sequence starts One section;
Step a2, real-valued sequence is changed into binary sequence:By defining a threshold value Γ, obtained by real-valued sequence:
<mrow> <mi>&amp;Gamma;</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <mo>-</mo> <mn>1</mn> </mrow> </mtd> <mtd> <mrow> <mi>x</mi> <mo>&lt;</mo> <mi>&amp;Gamma;</mi> </mrow> </mtd> </mtr> <mtr> <mtd> <mn>1</mn> </mtd> <mtd> <mrow> <mi>&amp;Gamma;</mi> <mo>&amp;le;</mo> <mi>x</mi> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>;</mo> </mrow>
Binary chaotic sequence is { Γ (xk);K=0,1,2,3 ... };
Step a3, binary chaotic sequence is multiplied with a PN sequence, you can obtain monitoring data sequent;
Step B, for monitoring data sequent insert synchronous code, monitoring data sequent embedded below is extracted frame by frame;
Step C, the monitoring data sequent of synchronous code will be inserted in wavelet field in embedded voice signal, detailed process is as follows:
Step c1, selection Daubechies10 small echos are used as wavelet function;
Step c2, sub-frame processing is carried out to voice signal, the size per frame is 1152 sampled points, and carries out 3 to every frame signal Layer wavelet transformation;
Step c3, wavelet coefficient is quantified, and monitoring data sequent is modulated, so that monitoring data sequent is embedded in into voice signal In, if coefficient to be quantified is f, the bit of embedded monitoring data sequent is w, and quantization step is Δ, is believed after quantization containing monitoring data sequent The coefficient of breath concretely comprises the following steps for f':
To f modulus and floor operation, as f > 0, ifN=m%2, then:
As f < 0, ifN=m%2, n=w, then:
Monitoring data sequent is embedded into voice signal one by one according to above-mentioned formula;
Step c4, the signal of embedded monitoring data sequent switched back into time-domain signal;
Embedded monitoring data sequent is extracted in step D, the voice received, and calculates the bit error rate, the process specifically extracted includes Following steps:
Step d1, synchronous code is searched in voice signal, be specifically:If it is L to need the signal length searched for, then L length should When the length more than two synchronous codes and the summation of a complete monitoring data sequent length;If the initiating searches point of signal is I= 1, if the sample value of signalIn the range of 900-1100, then it is assumed that searched possible synchronous code, profit It is compared with default synchronous code;If determined as synchronous code, then I points are the original position of monitoring data sequent, otherwise make I= I+L;
Step d2, since the starting point found, to voice signal carry out wavelet transform;
Step d3, the operation for making contrary during with insertion to the coefficient f after wavelet decomposition, i.e.,:During f > 0, ifw =m%2;During f < 0, ifW=m%2;
So as to extract binary system monitoring data sequent;
Step d4, compare the monitoring data sequent extracted and the monitoring data sequent of insertion, by expression formula 18) calculate the bit error rate:
<mrow> <mi>B</mi> <mi>E</mi> <mi>R</mi> <mo>=</mo> <mfrac> <mrow> <mi>H</mi> <mi>a</mi> <mi>m</mi> <mi>min</mi> <mi>g</mi> <mi>W</mi> <mi>e</mi> <mi>i</mi> <mi>g</mi> <mi>h</mi> <mi>t</mi> <mrow> <mo>(</mo> <msub> <mi>Seq</mi> <mrow> <mi>s</mi> <mi>e</mi> <mi>n</mi> <mi>d</mi> </mrow> </msub> <mi>X</mi> <mi>O</mi> <mi>R</mi> <mi> </mi> <msub> <mi>Seq</mi> <mrow> <mi>r</mi> <mi>e</mi> <mi>c</mi> <mi>e</mi> <mi>i</mi> <mi>v</mi> <mi>e</mi> </mrow> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <msub> <mi>Seq</mi> <mrow> <mi>l</mi> <mi>e</mi> <mi>n</mi> <mi>g</mi> <mi>t</mi> <mi>h</mi> </mrow> </msub> </mrow> </mfrac> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mn>18</mn> <mo>)</mo> <mo>;</mo> </mrow>
Wherein Seqsend、SeqreceiveAnd SeqlengthRepresent to send monitoring data sequent respectively, receive monitoring data sequent and sequence length; The Hamming weight of sequence is sought in Hammingweight () expressions, and XOR represents xor operation.
10. the appraisal procedure of the Objective speech quality according to claim 1 based on output, it is characterised in that:It is described to reflect Penetrate relation and pass through expression formula 19) obtain:
In formula:F () is multivariate nonlinear regression analysis model;CiIt is that the uniformity of i-th kind of parameter is measured;N is speech characteristic parameter Number;It is c1,...,cNThe objective MOS scorings predicted by f ().
CN201710475912.8A 2017-06-21 2017-06-21 A kind of appraisal procedure of the Objective speech quality based on output Active CN107293306B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710475912.8A CN107293306B (en) 2017-06-21 2017-06-21 A kind of appraisal procedure of the Objective speech quality based on output

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710475912.8A CN107293306B (en) 2017-06-21 2017-06-21 A kind of appraisal procedure of the Objective speech quality based on output

Publications (2)

Publication Number Publication Date
CN107293306A true CN107293306A (en) 2017-10-24
CN107293306B CN107293306B (en) 2018-06-15

Family

ID=60096759

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710475912.8A Active CN107293306B (en) 2017-06-21 2017-06-21 A kind of appraisal procedure of the Objective speech quality based on output

Country Status (1)

Country Link
CN (1) CN107293306B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107818797A (en) * 2017-12-07 2018-03-20 苏州科达科技股份有限公司 Voice quality assessment method, apparatus and its system
CN108364661A (en) * 2017-12-15 2018-08-03 海尔优家智能科技(北京)有限公司 Visualize speech performance appraisal procedure, device, computer equipment and storage medium
CN110211566A (en) * 2019-06-08 2019-09-06 安徽中医药大学 A kind of classification method of compressed sensing based hepatolenticular degeneration disfluency
CN110289014A (en) * 2019-05-21 2019-09-27 华为技术有限公司 A kind of speech quality detection method and electronic equipment
CN111091816A (en) * 2020-03-19 2020-05-01 北京五岳鑫信息技术股份有限公司 Data processing system and method based on voice evaluation
CN111968677A (en) * 2020-08-21 2020-11-20 南京工程学院 Voice quality self-evaluation method for fitting-free hearing aid

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040186716A1 (en) * 2003-01-21 2004-09-23 Telefonaktiebolaget Lm Ericsson Mapping objective voice quality metrics to a MOS domain for field measurements
CN1713273A (en) * 2005-07-21 2005-12-28 复旦大学 Algorithm of local robust digital voice-frequency watermark for preventing time size pantography
CN101847409A (en) * 2010-03-25 2010-09-29 北京邮电大学 Voice integrity protection method based on digital fingerprint
CN102044248A (en) * 2009-10-10 2011-05-04 北京理工大学 Objective evaluating method for audio quality of streaming media
CN102881289A (en) * 2012-09-11 2013-01-16 重庆大学 Hearing perception characteristic-based objective voice quality evaluation method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040186716A1 (en) * 2003-01-21 2004-09-23 Telefonaktiebolaget Lm Ericsson Mapping objective voice quality metrics to a MOS domain for field measurements
CN1713273A (en) * 2005-07-21 2005-12-28 复旦大学 Algorithm of local robust digital voice-frequency watermark for preventing time size pantography
CN102044248A (en) * 2009-10-10 2011-05-04 北京理工大学 Objective evaluating method for audio quality of streaming media
CN101847409A (en) * 2010-03-25 2010-09-29 北京邮电大学 Voice integrity protection method based on digital fingerprint
CN102881289A (en) * 2012-09-11 2013-01-16 重庆大学 Hearing perception characteristic-based objective voice quality evaluation method

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
QINGXIAN LI ETC: "A Speech Quality Evaluation Method Based on Auditory Characteristic", 《PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON INTELLIGENT CONTROL AND COMPUTER APPLICATION(ICCA 2016)》 *
宋知用: "《Matlab在语音信号分析与合成中的应用》", 30 November 2013, 北京航空航天大学出版社 *
陆虎敏: "《飞机座舱显示与控制技术》", 31 December 2015, 航空工业出版社 *
韩志艳 等: "《语音信号鲁棒特征提取及可视化技术研究》", 29 February 2012, 东北大学出版社 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107818797A (en) * 2017-12-07 2018-03-20 苏州科达科技股份有限公司 Voice quality assessment method, apparatus and its system
CN108364661A (en) * 2017-12-15 2018-08-03 海尔优家智能科技(北京)有限公司 Visualize speech performance appraisal procedure, device, computer equipment and storage medium
CN110289014A (en) * 2019-05-21 2019-09-27 华为技术有限公司 A kind of speech quality detection method and electronic equipment
CN110289014B (en) * 2019-05-21 2021-11-19 华为技术有限公司 Voice quality detection method and electronic equipment
CN110211566A (en) * 2019-06-08 2019-09-06 安徽中医药大学 A kind of classification method of compressed sensing based hepatolenticular degeneration disfluency
CN111091816A (en) * 2020-03-19 2020-05-01 北京五岳鑫信息技术股份有限公司 Data processing system and method based on voice evaluation
CN111968677A (en) * 2020-08-21 2020-11-20 南京工程学院 Voice quality self-evaluation method for fitting-free hearing aid
CN111968677B (en) * 2020-08-21 2021-09-07 南京工程学院 Voice quality self-evaluation method for fitting-free hearing aid

Also Published As

Publication number Publication date
CN107293306B (en) 2018-06-15

Similar Documents

Publication Publication Date Title
CN107293306B (en) A kind of appraisal procedure of the Objective speech quality based on output
CN108447495B (en) Deep learning voice enhancement method based on comprehensive feature set
CN1121681C (en) Speech processing
CN1750124B (en) Bandwidth extension of band limited audio signals
WO2019232829A1 (en) Voiceprint recognition method and apparatus, computer device and storage medium
US7720012B1 (en) Speaker identification in the presence of packet losses
EP3910630B1 (en) Transient speech or audio signal encoding method and device, decoding method and device, processing system and computer-readable storage medium
CN110428842A (en) Speech model training method, device, equipment and computer readable storage medium
CN111696580B (en) Voice detection method and device, electronic equipment and storage medium
Dubey et al. Non-intrusive speech quality assessment using several combinations of auditory features
CN104361894A (en) Output-based objective voice quality evaluation method
CN109036470A (en) Speech differentiation method, apparatus, computer equipment and storage medium
CN101577116A (en) Extracting method of MFCC coefficients of voice signal, device and Mel filtering method
CN108806725A (en) Speech differentiation method, apparatus, computer equipment and storage medium
CN105741853A (en) Digital speech perception hash method based on formant frequency
Lim et al. Classification of underwater transient signals using mfcc feature vector
CN103971697B (en) Sound enhancement method based on non-local mean filtering
CN116153337B (en) Synthetic voice tracing evidence obtaining method and device, electronic equipment and storage medium
Gandhiraj et al. Auditory-based wavelet packet filterbank for speech recognition using neural network
Varela et al. Combining pulse-based features for rejecting far-field speech in a HMM-based voice activity detector
CN105206259A (en) Voice conversion method
CN112233693B (en) Sound quality evaluation method, device and equipment
Mahdi et al. New single-ended objective measure for non-intrusive speech quality evaluation
CN113012684B (en) Synthesized voice detection method based on voice segmentation
CN114023343A (en) Voice conversion method based on semi-supervised feature learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant