CN106710604A

CN106710604A - Formant enhancement apparatus and method for improving speech intelligibility

Info

Publication number: CN106710604A
Application number: CN201611118099.0A
Authority: CN
Inventors: 薛玮
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2016-12-07
Filing date: 2016-12-07
Publication date: 2017-05-24

Abstract

The invention relates to speech enhancement and brings forward a formant enhancement apparatus and method for improving speech intelligibility, for the purposes of enhancing or recovering voice information of a speaker and reserving tone quality of the speaker for pronunciation features of a patient with muscular atrophy. On one hand, the apparatus and method can help the patient with the muscular atrophy to train pronunciation to carry out speech sounding aiding, and on the other hand, the apparatus and method can also improve communication between the patient with others. According to the formant enhancement apparatus and method for improving speech intelligibility, first of all, pronunciation features of the patient with the muscular atrophy are compared with pronunciation features of normal people, and major differences of the two are obtained; secondly, numerous speech feature parameters of the normal people are taken as a reference database, at the moment, the method for extracting the feature parameters maintains consistent with the method for extracting the speech feature parameters of the patient with the muscular atrophy; thirdly, after the speech feature parameters of the patient with the muscular atrophy are extracted, simple gender division is carried out; and fourthly, the speech feature parameters of the patient are classifier through a trained neural network. The apparatus and method are applied to a speech enhancement occasion.

Description

Improve the formant intensifier and method of the intelligibility of speech

Technical field

The present invention relates to speech enhan-cement；Speech processes, specifically, be related to improve points of articulation muscular atrophy patient's voice can The formant Enhancement Method of degree of understanding.

Background technology

Muscular atrophy refers to because the meat fiber caused muscle volume such as even disappear that attenuates reduces caused by dystrophia The reason for problem, initiation, has：Neurogenic muscular atrophy, muscle-derived amyotrophia, muscular atrophy of disuse and other reasonses etc..In addition Also there is substantial connection with nervous system, spinal cord disease also often results in muscular atrophy, such as the parkinsonism that Stephen Hawking is suffered from, Also senile dementia, multiple sclerosis, amyotrophic lateral sclerosis (ALS) etc., this causes that the pronunciation of sufferer is received To influence.

Additionally, with the implementation of domestic family planning policy, the elderly's population is gradually increasing, Aging Problem is serious, This becomes apparent in developed country.With advancing age, the muscle of the old person increasingly atrophy, the channel portion of pronunciation Muscle be also gradually difficult to control to, therefore cause speech intelligibility to decline, intelligibility it is low, become difficult with other people exchange.

The method of existing speech enhan-cement, [1-2] is processed both for by the voice of the normal person of the interference such as noise, But this characteristic voice with muscular atrophy patient is simultaneously differed.Because there is problem in the sound channel position of muscular atrophy patient, institute To cause defects of voice, for example, there are problems that formant is incomplete or not from frequency spectrum, this cause with noise to language The destruction of sound is not very identical, therefore treating method is also not quite alike.

Additionally, there are some researches show muscular atrophy or the patient with nerve degenerative diseases etc. cannot very well control hair When the crowd of raw position muscle group carries out sounding and resumes training, speaker's tone color is maintained if can hear in the training process And the voice of lamprophonia, it resume training can play very big booster action.

In speech processes, time domain is elusive, therefore voice signal is transformed into frequency domain and at frequency domain Reason is topmost mode.Wherein, formant reflects the tract characteristics and tone color of speaker, therefore formant is adjusted It is whole main and effective.

[1] DST LLC is used for system [P] Chinese patents of self adaptation voice intellingibility treatment, 102498482B, 2014-10-15.

[2] Samsung Electronics Co., Ltd strengthens method and apparatus [P] Chinese patents of dialogue using formant, 1619646A, 200-05-25.

The content of the invention

To overcome the deficiencies in the prior art, the present invention is directed to propose for the pronunciation characteristic of muscular atrophy patient, enhancing or Person recovers speaker speech information, and retains speaker tone color.This aspect can help muscular atrophy patient to train sounding, enter Row speech utterance is aided in, while can also improve it being exchanged with other people.The technical solution adopted by the present invention is to improve voice The formant Enhancement Method of intelligibility, contrasts pronunciation characteristic and normal person's pronunciation characteristic of muscular atrophy patient and obtains two first The essential difference of person, using in Cepstrum Method, linear predictive coding LPC methods, mel-frequency cepstrum coefficient MFCC or line spectrum pair LSP Characteristic parameter is extracted under different spectrums one or more；Secondly, with a large amount of normal person's speech characteristic parameters as reference library, herein The method and the method for extracting muscular atrophy patient's speech characteristic parameter for extracting characteristic parameter are consistent；3rd, muscle is withered After the speech characteristic parameter of contracting patient is extracted, simple sex division is carried out；4th, by patient's speech characteristic parameter by having instructed The neutral net perfected is classified, and is compared with information in corresponding sex storehouse, obtains most close voice segments, and as Reference, the speech characteristic parameter to muscular atrophy patient is adjusted；Finally, the voice messaging after adjustment is returned into time domain simultaneously Synthesized, exported into complete voice.

In formant strengthening part, time-domain signal is transformed into frequency domain first, and extract frequency domain character parameter, referred to herein as Frequency domain can be that frequency can also be mel-frequency Mel-frequency, cepstrum rate Cepstrum.

Improve the intelligibility of speech formant intensifier, by signal audio signal reception device, preprocessor, segment processor, altogether Shake peak enhancing processor, synthesizer, player constitute；Wherein：

Signal audio signal reception device is by the radio reception of patient's voice signal and is stored, and radio reception includes voice signal and ambient noise；

Preprocessor carries out noise reduction process using spectrum-subtraction to the voice signal for storing, and the voice after noise reduction is carried out pre- Exacerbation is processed；

Segment processor carries out language to pretreated voice signal using short-time average zero-crossing rate and short-time energy detection The division of segment non-speech segment, and each word or word or phoneme are segmented, every section is stored successively, and be sequentially output；

Formant enhancing processor extracts the formant parameter of voice by LPC mel cepstrum coefficients, by this formant Parameter divides threshold value and is compared and carries out sex division with calculated sex, by having been trained under corresponding sex after division Kind neutral net is classified, and by the sound bank for prestoring of correspondence phoneme after classification, finds most close formant Parameter, adjusts pending formant parameter, when the formant parameter after adjustment is transformed to by standard of this formant parameter Domain exports, and suitably adjusts Time Domain Amplitude；

Synthesizer sequentially synthesizes processed voice signal；

The voice signal after synthesis is presented by player plays and with pictorial representation for player.

In an instantiation, formant enhancing processor internal logic is：

Framing device carries out framing to voice signal, and with 15-30ms as frame length, 10ms is moved for frame；

LPC coefficient extractor extracts the LPC coefficient per frame；

Cepstrum coefficient converter is converted to cepstrum coefficient by LPC coefficient；

Cepstrum coefficient is carried out nonlinear transformation by mel-frequency converter by Mel yardstick, is converted to LPC Mel spectral coefficients；

LPCCMCC coefficients are calculated resonance peak by formant calculator, take the centre frequency and band of first three formant A width of formant parameter；

The formant centre frequency that be calculated for formant calculator by gender sorter compares with a threshold value and carries out Sex is divided, and different sexes enter different phoneme graders.

Phoneme grader is the neutral net after the training of normal person's speech LPC mel cepstrum coefficients, by formant calculator Formant centre frequency and respective bandwidth after calculating obtain the classification of phoneme, into different formant ratios by grader Compared with device；

Formant comparator is the database under normal person's voice difference phoneme, will be joined by the formant of gender sorter Number finds centre frequency and the minimum formant parameter of bandwidth deviation, herein mainly with resonance by corresponding phoneme database Peak center frequency is Main Basiss, is keeping the second formant frequency F₂In the case of deviation minimum, the first formant frequency F is looked for₁ Deviation is tried one's best small value, finally keeps the 3rd formant frequency F deviations₃It is as far as possible small；

Formant enhancing wave filter take out phoneme library in be selected formant parameter and by formant calculator meter The formant parameter of calculation, adjusts the formant that formant calculator is calculated on the basis of the formant parameter being selected in phoneme library The bandwidth and amplitude of parameter, formant parameter after being processed.

Formant parameter after treatment is transformed to time-domain signal by time domain converter, line amplitude of going forward side by side adjustment.

The features of the present invention and beneficial effect are：

The present invention carries out repairing on frequency domain and time domain for the pronunciation characteristic of muscular atrophy patient by its voice signal It is multiple so that the intelligibility and quality of voice to be improved while speaker's tone color is retained, it is possible to carry out voice as patient extensive The auxiliary method that refreshment is practiced.

Brief description of the drawings：

Fig. 1 is a kind of to cause the low formant enhancing method of the intelligibility of speech for points of articulation muscular atrophy.

Fig. 2 formants strengthen block diagram.

Fig. 3 is the formant enhancing block diagram of the raising intelligibility of the embodiment according to present general inventive concept.

Fig. 4 is the block diagram of the preprocessor 320 of Fig. 3.

Fig. 5 is the block diagram of the segment processor 330 of Fig. 3.

Fig. 6 is the block diagram of the formant enhancing processor 340 of Fig. 3.

Fig. 7 is the block diagram of Fig. 3 synthesizers 350.

Fig. 8 is the block diagram of Fig. 3 players 360.

Specific embodiment

Being directed to the voice messaging under noise circumstance such as interchannel noise, radio reception noise etc. the method for existing speech enhan-cement more Enhancing, it improves intelligibility etc. mainly for reducing noise component(s) by algorithm, improve speech components.But, muscle withers The subject matter of contracting patient's voice be due to sound channel, the control ability of oral cavity muscle reduction, cause voiced sound formant lack, The problems such as voiceless sound is not protruded.This improves SNR and can not well lift voice messaging, raising intelligibility only by noise reduction, because To be difficult to distinguish whether unsharp part is noise.It is low even intelligibility to be there is also when the signal to noise ratio of patient's voice is very high Problem.

It is contemplated that for the pronunciation characteristic of muscular atrophy patient, strengthening or recovering speaker speech information, and protect Stay speaker tone color.This aspect can help muscular atrophy patient to train sounding, speech utterance auxiliary be carried out, while can also It is improved to be exchanged with other people.

Existing voice enhancement algorithm lifts intelligibility for improving the SNR of voice, and this is to SNR high but speaker's presence The voice signal that sounding problem causes intelligibility low has no too big help.

The present invention by analyzing the pronunciation characteristic and speech characteristic parameter of muscular atrophy patient and normal person, and with a large amount of The neutral net and storehouse for feature extraction being carried out based on normal person's voice and being trained with this characteristic parameter are contrasted, to flesh The characteristic parameter of meat atrophy patient's voice is adjusted so that understanding for voice is improved under the characteristics of ensureing speaker tone color Degree.Specific implementation method is as follows：

The pronunciation characteristic and normal person's pronunciation characteristic of muscular atrophy patient are contrasted first and obtain both essential differences, this Mainly carried out in frequency domain part, characteristic parameter can be extracted under different spectrums using Cepstrum Method, LPC methods, MFCC and LSP etc.；Cause Which kind of parameter some difference obtained for every kind of method, specifically select also by the experience that laboratory technician is certain.Can take:Every kind of method The parameter of calculating has a weight, then weight summation.

Secondly, with a large amount of normal person's speech characteristic parameters as reference library the method for characteristic parameter is extracted herein and is extracted The method of muscular atrophy patient's speech characteristic parameter is consistent；

3rd, after the speech characteristic parameter of muscular atrophy patient is extracted, carry out simple sex division；4th, will suffer from The neutral net that person's speech characteristic parameter passes through to have trained is classified, and is compared with information in corresponding sex storehouse, is obtained Most close voice segments, and as reference, the speech characteristic parameter to muscular atrophy patient is adjusted；Finally, will adjust Voice messaging afterwards returns to time domain and is synthesized, and exports into complete voice.

LPC methods：Linear predictive coding (linear predictive coding, LPC).

MFCC methods：Mel-frequency cepstrum coefficient (Mel Frequency Cepstrum Coefficient, MFCC).

LSP methods：Line spectrum pair (Line Spectrum Pairs, LSP).

Cepstrum Method：Cepstrum methods.

The present invention is further described with reference to the accompanying drawings and detailed description.

As shown in figure 1, causing the low formant enhancing method of the intelligibility of speech for points of articulation muscular atrophy for a kind of. The reception of voice signal can be carried out by microphone etc. first, next carries out the pretreatment such as noise reduction, preemphasis, and by voice segments Extract segmentation.Then formant enhancing is carried out, phonetic synthesis is finally carried out and is played or strengthened with view displaying formant Front and rear voice is compared.

In formant strengthening part, time-domain signal is transformed into frequency domain first, and extract frequency domain character parameter.Referred to herein as Frequency domain can be that frequency can also be mel-frequency (Mel-frequency), cepstrum rate (Cepstrum) etc..Extract feature Method can be using LPC methods, Cepstrum Method, MFCC methods, LSP methods etc..The characteristic parameter that will be obtained carries out preliminary sex area first Point, because frequency of the women generally than male is high.The neutral net for being built by learning algorithm again is classified, and obtains this fragment Under voice be which kind of phoneme.Then it is compared with the normal person's sound bank under this phoneme, finds immediate phonetic feature, As standard, adjustment needs the frequency domain character parameter of process signal.Time domain is finally transformed to, and is carried out appropriate amplitude and adjusted The whole quality for ensureing voice.

The embodiment of present general inventive concept is will be described in now, and its example is stated in the accompanying drawings, wherein identical mark Number all the time identical part is represented.Embodiment is described with reference to the accompanying drawings to explain the present general inventive concept.

Reference picture 3, signal audio signal reception device 310 is by the radio reception of patient's voice signal and is stored, and radio reception includes voice signal And ambient noise.

Preprocessor 320 carries out noise reduction process using spectrum-subtraction to the voice signal for storing, and the voice after noise reduction is entered The treatment of row preemphasis.

Segment processor 330 is carried out to pretreated voice signal using short-time average zero-crossing rate and short-time energy detection The division of voice segments non-speech segment, and each word or word or phoneme are segmented, every section is stored successively, and after passing sequentially through Continuous processor.

The formant that formant enhancing processor 340 extracts voice by LPC mel cepstrum coefficients (LPCCMCC) is joined Number, divides this formant parameter and calculated sex threshold value and is compared and carries out sex division, by corresponding after division Train perfect neutral net to be classified under sex, by the sound bank for prestoring of correspondence phoneme after classification, found Most close formant parameter, adjusts pending formant parameter, by the resonance after adjustment by standard of this formant parameter Peak parameter transformation is exported to time domain, and suitably adjusts Time Domain Amplitude.

Synthesizer 350 sequentially synthesizes processed voice signal.

The voice signal after synthesis is presented by player plays and with pictorial representation for player 360.

Fig. 4 is the block diagram of the preprocessor 320 of Fig. 3.

320 points of preprocessor is denoiser 410 and preaccentuator 420.

Fig. 5 is the block diagram of the segment processor 330 of Fig. 3.

Zero-crossing detector 510 is analyzed by short-time average zero-crossing rate and carries out first step segmentation, for by voice from Jing Yin area Split, if zero-crossing rate has unexpected decline and less than a certain threshold value, then it is assumed that be herein Jing Yin area, and will herein arrive previous Locate to be divided into one section of voice between Jing Yin area.

Energy detector 520 can mark off voiceless sound and voiced sound to be further segmented for voice by short-time energy.

Fig. 6 is the block diagram of the formant enhancing processor 340 of Fig. 3.

Framing device 621 carries out framing to voice signal, and with 15-30ms as frame length, 10ms is moved for frame.

LPC coefficient extractor 622 extracts the LPC coefficient of every frame.

Cepstrum coefficient converter 623 is converted to cepstrum coefficient by LPC coefficient.

Cepstrum coefficient is carried out nonlinear transformation by mel-frequency converter 624 by Mel yardstick, is converted to LPC mel cepstrums Coefficient.

LPC mel cepstrum coefficients are calculated resonance peak by formant calculator 625, take the center of first three formant frequently Rate and with a width of formant parameter 640.

The formant centre frequency that gender sorter 626 is calculated 625 compares with a threshold value carries out sex stroke Point, different sexes enter different phoneme graders.

Phoneme grader 627 is the neutral net after the training of normal person speech LPC MCC coefficients, the resonance after 625 are calculated Peak center frequency and respective bandwidth obtain the classification of phoneme, into different comparators by grader.

Formant comparator 628 is the database under normal person's voice difference phoneme, will be led to by 626 formant parameter Corresponding phoneme database is crossed, centre frequency and the minimum formant parameter 650 of bandwidth deviation is found.Herein mainly with formant Centre frequency is Main Basiss, is keeping the second formant frequency F₂In the case of deviation minimum, the first formant frequency F is looked for₁Partially The as far as possible small value of difference, finally keeps the 3rd formant frequency F₃Deviation is as far as possible small.

Formant calculator is adjusted on the basis of the formant parameter that formant strengthens wave filter 629 to be selected in phoneme library The bandwidth and amplitude of the formant parameter of calculating, formant parameter after being processed.F is assumed to be with to the formant that voice is obtained It is compared with all formant f ' in sound bank, when two deviations are minimum, chooses this f ', on the basis of this f ', Regulation f, the f after being processed.

Formant enhancing wave filter 629 take out phoneme library in be selected formant parameter 650 and by 625 calculate Formant parameter 640, adjusts the bandwidth and amplitude of 640 formants on the basis of 650, formant parameter 660 after being processed.

Time domain converter 630 transforms to time-domain signal 670 by 660, line amplitude of going forward side by side adjustment.

Fig. 7 is the block diagram of Fig. 3 synthesizers 350.

Paragraph maker 710 will progressively synthesize into complete speech section per frame signal.

Every section of voice is sequentially synthesized into complete speech by VODER 720.

Fig. 8 is the block diagram of Fig. 3 players 360.

View device 810 is shown the time domain speech signal after before processing.

Speech player 820 will process after speech play.

Claims

1. it is a kind of improve the intelligibility of speech formant Enhancement Method, it is characterized in that, first contrast muscular atrophy patient pronunciation Feature and normal person's pronunciation characteristic simultaneously obtain both essential differences, using Cepstrum Method, linear predictive coding LPC methods, Mel frequently Characteristic parameter is extracted under different spectrums one or more in rate cepstrum coefficient MFCC or line spectrum pair LSP；Secondly, with a large amount of normal People's speech characteristic parameter extracts the method for characteristic parameter and extracts muscular atrophy patient's speech characteristic parameter herein as reference library Method be consistent；3rd, after the speech characteristic parameter of muscular atrophy patient is extracted, carry out simple sex division；The Four, the neutral net that patient's speech characteristic parameter passes through to have trained is classified, and ratio is done with information in corresponding sex storehouse Compared with, most close voice segments are obtained, and as reference, the speech characteristic parameter to muscular atrophy patient is adjusted；Finally, Voice messaging after adjustment is returned into time domain and is synthesized, export into complete voice.

2. the formant Enhancement Method of the intelligibility of speech is improved as claimed in claim 1, it is characterized in that, in formant enhanced portion Point, time-domain signal is transformed into frequency domain first, and extract frequency domain character parameter, referred to herein as frequency domain can be that frequency can also It is mel-frequency Mel-frequency, cepstrum rate Cepstrum.

3. it is a kind of improve the intelligibility of speech formant intensifier, it is characterized in that, by signal audio signal reception device, preprocessor, point Section processor, formant enhancing processor, synthesizer, player are constituted；Wherein：

Preprocessor carries out noise reduction process using spectrum-subtraction to the voice signal for storing, and the voice after noise reduction is carried out into preemphasis Treatment；

Segment processor carries out voice segments to pretreated voice signal using short-time average zero-crossing rate and short-time energy detection The division of non-speech segment, and each word or word or phoneme are segmented, every section is stored successively, and be sequentially output；

Formant enhancing processor extracts the formant parameter of voice by LPC mel cepstrum coefficients, by this formant parameter Threshold value is divided with calculated sex to be compared and carry out sex division, by having trained perfect under corresponding sex after division Neutral net is classified, and by the sound bank for prestoring of correspondence phoneme after classification, finds most close formant parameter, Pending formant parameter is adjusted by standard of this formant parameter, the formant parameter after adjustment time domain is transformed into defeated Go out, and suitably adjust Time Domain Amplitude；

Synthesizer sequentially synthesizes processed voice signal；

4. the formant intensifier of the intelligibility of speech is improved as claimed in claim 3, it is characterized in that, in an instantiation In, formant enhancing processor internal logic is：

LPC coefficient extractor extracts the LPC coefficient per frame；

LPCCMCC coefficients are calculated resonance peak by formant calculator, take the centre frequency of first three formant and with a width of Formant parameter；

The formant centre frequency that be calculated for formant calculator by gender sorter compares with a threshold value and carries out sex Divide, different sexes enter different phoneme graders.

5. the formant intensifier of the intelligibility of speech is improved as claimed in claim 3, it is characterized in that, phoneme grader is for just Neutral net after the training of ordinary person's speech LPC mel cepstrum coefficients, the formant centre frequency after formant calculator is calculated And respective bandwidth obtains the classification of phoneme, into different formant comparators by grader；

Formant comparator is the database under normal person's voice difference phoneme, will be led to by the formant parameter of gender sorter Corresponding phoneme database is crossed, centre frequency and the minimum formant parameter of bandwidth deviation is found, herein mainly with formant Frequency of heart is Main Basiss, is keeping the second formant frequency F₂In the case of deviation minimum, the first formant frequency F is looked for₁Deviation Small value, finally keeps the 3rd formant frequency F deviations as far as possible₃It is as far as possible small；

Formant enhancing wave filter is taken out the formant parameter that is selected in phoneme library and is calculated by formant calculator Formant parameter, adjusts the formant parameter that formant calculator is calculated on the basis of the formant parameter being selected in phoneme library Bandwidth and amplitude, formant parameter after being processed；