CN106710604A - Formant enhancement apparatus and method for improving speech intelligibility - Google Patents
Formant enhancement apparatus and method for improving speech intelligibility Download PDFInfo
- Publication number
- CN106710604A CN106710604A CN201611118099.0A CN201611118099A CN106710604A CN 106710604 A CN106710604 A CN 106710604A CN 201611118099 A CN201611118099 A CN 201611118099A CN 106710604 A CN106710604 A CN 106710604A
- Authority
- CN
- China
- Prior art keywords
- formant
- parameter
- speech
- frequency
- voice
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 50
- 201000000585 muscular atrophy Diseases 0.000 claims abstract description 33
- 230000002708 enhancing effect Effects 0.000 claims abstract description 20
- 239000000284 extract Substances 0.000 claims description 13
- 230000007935 neutral effect Effects 0.000 claims description 11
- 238000001228 spectrum Methods 0.000 claims description 8
- 238000009432 framing Methods 0.000 claims description 6
- 230000005236 sound signal Effects 0.000 claims description 5
- 230000015572 biosynthetic process Effects 0.000 claims description 4
- 238000003786 synthesis reaction Methods 0.000 claims description 4
- 230000009466 transformation Effects 0.000 claims description 4
- 238000001514 detection method Methods 0.000 claims description 3
- 238000011946 reduction process Methods 0.000 claims description 3
- 230000003595 spectral effect Effects 0.000 claims description 2
- 238000013528 artificial neural network Methods 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 13
- 210000003205 muscle Anatomy 0.000 description 8
- 239000004568 cement Substances 0.000 description 3
- 238000005728 strengthening Methods 0.000 description 3
- 206010003694 Atrophy Diseases 0.000 description 2
- 206010002026 amyotrophic lateral sclerosis Diseases 0.000 description 2
- 230000037444 atrophy Effects 0.000 description 2
- 230000007423 decrease Effects 0.000 description 2
- 235000013372 meat Nutrition 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 208000024827 Alzheimer disease Diseases 0.000 description 1
- 208000027089 Parkinsonian disease Diseases 0.000 description 1
- 206010034010 Parkinsonism Diseases 0.000 description 1
- 206010039966 Senile dementia Diseases 0.000 description 1
- 208000029033 Spinal Cord disease Diseases 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000032683 aging Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000033228 biological regulation Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000006378 damage Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000005713 exacerbation Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 210000000214 mouth Anatomy 0.000 description 1
- 201000006417 multiple sclerosis Diseases 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 210000000653 nervous system Anatomy 0.000 description 1
- 208000015122 neurodegenerative disease Diseases 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0316—Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
- G10L21/0364—Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude for improving intelligibility
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0224—Processing in the time domain
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0232—Processing in the frequency domain
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Prostheses (AREA)
Abstract
The invention relates to speech enhancement and brings forward a formant enhancement apparatus and method for improving speech intelligibility, for the purposes of enhancing or recovering voice information of a speaker and reserving tone quality of the speaker for pronunciation features of a patient with muscular atrophy. On one hand, the apparatus and method can help the patient with the muscular atrophy to train pronunciation to carry out speech sounding aiding, and on the other hand, the apparatus and method can also improve communication between the patient with others. According to the formant enhancement apparatus and method for improving speech intelligibility, first of all, pronunciation features of the patient with the muscular atrophy are compared with pronunciation features of normal people, and major differences of the two are obtained; secondly, numerous speech feature parameters of the normal people are taken as a reference database, at the moment, the method for extracting the feature parameters maintains consistent with the method for extracting the speech feature parameters of the patient with the muscular atrophy; thirdly, after the speech feature parameters of the patient with the muscular atrophy are extracted, simple gender division is carried out; and fourthly, the speech feature parameters of the patient are classifier through a trained neural network. The apparatus and method are applied to a speech enhancement occasion.
Description
Technical field
The present invention relates to speech enhan-cement;Speech processes, specifically, be related to improve points of articulation muscular atrophy patient's voice can
The formant Enhancement Method of degree of understanding.
Background technology
Muscular atrophy refers to because the meat fiber caused muscle volume such as even disappear that attenuates reduces caused by dystrophia
The reason for problem, initiation, has:Neurogenic muscular atrophy, muscle-derived amyotrophia, muscular atrophy of disuse and other reasonses etc..In addition
Also there is substantial connection with nervous system, spinal cord disease also often results in muscular atrophy, such as the parkinsonism that Stephen Hawking is suffered from,
Also senile dementia, multiple sclerosis, amyotrophic lateral sclerosis (ALS) etc., this causes that the pronunciation of sufferer is received
To influence.
Additionally, with the implementation of domestic family planning policy, the elderly's population is gradually increasing, Aging Problem is serious,
This becomes apparent in developed country.With advancing age, the muscle of the old person increasingly atrophy, the channel portion of pronunciation
Muscle be also gradually difficult to control to, therefore cause speech intelligibility to decline, intelligibility it is low, become difficult with other people exchange.
The method of existing speech enhan-cement, [1-2] is processed both for by the voice of the normal person of the interference such as noise,
But this characteristic voice with muscular atrophy patient is simultaneously differed.Because there is problem in the sound channel position of muscular atrophy patient, institute
To cause defects of voice, for example, there are problems that formant is incomplete or not from frequency spectrum, this cause with noise to language
The destruction of sound is not very identical, therefore treating method is also not quite alike.
Additionally, there are some researches show muscular atrophy or the patient with nerve degenerative diseases etc. cannot very well control hair
When the crowd of raw position muscle group carries out sounding and resumes training, speaker's tone color is maintained if can hear in the training process
And the voice of lamprophonia, it resume training can play very big booster action.
In speech processes, time domain is elusive, therefore voice signal is transformed into frequency domain and at frequency domain
Reason is topmost mode.Wherein, formant reflects the tract characteristics and tone color of speaker, therefore formant is adjusted
It is whole main and effective.
[1] DST LLC is used for system [P] Chinese patents of self adaptation voice intellingibility treatment,
102498482B, 2014-10-15.
[2] Samsung Electronics Co., Ltd strengthens method and apparatus [P] Chinese patents of dialogue using formant,
1619646A, 200-05-25.
The content of the invention
To overcome the deficiencies in the prior art, the present invention is directed to propose for the pronunciation characteristic of muscular atrophy patient, enhancing or
Person recovers speaker speech information, and retains speaker tone color.This aspect can help muscular atrophy patient to train sounding, enter
Row speech utterance is aided in, while can also improve it being exchanged with other people.The technical solution adopted by the present invention is to improve voice
The formant Enhancement Method of intelligibility, contrasts pronunciation characteristic and normal person's pronunciation characteristic of muscular atrophy patient and obtains two first
The essential difference of person, using in Cepstrum Method, linear predictive coding LPC methods, mel-frequency cepstrum coefficient MFCC or line spectrum pair LSP
Characteristic parameter is extracted under different spectrums one or more;Secondly, with a large amount of normal person's speech characteristic parameters as reference library, herein
The method and the method for extracting muscular atrophy patient's speech characteristic parameter for extracting characteristic parameter are consistent;3rd, muscle is withered
After the speech characteristic parameter of contracting patient is extracted, simple sex division is carried out;4th, by patient's speech characteristic parameter by having instructed
The neutral net perfected is classified, and is compared with information in corresponding sex storehouse, obtains most close voice segments, and as
Reference, the speech characteristic parameter to muscular atrophy patient is adjusted;Finally, the voice messaging after adjustment is returned into time domain simultaneously
Synthesized, exported into complete voice.
In formant strengthening part, time-domain signal is transformed into frequency domain first, and extract frequency domain character parameter, referred to herein as
Frequency domain can be that frequency can also be mel-frequency Mel-frequency, cepstrum rate Cepstrum.
Improve the intelligibility of speech formant intensifier, by signal audio signal reception device, preprocessor, segment processor, altogether
Shake peak enhancing processor, synthesizer, player constitute;Wherein:
Signal audio signal reception device is by the radio reception of patient's voice signal and is stored, and radio reception includes voice signal and ambient noise;
Preprocessor carries out noise reduction process using spectrum-subtraction to the voice signal for storing, and the voice after noise reduction is carried out pre-
Exacerbation is processed;
Segment processor carries out language to pretreated voice signal using short-time average zero-crossing rate and short-time energy detection
The division of segment non-speech segment, and each word or word or phoneme are segmented, every section is stored successively, and be sequentially output;
Formant enhancing processor extracts the formant parameter of voice by LPC mel cepstrum coefficients, by this formant
Parameter divides threshold value and is compared and carries out sex division with calculated sex, by having been trained under corresponding sex after division
Kind neutral net is classified, and by the sound bank for prestoring of correspondence phoneme after classification, finds most close formant
Parameter, adjusts pending formant parameter, when the formant parameter after adjustment is transformed to by standard of this formant parameter
Domain exports, and suitably adjusts Time Domain Amplitude;
Synthesizer sequentially synthesizes processed voice signal;
The voice signal after synthesis is presented by player plays and with pictorial representation for player.
In an instantiation, formant enhancing processor internal logic is:
Framing device carries out framing to voice signal, and with 15-30ms as frame length, 10ms is moved for frame;
LPC coefficient extractor extracts the LPC coefficient per frame;
Cepstrum coefficient converter is converted to cepstrum coefficient by LPC coefficient;
Cepstrum coefficient is carried out nonlinear transformation by mel-frequency converter by Mel yardstick, is converted to LPC Mel spectral coefficients;
LPCCMCC coefficients are calculated resonance peak by formant calculator, take the centre frequency and band of first three formant
A width of formant parameter;
The formant centre frequency that be calculated for formant calculator by gender sorter compares with a threshold value and carries out
Sex is divided, and different sexes enter different phoneme graders.
Phoneme grader is the neutral net after the training of normal person's speech LPC mel cepstrum coefficients, by formant calculator
Formant centre frequency and respective bandwidth after calculating obtain the classification of phoneme, into different formant ratios by grader
Compared with device;
Formant comparator is the database under normal person's voice difference phoneme, will be joined by the formant of gender sorter
Number finds centre frequency and the minimum formant parameter of bandwidth deviation, herein mainly with resonance by corresponding phoneme database
Peak center frequency is Main Basiss, is keeping the second formant frequency F2In the case of deviation minimum, the first formant frequency F is looked for1
Deviation is tried one's best small value, finally keeps the 3rd formant frequency F deviations3It is as far as possible small;
Formant enhancing wave filter take out phoneme library in be selected formant parameter and by formant calculator meter
The formant parameter of calculation, adjusts the formant that formant calculator is calculated on the basis of the formant parameter being selected in phoneme library
The bandwidth and amplitude of parameter, formant parameter after being processed.
Formant parameter after treatment is transformed to time-domain signal by time domain converter, line amplitude of going forward side by side adjustment.
The features of the present invention and beneficial effect are:
The present invention carries out repairing on frequency domain and time domain for the pronunciation characteristic of muscular atrophy patient by its voice signal
It is multiple so that the intelligibility and quality of voice to be improved while speaker's tone color is retained, it is possible to carry out voice as patient extensive
The auxiliary method that refreshment is practiced.
Brief description of the drawings:
Fig. 1 is a kind of to cause the low formant enhancing method of the intelligibility of speech for points of articulation muscular atrophy.
Fig. 2 formants strengthen block diagram.
Fig. 3 is the formant enhancing block diagram of the raising intelligibility of the embodiment according to present general inventive concept.
Fig. 4 is the block diagram of the preprocessor 320 of Fig. 3.
Fig. 5 is the block diagram of the segment processor 330 of Fig. 3.
Fig. 6 is the block diagram of the formant enhancing processor 340 of Fig. 3.
Fig. 7 is the block diagram of Fig. 3 synthesizers 350.
Fig. 8 is the block diagram of Fig. 3 players 360.
Specific embodiment
Being directed to the voice messaging under noise circumstance such as interchannel noise, radio reception noise etc. the method for existing speech enhan-cement more
Enhancing, it improves intelligibility etc. mainly for reducing noise component(s) by algorithm, improve speech components.But, muscle withers
The subject matter of contracting patient's voice be due to sound channel, the control ability of oral cavity muscle reduction, cause voiced sound formant lack,
The problems such as voiceless sound is not protruded.This improves SNR and can not well lift voice messaging, raising intelligibility only by noise reduction, because
To be difficult to distinguish whether unsharp part is noise.It is low even intelligibility to be there is also when the signal to noise ratio of patient's voice is very high
Problem.
It is contemplated that for the pronunciation characteristic of muscular atrophy patient, strengthening or recovering speaker speech information, and protect
Stay speaker tone color.This aspect can help muscular atrophy patient to train sounding, speech utterance auxiliary be carried out, while can also
It is improved to be exchanged with other people.
Existing voice enhancement algorithm lifts intelligibility for improving the SNR of voice, and this is to SNR high but speaker's presence
The voice signal that sounding problem causes intelligibility low has no too big help.
The present invention by analyzing the pronunciation characteristic and speech characteristic parameter of muscular atrophy patient and normal person, and with a large amount of
The neutral net and storehouse for feature extraction being carried out based on normal person's voice and being trained with this characteristic parameter are contrasted, to flesh
The characteristic parameter of meat atrophy patient's voice is adjusted so that understanding for voice is improved under the characteristics of ensureing speaker tone color
Degree.Specific implementation method is as follows:
The pronunciation characteristic and normal person's pronunciation characteristic of muscular atrophy patient are contrasted first and obtain both essential differences, this
Mainly carried out in frequency domain part, characteristic parameter can be extracted under different spectrums using Cepstrum Method, LPC methods, MFCC and LSP etc.;Cause
Which kind of parameter some difference obtained for every kind of method, specifically select also by the experience that laboratory technician is certain.Can take:Every kind of method
The parameter of calculating has a weight, then weight summation.
Secondly, with a large amount of normal person's speech characteristic parameters as reference library the method for characteristic parameter is extracted herein and is extracted
The method of muscular atrophy patient's speech characteristic parameter is consistent;
3rd, after the speech characteristic parameter of muscular atrophy patient is extracted, carry out simple sex division;4th, will suffer from
The neutral net that person's speech characteristic parameter passes through to have trained is classified, and is compared with information in corresponding sex storehouse, is obtained
Most close voice segments, and as reference, the speech characteristic parameter to muscular atrophy patient is adjusted;Finally, will adjust
Voice messaging afterwards returns to time domain and is synthesized, and exports into complete voice.
LPC methods:Linear predictive coding (linear predictive coding, LPC).
MFCC methods:Mel-frequency cepstrum coefficient (Mel Frequency Cepstrum Coefficient, MFCC).
LSP methods:Line spectrum pair (Line Spectrum Pairs, LSP).
Cepstrum Method:Cepstrum methods.
The present invention is further described with reference to the accompanying drawings and detailed description.
As shown in figure 1, causing the low formant enhancing method of the intelligibility of speech for points of articulation muscular atrophy for a kind of.
The reception of voice signal can be carried out by microphone etc. first, next carries out the pretreatment such as noise reduction, preemphasis, and by voice segments
Extract segmentation.Then formant enhancing is carried out, phonetic synthesis is finally carried out and is played or strengthened with view displaying formant
Front and rear voice is compared.
In formant strengthening part, time-domain signal is transformed into frequency domain first, and extract frequency domain character parameter.Referred to herein as
Frequency domain can be that frequency can also be mel-frequency (Mel-frequency), cepstrum rate (Cepstrum) etc..Extract feature
Method can be using LPC methods, Cepstrum Method, MFCC methods, LSP methods etc..The characteristic parameter that will be obtained carries out preliminary sex area first
Point, because frequency of the women generally than male is high.The neutral net for being built by learning algorithm again is classified, and obtains this fragment
Under voice be which kind of phoneme.Then it is compared with the normal person's sound bank under this phoneme, finds immediate phonetic feature,
As standard, adjustment needs the frequency domain character parameter of process signal.Time domain is finally transformed to, and is carried out appropriate amplitude and adjusted
The whole quality for ensureing voice.
The embodiment of present general inventive concept is will be described in now, and its example is stated in the accompanying drawings, wherein identical mark
Number all the time identical part is represented.Embodiment is described with reference to the accompanying drawings to explain the present general inventive concept.
Fig. 3 is the formant enhancing block diagram of the raising intelligibility of the embodiment according to present general inventive concept.
Reference picture 3, signal audio signal reception device 310 is by the radio reception of patient's voice signal and is stored, and radio reception includes voice signal
And ambient noise.
Preprocessor 320 carries out noise reduction process using spectrum-subtraction to the voice signal for storing, and the voice after noise reduction is entered
The treatment of row preemphasis.
Segment processor 330 is carried out to pretreated voice signal using short-time average zero-crossing rate and short-time energy detection
The division of voice segments non-speech segment, and each word or word or phoneme are segmented, every section is stored successively, and after passing sequentially through
Continuous processor.
The formant that formant enhancing processor 340 extracts voice by LPC mel cepstrum coefficients (LPCCMCC) is joined
Number, divides this formant parameter and calculated sex threshold value and is compared and carries out sex division, by corresponding after division
Train perfect neutral net to be classified under sex, by the sound bank for prestoring of correspondence phoneme after classification, found
Most close formant parameter, adjusts pending formant parameter, by the resonance after adjustment by standard of this formant parameter
Peak parameter transformation is exported to time domain, and suitably adjusts Time Domain Amplitude.
Synthesizer 350 sequentially synthesizes processed voice signal.
The voice signal after synthesis is presented by player plays and with pictorial representation for player 360.
Fig. 4 is the block diagram of the preprocessor 320 of Fig. 3.
320 points of preprocessor is denoiser 410 and preaccentuator 420.
Fig. 5 is the block diagram of the segment processor 330 of Fig. 3.
Zero-crossing detector 510 is analyzed by short-time average zero-crossing rate and carries out first step segmentation, for by voice from Jing Yin area
Split, if zero-crossing rate has unexpected decline and less than a certain threshold value, then it is assumed that be herein Jing Yin area, and will herein arrive previous
Locate to be divided into one section of voice between Jing Yin area.
Energy detector 520 can mark off voiceless sound and voiced sound to be further segmented for voice by short-time energy.
Fig. 6 is the block diagram of the formant enhancing processor 340 of Fig. 3.
Framing device 621 carries out framing to voice signal, and with 15-30ms as frame length, 10ms is moved for frame.
LPC coefficient extractor 622 extracts the LPC coefficient of every frame.
Cepstrum coefficient converter 623 is converted to cepstrum coefficient by LPC coefficient.
Cepstrum coefficient is carried out nonlinear transformation by mel-frequency converter 624 by Mel yardstick, is converted to LPC mel cepstrums
Coefficient.
LPC mel cepstrum coefficients are calculated resonance peak by formant calculator 625, take the center of first three formant frequently
Rate and with a width of formant parameter 640.
The formant centre frequency that gender sorter 626 is calculated 625 compares with a threshold value carries out sex stroke
Point, different sexes enter different phoneme graders.
Phoneme grader 627 is the neutral net after the training of normal person speech LPC MCC coefficients, the resonance after 625 are calculated
Peak center frequency and respective bandwidth obtain the classification of phoneme, into different comparators by grader.
Formant comparator 628 is the database under normal person's voice difference phoneme, will be led to by 626 formant parameter
Corresponding phoneme database is crossed, centre frequency and the minimum formant parameter 650 of bandwidth deviation is found.Herein mainly with formant
Centre frequency is Main Basiss, is keeping the second formant frequency F2In the case of deviation minimum, the first formant frequency F is looked for1Partially
The as far as possible small value of difference, finally keeps the 3rd formant frequency F3Deviation is as far as possible small.
Formant calculator is adjusted on the basis of the formant parameter that formant strengthens wave filter 629 to be selected in phoneme library
The bandwidth and amplitude of the formant parameter of calculating, formant parameter after being processed.F is assumed to be with to the formant that voice is obtained
It is compared with all formant f ' in sound bank, when two deviations are minimum, chooses this f ', on the basis of this f ',
Regulation f, the f after being processed.
Formant enhancing wave filter 629 take out phoneme library in be selected formant parameter 650 and by 625 calculate
Formant parameter 640, adjusts the bandwidth and amplitude of 640 formants on the basis of 650, formant parameter 660 after being processed.
Time domain converter 630 transforms to time-domain signal 670 by 660, line amplitude of going forward side by side adjustment.
Fig. 7 is the block diagram of Fig. 3 synthesizers 350.
Paragraph maker 710 will progressively synthesize into complete speech section per frame signal.
Every section of voice is sequentially synthesized into complete speech by VODER 720.
Fig. 8 is the block diagram of Fig. 3 players 360.
View device 810 is shown the time domain speech signal after before processing.
Speech player 820 will process after speech play.
Claims (5)
1. it is a kind of improve the intelligibility of speech formant Enhancement Method, it is characterized in that, first contrast muscular atrophy patient pronunciation
Feature and normal person's pronunciation characteristic simultaneously obtain both essential differences, using Cepstrum Method, linear predictive coding LPC methods, Mel frequently
Characteristic parameter is extracted under different spectrums one or more in rate cepstrum coefficient MFCC or line spectrum pair LSP;Secondly, with a large amount of normal
People's speech characteristic parameter extracts the method for characteristic parameter and extracts muscular atrophy patient's speech characteristic parameter herein as reference library
Method be consistent;3rd, after the speech characteristic parameter of muscular atrophy patient is extracted, carry out simple sex division;The
Four, the neutral net that patient's speech characteristic parameter passes through to have trained is classified, and ratio is done with information in corresponding sex storehouse
Compared with, most close voice segments are obtained, and as reference, the speech characteristic parameter to muscular atrophy patient is adjusted;Finally,
Voice messaging after adjustment is returned into time domain and is synthesized, export into complete voice.
2. the formant Enhancement Method of the intelligibility of speech is improved as claimed in claim 1, it is characterized in that, in formant enhanced portion
Point, time-domain signal is transformed into frequency domain first, and extract frequency domain character parameter, referred to herein as frequency domain can be that frequency can also
It is mel-frequency Mel-frequency, cepstrum rate Cepstrum.
3. it is a kind of improve the intelligibility of speech formant intensifier, it is characterized in that, by signal audio signal reception device, preprocessor, point
Section processor, formant enhancing processor, synthesizer, player are constituted;Wherein:
Signal audio signal reception device is by the radio reception of patient's voice signal and is stored, and radio reception includes voice signal and ambient noise;
Preprocessor carries out noise reduction process using spectrum-subtraction to the voice signal for storing, and the voice after noise reduction is carried out into preemphasis
Treatment;
Segment processor carries out voice segments to pretreated voice signal using short-time average zero-crossing rate and short-time energy detection
The division of non-speech segment, and each word or word or phoneme are segmented, every section is stored successively, and be sequentially output;
Formant enhancing processor extracts the formant parameter of voice by LPC mel cepstrum coefficients, by this formant parameter
Threshold value is divided with calculated sex to be compared and carry out sex division, by having trained perfect under corresponding sex after division
Neutral net is classified, and by the sound bank for prestoring of correspondence phoneme after classification, finds most close formant parameter,
Pending formant parameter is adjusted by standard of this formant parameter, the formant parameter after adjustment time domain is transformed into defeated
Go out, and suitably adjust Time Domain Amplitude;
Synthesizer sequentially synthesizes processed voice signal;
The voice signal after synthesis is presented by player plays and with pictorial representation for player.
4. the formant intensifier of the intelligibility of speech is improved as claimed in claim 3, it is characterized in that, in an instantiation
In, formant enhancing processor internal logic is:
Framing device carries out framing to voice signal, and with 15-30ms as frame length, 10ms is moved for frame;
LPC coefficient extractor extracts the LPC coefficient per frame;
Cepstrum coefficient converter is converted to cepstrum coefficient by LPC coefficient;
Cepstrum coefficient is carried out nonlinear transformation by mel-frequency converter by Mel yardstick, is converted to LPC Mel spectral coefficients;
LPCCMCC coefficients are calculated resonance peak by formant calculator, take the centre frequency of first three formant and with a width of
Formant parameter;
The formant centre frequency that be calculated for formant calculator by gender sorter compares with a threshold value and carries out sex
Divide, different sexes enter different phoneme graders.
5. the formant intensifier of the intelligibility of speech is improved as claimed in claim 3, it is characterized in that, phoneme grader is for just
Neutral net after the training of ordinary person's speech LPC mel cepstrum coefficients, the formant centre frequency after formant calculator is calculated
And respective bandwidth obtains the classification of phoneme, into different formant comparators by grader;
Formant comparator is the database under normal person's voice difference phoneme, will be led to by the formant parameter of gender sorter
Corresponding phoneme database is crossed, centre frequency and the minimum formant parameter of bandwidth deviation is found, herein mainly with formant
Frequency of heart is Main Basiss, is keeping the second formant frequency F2In the case of deviation minimum, the first formant frequency F is looked for1Deviation
Small value, finally keeps the 3rd formant frequency F deviations as far as possible3It is as far as possible small;
Formant enhancing wave filter is taken out the formant parameter that is selected in phoneme library and is calculated by formant calculator
Formant parameter, adjusts the formant parameter that formant calculator is calculated on the basis of the formant parameter being selected in phoneme library
Bandwidth and amplitude, formant parameter after being processed;
Formant parameter after treatment is transformed to time-domain signal by time domain converter, line amplitude of going forward side by side adjustment.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611118099.0A CN106710604A (en) | 2016-12-07 | 2016-12-07 | Formant enhancement apparatus and method for improving speech intelligibility |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611118099.0A CN106710604A (en) | 2016-12-07 | 2016-12-07 | Formant enhancement apparatus and method for improving speech intelligibility |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106710604A true CN106710604A (en) | 2017-05-24 |
Family
ID=58936430
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611118099.0A Pending CN106710604A (en) | 2016-12-07 | 2016-12-07 | Formant enhancement apparatus and method for improving speech intelligibility |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106710604A (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108899052A (en) * | 2018-07-10 | 2018-11-27 | 南京邮电大学 | A kind of Parkinson's sound enhancement method based on mostly with spectrum-subtraction |
CN109215635A (en) * | 2018-10-25 | 2019-01-15 | 武汉大学 | Broadband voice spectral tilt degree characteristic parameter method for reconstructing for speech intelligibility enhancing |
CN109346058A (en) * | 2018-11-29 | 2019-02-15 | 西安交通大学 | A kind of speech acoustics feature expansion system |
CN110070894A (en) * | 2019-03-26 | 2019-07-30 | 天津大学 | A kind of improved multiple pathology unit voice recognition methods |
CN110164454A (en) * | 2019-05-24 | 2019-08-23 | 广州国音智能科技有限公司 | A kind of audio identity method of discrimination and device based on resonance peak deviation |
CN110604568A (en) * | 2019-09-29 | 2019-12-24 | 三峡大学 | System and method for detecting singing tone in air street |
CN111108552A (en) * | 2019-12-24 | 2020-05-05 | 广州国音智能科技有限公司 | Voiceprint identity identification method and related device |
CN112687277A (en) * | 2021-03-15 | 2021-04-20 | 北京远鉴信息技术有限公司 | Method and device for determining voice formant, electronic equipment and readable storage medium |
CN112802489A (en) * | 2021-04-09 | 2021-05-14 | 广州健抿科技有限公司 | Automatic call voice adjusting system and method |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102498482A (en) * | 2009-09-14 | 2012-06-13 | Srs实验室有限公司 | System for adaptive voice intelligibility processing |
CN102610236A (en) * | 2012-02-29 | 2012-07-25 | 山东大学 | Method for improving voice quality of throat microphone |
CN105513597A (en) * | 2015-12-30 | 2016-04-20 | 百度在线网络技术(北京)有限公司 | Voiceprint authentication processing method and apparatus |
-
2016
- 2016-12-07 CN CN201611118099.0A patent/CN106710604A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102498482A (en) * | 2009-09-14 | 2012-06-13 | Srs实验室有限公司 | System for adaptive voice intelligibility processing |
CN102610236A (en) * | 2012-02-29 | 2012-07-25 | 山东大学 | Method for improving voice quality of throat microphone |
CN105513597A (en) * | 2015-12-30 | 2016-04-20 | 百度在线网络技术(北京)有限公司 | Voiceprint authentication processing method and apparatus |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108899052A (en) * | 2018-07-10 | 2018-11-27 | 南京邮电大学 | A kind of Parkinson's sound enhancement method based on mostly with spectrum-subtraction |
CN108899052B (en) * | 2018-07-10 | 2020-12-01 | 南京邮电大学 | Parkinson speech enhancement method based on multi-band spectral subtraction |
CN109215635A (en) * | 2018-10-25 | 2019-01-15 | 武汉大学 | Broadband voice spectral tilt degree characteristic parameter method for reconstructing for speech intelligibility enhancing |
CN109346058A (en) * | 2018-11-29 | 2019-02-15 | 西安交通大学 | A kind of speech acoustics feature expansion system |
CN110070894B (en) * | 2019-03-26 | 2021-08-03 | 天津大学 | Improved method for identifying multiple pathological unit tones |
CN110070894A (en) * | 2019-03-26 | 2019-07-30 | 天津大学 | A kind of improved multiple pathology unit voice recognition methods |
CN110164454A (en) * | 2019-05-24 | 2019-08-23 | 广州国音智能科技有限公司 | A kind of audio identity method of discrimination and device based on resonance peak deviation |
CN110164454B (en) * | 2019-05-24 | 2021-08-24 | 广州国音智能科技有限公司 | Formant deviation-based audio identity discrimination method and device |
CN110604568A (en) * | 2019-09-29 | 2019-12-24 | 三峡大学 | System and method for detecting singing tone in air street |
CN110604568B (en) * | 2019-09-29 | 2022-01-04 | 三峡大学 | System and method for detecting singing tone in air street |
CN111108552A (en) * | 2019-12-24 | 2020-05-05 | 广州国音智能科技有限公司 | Voiceprint identity identification method and related device |
CN112687277B (en) * | 2021-03-15 | 2021-06-18 | 北京远鉴信息技术有限公司 | Method and device for determining voice formant, electronic equipment and readable storage medium |
CN112687277A (en) * | 2021-03-15 | 2021-04-20 | 北京远鉴信息技术有限公司 | Method and device for determining voice formant, electronic equipment and readable storage medium |
CN112802489A (en) * | 2021-04-09 | 2021-05-14 | 广州健抿科技有限公司 | Automatic call voice adjusting system and method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106710604A (en) | Formant enhancement apparatus and method for improving speech intelligibility | |
CN103928023B (en) | A kind of speech assessment method and system | |
US8036891B2 (en) | Methods of identification using voice sound analysis | |
CN102063899B (en) | Method for voice conversion under unparallel text condition | |
CN111462769B (en) | End-to-end accent conversion method | |
US20120150544A1 (en) | Method and system for reconstructing speech from an input signal comprising whispers | |
JP2002014689A (en) | Method and device for improving understandability of digitally compressed speech | |
Trabelsi et al. | On the use of different feature extraction methods for linear and non linear kernels | |
CN1815552A (en) | Frequency spectrum modelling and voice reinforcing method based on line spectrum frequency and its interorder differential parameter | |
CN110277087A (en) | A kind of broadcast singal anticipation preprocess method | |
CN108281150B (en) | Voice tone-changing voice-changing method based on differential glottal wave model | |
CN111489763B (en) | GMM model-based speaker recognition self-adaption method in complex environment | |
Katsir et al. | Evaluation of a speech bandwidth extension algorithm based on vocal tract shape estimation | |
Walliczek et al. | Sub-word unit based non-audible speech recognition using surface electromyography | |
KR20070045772A (en) | Apparatus for vocal-cord signal recognition and its method | |
CN114913844A (en) | Broadcast language identification method for pitch normalization reconstruction | |
Shuang et al. | A novel voice conversion system based on codebook mapping with phoneme-tied weighting | |
CN110033786B (en) | Gender judgment method, device, equipment and readable storage medium | |
Singh et al. | Features and techniques for speaker recognition | |
Garcia et al. | Oesophageal speech enhancement using poles stabilization and Kalman filtering | |
TWI746138B (en) | System for clarifying a dysarthria voice and method thereof | |
Sivaram et al. | Enhancement of dysarthric speech for developing an effective speech therapy tool | |
Ali et al. | Esophageal speech enhancement using excitation source synthesis and formant structure modification | |
Diener | Improving unit selection based EMG-to-speech conversion | |
KR101567566B1 (en) | System and Method for Statistical Speech Synthesis with Personalized Synthetic Voice |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170524 |