CN106710604A - Formant enhancement apparatus and method for improving speech intelligibility - Google Patents

Formant enhancement apparatus and method for improving speech intelligibility Download PDF

Info

Publication number
CN106710604A
CN106710604A CN201611118099.0A CN201611118099A CN106710604A CN 106710604 A CN106710604 A CN 106710604A CN 201611118099 A CN201611118099 A CN 201611118099A CN 106710604 A CN106710604 A CN 106710604A
Authority
CN
China
Prior art keywords
formant
parameter
speech
frequency
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611118099.0A
Other languages
Chinese (zh)
Inventor
薛玮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN201611118099.0A priority Critical patent/CN106710604A/en
Publication of CN106710604A publication Critical patent/CN106710604A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • G10L21/0364Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude for improving intelligibility
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0224Processing in the time domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Prostheses (AREA)

Abstract

The invention relates to speech enhancement and brings forward a formant enhancement apparatus and method for improving speech intelligibility, for the purposes of enhancing or recovering voice information of a speaker and reserving tone quality of the speaker for pronunciation features of a patient with muscular atrophy. On one hand, the apparatus and method can help the patient with the muscular atrophy to train pronunciation to carry out speech sounding aiding, and on the other hand, the apparatus and method can also improve communication between the patient with others. According to the formant enhancement apparatus and method for improving speech intelligibility, first of all, pronunciation features of the patient with the muscular atrophy are compared with pronunciation features of normal people, and major differences of the two are obtained; secondly, numerous speech feature parameters of the normal people are taken as a reference database, at the moment, the method for extracting the feature parameters maintains consistent with the method for extracting the speech feature parameters of the patient with the muscular atrophy; thirdly, after the speech feature parameters of the patient with the muscular atrophy are extracted, simple gender division is carried out; and fourthly, the speech feature parameters of the patient are classifier through a trained neural network. The apparatus and method are applied to a speech enhancement occasion.

Description

Improve the formant intensifier and method of the intelligibility of speech
Technical field
The present invention relates to speech enhan-cement;Speech processes, specifically, be related to improve points of articulation muscular atrophy patient's voice can The formant Enhancement Method of degree of understanding.
Background technology
Muscular atrophy refers to because the meat fiber caused muscle volume such as even disappear that attenuates reduces caused by dystrophia The reason for problem, initiation, has:Neurogenic muscular atrophy, muscle-derived amyotrophia, muscular atrophy of disuse and other reasonses etc..In addition Also there is substantial connection with nervous system, spinal cord disease also often results in muscular atrophy, such as the parkinsonism that Stephen Hawking is suffered from, Also senile dementia, multiple sclerosis, amyotrophic lateral sclerosis (ALS) etc., this causes that the pronunciation of sufferer is received To influence.
Additionally, with the implementation of domestic family planning policy, the elderly's population is gradually increasing, Aging Problem is serious, This becomes apparent in developed country.With advancing age, the muscle of the old person increasingly atrophy, the channel portion of pronunciation Muscle be also gradually difficult to control to, therefore cause speech intelligibility to decline, intelligibility it is low, become difficult with other people exchange.
The method of existing speech enhan-cement, [1-2] is processed both for by the voice of the normal person of the interference such as noise, But this characteristic voice with muscular atrophy patient is simultaneously differed.Because there is problem in the sound channel position of muscular atrophy patient, institute To cause defects of voice, for example, there are problems that formant is incomplete or not from frequency spectrum, this cause with noise to language The destruction of sound is not very identical, therefore treating method is also not quite alike.
Additionally, there are some researches show muscular atrophy or the patient with nerve degenerative diseases etc. cannot very well control hair When the crowd of raw position muscle group carries out sounding and resumes training, speaker's tone color is maintained if can hear in the training process And the voice of lamprophonia, it resume training can play very big booster action.
In speech processes, time domain is elusive, therefore voice signal is transformed into frequency domain and at frequency domain Reason is topmost mode.Wherein, formant reflects the tract characteristics and tone color of speaker, therefore formant is adjusted It is whole main and effective.
[1] DST LLC is used for system [P] Chinese patents of self adaptation voice intellingibility treatment, 102498482B, 2014-10-15.
[2] Samsung Electronics Co., Ltd strengthens method and apparatus [P] Chinese patents of dialogue using formant, 1619646A, 200-05-25.
The content of the invention
To overcome the deficiencies in the prior art, the present invention is directed to propose for the pronunciation characteristic of muscular atrophy patient, enhancing or Person recovers speaker speech information, and retains speaker tone color.This aspect can help muscular atrophy patient to train sounding, enter Row speech utterance is aided in, while can also improve it being exchanged with other people.The technical solution adopted by the present invention is to improve voice The formant Enhancement Method of intelligibility, contrasts pronunciation characteristic and normal person's pronunciation characteristic of muscular atrophy patient and obtains two first The essential difference of person, using in Cepstrum Method, linear predictive coding LPC methods, mel-frequency cepstrum coefficient MFCC or line spectrum pair LSP Characteristic parameter is extracted under different spectrums one or more;Secondly, with a large amount of normal person's speech characteristic parameters as reference library, herein The method and the method for extracting muscular atrophy patient's speech characteristic parameter for extracting characteristic parameter are consistent;3rd, muscle is withered After the speech characteristic parameter of contracting patient is extracted, simple sex division is carried out;4th, by patient's speech characteristic parameter by having instructed The neutral net perfected is classified, and is compared with information in corresponding sex storehouse, obtains most close voice segments, and as Reference, the speech characteristic parameter to muscular atrophy patient is adjusted;Finally, the voice messaging after adjustment is returned into time domain simultaneously Synthesized, exported into complete voice.
In formant strengthening part, time-domain signal is transformed into frequency domain first, and extract frequency domain character parameter, referred to herein as Frequency domain can be that frequency can also be mel-frequency Mel-frequency, cepstrum rate Cepstrum.
Improve the intelligibility of speech formant intensifier, by signal audio signal reception device, preprocessor, segment processor, altogether Shake peak enhancing processor, synthesizer, player constitute;Wherein:
Signal audio signal reception device is by the radio reception of patient's voice signal and is stored, and radio reception includes voice signal and ambient noise;
Preprocessor carries out noise reduction process using spectrum-subtraction to the voice signal for storing, and the voice after noise reduction is carried out pre- Exacerbation is processed;
Segment processor carries out language to pretreated voice signal using short-time average zero-crossing rate and short-time energy detection The division of segment non-speech segment, and each word or word or phoneme are segmented, every section is stored successively, and be sequentially output;
Formant enhancing processor extracts the formant parameter of voice by LPC mel cepstrum coefficients, by this formant Parameter divides threshold value and is compared and carries out sex division with calculated sex, by having been trained under corresponding sex after division Kind neutral net is classified, and by the sound bank for prestoring of correspondence phoneme after classification, finds most close formant Parameter, adjusts pending formant parameter, when the formant parameter after adjustment is transformed to by standard of this formant parameter Domain exports, and suitably adjusts Time Domain Amplitude;
Synthesizer sequentially synthesizes processed voice signal;
The voice signal after synthesis is presented by player plays and with pictorial representation for player.
In an instantiation, formant enhancing processor internal logic is:
Framing device carries out framing to voice signal, and with 15-30ms as frame length, 10ms is moved for frame;
LPC coefficient extractor extracts the LPC coefficient per frame;
Cepstrum coefficient converter is converted to cepstrum coefficient by LPC coefficient;
Cepstrum coefficient is carried out nonlinear transformation by mel-frequency converter by Mel yardstick, is converted to LPC Mel spectral coefficients;
LPCCMCC coefficients are calculated resonance peak by formant calculator, take the centre frequency and band of first three formant A width of formant parameter;
The formant centre frequency that be calculated for formant calculator by gender sorter compares with a threshold value and carries out Sex is divided, and different sexes enter different phoneme graders.
Phoneme grader is the neutral net after the training of normal person's speech LPC mel cepstrum coefficients, by formant calculator Formant centre frequency and respective bandwidth after calculating obtain the classification of phoneme, into different formant ratios by grader Compared with device;
Formant comparator is the database under normal person's voice difference phoneme, will be joined by the formant of gender sorter Number finds centre frequency and the minimum formant parameter of bandwidth deviation, herein mainly with resonance by corresponding phoneme database Peak center frequency is Main Basiss, is keeping the second formant frequency F2In the case of deviation minimum, the first formant frequency F is looked for1 Deviation is tried one's best small value, finally keeps the 3rd formant frequency F deviations3It is as far as possible small;
Formant enhancing wave filter take out phoneme library in be selected formant parameter and by formant calculator meter The formant parameter of calculation, adjusts the formant that formant calculator is calculated on the basis of the formant parameter being selected in phoneme library The bandwidth and amplitude of parameter, formant parameter after being processed.
Formant parameter after treatment is transformed to time-domain signal by time domain converter, line amplitude of going forward side by side adjustment.
The features of the present invention and beneficial effect are:
The present invention carries out repairing on frequency domain and time domain for the pronunciation characteristic of muscular atrophy patient by its voice signal It is multiple so that the intelligibility and quality of voice to be improved while speaker's tone color is retained, it is possible to carry out voice as patient extensive The auxiliary method that refreshment is practiced.
Brief description of the drawings:
Fig. 1 is a kind of to cause the low formant enhancing method of the intelligibility of speech for points of articulation muscular atrophy.
Fig. 2 formants strengthen block diagram.
Fig. 3 is the formant enhancing block diagram of the raising intelligibility of the embodiment according to present general inventive concept.
Fig. 4 is the block diagram of the preprocessor 320 of Fig. 3.
Fig. 5 is the block diagram of the segment processor 330 of Fig. 3.
Fig. 6 is the block diagram of the formant enhancing processor 340 of Fig. 3.
Fig. 7 is the block diagram of Fig. 3 synthesizers 350.
Fig. 8 is the block diagram of Fig. 3 players 360.
Specific embodiment
Being directed to the voice messaging under noise circumstance such as interchannel noise, radio reception noise etc. the method for existing speech enhan-cement more Enhancing, it improves intelligibility etc. mainly for reducing noise component(s) by algorithm, improve speech components.But, muscle withers The subject matter of contracting patient's voice be due to sound channel, the control ability of oral cavity muscle reduction, cause voiced sound formant lack, The problems such as voiceless sound is not protruded.This improves SNR and can not well lift voice messaging, raising intelligibility only by noise reduction, because To be difficult to distinguish whether unsharp part is noise.It is low even intelligibility to be there is also when the signal to noise ratio of patient's voice is very high Problem.
It is contemplated that for the pronunciation characteristic of muscular atrophy patient, strengthening or recovering speaker speech information, and protect Stay speaker tone color.This aspect can help muscular atrophy patient to train sounding, speech utterance auxiliary be carried out, while can also It is improved to be exchanged with other people.
Existing voice enhancement algorithm lifts intelligibility for improving the SNR of voice, and this is to SNR high but speaker's presence The voice signal that sounding problem causes intelligibility low has no too big help.
The present invention by analyzing the pronunciation characteristic and speech characteristic parameter of muscular atrophy patient and normal person, and with a large amount of The neutral net and storehouse for feature extraction being carried out based on normal person's voice and being trained with this characteristic parameter are contrasted, to flesh The characteristic parameter of meat atrophy patient's voice is adjusted so that understanding for voice is improved under the characteristics of ensureing speaker tone color Degree.Specific implementation method is as follows:
The pronunciation characteristic and normal person's pronunciation characteristic of muscular atrophy patient are contrasted first and obtain both essential differences, this Mainly carried out in frequency domain part, characteristic parameter can be extracted under different spectrums using Cepstrum Method, LPC methods, MFCC and LSP etc.;Cause Which kind of parameter some difference obtained for every kind of method, specifically select also by the experience that laboratory technician is certain.Can take:Every kind of method The parameter of calculating has a weight, then weight summation.
Secondly, with a large amount of normal person's speech characteristic parameters as reference library the method for characteristic parameter is extracted herein and is extracted The method of muscular atrophy patient's speech characteristic parameter is consistent;
3rd, after the speech characteristic parameter of muscular atrophy patient is extracted, carry out simple sex division;4th, will suffer from The neutral net that person's speech characteristic parameter passes through to have trained is classified, and is compared with information in corresponding sex storehouse, is obtained Most close voice segments, and as reference, the speech characteristic parameter to muscular atrophy patient is adjusted;Finally, will adjust Voice messaging afterwards returns to time domain and is synthesized, and exports into complete voice.
LPC methods:Linear predictive coding (linear predictive coding, LPC).
MFCC methods:Mel-frequency cepstrum coefficient (Mel Frequency Cepstrum Coefficient, MFCC).
LSP methods:Line spectrum pair (Line Spectrum Pairs, LSP).
Cepstrum Method:Cepstrum methods.
The present invention is further described with reference to the accompanying drawings and detailed description.
As shown in figure 1, causing the low formant enhancing method of the intelligibility of speech for points of articulation muscular atrophy for a kind of. The reception of voice signal can be carried out by microphone etc. first, next carries out the pretreatment such as noise reduction, preemphasis, and by voice segments Extract segmentation.Then formant enhancing is carried out, phonetic synthesis is finally carried out and is played or strengthened with view displaying formant Front and rear voice is compared.
In formant strengthening part, time-domain signal is transformed into frequency domain first, and extract frequency domain character parameter.Referred to herein as Frequency domain can be that frequency can also be mel-frequency (Mel-frequency), cepstrum rate (Cepstrum) etc..Extract feature Method can be using LPC methods, Cepstrum Method, MFCC methods, LSP methods etc..The characteristic parameter that will be obtained carries out preliminary sex area first Point, because frequency of the women generally than male is high.The neutral net for being built by learning algorithm again is classified, and obtains this fragment Under voice be which kind of phoneme.Then it is compared with the normal person's sound bank under this phoneme, finds immediate phonetic feature, As standard, adjustment needs the frequency domain character parameter of process signal.Time domain is finally transformed to, and is carried out appropriate amplitude and adjusted The whole quality for ensureing voice.
The embodiment of present general inventive concept is will be described in now, and its example is stated in the accompanying drawings, wherein identical mark Number all the time identical part is represented.Embodiment is described with reference to the accompanying drawings to explain the present general inventive concept.
Fig. 3 is the formant enhancing block diagram of the raising intelligibility of the embodiment according to present general inventive concept.
Reference picture 3, signal audio signal reception device 310 is by the radio reception of patient's voice signal and is stored, and radio reception includes voice signal And ambient noise.
Preprocessor 320 carries out noise reduction process using spectrum-subtraction to the voice signal for storing, and the voice after noise reduction is entered The treatment of row preemphasis.
Segment processor 330 is carried out to pretreated voice signal using short-time average zero-crossing rate and short-time energy detection The division of voice segments non-speech segment, and each word or word or phoneme are segmented, every section is stored successively, and after passing sequentially through Continuous processor.
The formant that formant enhancing processor 340 extracts voice by LPC mel cepstrum coefficients (LPCCMCC) is joined Number, divides this formant parameter and calculated sex threshold value and is compared and carries out sex division, by corresponding after division Train perfect neutral net to be classified under sex, by the sound bank for prestoring of correspondence phoneme after classification, found Most close formant parameter, adjusts pending formant parameter, by the resonance after adjustment by standard of this formant parameter Peak parameter transformation is exported to time domain, and suitably adjusts Time Domain Amplitude.
Synthesizer 350 sequentially synthesizes processed voice signal.
The voice signal after synthesis is presented by player plays and with pictorial representation for player 360.
Fig. 4 is the block diagram of the preprocessor 320 of Fig. 3.
320 points of preprocessor is denoiser 410 and preaccentuator 420.
Fig. 5 is the block diagram of the segment processor 330 of Fig. 3.
Zero-crossing detector 510 is analyzed by short-time average zero-crossing rate and carries out first step segmentation, for by voice from Jing Yin area Split, if zero-crossing rate has unexpected decline and less than a certain threshold value, then it is assumed that be herein Jing Yin area, and will herein arrive previous Locate to be divided into one section of voice between Jing Yin area.
Energy detector 520 can mark off voiceless sound and voiced sound to be further segmented for voice by short-time energy.
Fig. 6 is the block diagram of the formant enhancing processor 340 of Fig. 3.
Framing device 621 carries out framing to voice signal, and with 15-30ms as frame length, 10ms is moved for frame.
LPC coefficient extractor 622 extracts the LPC coefficient of every frame.
Cepstrum coefficient converter 623 is converted to cepstrum coefficient by LPC coefficient.
Cepstrum coefficient is carried out nonlinear transformation by mel-frequency converter 624 by Mel yardstick, is converted to LPC mel cepstrums Coefficient.
LPC mel cepstrum coefficients are calculated resonance peak by formant calculator 625, take the center of first three formant frequently Rate and with a width of formant parameter 640.
The formant centre frequency that gender sorter 626 is calculated 625 compares with a threshold value carries out sex stroke Point, different sexes enter different phoneme graders.
Phoneme grader 627 is the neutral net after the training of normal person speech LPC MCC coefficients, the resonance after 625 are calculated Peak center frequency and respective bandwidth obtain the classification of phoneme, into different comparators by grader.
Formant comparator 628 is the database under normal person's voice difference phoneme, will be led to by 626 formant parameter Corresponding phoneme database is crossed, centre frequency and the minimum formant parameter 650 of bandwidth deviation is found.Herein mainly with formant Centre frequency is Main Basiss, is keeping the second formant frequency F2In the case of deviation minimum, the first formant frequency F is looked for1Partially The as far as possible small value of difference, finally keeps the 3rd formant frequency F3Deviation is as far as possible small.
Formant calculator is adjusted on the basis of the formant parameter that formant strengthens wave filter 629 to be selected in phoneme library The bandwidth and amplitude of the formant parameter of calculating, formant parameter after being processed.F is assumed to be with to the formant that voice is obtained It is compared with all formant f ' in sound bank, when two deviations are minimum, chooses this f ', on the basis of this f ', Regulation f, the f after being processed.
Formant enhancing wave filter 629 take out phoneme library in be selected formant parameter 650 and by 625 calculate Formant parameter 640, adjusts the bandwidth and amplitude of 640 formants on the basis of 650, formant parameter 660 after being processed.
Time domain converter 630 transforms to time-domain signal 670 by 660, line amplitude of going forward side by side adjustment.
Fig. 7 is the block diagram of Fig. 3 synthesizers 350.
Paragraph maker 710 will progressively synthesize into complete speech section per frame signal.
Every section of voice is sequentially synthesized into complete speech by VODER 720.
Fig. 8 is the block diagram of Fig. 3 players 360.
View device 810 is shown the time domain speech signal after before processing.
Speech player 820 will process after speech play.

Claims (5)

1. it is a kind of improve the intelligibility of speech formant Enhancement Method, it is characterized in that, first contrast muscular atrophy patient pronunciation Feature and normal person's pronunciation characteristic simultaneously obtain both essential differences, using Cepstrum Method, linear predictive coding LPC methods, Mel frequently Characteristic parameter is extracted under different spectrums one or more in rate cepstrum coefficient MFCC or line spectrum pair LSP;Secondly, with a large amount of normal People's speech characteristic parameter extracts the method for characteristic parameter and extracts muscular atrophy patient's speech characteristic parameter herein as reference library Method be consistent;3rd, after the speech characteristic parameter of muscular atrophy patient is extracted, carry out simple sex division;The Four, the neutral net that patient's speech characteristic parameter passes through to have trained is classified, and ratio is done with information in corresponding sex storehouse Compared with, most close voice segments are obtained, and as reference, the speech characteristic parameter to muscular atrophy patient is adjusted;Finally, Voice messaging after adjustment is returned into time domain and is synthesized, export into complete voice.
2. the formant Enhancement Method of the intelligibility of speech is improved as claimed in claim 1, it is characterized in that, in formant enhanced portion Point, time-domain signal is transformed into frequency domain first, and extract frequency domain character parameter, referred to herein as frequency domain can be that frequency can also It is mel-frequency Mel-frequency, cepstrum rate Cepstrum.
3. it is a kind of improve the intelligibility of speech formant intensifier, it is characterized in that, by signal audio signal reception device, preprocessor, point Section processor, formant enhancing processor, synthesizer, player are constituted;Wherein:
Signal audio signal reception device is by the radio reception of patient's voice signal and is stored, and radio reception includes voice signal and ambient noise;
Preprocessor carries out noise reduction process using spectrum-subtraction to the voice signal for storing, and the voice after noise reduction is carried out into preemphasis Treatment;
Segment processor carries out voice segments to pretreated voice signal using short-time average zero-crossing rate and short-time energy detection The division of non-speech segment, and each word or word or phoneme are segmented, every section is stored successively, and be sequentially output;
Formant enhancing processor extracts the formant parameter of voice by LPC mel cepstrum coefficients, by this formant parameter Threshold value is divided with calculated sex to be compared and carry out sex division, by having trained perfect under corresponding sex after division Neutral net is classified, and by the sound bank for prestoring of correspondence phoneme after classification, finds most close formant parameter, Pending formant parameter is adjusted by standard of this formant parameter, the formant parameter after adjustment time domain is transformed into defeated Go out, and suitably adjust Time Domain Amplitude;
Synthesizer sequentially synthesizes processed voice signal;
The voice signal after synthesis is presented by player plays and with pictorial representation for player.
4. the formant intensifier of the intelligibility of speech is improved as claimed in claim 3, it is characterized in that, in an instantiation In, formant enhancing processor internal logic is:
Framing device carries out framing to voice signal, and with 15-30ms as frame length, 10ms is moved for frame;
LPC coefficient extractor extracts the LPC coefficient per frame;
Cepstrum coefficient converter is converted to cepstrum coefficient by LPC coefficient;
Cepstrum coefficient is carried out nonlinear transformation by mel-frequency converter by Mel yardstick, is converted to LPC Mel spectral coefficients;
LPCCMCC coefficients are calculated resonance peak by formant calculator, take the centre frequency of first three formant and with a width of Formant parameter;
The formant centre frequency that be calculated for formant calculator by gender sorter compares with a threshold value and carries out sex Divide, different sexes enter different phoneme graders.
5. the formant intensifier of the intelligibility of speech is improved as claimed in claim 3, it is characterized in that, phoneme grader is for just Neutral net after the training of ordinary person's speech LPC mel cepstrum coefficients, the formant centre frequency after formant calculator is calculated And respective bandwidth obtains the classification of phoneme, into different formant comparators by grader;
Formant comparator is the database under normal person's voice difference phoneme, will be led to by the formant parameter of gender sorter Corresponding phoneme database is crossed, centre frequency and the minimum formant parameter of bandwidth deviation is found, herein mainly with formant Frequency of heart is Main Basiss, is keeping the second formant frequency F2In the case of deviation minimum, the first formant frequency F is looked for1Deviation Small value, finally keeps the 3rd formant frequency F deviations as far as possible3It is as far as possible small;
Formant enhancing wave filter is taken out the formant parameter that is selected in phoneme library and is calculated by formant calculator Formant parameter, adjusts the formant parameter that formant calculator is calculated on the basis of the formant parameter being selected in phoneme library Bandwidth and amplitude, formant parameter after being processed;
Formant parameter after treatment is transformed to time-domain signal by time domain converter, line amplitude of going forward side by side adjustment.
CN201611118099.0A 2016-12-07 2016-12-07 Formant enhancement apparatus and method for improving speech intelligibility Pending CN106710604A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611118099.0A CN106710604A (en) 2016-12-07 2016-12-07 Formant enhancement apparatus and method for improving speech intelligibility

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611118099.0A CN106710604A (en) 2016-12-07 2016-12-07 Formant enhancement apparatus and method for improving speech intelligibility

Publications (1)

Publication Number Publication Date
CN106710604A true CN106710604A (en) 2017-05-24

Family

ID=58936430

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611118099.0A Pending CN106710604A (en) 2016-12-07 2016-12-07 Formant enhancement apparatus and method for improving speech intelligibility

Country Status (1)

Country Link
CN (1) CN106710604A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108899052A (en) * 2018-07-10 2018-11-27 南京邮电大学 A kind of Parkinson's sound enhancement method based on mostly with spectrum-subtraction
CN109215635A (en) * 2018-10-25 2019-01-15 武汉大学 Broadband voice spectral tilt degree characteristic parameter method for reconstructing for speech intelligibility enhancing
CN109346058A (en) * 2018-11-29 2019-02-15 西安交通大学 A kind of speech acoustics feature expansion system
CN110070894A (en) * 2019-03-26 2019-07-30 天津大学 A kind of improved multiple pathology unit voice recognition methods
CN110164454A (en) * 2019-05-24 2019-08-23 广州国音智能科技有限公司 A kind of audio identity method of discrimination and device based on resonance peak deviation
CN110604568A (en) * 2019-09-29 2019-12-24 三峡大学 System and method for detecting singing tone in air street
CN111108552A (en) * 2019-12-24 2020-05-05 广州国音智能科技有限公司 Voiceprint identity identification method and related device
CN112687277A (en) * 2021-03-15 2021-04-20 北京远鉴信息技术有限公司 Method and device for determining voice formant, electronic equipment and readable storage medium
CN112802489A (en) * 2021-04-09 2021-05-14 广州健抿科技有限公司 Automatic call voice adjusting system and method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102498482A (en) * 2009-09-14 2012-06-13 Srs实验室有限公司 System for adaptive voice intelligibility processing
CN102610236A (en) * 2012-02-29 2012-07-25 山东大学 Method for improving voice quality of throat microphone
CN105513597A (en) * 2015-12-30 2016-04-20 百度在线网络技术(北京)有限公司 Voiceprint authentication processing method and apparatus

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102498482A (en) * 2009-09-14 2012-06-13 Srs实验室有限公司 System for adaptive voice intelligibility processing
CN102610236A (en) * 2012-02-29 2012-07-25 山东大学 Method for improving voice quality of throat microphone
CN105513597A (en) * 2015-12-30 2016-04-20 百度在线网络技术(北京)有限公司 Voiceprint authentication processing method and apparatus

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108899052A (en) * 2018-07-10 2018-11-27 南京邮电大学 A kind of Parkinson's sound enhancement method based on mostly with spectrum-subtraction
CN108899052B (en) * 2018-07-10 2020-12-01 南京邮电大学 Parkinson speech enhancement method based on multi-band spectral subtraction
CN109215635A (en) * 2018-10-25 2019-01-15 武汉大学 Broadband voice spectral tilt degree characteristic parameter method for reconstructing for speech intelligibility enhancing
CN109346058A (en) * 2018-11-29 2019-02-15 西安交通大学 A kind of speech acoustics feature expansion system
CN110070894B (en) * 2019-03-26 2021-08-03 天津大学 Improved method for identifying multiple pathological unit tones
CN110070894A (en) * 2019-03-26 2019-07-30 天津大学 A kind of improved multiple pathology unit voice recognition methods
CN110164454A (en) * 2019-05-24 2019-08-23 广州国音智能科技有限公司 A kind of audio identity method of discrimination and device based on resonance peak deviation
CN110164454B (en) * 2019-05-24 2021-08-24 广州国音智能科技有限公司 Formant deviation-based audio identity discrimination method and device
CN110604568A (en) * 2019-09-29 2019-12-24 三峡大学 System and method for detecting singing tone in air street
CN110604568B (en) * 2019-09-29 2022-01-04 三峡大学 System and method for detecting singing tone in air street
CN111108552A (en) * 2019-12-24 2020-05-05 广州国音智能科技有限公司 Voiceprint identity identification method and related device
CN112687277B (en) * 2021-03-15 2021-06-18 北京远鉴信息技术有限公司 Method and device for determining voice formant, electronic equipment and readable storage medium
CN112687277A (en) * 2021-03-15 2021-04-20 北京远鉴信息技术有限公司 Method and device for determining voice formant, electronic equipment and readable storage medium
CN112802489A (en) * 2021-04-09 2021-05-14 广州健抿科技有限公司 Automatic call voice adjusting system and method

Similar Documents

Publication Publication Date Title
CN106710604A (en) Formant enhancement apparatus and method for improving speech intelligibility
CN103928023B (en) A kind of speech assessment method and system
US8036891B2 (en) Methods of identification using voice sound analysis
CN102063899B (en) Method for voice conversion under unparallel text condition
CN111462769B (en) End-to-end accent conversion method
US20120150544A1 (en) Method and system for reconstructing speech from an input signal comprising whispers
JP2002014689A (en) Method and device for improving understandability of digitally compressed speech
Trabelsi et al. On the use of different feature extraction methods for linear and non linear kernels
CN1815552A (en) Frequency spectrum modelling and voice reinforcing method based on line spectrum frequency and its interorder differential parameter
CN110277087A (en) A kind of broadcast singal anticipation preprocess method
CN108281150B (en) Voice tone-changing voice-changing method based on differential glottal wave model
CN111489763B (en) GMM model-based speaker recognition self-adaption method in complex environment
Katsir et al. Evaluation of a speech bandwidth extension algorithm based on vocal tract shape estimation
Walliczek et al. Sub-word unit based non-audible speech recognition using surface electromyography
KR20070045772A (en) Apparatus for vocal-cord signal recognition and its method
CN114913844A (en) Broadcast language identification method for pitch normalization reconstruction
Shuang et al. A novel voice conversion system based on codebook mapping with phoneme-tied weighting
CN110033786B (en) Gender judgment method, device, equipment and readable storage medium
Singh et al. Features and techniques for speaker recognition
Garcia et al. Oesophageal speech enhancement using poles stabilization and Kalman filtering
TWI746138B (en) System for clarifying a dysarthria voice and method thereof
Sivaram et al. Enhancement of dysarthric speech for developing an effective speech therapy tool
Ali et al. Esophageal speech enhancement using excitation source synthesis and formant structure modification
Diener Improving unit selection based EMG-to-speech conversion
KR101567566B1 (en) System and Method for Statistical Speech Synthesis with Personalized Synthetic Voice

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170524