CN109545189A - A kind of spoken language pronunciation error detection and correcting system based on machine learning - Google Patents
A kind of spoken language pronunciation error detection and correcting system based on machine learning Download PDFInfo
- Publication number
- CN109545189A CN109545189A CN201811534792.5A CN201811534792A CN109545189A CN 109545189 A CN109545189 A CN 109545189A CN 201811534792 A CN201811534792 A CN 201811534792A CN 109545189 A CN109545189 A CN 109545189A
- Authority
- CN
- China
- Prior art keywords
- pronunciation
- error detection
- spoken language
- phoneme
- frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 71
- 238000010801 machine learning Methods 0.000 title claims abstract description 23
- 238000012549 training Methods 0.000 claims abstract description 48
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 15
- 238000012937 correction Methods 0.000 claims abstract description 15
- 239000013598 vector Substances 0.000 claims description 26
- 238000012360 testing method Methods 0.000 claims description 16
- 230000006870 function Effects 0.000 claims description 14
- 238000012706 support-vector machine Methods 0.000 claims description 13
- 238000012545 processing Methods 0.000 claims description 12
- 238000001228 spectrum Methods 0.000 claims description 11
- 239000000284 extract Substances 0.000 claims description 8
- 238000000605 extraction Methods 0.000 claims description 7
- 238000007476 Maximum Likelihood Methods 0.000 claims description 6
- 230000003595 spectral effect Effects 0.000 claims description 6
- 238000000926 separation method Methods 0.000 claims description 5
- 206010027476 Metastases Diseases 0.000 claims description 3
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 230000009401 metastasis Effects 0.000 claims description 3
- 238000012163 sequencing technique Methods 0.000 claims description 3
- 230000002123 temporal effect Effects 0.000 claims description 3
- 238000010998 test method Methods 0.000 claims description 3
- 238000007689 inspection Methods 0.000 claims 1
- 238000011156 evaluation Methods 0.000 abstract description 4
- 238000000034 method Methods 0.000 description 14
- 210000002105 tongue Anatomy 0.000 description 13
- 230000002452 interceptive effect Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 3
- 235000013399 edible fruits Nutrition 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000001755 vocal effect Effects 0.000 description 3
- 206010013887 Dysarthria Diseases 0.000 description 2
- 238000003066 decision tree Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000003745 diagnosis Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000009432 framing Methods 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 210000000056 organ Anatomy 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 241001442195 Bangia Species 0.000 description 1
- 240000007594 Oryza sativa Species 0.000 description 1
- 235000007164 Oryza sativa Nutrition 0.000 description 1
- 241001674048 Phthiraptera Species 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007635 classification algorithm Methods 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 239000006185 dispersion Substances 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 235000009566 rice Nutrition 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000013526 transfer learning Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Artificial Intelligence (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Electrically Operated Instructional Devices (AREA)
Abstract
The spoken language pronunciation error detection and correcting system that the present invention relates to a kind of based on machine learning, comprising: spoken language pronunciation sample collection module, for acquiring orthoepy phoneme and different types of incorrect pronunciations phoneme from whole sentence or whole section of spoken language pronunciation;Pronounce error detection model building module, for extracting acoustic feature to pronunciation phonemes collected and carrying out type mark as pronunciation error detection model training sample set, generates pronunciation error detection model by machine learning algorithm training;Module is corrected in online error detection, and the whole sentence or whole section of spoken language pronunciation read aloud using the pronunciation error detection model of generation to learner score and phoneme error detection and pronunciation correction.The present invention can on-line evaluation spoken language pronunciation achievement, check pronunciation mistake and forward suggestion for correction.
Description
Technical field
The present invention relates to online verbal learning technical fields, examine more particularly to a kind of spoken language pronunciation based on machine learning
Wrong and correcting system.
Background technique
During language learning, due to the limitation of qualified teachers and environment, the Oral Training time of attending class is insufficient, after class mouth
Language practice cannot be fed back, and the factors such as nonstandard of pronouncing of most of teacher of spoken language cause when learner learns foreign languages
One difficulty is ready to pay high tuition fee in order to correct one's pronunciation, foreign teacher is asked to correct the pronunciation of oneself there are many people.With shifting
The rise for moving online language learning has expedited the emergence of automatic pronunciation error-detection system.
On pronunciation error-detecting method, currently used method can be roughly divided into two types: the first kind is by voice Xue Zhi
The method of knowledge finds some distinctive features, for example may be replaced with " lice " using Japanese by the English learner of mother tongue
The pronunciation of " rice ", learner possibly can not adjust its articulation to correct this mistake because phoneme/r/ in Japanese not
In the presence of.For these typical type of errors, some acoustic feature such as formants for specifically having distinction can be usually extracted
To be used as the detection and diagnosis of pronunciation mistake.Second class is based on speaker to the pronunciation of given text and based on speaker's mother tongue
Similitude between the standard pronunciation of acoustic model identifies pronunciation mistake, if SelinaParveen et al. is in paper (BangIa
Pronunciation error detection syatem) use this method.Its similarity indices is based on automatic speech
Identify the confidence of (ASR).Generally speaking, first method can detecte out the movement for leading to the vocal organs of mistake,
Such as the deviation of tongue position height front and back is judged by formant, but be limited primarily to the discovery that word reads aloud medial vowel mistake.And
Algorithm fault-tolerance is lower, and the correctness that formant extracts is very crucial, but in noise circumstance, it is easy to formant be caused to extract
Mistake.Second method can not judge vocal organs errors present, (accidentally mainly for the replacement in phonation there are phoneme
Read), the mistakes such as skip and insertion, therefore targetedly pronunciation correction and raising scheme can not be given.
In application aspect, only exist has product for articulation problems at present on a small quantity, and but most of functions it is all relatively more single
One, only simple playback audio-video learning materials, student record with reading, system plays.Software only few in number, has mouth
The detection of language articulation problems is fed back, but existing first defect is the root cause problems that feedback function is not enough to solve learner,
If sound wishes the Oral Training function of science and technology, this function can only point out that the pronunciation of learner is not good enough after learner is with reading pronunciation,
But learner can not understand oneself pronunciation mistake where, and how should improve pronunciation, be unable to get learner most
Information is corrected for valuable feedback.The oracy of learner can not often be improved.Second defect is the mistake for detection
Accidentally type the mistake in these typical mistakes such as often focuses on the skip of pronunciation phonemes, misreads and be inserted into, lacking and move to pronunciation
Make the judgement of error reason and the feedback of correction scheme.
Summary of the invention
The spoken language pronunciation error detection and correct system that technical problem to be solved by the invention is to provide a kind of based on machine learning
System, can on-line evaluation spoken language pronunciation achievement, check pronunciation mistake and forward suggestion for correction.
The technical solution adopted by the present invention to solve the technical problems is: providing a kind of spoken language pronunciation based on machine learning
Error detection and correcting system, comprising: spoken language pronunciation sample collection module, for acquiring correct hair from whole sentence or whole section of spoken language pronunciation
Sound phoneme and different types of incorrect pronunciations phoneme;Pronounce error detection model building module, for mentioning to pronunciation phonemes collected
It takes acoustic feature and carries out type mark as pronunciation error detection model training sample set, hair is generated by machine learning algorithm training
Sound error detection model;Module, the whole sentence or whole section of mouth read aloud using the pronunciation error detection model of generation learner are corrected in online error detection
Language pronunciation carries out scoring and phoneme error detection and pronunciation correction.
The case where spoken language pronunciation sample collection module issues sound is divided into voiced segments, mute section and mute section,
Specifically: by the prediction residual energy of voice signal S (n) is defined as:Wherein, N is frame
It is long, the first reflection coefficient is defined as:It is segmented according to following rule: if the first reflection coefficient is greater than 0.2,
And prediction residual energy is greater than 2 times of system threshold value θ, is voiced segments by current speech frame definition;If the first reflection system
Number is greater than 0.3, and prediction residual energy is greater than system threshold value θ, and the former frame of current speech frame is pronunciation frame, then currently
Speech frame is defined as voiced segments;It is mute section for current speech frame definition if being unsatisfactory for above-mentioned two rule.
The spoken language pronunciation sample collection module is realized to obtain pronunciation factor by the way of forcing alignment, specifically: it will
Text file is handled by punctuation mark;Audio file is converted into monophonic, is done by endpoint detection processing;By text file
The conversion of word to sound is carried out, according to trained acoustic model, text file is extended to by implying Markov model
Search space composed by status switch;Feature extraction is carried out to the voice signal in audio file, according to from front to back frame by frame
Sequence phonetic feature is aligned with search space composed by corresponding implicit Markov Model state sequence;To each frame
Data are aligned using dynamic time warpping Viterbi, are obtained: Q (t, s)=maxs′{p(xt,s|s′)*Qv(t-1, s') }, wherein Q (t,
It s) is that moment t falls in the specific best score implied on Markov Model state s of some in search space, p (xt,s|s')
It is that latter frame state shifts hiding sequence x in the case of known previous frame state is s'tProbability, xtIt is implicit Markov state
Metastasis sequence, s' are the former frame status of s;In t moment, when there is path to reach active state sweWhen, wherein sweIt is the phase
It hopes the suffix state node for estimating the current sentence of its optimal end time τ, counts all active state s at this timeiUpper road
Diameter assumes numberWherein, δ () is indicator function,
All paths are assumed according to its score sequencing statistical;Count sweUpper all paths;Remember that Q is assumed in pathk(t,swe) institute
Having ranking in a path N (t) is Rk(t,swe), then sweOn path assume in a path N (t) in ranking sample expectationDefinition status active degree isA(t,swe) at the time of be maximized
It is to be aligned maximum likelihood time t, according to alignment maximum likelihood time t, exports the voice and text justification temporal information of sentence;
Level is separated according to the phoneme that voice and text justification temporal information read text table, reads some sound of phoneme separation level
At the beginning of element and end time, progress phoneme cutting obtain pronunciation phonemes.
The pronunciation error detection model building module first divides the data into training data and and test data in feature extraction
Collect, then the pronunciation phonemes that will acquire extract the MFCC feature and formant feature of each phoneme pronunciation, to primary speech signal
The time-domain signal of each speech frame is obtained by processing;Time-domain signal trailing zero is formed to the sequence of a length of N, then by from
Scattered Fourier transformation obtains linear spectral;Linear spectral obtains Mel frequency spectrum by Mel frequency filter group, obtained Mel frequency spectrum
It takes logarithmic energy to obtain log spectrum S (m), log spectrum S (m) is obtained into cepstral domains by discrete cosine transform, can be obtained
To Mel frequency cepstral coefficient c (n).
The pronunciation error detection model building module is divided into 7 classifications, respectively -1,1, A, B, C, D when being trained
And E, wherein -1 indicate type of error, 1 indicate right type, A, B, C, D, E respectively indicate mistake classification in tongue position it is to the front,
It is to the rear, tongue position is excessively high, too low and phoneme elongates and shortens class;When extracting training set, successively by the sample of some classification
It is classified as one kind, other remaining samples are classified as another kind of, obtain 7 classifiers in this way;In support vector machines as training classification
When device algorithm, phoneme of speech sound acoustic feature vector data is divided according to training set and test set 4:1, training set is as branch
The input vector number of vector machine is held, support vector machines kernel function selects Radial basis kernel function.
The pronunciation error detection model building module extracts feature using the unsupervised training of deepness belief network in modeling, most
Upper one layer uses support vector machines, is determined preferably according to the number and dimension of training set in pronunciation phonemes acoustic feature data set
Deepness belief network model;Mode by adjusting parameter model and the output result for comparing each self-optimizing determines hidden layer
Number gradually determines the optimal hidden layer number of plies and number of nodes by test method(s), and then pass through after fixing first layer hidden layer number of nodes
Optimal models are obtained to the adjusting of other parameters.
Beneficial effect
Due to the adoption of the above technical solution, compared with prior art, the present invention having the following advantages that and actively imitating
Fruit: the present invention by machine learning algorithm establish pronunciation error detection correct model in whole sentence or whole section phoneme pronunciation mistake it is quick
Detection and diagnosis compensates for current market and assists spoken learning areas in terms of real-time scoring and error detection feedback correction to online
Vacancy.It can quickly, effectively judge that there are which type of mistakes in which place in the entire sentence that learner is read aloud
It misses type and is assisted correcting.
Detailed description of the invention
Fig. 1 is offline machine learning classification error detection model training frame diagram in the present invention;
Fig. 2 is online pronunciation error detection and interactive work for correction flow chart;
Fig. 3 is to force alignment separation phoneme algorithm flow chart.
Specific embodiment
Present invention will be further explained below with reference to specific examples.It should be understood that these embodiments are merely to illustrate the present invention
Rather than it limits the scope of the invention.In addition, it should also be understood that, after reading the content taught by the present invention, those skilled in the art
Member can make various changes or modifications the present invention, and such equivalent forms equally fall within the application the appended claims and limited
Range.
Embodiments of the present invention are related to a kind of spoken language pronunciation error detection based on machine learning and correcting system, comprising: mouth
Language pronunciation sample collection module, for acquiring orthoepy phoneme and different types of mistake from whole sentence or whole section of spoken language pronunciation
Pronunciation phonemes;Pronounce error detection model building module, for extracting acoustic feature to pronunciation phonemes collected and carrying out type mark
Note generates pronunciation error detection model as pronunciation error detection model training sample set, by machine learning algorithm training;Online error detection is entangled
Positive module, the whole sentence or whole section of spoken language pronunciation read aloud using the pronunciation error detection model of generation to learner are scored and phoneme is examined
Wrong and pronunciation correction.
The present invention obtains atypical pronunciation error sample data in whole sentence or whole section of pronunciation first, utilizes machine learning
Classification and identification algorithm trains error detection disaggregated model, identifies that the specific mistake of learner's pronunciation belongs to the tongue of phoneme pronunciation
Position height front and back deviation or pronunciation tone length deviation caused by pronunciation mistake which kind of.Further for identification
Pronunciation type of error out is given correct interactive feedback and is corrected, and main includes shape of the mouth as one speaks tongue position when learner pronounces and hair
The Adjusted Option of the tone length of sound.
Fig. 1 is offline machine learning classification error detection model training frame diagram in the present invention.Fig. 2 is online pronunciation error detection and hands over
Mutual formula work for correction flow chart.Realize that step of the invention is as follows referring to Figures 1 and 2:
One, off-line model training part:
Step 1: obtaining the standard voice data of whole sentence.615 are obtained according to countries and regions, gender, age
Position is standard pronunciation data by the voice of reading aloud of the speaker of mother tongue of English.
Step 2: obtaining the non-standard voice data of whole sentence.It is obtained according to the English learner of different mother tongues different classes of
Incorrect pronunciations data, error category is divided into following 6 kinds: tongue position is to the front, to the rear, tongue position is excessively high, too low and phoneme elongates and contracting
Short class.The sample number of each classification is 200.
Step 3: alignment is forced to separate with phoneme.
1, the pronunciation of one section of voice signal is divided into mute section (S, Silent) according to the sending situation of sound, and mute section
(U, Unvoiced), voiced segments (V, Voiced).The prediction residual energy of voice signal S (n) is defined as:
Wherein, N is frame length, the first reflection coefficient is defined as:V/U/S chopping rule is as follows:
(1) if the first reflection coefficient is greater than 0.2, and prediction residual energy is greater than 2 times of threshold value θ, by current language
Sound frame definition is V sections;
(2) if the first reflection coefficient is greater than 0.3, and prediction residual energy is greater than threshold value θ, and before present frame
One frame is pronunciation frame, is V sections by current speech frame definition;
(3) if being unsatisfactory for two rule of front, speech frame is defined as U sections.
2, alignment is forced, as shown in Figure 3
(1) text file is handled by special punctuation mark, and English string segmentation processing finally saves as UTF-8 format.
(2) audio file is converted to monophonic, sample rate 16000hz format, does by endpoint detection processing, end-point detection
Purpose is exactly the starting point and end point that accurate detection goes out voice from comprising voice signal.
(3) text is extended to by implying by the conversion that text carries out word to sound according to trained acoustic model
Search space composed by Markov model (HMM) status switch.
(4) feature extraction is carried out to the voice signal in audio file, according to sequence frame by frame from front to back, by voice spy
Sign is aligned with search space composed by corresponding implicit Markov Model state sequence;Each frame data are advised using dynamic
Whole Viterbi alignment, obtains:
Q (t, s)=maxs′{p(xt,s|s')*Qv(t-1,s')}
Wherein, Q (t, s) is that moment t is fallen on the specific implicit Markov Model state s of some in search space most
Good score, p (xt, s | s') it is that latter frame state shifts hiding sequence x in the case of known previous frame state is s'tProbability, xtIt is
Implicit Markov state metastasis sequence, s' is the former frame status of s;sweIt is its optimal end time τ of expectation estimation
The suffix state node of current sentence.
In t moment, when there is path to reach active state sweWhen, count all active state s at this timeiUpper path assume
NumberWherein, δ () is indicator function,To own
Path assume according to its score sequencing statistical;Count sweUpper all paths;Remember that Q is assumed in pathk(t,swe) in all N (t)
Ranking is R in a pathk(t,swe), then sweOn path assume in a path N (t) in ranking sample expectationDefinition status active degree isA(t,swe) be maximized when
Quarter is alignment maximum likelihood time t, according to alignment maximum likelihood time t, exports the voice and text justification time letter of sentence
Breath.
In forcing alignment procedure, whole sentence pronunciation is snapped into word level and phone-level, in order in subsequent step
The acoustic feature that different pronunciations are extracted from phone-level is believed after forcing alignment according to the voice of output and text justification time
Breath reads the level of the phoneme separation of text table (textgrid), at the beginning of reading some phoneme of phoneme level and ties
The beam time carries out phoneme cutting.Obtain pronunciation phonemes.
Step 4: data normalization is handled.Data normalization processing is to be limited to the acoustic feature data of phoneme pronunciation
Within the scope of some, reduces the otherness of data its purpose is to reduce the dispersion degree of phoneme pronunciation acoustic feature data, allow
The fluctuation of data is smaller, has no effect on the original distribution of data, uses in present embodiment and be most worth method for normalizing.
Step 5: feature extraction, first divide the data into training data and and test data set, then will be obtained in step 4
Pronunciation phonemes extract the MFCC feature and formant feature of each phoneme pronunciation, primary speech signal S (n) by preemphasis,
The processing such as framing, adding window, end-point detection, obtains the time-domain signal x (n) of each speech frame.
Time-domain signal x (n) is mended to several 0 to form the sequence of a length of N (N=512 in the present embodiment) afterwards, then passed through
It crosses discrete Fourier transform DFT or (FFT) obtains linear spectral X (k);
Above-mentioned linear spectral obtains Mel frequency spectrum by Mel frequency filter group.In order to miss result to noise and Power estimation
Difference has better robustness, takes logarithmic energy to obtain S (m) the above-mentioned Mel frequency spectrum obtained by Mel filter group;
Above-mentioned log spectrum S (m) is obtained into cepstral domains by discrete cosine transform (DCT) transformation, Mel frequency can be obtained
Rate cepstrum coefficient (MFCC parameter) c (n).
Since voice signal is that time domain is continuous, the characteristic information that framing is extracted only has reacted the characteristic of this frame voice, is
The continuity of time domain can be embodied by making feature more, can before and after characteristic dimension increase frame information dimension, using first differential
Coefficient and 13 accelerator coefficients, in addition four-dimensional formant parameter, 43 coefficients altogether.Form 43 dimensional feature vectors.
Step 6: the training of pronunciation error detection model.Due to being related to more classification problems, support vector machines and decision Tree algorithms are all
It is two-value classifier.The construction of multi-categorizer can be realized using by combining multiple two classifiers.Combination multiplicity, has
One-to-one method (one-versus-one) and one-to-many method (one-versus-rest).Present embodiment uses one-to-many method, letter
Claim OVR.Its thought are as follows: training when the sample of some classification is successively classified as one kind, other remaining samples be classified as it is another kind of, this
The sample of k classification of sample has just constructed k classifier.Unknown sample is classified as with maximum classification function value when classification
That class.The specific steps are under:
Present embodiment has 7 classifications to need to divide (namely 7 label), is -1,1, A, B, C, D, E respectively.- 1 table
Show type of error, 1 indicates right type, A, B, C, D, E respectively indicate the tongue position in mistake classification is to the front, to the rear, tongue position is excessively high,
Too low and phoneme elongates and shortens class.
When extracting training set, extract respectively
Vector corresponding to (1) 1 is as positive collection, and the vector corresponding to other is as negative collection;
Vector corresponding to (2) -1 is as positive collection, and the vector corresponding to other is as negative collection;
(3) vector corresponding to A is as positive collection, and the vector corresponding to other is as negative collection;
(4) vector corresponding to B is as positive collection, and the vector corresponding to other is as negative collection;
(5) vector corresponding to C is as positive collection, and the vector corresponding to other is as negative collection;
(6) vector corresponding to D is as positive collection, and the vector corresponding to other is as negative collection;
(7) vector corresponding to E is as positive collection, and the vector corresponding to other is as negative collection.
When support vector machines is as training classifier algorithm, to phoneme of speech sound acoustic feature vector data according to training set
It is divided with test set 4:1, input vector number of the training set as support vector machines, support vector machines kernel function selects diameter
Xiang Ji (RBF) kernel function.
The range of support vector machines penalty factor c is set as [0,100], and the range of kernel functional parameter g is set as [0,1000].
It is trained respectively using this 7 training sets, then obtains 7 training result files.
When test, corresponding test vector is utilized respectively this four training set results and is tested, it is last every
A test has a result f1 (x), f2 (x), f3 (x), f4 (x), f5 (x), f6 (x), f7 (x).Then final classification knot
Fruit be in these values maximum one be used as classification results.
In practical bright read error, often the distribution of type of error is simultaneously uneven, such as in a certain area, most of study
The pronunciation mistake of person may concentrate in a certain or two classes, and this type of error is commonplace, and some type of errors are more rare
See, therefore lead to imbalance problem in sample in type of error training, further to explore optimal modeling method, this literary grace
With the thought of transfer learning, extract feature using in the unsupervised training of deepness belief network (DBN), most upper one layer using support to
Amount machine (SVM), i.e. DBN+SVM carry out disaggregated model foundation.
Preferably DBN model is determined according to the number and dimension of training set in pronunciation phonemes acoustic feature data set.It determines
Hidden layer number, tuning level-learning rate and the number of iterations.Determine parameter training Boltzmann machine.We are by 200 groups of sounds of every class
Prime number evidence is divided into training set (160 groups) and test set (40 groups) and is trained modeling using training set, using test set to foundation
Pronunciation phonemes error detection disaggregated model tested.It is hidden using 100,200,300,400,500,600,700 and 800 etc. eight
Layer neuron node number models feature in single hidden layer.By adjusting parameter model and compare each self-optimizing
Output as a result, it has been found that, when taking hidden layer number of nodes [400], available optimal result.After fixed first layer hidden layer number of nodes,
The optimal hidden layer number of plies and number of nodes are gradually determined by test method(s), and then optimal mould is obtained by the adjusting to other parameters
Type.In the phoneme pronunciation error detection disaggregated model finally established, the iteration that hidden layer number is 5, RBM is 50, DBN network iteration time
Number is 1000 times, Batch_size 64, and weight learning rate is 0.000001.It is last knot that test experiments, which are repeated several times, and take mean value
Fruit.
Error detection can produce the result of four seed types: 1) correctly receiving (CA), that is, be judged as correct orthoepy
Number;2) correct rejection (CR) is judged as incorrect incorrect pronunciations number;3) mistake receives (FA), that is, is judged and is positive
True incorrect pronunciations number;And 4) False Rejects (FR), that is, it is judged as the quantity of incorrect orthoepy.Using this four
It is a as a result, calculating correct receptance (CAR), correct rejection ratio (CRR).Present embodiment is accurate as identification using CAR and CRR
The parameter of measurement of degree, CAR are used for orthoepy, and CRR is used for incorrect pronunciations.
2 two kinds of algorithms of different classifiers of table and the test set of deep learning (DBN+SVM) identification CAR and CRR compare
It can be seen from 2 result of table in phoneme of speech sound pronunciation error detection effect, classification based on support vector machines and
Being sorted on recognition accuracy based on decision tree is not much different, and the two is stable to the discrimination of type of error on 80% left side
It is right.Recognition effect is preferable.Based on practical reasons, part is pronounced, and mistake is commonplace, and error sample is more, and classifying quality is more preferable, and
Part pronunciation type of error is more rare, and training sample amount deficiency may be to lead to the main reason for causing classification accuracy relatively low
Table 2 is crossed it can be seen that can averagely improve 2 percentage points on the basis of the above two recognition accuracy using (DBN+SVM).Cause
This adds the model of the classification algorithm training of support vector machines to be optimal based on deepness belief network.
Step 7: evaluation of result and error correction, point that the error detection disaggregated model obtained according to the machine learning algorithm provides
Class result, it is indicated which kind of mistake is test sample belong to;Classification belonging to mistake of pronouncing is gone out by model prediction.Learner is pronounced wrong
Accidentally place and error type feed back to learner and propose correction scheme for it.
Two, pronunciation error detection and interactive mode correct part online
Step 1: obtaining pronunciation data
Fig. 2 is pronunciation error detection and interactive correcting system work flow diagram, after learner's login system, what selection to be learnt
Pronounce sentence, according to display text, reads aloud whole sentence, system obtains learner's pronunciation.
Step 2: data processing and pronunciation error detection
The learner's pronunciation data obtained in step 1 is pre-processed, including alignment separation phoneme and feature is forced to mention
It takes, processing step and off-line model training department split-phase are same.By treated, data are sent into trained pronunciation error detection model, by mould
Type exports learner's pronunciation result.
Step 3: interactive correct, judged according to the pronunciation that system provides, tell the problems of learner's pronunciation,
And the correction of articulation is given for the different incorrect pronunciations judged, prompt learner to read aloud again.Persistently correct one's pronunciation.
Until phoneme standard.
It is not difficult to find that the present invention is based on machine learning algorithm, using different pronunciation phonemes have different acoustic features this
One feature, acquisition and processing to the read aloud foreign language voice segments signal data of different learners obtain its 39+4 under frequency domain
Acoustic feature vector is tieed up, it is right by means of having the learning network of supervision or unsupervised learning network as the input of training pattern
The acoustic feature vector of extraction carries out learning training and generates acoustics error detection model.Using test set to the mistake of acoustics error detection model
Classifying quality is verified, experiments verify that classification accuracy is higher, meets normal learner's pronunciation type of error analysis.And needle
Pronunciation evaluation and correction scheme are given to test set verification result.The present invention not only point out learner pronunciation to mistake,
Further identified on this basis learner's pronunciation be it is wrong where, and will how improved method feeds back to study
Person, the articulation ability of raising learner that in this way can be practical.
Claims (6)
1. a kind of spoken language pronunciation error detection and correcting system based on machine learning characterized by comprising spoken language pronunciation sample is adopted
Collect module, for acquiring orthoepy phoneme and different types of incorrect pronunciations phoneme from whole sentence or whole section of spoken language pronunciation;Hair
Sound error detection model building module, for extracting acoustic feature to pronunciation phonemes collected and carrying out type mark as pronunciation inspection
Mismatch type training sample set generates pronunciation error detection model by machine learning algorithm training;Module is corrected in online error detection, using life
At pronunciation error detection model whole sentence that learner is read aloud or whole section of spoken language pronunciation score and phoneme error detection and pronunciation correction.
2. the spoken language pronunciation error detection and correcting system according to claim 1 based on machine learning, which is characterized in that described
The case where spoken language pronunciation sample collection module issues sound is divided into voiced segments, mute section and mute section, specifically: by voice
The prediction residual energy of signal S (n) is defined as:Wherein, N is frame length, the first reflection coefficient
Is defined as:It is segmented according to following rule: if the first reflection coefficient is greater than 0.2, and predicting error energy
Amount is greater than 2 times of system threshold value θ, is voiced segments by current speech frame definition;If the first reflection coefficient be greater than 0.3, and
Prediction residual energy is greater than system threshold value θ, and the former frame of current speech frame is pronunciation frame, then current speech frame definition is hair
Segment;It is mute section for current speech frame definition if being unsatisfactory for above-mentioned two rule.
3. the spoken language pronunciation error detection and correcting system according to claim 1 based on machine learning, which is characterized in that described
Spoken language pronunciation sample collection module is realized to obtain pronunciation factor by the way of forcing alignment, specifically: text file is passed through
Punctuation mark processing;Audio file is converted into monophonic, is done by endpoint detection processing;Text file is subjected to word to sound
Text file is extended to by implying Markov Model state sequence institute group by conversion according to trained acoustic model
At search space;Feature extraction is carried out to the voice signal in audio file, according to sequence from front to back frame by frame by voice
Feature is aligned with search space composed by corresponding implicit Markov Model state sequence;Dynamic is used to each frame data
Regular Viterbi alignment, obtains: Q (t, s)=maxs'{p(xt,s|s')*Qv(t-1, s') }, wherein Q (t, s) is that moment t is fallen
Best score in search space on some specific implicit Markov Model state s, p (xt, s | s') it is known previous
Latter frame state, which shifts, in the case of frame state is s' hides sequence xtProbability, xtIt is implicit Markov state metastasis sequence, s'
It is the former frame status of s;In t moment, when there is path to reach active state sweWhen, wherein sweIt is that its is optimal for expectation estimation
The suffix state node of the current sentence of end time τ counts all active state s at this timeiUpper path assume numberWherein, δ () is indicator function,By all roads
Diameter is assumed according to its score sequencing statistical;Count sweUpper all paths;Remember that Q is assumed in pathk(t,swe) on a road of all N (t)
Ranking is R in diameterk(t,swe), then sweOn path assume in a path N (t) in ranking sample expectationDefinition status active degree isA(t,swe) be maximized when
Quarter is alignment maximum likelihood time t, according to alignment maximum likelihood time t, exports the voice and text justification time letter of sentence
Breath;Level is separated according to the phoneme that voice and text justification temporal information read text table, reads phoneme separation level certain
At the beginning of a phoneme and end time, progress phoneme cutting obtain pronunciation phonemes.
4. the spoken language pronunciation error detection and correcting system according to claim 1 based on machine learning, which is characterized in that described
Error detection model building module pronounce in feature extraction, first divide the data into training data and and test data set, then will acquire
Pronunciation phonemes extract the MFCC feature and formant feature of each phoneme pronunciation, primary speech signal is obtained by processing
The time-domain signal of each speech frame;Time-domain signal trailing zero is formed to the sequence of a length of N, then passes through discrete Fourier transform
Obtain linear spectral;Linear spectral obtains Mel frequency spectrum by Mel frequency filter group, and obtained Mel frequency spectrum takes logarithmic energy to obtain
To log spectrum S (m), log spectrum S (m) is obtained into cepstral domains by discrete cosine transform, Mel frequency cepstral can be obtained
Coefficient c (n).
5. the spoken language pronunciation error detection and correcting system according to claim 1 based on machine learning, which is characterized in that described
Error detection model building module pronounce when being trained, is divided into 7 classifications, respectively -1,1, A, B, C, D and E, wherein -1
Indicate type of error, 1 indicates right type, A, B, C, D, E respectively indicate in mistake classification tongue position is to the front, to the rear, tongue position mistake
High, too low and phoneme elongates and shortens class;When extracting training set, the sample of some classification is successively classified as one kind,
His remaining sample is classified as another kind of, obtains 7 classifiers in this way;It is right when support vector machines is as training classifier algorithm
Phoneme of speech sound acoustic feature vector data is divided according to training set and test set 4:1, and training set is as support vector machines
Input vector number, support vector machines kernel function select Radial basis kernel function.
6. the spoken language pronunciation error detection and correcting system according to claim 1 based on machine learning, which is characterized in that described
The error detection model building module that pronounces extracts feature using the unsupervised training of deepness belief network in modeling, and most upper one layer using branch
Vector machine is held, preferably deepness belief network is determined according to the number and dimension of training set in pronunciation phonemes acoustic feature data set
Model;Hidden layer number, fixed first layer are determined by adjusting the mode of parameter model and the output result for comparing each self-optimizing
After hidden layer number of nodes, the optimal hidden layer number of plies and number of nodes are gradually determined by test method(s), and then by other parameters
It adjusts and obtains optimal models.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811534792.5A CN109545189A (en) | 2018-12-14 | 2018-12-14 | A kind of spoken language pronunciation error detection and correcting system based on machine learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811534792.5A CN109545189A (en) | 2018-12-14 | 2018-12-14 | A kind of spoken language pronunciation error detection and correcting system based on machine learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109545189A true CN109545189A (en) | 2019-03-29 |
Family
ID=65856297
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811534792.5A Pending CN109545189A (en) | 2018-12-14 | 2018-12-14 | A kind of spoken language pronunciation error detection and correcting system based on machine learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109545189A (en) |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110415679A (en) * | 2019-07-25 | 2019-11-05 | 北京百度网讯科技有限公司 | Voice error correction method, device, equipment and storage medium |
CN110457670A (en) * | 2019-07-25 | 2019-11-15 | 天津大学 | A method of it reducing the space of a whole page before printing based on machine learning and handles error rate |
CN110488675A (en) * | 2019-07-12 | 2019-11-22 | 国网上海市电力公司 | A kind of substation's Abstraction of Sound Signal Characteristics based on dynamic time warpping algorithm |
CN110556093A (en) * | 2019-09-17 | 2019-12-10 | 浙江核新同花顺网络信息股份有限公司 | Voice marking method and system |
CN110598208A (en) * | 2019-08-14 | 2019-12-20 | 清华大学深圳研究生院 | AI/ML enhanced pronunciation course design and personalized exercise planning method |
CN111292769A (en) * | 2020-03-04 | 2020-06-16 | 苏州驰声信息科技有限公司 | Method, system, device and storage medium for correcting pronunciation of spoken language |
CN111833859A (en) * | 2020-07-22 | 2020-10-27 | 科大讯飞股份有限公司 | Pronunciation error detection method and device, electronic equipment and storage medium |
CN112215018A (en) * | 2020-08-28 | 2021-01-12 | 北京中科凡语科技有限公司 | Automatic positioning method and device for correction term pair, electronic equipment and storage medium |
CN112863486A (en) * | 2021-04-23 | 2021-05-28 | 北京一起教育科技有限责任公司 | Voice-based spoken language evaluation method and device and electronic equipment |
CN112967538A (en) * | 2021-03-01 | 2021-06-15 | 郑州铁路职业技术学院 | English pronunciation information acquisition system |
TWI767532B (en) * | 2021-01-22 | 2022-06-11 | 賽微科技股份有限公司 | A wake word recognition training system and training method thereof |
CN114758647A (en) * | 2021-07-20 | 2022-07-15 | 无锡柠檬科技服务有限公司 | Language training method and system based on deep learning |
CN114783412A (en) * | 2022-04-21 | 2022-07-22 | 山东青年政治学院 | Spanish spoken language pronunciation training correction method and system |
WO2022168102A1 (en) * | 2021-02-08 | 2022-08-11 | Rambam Med-Tech Ltd. | Machine-learning-based speech production correction |
CN115148225A (en) * | 2021-03-30 | 2022-10-04 | 北京猿力未来科技有限公司 | Intonation scoring method, intonation scoring system, computing device and storage medium |
CN116340489A (en) * | 2023-03-27 | 2023-06-27 | 齐齐哈尔大学 | Japanese teaching interaction method and device based on big data |
CN116805495A (en) * | 2023-08-17 | 2023-09-26 | 北京语言大学 | Pronunciation deviation detection and action feedback method and system based on large language model |
CN116894442A (en) * | 2023-09-11 | 2023-10-17 | 临沂大学 | Language translation method and system for correcting guide pronunciation |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060136225A1 (en) * | 2004-12-17 | 2006-06-22 | Chih-Chung Kuo | Pronunciation assessment method and system based on distinctive feature analysis |
CN101651788A (en) * | 2008-12-26 | 2010-02-17 | 中国科学院声学研究所 | Alignment system of on-line speech text and method thereof |
CN103366759A (en) * | 2012-03-29 | 2013-10-23 | 北京中传天籁数字技术有限公司 | Speech data evaluation method and speech data evaluation device |
CN103383845A (en) * | 2013-07-08 | 2013-11-06 | 上海昭鸣投资管理有限责任公司 | Multi-dimensional dysarthria measuring system and method based on real-time vocal tract shape correction |
CN106297828A (en) * | 2016-08-12 | 2017-01-04 | 苏州驰声信息科技有限公司 | The detection method of a kind of mistake utterance detection based on degree of depth study and device |
CN107203777A (en) * | 2017-04-19 | 2017-09-26 | 北京协同创新研究院 | audio scene classification method and device |
-
2018
- 2018-12-14 CN CN201811534792.5A patent/CN109545189A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060136225A1 (en) * | 2004-12-17 | 2006-06-22 | Chih-Chung Kuo | Pronunciation assessment method and system based on distinctive feature analysis |
CN101651788A (en) * | 2008-12-26 | 2010-02-17 | 中国科学院声学研究所 | Alignment system of on-line speech text and method thereof |
CN103366759A (en) * | 2012-03-29 | 2013-10-23 | 北京中传天籁数字技术有限公司 | Speech data evaluation method and speech data evaluation device |
CN103383845A (en) * | 2013-07-08 | 2013-11-06 | 上海昭鸣投资管理有限责任公司 | Multi-dimensional dysarthria measuring system and method based on real-time vocal tract shape correction |
CN106297828A (en) * | 2016-08-12 | 2017-01-04 | 苏州驰声信息科技有限公司 | The detection method of a kind of mistake utterance detection based on degree of depth study and device |
CN107203777A (en) * | 2017-04-19 | 2017-09-26 | 北京协同创新研究院 | audio scene classification method and device |
Cited By (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110488675A (en) * | 2019-07-12 | 2019-11-22 | 国网上海市电力公司 | A kind of substation's Abstraction of Sound Signal Characteristics based on dynamic time warpping algorithm |
CN110415679A (en) * | 2019-07-25 | 2019-11-05 | 北京百度网讯科技有限公司 | Voice error correction method, device, equipment and storage medium |
CN110457670A (en) * | 2019-07-25 | 2019-11-15 | 天津大学 | A method of it reducing the space of a whole page before printing based on machine learning and handles error rate |
US11328708B2 (en) | 2019-07-25 | 2022-05-10 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Speech error-correction method, device and storage medium |
CN110415679B (en) * | 2019-07-25 | 2021-12-17 | 北京百度网讯科技有限公司 | Voice error correction method, device, equipment and storage medium |
CN110598208A (en) * | 2019-08-14 | 2019-12-20 | 清华大学深圳研究生院 | AI/ML enhanced pronunciation course design and personalized exercise planning method |
CN110556093A (en) * | 2019-09-17 | 2019-12-10 | 浙江核新同花顺网络信息股份有限公司 | Voice marking method and system |
CN110556093B (en) * | 2019-09-17 | 2021-12-10 | 浙江同花顺智富软件有限公司 | Voice marking method and system |
CN111292769A (en) * | 2020-03-04 | 2020-06-16 | 苏州驰声信息科技有限公司 | Method, system, device and storage medium for correcting pronunciation of spoken language |
CN111833859B (en) * | 2020-07-22 | 2024-02-13 | 科大讯飞股份有限公司 | Pronunciation error detection method and device, electronic equipment and storage medium |
CN111833859A (en) * | 2020-07-22 | 2020-10-27 | 科大讯飞股份有限公司 | Pronunciation error detection method and device, electronic equipment and storage medium |
CN112215018A (en) * | 2020-08-28 | 2021-01-12 | 北京中科凡语科技有限公司 | Automatic positioning method and device for correction term pair, electronic equipment and storage medium |
CN112215018B (en) * | 2020-08-28 | 2021-08-13 | 北京中科凡语科技有限公司 | Automatic positioning method and device for correction term pair, electronic equipment and storage medium |
TWI767532B (en) * | 2021-01-22 | 2022-06-11 | 賽微科技股份有限公司 | A wake word recognition training system and training method thereof |
WO2022168102A1 (en) * | 2021-02-08 | 2022-08-11 | Rambam Med-Tech Ltd. | Machine-learning-based speech production correction |
CN112967538B (en) * | 2021-03-01 | 2023-09-15 | 郑州铁路职业技术学院 | English pronunciation information acquisition system |
CN112967538A (en) * | 2021-03-01 | 2021-06-15 | 郑州铁路职业技术学院 | English pronunciation information acquisition system |
CN115148225A (en) * | 2021-03-30 | 2022-10-04 | 北京猿力未来科技有限公司 | Intonation scoring method, intonation scoring system, computing device and storage medium |
CN112863486A (en) * | 2021-04-23 | 2021-05-28 | 北京一起教育科技有限责任公司 | Voice-based spoken language evaluation method and device and electronic equipment |
CN114758647A (en) * | 2021-07-20 | 2022-07-15 | 无锡柠檬科技服务有限公司 | Language training method and system based on deep learning |
CN114783412A (en) * | 2022-04-21 | 2022-07-22 | 山东青年政治学院 | Spanish spoken language pronunciation training correction method and system |
CN114783412B (en) * | 2022-04-21 | 2022-11-15 | 山东青年政治学院 | Spanish spoken language pronunciation training correction method and system |
CN116340489A (en) * | 2023-03-27 | 2023-06-27 | 齐齐哈尔大学 | Japanese teaching interaction method and device based on big data |
CN116340489B (en) * | 2023-03-27 | 2023-08-22 | 齐齐哈尔大学 | Japanese teaching interaction method and device based on big data |
CN116805495A (en) * | 2023-08-17 | 2023-09-26 | 北京语言大学 | Pronunciation deviation detection and action feedback method and system based on large language model |
CN116805495B (en) * | 2023-08-17 | 2023-11-21 | 北京语言大学 | Pronunciation deviation detection and action feedback method and system based on large language model |
CN116894442A (en) * | 2023-09-11 | 2023-10-17 | 临沂大学 | Language translation method and system for correcting guide pronunciation |
CN116894442B (en) * | 2023-09-11 | 2023-12-05 | 临沂大学 | Language translation method and system for correcting guide pronunciation |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109545189A (en) | A kind of spoken language pronunciation error detection and correcting system based on machine learning | |
Safavi et al. | Automatic speaker, age-group and gender identification from children’s speech | |
CN107221318B (en) | English spoken language pronunciation scoring method and system | |
Franco et al. | EduSpeak®: A speech recognition and pronunciation scoring toolkit for computer-aided language learning applications | |
Shobaki et al. | The OGI kids’ speech corpus and recognizers | |
Friedland et al. | Prosodic and other long-term features for speaker diarization | |
TWI275072B (en) | Pronunciation assessment method and system based on distinctive feature analysis | |
Das et al. | Bengali speech corpus for continuous auutomatic speech recognition system | |
CN106782603B (en) | Intelligent voice evaluation method and system | |
US20100004931A1 (en) | Apparatus and method for speech utterance verification | |
Maier et al. | Automatic detection of articulation disorders in children with cleft lip and palate | |
KR20080059180A (en) | Pronunciation diagnosis device, pronunciation diagnosis method, recording medium, and pronunciation diagnosis program | |
CN106205603B (en) | A kind of tone appraisal procedure | |
Cole et al. | Speaker-independent recognition of spoken English letters | |
Ahsiah et al. | Tajweed checking system to support recitation | |
Arafa et al. | A dataset for speech recognition to support Arabic phoneme pronunciation | |
Burgos | Gammatone and MFCC features in speaker recognition | |
Mathad et al. | The Impact of Forced-Alignment Errors on Automatic Pronunciation Evaluation. | |
CN114220419A (en) | Voice evaluation method, device, medium and equipment | |
CN113571088A (en) | Difficult airway assessment method and device based on deep learning voiceprint recognition | |
Sadeghian et al. | Towards an automated screening tool for pediatric speech delay | |
Shafie et al. | Dynamic time warping features extraction design for quranic syllable-based harakaat assessment | |
Abdou et al. | Enhancing the confidence measure for an Arabic pronunciation verification system | |
Barczewska et al. | Detection of disfluencies in speech signal | |
Wang et al. | Putonghua proficiency test and evaluation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190329 |