CN107039049A - A kind of data assessment educational system - Google Patents
A kind of data assessment educational system Download PDFInfo
- Publication number
- CN107039049A CN107039049A CN201710390762.0A CN201710390762A CN107039049A CN 107039049 A CN107039049 A CN 107039049A CN 201710390762 A CN201710390762 A CN 201710390762A CN 107039049 A CN107039049 A CN 107039049A
- Authority
- CN
- China
- Prior art keywords
- voice
- sound
- audio
- voice messaging
- frequency
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 claims abstract description 42
- 230000004044 response Effects 0.000 claims abstract description 19
- 238000001228 spectrum Methods 0.000 claims description 10
- 230000005284 excitation Effects 0.000 claims description 6
- 238000010801 machine learning Methods 0.000 claims description 4
- 230000015572 biosynthetic process Effects 0.000 claims description 3
- 238000013527 convolutional neural network Methods 0.000 claims description 3
- 238000001514 detection method Methods 0.000 claims description 3
- 239000000284 extract Substances 0.000 claims description 3
- 238000000605 extraction Methods 0.000 claims description 3
- 238000003786 synthesis reaction Methods 0.000 claims description 3
- 230000003595 spectral effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/60—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Quality & Reliability (AREA)
- Electrically Operated Instructional Devices (AREA)
Abstract
The invention discloses a kind of data assessment educational system, including:Voice pretreatment module, obtain surrounding speech information, the surrounding speech packet contains the spoken sounds talked to the data assessment educational system, represent the sound around the talker of the spoken sounds, the surrounding speech information is separated into the 1st voice messaging comprising the spoken sounds and the 2nd voice messaging comprising the sound in addition to the spoken sounds, the sound level of the sound level of 1st voice messaging and the 2nd voice messaging is compared, according to result of the comparison, using the 1st reproducting method, one party in the 2nd reproducting method different from the 1st reproducting method with the directive property of the voice of reproduction, reproduce and the response voice of the spoken sounds is cut into slices Oral English Practice audio file random division to be evaluated for equal length.The present invention voice is identified processing by speech recognition technology, assesses accuracy rate height, and autgmentability is strong.
Description
Technical field
The present invention relates to a kind of system, specifically a kind of data assessment educational system.
Background technology
Oneself is through occurring in that spoken data assessment educational system in the market, but it is all as follows that these products use at present
Method:Student's spoken audio is identified as text first with speech recognition technology, signature analysis then is carried out to the text of identification,
The spoken assessment result of student is finally provided with machine learning algorithm.This method greatest problem is from speech recognition period and subsequently
The signature analysis stage.First, high-precision English Phonetics identification engine R&D costs are expensive, at present only similar Google etc
Large-scale scientific & technical corporation or research unit just possess.Secondly, the result of speech recognition determine it is follow-up all, but current English
Speech recognition technology simply has enough accuracys rate in the speech recognition of pronunciation standard, and at the beginning of the not accurate enough English that pronounces
It is also undesirable in scholar (such as Chinese learners) speech recognition.Finally, the signature analysis stage needs Oral English Teaching
The expert in examination field carrys out design feature, and this can also consume many manpower and materials, and effect is bad.
The content of the invention
It is an object of the invention to provide a kind of data assessment educational system, with asking for solving to propose in above-mentioned background technology
Topic.
To achieve the above object, the present invention provides following technical scheme:
A kind of data assessment educational system, including:Voice pretreatment module, obtains surrounding speech information, the surrounding speech letter
Breath includes the sound around the spoken sounds, the talker of the expression spoken sounds talked to the data assessment educational system
Sound, the 1st voice messaging comprising the spoken sounds is separated into and comprising except the spoken utterance by the surrounding speech information
2nd voice messaging of the sound beyond sound, the sound level of the 1st voice messaging and the sound level of the 2nd voice messaging are carried out
Compare, according to result of the comparison, the directive property using the 1st reproducting method and the voice reproduced is different from the 1st reproducting method
The 2nd reproducting method in one party, reproduce Oral English Practice audio text to be evaluated to the response voices of the spoken sounds
Part random division is cut into slices for equal length;Convolutional neural networks analysis module, carries out Fourier in short-term to obtained audio section and becomes
The corresponding two-dimentional time-frequency figure of generation is changed, then high-level abstractions are carried out to one-dimensional time-frequency figure one by one, the high-level abstractions of audio section are obtained
Feature;Assess and feedback module, the high-level abstractions feature that audio is cut into slices is analyzed one by one by machine learning model
The fraction of each audio section, then all scores are taken the mean obtain final English speaking assessment fraction.
It is used as further scheme of the invention:When a length of 10s of the random audio section.
It is used as further scheme of the invention:The voice signal processing module, cuts into slices for all audios, is sequentially completed
Time-domain analysis, frequency-domain analysis and cepstrum domain analysis;Parameters,acoustic analysis module, to audio cut into slices parameters,acoustic carry out analysis and
Calculate, parameters,acoustic includes MLL frequency cepstral coefficients, linear prediction residue error and Line Spectral Pair coefficients.
It is used as further scheme of the invention:2nd reproducting method is the reproduction for having directive property to the talker
Method, in the case of sound level of the sound level higher than the 2nd voice messaging of the 1st voice messaging, reproduces using the described 1st
Method reproduces the response voice, in situation of the sound level less than the sound level of the 2nd voice messaging of the 1st voice messaging
Under, the response voice is reproduced using the 2nd reproducting method.
It is used as further scheme of the invention:The voice signal processing module is included with lower module:Time-domain analysis mould
Block, the time domain charactreristic parameter in analysis and extraction audio section;Frequency-domain analysis module, passes through bandpass filter group method, Fourier
Converter technique, frequency domain Pitch detection, when one frequency method for expressing, extract audio section frequency spectrum, power spectrum, spectrum envelope;Cepstrum
Domain analyzing module, is analyzed and is extracted the cepstral domain feature parameter that audio is cut into slices by Homomorphic Processing, further believe glottal excitation
Breath and sound channel response message are effectively separated:Glottal excitation information is used to judge pure and impure sound, ask pitch period, sound channel response message
For seeking formant, for the coding of voice, synthesis, identification.
Compared with prior art, the beneficial effects of the invention are as follows:The present invention is known by speech recognition technology to voice
Other places are managed, and assess accuracy rate height, and autgmentability is strong.
Embodiment
The technical scheme in the embodiment of the present invention is clearly and completely described below, it is clear that described embodiment
Only a part of embodiment of the invention, rather than whole embodiments.Based on the embodiment in the present invention, the common skill in this area
The every other embodiment that art personnel are obtained under the premise of creative work is not made, belongs to the model that the present invention is protected
Enclose.
In the embodiment of the present invention, a kind of data assessment educational system, including:Voice pretreatment module, obtains surrounding speech
Information, the surrounding speech packet contains the spoken sounds talked to the data assessment educational system, represents the spoken utterance
Sound around the talker of sound, the 1st voice messaging comprising the spoken sounds is separated into by the surrounding speech information
With the 2nd voice messaging comprising the sound in addition to the spoken sounds, by the sound level of the 1st voice messaging and described
The sound level of 2 voice messagings is compared, according to result of the comparison, using the 1st reproducting method and the voice reproduced directive property with
One party in the 2nd different reproducting method of 1st reproducting method, reproducing will be to be evaluated to the response voice of the spoken sounds
The Oral English Practice audio file random division of survey is cut into slices for equal length;Convolutional neural networks analysis module, cuts to obtained audio
Piece carries out Short Time Fourier Transform and generates corresponding two-dimentional time-frequency figure, then carries out high-level abstractions to one-dimensional time-frequency figure one by one, obtains
The high-level abstractions feature of audio section;Assess and feedback module, pass through senior the taking out of machine learning model one by one to audio section
The fraction that each audio is cut into slices is obtained as feature analyze, then all scores are taken the mean obtains final Oral English Practice
Assess fraction.
Constituted according to the spoken sounds, obtain and talked comprising spoken sounds, the expression talked to voice dialogue device
The surrounding speech information of sound around the talker of voice.Surrounding speech information is separated into the 1st comprising spoken sounds
Voice messaging and the 2nd voice messaging comprising the sound in addition to spoken sounds.By the sound level and the 2nd language of the 1st voice messaging
The sound level of message breath is compared.According to result of the comparison, using the directive property and the 1st of the 1st reproducting method and the voice reproduced
One party in the 2nd different reproducting methods of reproducting method reproduces response voice.Therefore, according to comprising being filled to voice dialogue
Put the sound level of the 1st voice messaging of the spoken sounds of speech and the 2nd voice letter comprising the sound in addition to spoken sounds
The comparative result of the sound level of breath, the directive property using the 1st reproducting method and the voice reproduced is different from the 1st reproducting method
One party in 2nd reproducting method reproduces response voice, so can be reproduced using corresponding with the situation around talker
Method reproduces response voice.
When a length of 10s of the random audio section.
The voice signal processing module, cuts into slices for all audios, is sequentially completed time-domain analysis, frequency-domain analysis and cepstrum
Domain analysis;Parameters,acoustic analysis module, is analyzed and is calculated to the parameters,acoustic that audio is cut into slices, and parameters,acoustic includes MLL frequencies
Rate cepstrum coefficient, linear prediction residue error and Line Spectral Pair coefficients.
2nd reproducting method is the reproducting method for having directive property to the talker, in the 1st voice messaging
In the case of sound level of the sound level higher than the 2nd voice messaging, the response voice is reproduced using the 1st reproducting method,
In the case of sound level of the sound level of 1st voice messaging less than the 2nd voice messaging, using the 2nd reproducting method again
The existing response voice.
The voice signal processing module is included with lower module:Time-domain analysis module, in analysis and extraction audio section
Time domain charactreristic parameter;Frequency-domain analysis module, passes through bandpass filter group method, fourier transform method, frequency domain Pitch detection, Shi Yi
Frequency method for expressing, extracts the frequency spectrum, power spectrum, spectrum envelope of audio section;Cepstrum domain analyzing module, is analyzed by Homomorphic Processing
With the cepstral domain feature parameter for extracting audio section, further glottal excitation information and sound channel response message are effectively separated:
Glottal excitation information is used to judge pure and impure sound, seeks pitch period, and sound channel response message is used to seek formant, the volume for voice
Code, synthesis, identification.
It is obvious to a person skilled in the art that the invention is not restricted to the details of above-mentioned one exemplary embodiment, Er Qie
In the case of without departing substantially from spirit or essential attributes of the invention, the present invention can be realized in other specific forms.Therefore, no matter
From the point of view of which point, embodiment all should be regarded as exemplary, and be nonrestrictive, the scope of the present invention is by appended power
Profit is required rather than described above is limited, it is intended that all in the implication and scope of the equivalency of claim by falling
Change is included in the present invention.Although not each moreover, it will be appreciated that the present specification is described in terms of embodiments
Embodiment is only comprising an independent technical scheme, and this narrating mode of specification is only this area for clarity
Technical staff should be using specification as an entirety, and the technical solutions in the various embodiments may also be suitably combined, forms this
Art personnel may be appreciated other embodiment.
Claims (5)
1. a kind of data assessment educational system, it is characterised in that including:Voice pretreatment module, obtains surrounding speech information, institute
Surrounding speech packet is stated to contain the spoken sounds for talking to the data assessment educational system, represent the speech of the spoken sounds
Sound around person, by the surrounding speech information be separated into the 1st voice messaging comprising the spoken sounds and comprising except
2nd voice messaging of the sound beyond the spoken sounds, the sound level of the 1st voice messaging and the 2nd voice are believed
The sound level of breath is compared, according to result of the comparison, using the directive property and the described 1st of the 1st reproducting method and the voice reproduced
One party in the 2nd different reproducting method of reproducting method, reproduces English to be evaluated to the response voices of the spoken sounds
Language spoken audio file random division is cut into slices for equal length;Convolutional neural networks analysis module, cuts into slices to obtained audio and carries out
Short Time Fourier Transform generates corresponding two-dimentional time-frequency figure, then carries out high-level abstractions to one-dimensional time-frequency figure one by one, obtains audio and cuts
The high-level abstractions feature of piece;Assess and feedback module, the high-level abstractions feature cut into slices one by one to audio by machine learning model
Carry out analyzing the fraction for obtaining the section of each audio, then all scores are taken the mean obtain final English speaking assessment point
Number.
2. data assessment educational system according to claim 1, it is characterised in that the random audio section when it is a length of
10s。
3. data assessment educational system according to claim 1, it is characterised in that the voice signal processing module, pin
All audios are cut into slices, time-domain analysis, frequency-domain analysis and cepstrum domain analysis is sequentially completed;Parameters,acoustic analysis module, to audio
The parameters,acoustic of section is analyzed and calculated, and parameters,acoustic includes MLL frequency cepstral coefficients, linear prediction residue error and line
Spectrum is to coefficient.
4. data assessment educational system according to claim 1, it is characterised in that the 2nd reproducting method is to described
Talker has the reproducting method of directive property, and the sound level of the 2nd voice messaging is higher than in the sound level of the 1st voice messaging
In the case of, the response voice is reproduced using the 1st reproducting method, is less than the described 2nd in the sound level of the 1st voice messaging
In the case of the sound level of voice messaging, the response voice is reproduced using the 2nd reproducting method.
5. data assessment educational system according to claim 1, it is characterised in that the voice signal processing module includes
With lower module:Time-domain analysis module, the time domain charactreristic parameter in analysis and extraction audio section;Frequency-domain analysis module, passes through band
Bandpass filter group method, fourier transform method, frequency domain Pitch detection, when one frequency method for expressing, extract audio section frequency spectrum, work(
Rate spectrum, spectrum envelope;Cepstrum domain analyzing module, is analyzed and is extracted the cepstral domain feature parameter that audio is cut into slices by Homomorphic Processing,
Further glottal excitation information and sound channel response message are effectively separated:Glottal excitation information is used to judge pure and impure sound, seek base
Sound cycle, sound channel response message is used to seek formant, for the coding of voice, synthesis, identification.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710390762.0A CN107039049A (en) | 2017-05-27 | 2017-05-27 | A kind of data assessment educational system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710390762.0A CN107039049A (en) | 2017-05-27 | 2017-05-27 | A kind of data assessment educational system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107039049A true CN107039049A (en) | 2017-08-11 |
Family
ID=59539931
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710390762.0A Pending CN107039049A (en) | 2017-05-27 | 2017-05-27 | A kind of data assessment educational system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107039049A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113593529A (en) * | 2021-07-09 | 2021-11-02 | 北京字跳网络技术有限公司 | Evaluation method and device for speaker separation algorithm, electronic equipment and storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106067996A (en) * | 2015-04-24 | 2016-11-02 | 松下知识产权经营株式会社 | Voice reproduction method, voice dialogue device |
CN106653055A (en) * | 2016-10-20 | 2017-05-10 | 北京创新伙伴教育科技有限公司 | On-line oral English evaluating system |
-
2017
- 2017-05-27 CN CN201710390762.0A patent/CN107039049A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106067996A (en) * | 2015-04-24 | 2016-11-02 | 松下知识产权经营株式会社 | Voice reproduction method, voice dialogue device |
CN106653055A (en) * | 2016-10-20 | 2017-05-10 | 北京创新伙伴教育科技有限公司 | On-line oral English evaluating system |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113593529A (en) * | 2021-07-09 | 2021-11-02 | 北京字跳网络技术有限公司 | Evaluation method and device for speaker separation algorithm, electronic equipment and storage medium |
CN113593529B (en) * | 2021-07-09 | 2023-07-25 | 北京字跳网络技术有限公司 | Speaker separation algorithm evaluation method, speaker separation algorithm evaluation device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Muhammad et al. | E-hafiz: Intelligent system to help muslims in recitation and memorization of Quran | |
CN101751919B (en) | Spoken Chinese stress automatic detection method | |
Sinith et al. | Emotion recognition from audio signals using Support Vector Machine | |
Koolagudi et al. | Two stage emotion recognition based on speaking rate | |
Muhammad et al. | Voice content matching system for quran readers | |
CN102655003B (en) | Method for recognizing emotion points of Chinese pronunciation based on sound-track modulating signals MFCC (Mel Frequency Cepstrum Coefficient) | |
CN106057192A (en) | Real-time voice conversion method and apparatus | |
Fukuda et al. | Detecting breathing sounds in realistic Japanese telephone conversations and its application to automatic speech recognition | |
Tóth et al. | Speech emotion perception by human and machine | |
CN109102800A (en) | A kind of method and apparatus that the determining lyrics show data | |
CN106548785A (en) | A kind of method of speech processing and device, terminal unit | |
CN109300339A (en) | A kind of exercising method and system of Oral English Practice | |
Pervaiz et al. | Emotion recognition from speech using prosodic and linguistic features | |
Wester et al. | Evaluating comprehension of natural and synthetic conversational speech | |
Nagano et al. | Data augmentation based on vowel stretch for improving children's speech recognition | |
CN114550706A (en) | Smart campus voice recognition method based on deep learning | |
Lanjewar et al. | Speech emotion recognition: a review | |
KR20080018658A (en) | Pronunciation comparation system for user select section | |
Hillenbrand et al. | Perception of sinewave vowels | |
CN107039049A (en) | A kind of data assessment educational system | |
Nagaraja et al. | Mono and cross lingual speaker identification with the constraint of limited data | |
Hanani et al. | Speech-based identification of social groups in a single accent of British English by humans and computers | |
Singhal et al. | wspire: A parallel multi-device corpus in neutral and whispered speech | |
Mahmood et al. | Multidirectional local feature for speaker recognition | |
Waghmare et al. | A Comparative Study of the Various Emotional Speech Databases |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170811 |
|
RJ01 | Rejection of invention patent application after publication |