CN107039049A - A kind of data assessment educational system - Google Patents

A kind of data assessment educational system Download PDF

Info

Publication number
CN107039049A
CN107039049A CN201710390762.0A CN201710390762A CN107039049A CN 107039049 A CN107039049 A CN 107039049A CN 201710390762 A CN201710390762 A CN 201710390762A CN 107039049 A CN107039049 A CN 107039049A
Authority
CN
China
Prior art keywords
voice
sound
audio
voice messaging
frequency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710390762.0A
Other languages
Chinese (zh)
Inventor
杨高峰
孟军霞
朱炯圳
郭海峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou Renfeng Software Development Co Ltd
Original Assignee
Zhengzhou Renfeng Software Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou Renfeng Software Development Co Ltd filed Critical Zhengzhou Renfeng Software Development Co Ltd
Priority to CN201710390762.0A priority Critical patent/CN107039049A/en
Publication of CN107039049A publication Critical patent/CN107039049A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/60Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Quality & Reliability (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The invention discloses a kind of data assessment educational system, including:Voice pretreatment module, obtain surrounding speech information, the surrounding speech packet contains the spoken sounds talked to the data assessment educational system, represent the sound around the talker of the spoken sounds, the surrounding speech information is separated into the 1st voice messaging comprising the spoken sounds and the 2nd voice messaging comprising the sound in addition to the spoken sounds, the sound level of the sound level of 1st voice messaging and the 2nd voice messaging is compared, according to result of the comparison, using the 1st reproducting method, one party in the 2nd reproducting method different from the 1st reproducting method with the directive property of the voice of reproduction, reproduce and the response voice of the spoken sounds is cut into slices Oral English Practice audio file random division to be evaluated for equal length.The present invention voice is identified processing by speech recognition technology, assesses accuracy rate height, and autgmentability is strong.

Description

A kind of data assessment educational system
Technical field
The present invention relates to a kind of system, specifically a kind of data assessment educational system.
Background technology
Oneself is through occurring in that spoken data assessment educational system in the market, but it is all as follows that these products use at present Method:Student's spoken audio is identified as text first with speech recognition technology, signature analysis then is carried out to the text of identification, The spoken assessment result of student is finally provided with machine learning algorithm.This method greatest problem is from speech recognition period and subsequently The signature analysis stage.First, high-precision English Phonetics identification engine R&D costs are expensive, at present only similar Google etc Large-scale scientific & technical corporation or research unit just possess.Secondly, the result of speech recognition determine it is follow-up all, but current English Speech recognition technology simply has enough accuracys rate in the speech recognition of pronunciation standard, and at the beginning of the not accurate enough English that pronounces It is also undesirable in scholar (such as Chinese learners) speech recognition.Finally, the signature analysis stage needs Oral English Teaching The expert in examination field carrys out design feature, and this can also consume many manpower and materials, and effect is bad.
The content of the invention
It is an object of the invention to provide a kind of data assessment educational system, with asking for solving to propose in above-mentioned background technology Topic.
To achieve the above object, the present invention provides following technical scheme:
A kind of data assessment educational system, including:Voice pretreatment module, obtains surrounding speech information, the surrounding speech letter Breath includes the sound around the spoken sounds, the talker of the expression spoken sounds talked to the data assessment educational system Sound, the 1st voice messaging comprising the spoken sounds is separated into and comprising except the spoken utterance by the surrounding speech information 2nd voice messaging of the sound beyond sound, the sound level of the 1st voice messaging and the sound level of the 2nd voice messaging are carried out Compare, according to result of the comparison, the directive property using the 1st reproducting method and the voice reproduced is different from the 1st reproducting method The 2nd reproducting method in one party, reproduce Oral English Practice audio text to be evaluated to the response voices of the spoken sounds Part random division is cut into slices for equal length;Convolutional neural networks analysis module, carries out Fourier in short-term to obtained audio section and becomes The corresponding two-dimentional time-frequency figure of generation is changed, then high-level abstractions are carried out to one-dimensional time-frequency figure one by one, the high-level abstractions of audio section are obtained Feature;Assess and feedback module, the high-level abstractions feature that audio is cut into slices is analyzed one by one by machine learning model The fraction of each audio section, then all scores are taken the mean obtain final English speaking assessment fraction.
It is used as further scheme of the invention:When a length of 10s of the random audio section.
It is used as further scheme of the invention:The voice signal processing module, cuts into slices for all audios, is sequentially completed Time-domain analysis, frequency-domain analysis and cepstrum domain analysis;Parameters,acoustic analysis module, to audio cut into slices parameters,acoustic carry out analysis and Calculate, parameters,acoustic includes MLL frequency cepstral coefficients, linear prediction residue error and Line Spectral Pair coefficients.
It is used as further scheme of the invention:2nd reproducting method is the reproduction for having directive property to the talker Method, in the case of sound level of the sound level higher than the 2nd voice messaging of the 1st voice messaging, reproduces using the described 1st Method reproduces the response voice, in situation of the sound level less than the sound level of the 2nd voice messaging of the 1st voice messaging Under, the response voice is reproduced using the 2nd reproducting method.
It is used as further scheme of the invention:The voice signal processing module is included with lower module:Time-domain analysis mould Block, the time domain charactreristic parameter in analysis and extraction audio section;Frequency-domain analysis module, passes through bandpass filter group method, Fourier Converter technique, frequency domain Pitch detection, when one frequency method for expressing, extract audio section frequency spectrum, power spectrum, spectrum envelope;Cepstrum Domain analyzing module, is analyzed and is extracted the cepstral domain feature parameter that audio is cut into slices by Homomorphic Processing, further believe glottal excitation Breath and sound channel response message are effectively separated:Glottal excitation information is used to judge pure and impure sound, ask pitch period, sound channel response message For seeking formant, for the coding of voice, synthesis, identification.
Compared with prior art, the beneficial effects of the invention are as follows:The present invention is known by speech recognition technology to voice Other places are managed, and assess accuracy rate height, and autgmentability is strong.
Embodiment
The technical scheme in the embodiment of the present invention is clearly and completely described below, it is clear that described embodiment Only a part of embodiment of the invention, rather than whole embodiments.Based on the embodiment in the present invention, the common skill in this area The every other embodiment that art personnel are obtained under the premise of creative work is not made, belongs to the model that the present invention is protected Enclose.
In the embodiment of the present invention, a kind of data assessment educational system, including:Voice pretreatment module, obtains surrounding speech Information, the surrounding speech packet contains the spoken sounds talked to the data assessment educational system, represents the spoken utterance Sound around the talker of sound, the 1st voice messaging comprising the spoken sounds is separated into by the surrounding speech information With the 2nd voice messaging comprising the sound in addition to the spoken sounds, by the sound level of the 1st voice messaging and described The sound level of 2 voice messagings is compared, according to result of the comparison, using the 1st reproducting method and the voice reproduced directive property with One party in the 2nd different reproducting method of 1st reproducting method, reproducing will be to be evaluated to the response voice of the spoken sounds The Oral English Practice audio file random division of survey is cut into slices for equal length;Convolutional neural networks analysis module, cuts to obtained audio Piece carries out Short Time Fourier Transform and generates corresponding two-dimentional time-frequency figure, then carries out high-level abstractions to one-dimensional time-frequency figure one by one, obtains The high-level abstractions feature of audio section;Assess and feedback module, pass through senior the taking out of machine learning model one by one to audio section The fraction that each audio is cut into slices is obtained as feature analyze, then all scores are taken the mean obtains final Oral English Practice Assess fraction.
Constituted according to the spoken sounds, obtain and talked comprising spoken sounds, the expression talked to voice dialogue device The surrounding speech information of sound around the talker of voice.Surrounding speech information is separated into the 1st comprising spoken sounds Voice messaging and the 2nd voice messaging comprising the sound in addition to spoken sounds.By the sound level and the 2nd language of the 1st voice messaging The sound level of message breath is compared.According to result of the comparison, using the directive property and the 1st of the 1st reproducting method and the voice reproduced One party in the 2nd different reproducting methods of reproducting method reproduces response voice.Therefore, according to comprising being filled to voice dialogue Put the sound level of the 1st voice messaging of the spoken sounds of speech and the 2nd voice letter comprising the sound in addition to spoken sounds The comparative result of the sound level of breath, the directive property using the 1st reproducting method and the voice reproduced is different from the 1st reproducting method One party in 2nd reproducting method reproduces response voice, so can be reproduced using corresponding with the situation around talker Method reproduces response voice.
When a length of 10s of the random audio section.
The voice signal processing module, cuts into slices for all audios, is sequentially completed time-domain analysis, frequency-domain analysis and cepstrum Domain analysis;Parameters,acoustic analysis module, is analyzed and is calculated to the parameters,acoustic that audio is cut into slices, and parameters,acoustic includes MLL frequencies Rate cepstrum coefficient, linear prediction residue error and Line Spectral Pair coefficients.
2nd reproducting method is the reproducting method for having directive property to the talker, in the 1st voice messaging In the case of sound level of the sound level higher than the 2nd voice messaging, the response voice is reproduced using the 1st reproducting method, In the case of sound level of the sound level of 1st voice messaging less than the 2nd voice messaging, using the 2nd reproducting method again The existing response voice.
The voice signal processing module is included with lower module:Time-domain analysis module, in analysis and extraction audio section Time domain charactreristic parameter;Frequency-domain analysis module, passes through bandpass filter group method, fourier transform method, frequency domain Pitch detection, Shi Yi Frequency method for expressing, extracts the frequency spectrum, power spectrum, spectrum envelope of audio section;Cepstrum domain analyzing module, is analyzed by Homomorphic Processing With the cepstral domain feature parameter for extracting audio section, further glottal excitation information and sound channel response message are effectively separated: Glottal excitation information is used to judge pure and impure sound, seeks pitch period, and sound channel response message is used to seek formant, the volume for voice Code, synthesis, identification.
It is obvious to a person skilled in the art that the invention is not restricted to the details of above-mentioned one exemplary embodiment, Er Qie In the case of without departing substantially from spirit or essential attributes of the invention, the present invention can be realized in other specific forms.Therefore, no matter From the point of view of which point, embodiment all should be regarded as exemplary, and be nonrestrictive, the scope of the present invention is by appended power Profit is required rather than described above is limited, it is intended that all in the implication and scope of the equivalency of claim by falling Change is included in the present invention.Although not each moreover, it will be appreciated that the present specification is described in terms of embodiments Embodiment is only comprising an independent technical scheme, and this narrating mode of specification is only this area for clarity Technical staff should be using specification as an entirety, and the technical solutions in the various embodiments may also be suitably combined, forms this Art personnel may be appreciated other embodiment.

Claims (5)

1. a kind of data assessment educational system, it is characterised in that including:Voice pretreatment module, obtains surrounding speech information, institute Surrounding speech packet is stated to contain the spoken sounds for talking to the data assessment educational system, represent the speech of the spoken sounds Sound around person, by the surrounding speech information be separated into the 1st voice messaging comprising the spoken sounds and comprising except 2nd voice messaging of the sound beyond the spoken sounds, the sound level of the 1st voice messaging and the 2nd voice are believed The sound level of breath is compared, according to result of the comparison, using the directive property and the described 1st of the 1st reproducting method and the voice reproduced One party in the 2nd different reproducting method of reproducting method, reproduces English to be evaluated to the response voices of the spoken sounds Language spoken audio file random division is cut into slices for equal length;Convolutional neural networks analysis module, cuts into slices to obtained audio and carries out Short Time Fourier Transform generates corresponding two-dimentional time-frequency figure, then carries out high-level abstractions to one-dimensional time-frequency figure one by one, obtains audio and cuts The high-level abstractions feature of piece;Assess and feedback module, the high-level abstractions feature cut into slices one by one to audio by machine learning model Carry out analyzing the fraction for obtaining the section of each audio, then all scores are taken the mean obtain final English speaking assessment point Number.
2. data assessment educational system according to claim 1, it is characterised in that the random audio section when it is a length of 10s。
3. data assessment educational system according to claim 1, it is characterised in that the voice signal processing module, pin All audios are cut into slices, time-domain analysis, frequency-domain analysis and cepstrum domain analysis is sequentially completed;Parameters,acoustic analysis module, to audio The parameters,acoustic of section is analyzed and calculated, and parameters,acoustic includes MLL frequency cepstral coefficients, linear prediction residue error and line Spectrum is to coefficient.
4. data assessment educational system according to claim 1, it is characterised in that the 2nd reproducting method is to described Talker has the reproducting method of directive property, and the sound level of the 2nd voice messaging is higher than in the sound level of the 1st voice messaging In the case of, the response voice is reproduced using the 1st reproducting method, is less than the described 2nd in the sound level of the 1st voice messaging In the case of the sound level of voice messaging, the response voice is reproduced using the 2nd reproducting method.
5. data assessment educational system according to claim 1, it is characterised in that the voice signal processing module includes With lower module:Time-domain analysis module, the time domain charactreristic parameter in analysis and extraction audio section;Frequency-domain analysis module, passes through band Bandpass filter group method, fourier transform method, frequency domain Pitch detection, when one frequency method for expressing, extract audio section frequency spectrum, work( Rate spectrum, spectrum envelope;Cepstrum domain analyzing module, is analyzed and is extracted the cepstral domain feature parameter that audio is cut into slices by Homomorphic Processing, Further glottal excitation information and sound channel response message are effectively separated:Glottal excitation information is used to judge pure and impure sound, seek base Sound cycle, sound channel response message is used to seek formant, for the coding of voice, synthesis, identification.
CN201710390762.0A 2017-05-27 2017-05-27 A kind of data assessment educational system Pending CN107039049A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710390762.0A CN107039049A (en) 2017-05-27 2017-05-27 A kind of data assessment educational system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710390762.0A CN107039049A (en) 2017-05-27 2017-05-27 A kind of data assessment educational system

Publications (1)

Publication Number Publication Date
CN107039049A true CN107039049A (en) 2017-08-11

Family

ID=59539931

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710390762.0A Pending CN107039049A (en) 2017-05-27 2017-05-27 A kind of data assessment educational system

Country Status (1)

Country Link
CN (1) CN107039049A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113593529A (en) * 2021-07-09 2021-11-02 北京字跳网络技术有限公司 Evaluation method and device for speaker separation algorithm, electronic equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106067996A (en) * 2015-04-24 2016-11-02 松下知识产权经营株式会社 Voice reproduction method, voice dialogue device
CN106653055A (en) * 2016-10-20 2017-05-10 北京创新伙伴教育科技有限公司 On-line oral English evaluating system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106067996A (en) * 2015-04-24 2016-11-02 松下知识产权经营株式会社 Voice reproduction method, voice dialogue device
CN106653055A (en) * 2016-10-20 2017-05-10 北京创新伙伴教育科技有限公司 On-line oral English evaluating system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113593529A (en) * 2021-07-09 2021-11-02 北京字跳网络技术有限公司 Evaluation method and device for speaker separation algorithm, electronic equipment and storage medium
CN113593529B (en) * 2021-07-09 2023-07-25 北京字跳网络技术有限公司 Speaker separation algorithm evaluation method, speaker separation algorithm evaluation device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
Muhammad et al. E-hafiz: Intelligent system to help muslims in recitation and memorization of Quran
CN101751919B (en) Spoken Chinese stress automatic detection method
Sinith et al. Emotion recognition from audio signals using Support Vector Machine
Koolagudi et al. Two stage emotion recognition based on speaking rate
Muhammad et al. Voice content matching system for quran readers
CN102655003B (en) Method for recognizing emotion points of Chinese pronunciation based on sound-track modulating signals MFCC (Mel Frequency Cepstrum Coefficient)
CN106057192A (en) Real-time voice conversion method and apparatus
Fukuda et al. Detecting breathing sounds in realistic Japanese telephone conversations and its application to automatic speech recognition
Tóth et al. Speech emotion perception by human and machine
CN109102800A (en) A kind of method and apparatus that the determining lyrics show data
CN106548785A (en) A kind of method of speech processing and device, terminal unit
CN109300339A (en) A kind of exercising method and system of Oral English Practice
Pervaiz et al. Emotion recognition from speech using prosodic and linguistic features
Wester et al. Evaluating comprehension of natural and synthetic conversational speech
Nagano et al. Data augmentation based on vowel stretch for improving children's speech recognition
CN114550706A (en) Smart campus voice recognition method based on deep learning
Lanjewar et al. Speech emotion recognition: a review
KR20080018658A (en) Pronunciation comparation system for user select section
Hillenbrand et al. Perception of sinewave vowels
CN107039049A (en) A kind of data assessment educational system
Nagaraja et al. Mono and cross lingual speaker identification with the constraint of limited data
Hanani et al. Speech-based identification of social groups in a single accent of British English by humans and computers
Singhal et al. wspire: A parallel multi-device corpus in neutral and whispered speech
Mahmood et al. Multidirectional local feature for speaker recognition
Waghmare et al. A Comparative Study of the Various Emotional Speech Databases

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170811

RJ01 Rejection of invention patent application after publication