CN107039049A

CN107039049A - A kind of data assessment educational system

Info

Publication number: CN107039049A
Application number: CN201710390762.0A
Authority: CN
Inventors: 杨高峰; 孟军霞; 朱炯圳; 郭海峰
Original assignee: Zhengzhou Renfeng Software Development Co Ltd
Current assignee: Zhengzhou Renfeng Software Development Co Ltd
Priority date: 2017-05-27
Filing date: 2017-05-27
Publication date: 2017-08-11

Abstract

The invention discloses a kind of data assessment educational system, including：Voice pretreatment module, obtain surrounding speech information, the surrounding speech packet contains the spoken sounds talked to the data assessment educational system, represent the sound around the talker of the spoken sounds, the surrounding speech information is separated into the 1st voice messaging comprising the spoken sounds and the 2nd voice messaging comprising the sound in addition to the spoken sounds, the sound level of the sound level of 1st voice messaging and the 2nd voice messaging is compared, according to result of the comparison, using the 1st reproducting method, one party in the 2nd reproducting method different from the 1st reproducting method with the directive property of the voice of reproduction, reproduce and the response voice of the spoken sounds is cut into slices Oral English Practice audio file random division to be evaluated for equal length.The present invention voice is identified processing by speech recognition technology, assesses accuracy rate height, and autgmentability is strong.

Description

A kind of data assessment educational system

Technical field

The present invention relates to a kind of system, specifically a kind of data assessment educational system.

Background technology

Oneself is through occurring in that spoken data assessment educational system in the market, but it is all as follows that these products use at present Method：Student's spoken audio is identified as text first with speech recognition technology, signature analysis then is carried out to the text of identification, The spoken assessment result of student is finally provided with machine learning algorithm.This method greatest problem is from speech recognition period and subsequently The signature analysis stage.First, high-precision English Phonetics identification engine R＆D costs are expensive, at present only similar Google etc Large-scale scientific ＆ technical corporation or research unit just possess.Secondly, the result of speech recognition determine it is follow-up all, but current English Speech recognition technology simply has enough accuracys rate in the speech recognition of pronunciation standard, and at the beginning of the not accurate enough English that pronounces It is also undesirable in scholar (such as Chinese learners) speech recognition.Finally, the signature analysis stage needs Oral English Teaching The expert in examination field carrys out design feature, and this can also consume many manpower and materials, and effect is bad.

The content of the invention

It is an object of the invention to provide a kind of data assessment educational system, with asking for solving to propose in above-mentioned background technology Topic.

To achieve the above object, the present invention provides following technical scheme：

A kind of data assessment educational system, including：Voice pretreatment module, obtains surrounding speech information, the surrounding speech letter Breath includes the sound around the spoken sounds, the talker of the expression spoken sounds talked to the data assessment educational system Sound, the 1st voice messaging comprising the spoken sounds is separated into and comprising except the spoken utterance by the surrounding speech information 2nd voice messaging of the sound beyond sound, the sound level of the 1st voice messaging and the sound level of the 2nd voice messaging are carried out Compare, according to result of the comparison, the directive property using the 1st reproducting method and the voice reproduced is different from the 1st reproducting method The 2nd reproducting method in one party, reproduce Oral English Practice audio text to be evaluated to the response voices of the spoken sounds Part random division is cut into slices for equal length；Convolutional neural networks analysis module, carries out Fourier in short-term to obtained audio section and becomes The corresponding two-dimentional time-frequency figure of generation is changed, then high-level abstractions are carried out to one-dimensional time-frequency figure one by one, the high-level abstractions of audio section are obtained Feature；Assess and feedback module, the high-level abstractions feature that audio is cut into slices is analyzed one by one by machine learning model The fraction of each audio section, then all scores are taken the mean obtain final English speaking assessment fraction.

It is used as further scheme of the invention：When a length of 10s of the random audio section.

It is used as further scheme of the invention：The voice signal processing module, cuts into slices for all audios, is sequentially completed Time-domain analysis, frequency-domain analysis and cepstrum domain analysis；Parameters,acoustic analysis module, to audio cut into slices parameters,acoustic carry out analysis and Calculate, parameters,acoustic includes MLL frequency cepstral coefficients, linear prediction residue error and Line Spectral Pair coefficients.

It is used as further scheme of the invention：2nd reproducting method is the reproduction for having directive property to the talker Method, in the case of sound level of the sound level higher than the 2nd voice messaging of the 1st voice messaging, reproduces using the described 1st Method reproduces the response voice, in situation of the sound level less than the sound level of the 2nd voice messaging of the 1st voice messaging Under, the response voice is reproduced using the 2nd reproducting method.

It is used as further scheme of the invention：The voice signal processing module is included with lower module：Time-domain analysis mould Block, the time domain charactreristic parameter in analysis and extraction audio section；Frequency-domain analysis module, passes through bandpass filter group method, Fourier Converter technique, frequency domain Pitch detection, when one frequency method for expressing, extract audio section frequency spectrum, power spectrum, spectrum envelope；Cepstrum Domain analyzing module, is analyzed and is extracted the cepstral domain feature parameter that audio is cut into slices by Homomorphic Processing, further believe glottal excitation Breath and sound channel response message are effectively separated：Glottal excitation information is used to judge pure and impure sound, ask pitch period, sound channel response message For seeking formant, for the coding of voice, synthesis, identification.

Compared with prior art, the beneficial effects of the invention are as follows：The present invention is known by speech recognition technology to voice Other places are managed, and assess accuracy rate height, and autgmentability is strong.

Embodiment

The technical scheme in the embodiment of the present invention is clearly and completely described below, it is clear that described embodiment Only a part of embodiment of the invention, rather than whole embodiments.Based on the embodiment in the present invention, the common skill in this area The every other embodiment that art personnel are obtained under the premise of creative work is not made, belongs to the model that the present invention is protected Enclose.

In the embodiment of the present invention, a kind of data assessment educational system, including：Voice pretreatment module, obtains surrounding speech Information, the surrounding speech packet contains the spoken sounds talked to the data assessment educational system, represents the spoken utterance Sound around the talker of sound, the 1st voice messaging comprising the spoken sounds is separated into by the surrounding speech information With the 2nd voice messaging comprising the sound in addition to the spoken sounds, by the sound level of the 1st voice messaging and described The sound level of 2 voice messagings is compared, according to result of the comparison, using the 1st reproducting method and the voice reproduced directive property with One party in the 2nd different reproducting method of 1st reproducting method, reproducing will be to be evaluated to the response voice of the spoken sounds The Oral English Practice audio file random division of survey is cut into slices for equal length；Convolutional neural networks analysis module, cuts to obtained audio Piece carries out Short Time Fourier Transform and generates corresponding two-dimentional time-frequency figure, then carries out high-level abstractions to one-dimensional time-frequency figure one by one, obtains The high-level abstractions feature of audio section；Assess and feedback module, pass through senior the taking out of machine learning model one by one to audio section The fraction that each audio is cut into slices is obtained as feature analyze, then all scores are taken the mean obtains final Oral English Practice Assess fraction.

Constituted according to the spoken sounds, obtain and talked comprising spoken sounds, the expression talked to voice dialogue device The surrounding speech information of sound around the talker of voice.Surrounding speech information is separated into the 1st comprising spoken sounds Voice messaging and the 2nd voice messaging comprising the sound in addition to spoken sounds.By the sound level and the 2nd language of the 1st voice messaging The sound level of message breath is compared.According to result of the comparison, using the directive property and the 1st of the 1st reproducting method and the voice reproduced One party in the 2nd different reproducting methods of reproducting method reproduces response voice.Therefore, according to comprising being filled to voice dialogue Put the sound level of the 1st voice messaging of the spoken sounds of speech and the 2nd voice letter comprising the sound in addition to spoken sounds The comparative result of the sound level of breath, the directive property using the 1st reproducting method and the voice reproduced is different from the 1st reproducting method One party in 2nd reproducting method reproduces response voice, so can be reproduced using corresponding with the situation around talker Method reproduces response voice.

When a length of 10s of the random audio section.

The voice signal processing module, cuts into slices for all audios, is sequentially completed time-domain analysis, frequency-domain analysis and cepstrum Domain analysis；Parameters,acoustic analysis module, is analyzed and is calculated to the parameters,acoustic that audio is cut into slices, and parameters,acoustic includes MLL frequencies Rate cepstrum coefficient, linear prediction residue error and Line Spectral Pair coefficients.

2nd reproducting method is the reproducting method for having directive property to the talker, in the 1st voice messaging In the case of sound level of the sound level higher than the 2nd voice messaging, the response voice is reproduced using the 1st reproducting method, In the case of sound level of the sound level of 1st voice messaging less than the 2nd voice messaging, using the 2nd reproducting method again The existing response voice.

The voice signal processing module is included with lower module：Time-domain analysis module, in analysis and extraction audio section Time domain charactreristic parameter；Frequency-domain analysis module, passes through bandpass filter group method, fourier transform method, frequency domain Pitch detection, Shi Yi Frequency method for expressing, extracts the frequency spectrum, power spectrum, spectrum envelope of audio section；Cepstrum domain analyzing module, is analyzed by Homomorphic Processing With the cepstral domain feature parameter for extracting audio section, further glottal excitation information and sound channel response message are effectively separated： Glottal excitation information is used to judge pure and impure sound, seeks pitch period, and sound channel response message is used to seek formant, the volume for voice Code, synthesis, identification.

It is obvious to a person skilled in the art that the invention is not restricted to the details of above-mentioned one exemplary embodiment, Er Qie In the case of without departing substantially from spirit or essential attributes of the invention, the present invention can be realized in other specific forms.Therefore, no matter From the point of view of which point, embodiment all should be regarded as exemplary, and be nonrestrictive, the scope of the present invention is by appended power Profit is required rather than described above is limited, it is intended that all in the implication and scope of the equivalency of claim by falling Change is included in the present invention.Although not each moreover, it will be appreciated that the present specification is described in terms of embodiments Embodiment is only comprising an independent technical scheme, and this narrating mode of specification is only this area for clarity Technical staff should be using specification as an entirety, and the technical solutions in the various embodiments may also be suitably combined, forms this Art personnel may be appreciated other embodiment.

Claims

1. a kind of data assessment educational system, it is characterised in that including：Voice pretreatment module, obtains surrounding speech information, institute Surrounding speech packet is stated to contain the spoken sounds for talking to the data assessment educational system, represent the speech of the spoken sounds Sound around person, by the surrounding speech information be separated into the 1st voice messaging comprising the spoken sounds and comprising except 2nd voice messaging of the sound beyond the spoken sounds, the sound level of the 1st voice messaging and the 2nd voice are believed The sound level of breath is compared, according to result of the comparison, using the directive property and the described 1st of the 1st reproducting method and the voice reproduced One party in the 2nd different reproducting method of reproducting method, reproduces English to be evaluated to the response voices of the spoken sounds Language spoken audio file random division is cut into slices for equal length；Convolutional neural networks analysis module, cuts into slices to obtained audio and carries out Short Time Fourier Transform generates corresponding two-dimentional time-frequency figure, then carries out high-level abstractions to one-dimensional time-frequency figure one by one, obtains audio and cuts The high-level abstractions feature of piece；Assess and feedback module, the high-level abstractions feature cut into slices one by one to audio by machine learning model Carry out analyzing the fraction for obtaining the section of each audio, then all scores are taken the mean obtain final English speaking assessment point Number.

2. data assessment educational system according to claim 1, it is characterised in that the random audio section when it is a length of 10s。

3. data assessment educational system according to claim 1, it is characterised in that the voice signal processing module, pin All audios are cut into slices, time-domain analysis, frequency-domain analysis and cepstrum domain analysis is sequentially completed；Parameters,acoustic analysis module, to audio The parameters,acoustic of section is analyzed and calculated, and parameters,acoustic includes MLL frequency cepstral coefficients, linear prediction residue error and line Spectrum is to coefficient.

4. data assessment educational system according to claim 1, it is characterised in that the 2nd reproducting method is to described Talker has the reproducting method of directive property, and the sound level of the 2nd voice messaging is higher than in the sound level of the 1st voice messaging In the case of, the response voice is reproduced using the 1st reproducting method, is less than the described 2nd in the sound level of the 1st voice messaging In the case of the sound level of voice messaging, the response voice is reproduced using the 2nd reproducting method.

5. data assessment educational system according to claim 1, it is characterised in that the voice signal processing module includes With lower module：Time-domain analysis module, the time domain charactreristic parameter in analysis and extraction audio section；Frequency-domain analysis module, passes through band Bandpass filter group method, fourier transform method, frequency domain Pitch detection, when one frequency method for expressing, extract audio section frequency spectrum, work( Rate spectrum, spectrum envelope；Cepstrum domain analyzing module, is analyzed and is extracted the cepstral domain feature parameter that audio is cut into slices by Homomorphic Processing, Further glottal excitation information and sound channel response message are effectively separated：Glottal excitation information is used to judge pure and impure sound, seek base Sound cycle, sound channel response message is used to seek formant, for the coding of voice, synthesis, identification.