CN113205729A

CN113205729A - Foreign student-oriented speech evaluation method, device and system

Info

Publication number: CN113205729A
Application number: CN202110389484.3A
Authority: CN
Inventors: 李会法; 沈莺英; 谈遥新; 张恒彰; 王华珍
Original assignee: Huaqiao University
Current assignee: Huaqiao University
Priority date: 2021-04-12
Filing date: 2021-04-12
Publication date: 2021-08-03

Abstract

The invention discloses a lecture evaluation method, a lecture evaluation device and a lecture evaluation system for foreign retention students, wherein the lecture evaluation method, the lecture evaluation device and the lecture evaluation system comprise the following steps: acquiring speech data of a lecturer; carrying out voice recognition on the speech voice data to obtain recognition text data; respectively extracting features of the speech data and the recognition text data to obtain quantifiable evaluation features, wherein the quantifiable evaluation features comprise speech scoring features and speech piece scoring features; and evaluating the speech data of the speech according to the quantifiable evaluation characteristics to obtain an evaluation result of the speech. The invention provides objective, professional, universal and systematic spoken language ability evaluation standards for Chinese learners, and has promotion effect on the improvement of the abilities of the Chinese learners and the development of the Chinese education industry.

Description

Foreign student-oriented speech evaluation method, device and system

Technical Field

The invention relates to the field of Chinese education industry and the technical field of language signal processing, in particular to a speech evaluation method, device and system for foreign students.

Background

In the spreading period of Chinese development, most Chinese learners study systematically in schools, both textbook textbooks and teacher vocabularies basically have more written languages, and the teaching and the use of spoken languages are relatively few. Therefore, the following two problems are common to Chinese learners when learning Chinese: first, there is a lack of opportunity for spoken language practice; and secondly, some Chinese learners want to do oral exercises, but cannot insist on the exercises because of no evaluation feedback. The traditional Chinese spoken language practice method has few platforms and resources in the market, because the spoken language is a very targeted course and does not have a uniform reference standard. Each individual will have a different expression and thus the evaluation feedback will be different accordingly.

Disclosure of Invention

The invention mainly aims to provide a speech evaluation method, a speech evaluation device and a speech evaluation system for foreign retention students, which provide objective, professional, universal and systematic spoken language ability evaluation standards for Chinese learners and have a promoting effect on the improvement of the self ability of the Chinese learners and the development of the Chinese education industry.

The invention adopts the following technical scheme:

in a first aspect, a speech evaluation method for foreign retention students includes:

acquiring speech data of a lecturer;

carrying out voice recognition on the speech voice data to obtain recognition text data;

respectively extracting features of the speech data and the recognition text data to obtain quantifiable evaluation features, wherein the quantifiable evaluation features comprise speech scoring features and speech piece scoring features; the voice scoring characteristics comprise fluency characteristics, effectiveness characteristics, speech speed characteristics and voice basic scoring characteristics; the sentence scoring characteristics comprise a principal and predicate analysis characteristic, a mixed sentence pattern characteristic, a proper vocabulary use characteristic, a language framework scoring characteristic, a healthy emotion characteristic, a consistent theme characteristic, a spoken language expression thinking characteristic and a sentence basic scoring characteristic;

and evaluating the speech data of the speech according to the quantifiable evaluation characteristics to obtain an evaluation result of the speech.

Preferably, the method for extracting fluency features comprises the following steps:

counting the time TD of pause occurrence in the speech data; the pause refers to the fact that the voice data of the speech are subjected to endpoint detection by adopting a VAD algorithm, the position of an endpoint of the voice is obtained, the duration between two connected endpoints is calculated, and the duration exceeds a set threshold value h_tIndicating that a pause occurred;

designing and outputting speech fluency S based on the number TD of pause occurrences; the value of the voice fluency S is in inverse proportion to the pause times, namely the smaller the pause times, the larger the value of the voice fluency S is;

preferably, the method for extracting the effectiveness characteristics comprises the following steps:

counting the number of pause words in the recognized text data and the occurrence times of invalid repeated voice texts; specifically, counting the number I of words paused in the text based on a pausing word list, and counting the occurrence times J of invalid repeated voice texts in the text based on a rule method;

the number of stop words and the number of occurrences of invalid repeated speech text are taken as validity characteristics.

Preferably, the method for extracting the speech rate features includes:

acquiring the text length L of the recognition text data and the audio length T of the speech voice data, and calculating the speech speed L/T;

and taking the speech rate L/T as the speech rate characteristic.

Preferably, the method for extracting the speech basic feature comprises the following steps:

obtaining the audio length T of the speech data, obtaining the audio length T and a preset threshold h_vtThe ratio of (A) to (B); the preset threshold h_vtThe audio length of the speech of the voice basic score can be obtained for the speaker;

the audio length T is compared with a preset threshold value h_vtThe ratio of (a) is used as a speech basis feature.

Preferably, the method for extracting the syntactic analysis characteristics of the principal and predicate object comprises the following steps:

counting the number G of sentences which accord with the grammar structure information standard in the N sentences of the recognized text data to obtain the text sentence structure standard rate G/N;

and taking the standard rate G/N of the structure of the text sentence as a syntactic analysis characteristic of the principal and predicate object.

Preferably, the method for extracting the mixed sentence pattern features comprises the following steps:

counting the number M of sentences which accord with a standard sentence pattern in the N-sentence texts of the identification text data to obtain the standard rate M/N of the text sentence pattern;

and taking the standard rate M/N of the text sentence pattern as a syntactic analysis characteristic of the principal and predicate object.

Preferably, the method for extracting proper or improper vocabulary use features comprises the following steps:

counting the error quantity W of the text in the N sentences of text of the identification text data;

the text error number W is used as a word whether proper features are used.

Preferably, the method for extracting the language framework score features comprises the following steps:

splitting the recognized text data into an array by taking sentences as units, dividing words by taking words as units, matching the array after word division with a language frame dictionary, and calculating the number F of sentences conforming to the language frame;

and taking the number F of sentences which accord with the language frame as the language frame score characteristic.

Preferably, the method for extracting the health characteristics of the emotion comprises the following steps:

firstly, a large-scale corpus is used for training word vectors through word2vec, then word segmentation operation is carried out on the recognition text data, and word vectors word corresponding to each word are found out_iThe difference between the directions of the two vectors of the word vector in the recognized text data and the word vector in the violation dictionary is measured through cosine distance, and the calculation method is that

Where D is the word vector dimension, word_1kBeing a vector of text words, word_2kThe word vectors are word vectors in the violation dictionary library;

and taking the cosine distance as the health characteristic of the emotion.

Preferably, the method for extracting whether the theme is consistent with the features comprises the following steps:

according to the number P of the subject words matched with the identification text data and the number P of all the subject words_allCounting the subject coincidence rate P/P of the identification text data and the subject_all；

Matching the topic compliance P/P_allAs to whether the subject conforms to the characteristics.

Preferably, the method for extracting the features of the thought expressed by the spoken language comprises the following steps:

analyzing and matching the text data with a written language dictionary in the unit of sentences, and counting the number Wr of the sentences of the written language;

the sentence number Wr of the written language is taken as a spoken language expression thinking characteristic.

Preferably, the method for extracting the basic feature of the speech piece comprises the following steps:

taking the length of the identification text data as a basic score Q of a basic part of the user language, and regarding the identification text data, if the standard text length isL_standardThe standard basis of speech is score_{text_basic}Identifying the length of the text data as L, and calculating the basic score of the user language piece

And taking the user speech piece basic part Q as the speech piece basic part characteristic.

Preferably, according to the quantifiable evaluation feature, evaluating the speech data to obtain an evaluation result of the speech, including:

and respectively obtaining corresponding scores based on the extracted fluency feature, effectiveness feature, speech speed feature, speech basic feature, chief and predicate analysis feature, mixed sentence pattern feature, proper vocabulary use feature, language framework score feature, healthy emotion feature, consistent theme feature, spoken language expression thinking feature and speech basic feature, and calculating a total score through summation to realize evaluation of the speech data to obtain an evaluation result of the speech.

On the other hand, a speech evaluation device for foreign retention students comprises:

the voice data acquisition module is used for acquiring the speech data of the speaker;

the speech data recognition module is used for carrying out speech recognition on the speech data of the speech to obtain recognition text data;

the feature extraction module is used for respectively extracting features of the speech data and the recognition text data to obtain quantifiable evaluation features, and the quantifiable evaluation features comprise speech scoring features and speech piece scoring features; the voice scoring characteristics comprise fluency characteristics, effectiveness characteristics, speech speed characteristics and voice basic scoring characteristics; the sentence scoring characteristics comprise a principal and predicate analysis characteristic, a mixed sentence pattern characteristic, a proper vocabulary use characteristic, a language framework scoring characteristic, a healthy emotion characteristic, a consistent theme characteristic, a spoken language expression thinking characteristic and a sentence basic scoring characteristic;

and the evaluation module is used for evaluating the speech data according to the quantifiable evaluation characteristics to obtain the evaluation result of the speech.

In another aspect, a speech evaluation system for foreign retention students includes:

the client is used for acquiring the speech data of the speaker;

the server is used for the speech data of the speech sent by the client; carrying out voice recognition on the speech voice data to obtain recognition text data; respectively extracting features of the speech data and the recognition text data to obtain quantifiable evaluation features, wherein the quantifiable evaluation features comprise speech scoring features and speech piece scoring features; the voice scoring characteristics comprise fluency characteristics, effectiveness characteristics, speech speed characteristics and voice basic scoring characteristics; the sentence scoring characteristics comprise a principal and predicate analysis characteristic, a mixed sentence pattern characteristic, a proper vocabulary use characteristic, a language framework scoring characteristic, a healthy emotion characteristic, a consistent theme characteristic, a spoken language expression thinking characteristic and a sentence basic scoring characteristic; evaluating the speech data according to the quantifiable evaluation characteristics to obtain an evaluation result of the speech; and sending the evaluation result to a client for display.

Compared with the prior art, the invention has the following beneficial effects:

(1) the spoken language ability method for the Chinese learner is objective in standard, professional, universal and systematic, and a method based on a corpus is adopted to design a calculation method of each evaluation characteristic; the invention can promote the self-ability improvement of Chinese learners and the development of Chinese education industry;

(2) after learning the oral practice requirements of the current Chinese learner, the device and the system of the invention are designed according to the learning concept of 'the language is not learned but is practiced'; the invention is a spoken language practice system with convenient operation, flexible application and integrated design, and the practice can be carried out on the WeChat small program at any time and any place as long as one smart phone is provided; the invention can meet the individual requirements of the current Chinese learner and guides the spoken language of the user in a targeted manner; the device is different from common spoken language dialogue practice which has the characteristic of random use and aims to realize simple daily communication; the invention can not only train the spoken language ability of Chinese learners, but also improve the level of other skills such as Chinese thinking ability of the learners; the invention aims to provide students with the opportunity to speak Chinese, avoid the phenomenon of 'dumb Chinese', train the spoken language expression ability of the students and finally achieve the effect of fully developing four language skills of listening, speaking, reading and writing of Chinese learners.

The above description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the description of the technical means more comprehensible.

The above and other objects, advantages and features of the present invention will become more apparent to those skilled in the art from the following detailed description of specific embodiments thereof, taken in conjunction with the accompanying drawings.

Drawings

Fig. 1 is an interface schematic diagram of a lecture evaluation method for foreign retention students according to an embodiment of the present invention;

FIG. 2 is a flowchart of a lecture evaluation method for foreign reservation students according to an embodiment of the present invention;

fig. 3 is a block diagram of the speech evaluation device for foreign retention students according to the embodiment of the present invention;

fig. 4 is a block diagram of a speech evaluation system for foreign reservation students according to an embodiment of the present invention.

Detailed Description

The invention will be further illustrated with reference to the following specific examples. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art after reading the teaching of the present invention, and such equivalents may fall within the scope of the present invention as defined in the appended claims.

Referring to fig. 1 and 2, the speech evaluation method for foreign retention students of the present invention includes the following steps: (1) acquiring text data of user voice; (2) and calculating the speech capability evaluation index.

The method comprises the following steps: text data of a user's voice is acquired.

The method specifically comprises the following steps:

s201, obtaining speech data of a speaker;

s202, carrying out voice recognition on the speech voice data to obtain recognition text data.

In the invention, based on a lecture evaluation method for foreign students, scene corpora corresponding to 3 evaluation dimensions are designed according to different levels of topics, related prompts of the topics, and model texts and written expression corpora of the topics.

1. Theme design of different levels. 46 corpora are designed (it should be noted that the corpus can be expanded according to needs, and the following is only an example), and the types such as "my weekend", "my vacation arrangement" and the like are simple in one degree and difficult in another degree. The theme design at different levels is shown in table 1 below, where difficiltly columns represent different levels and the me column represents a theme.

TABLE 1

2. Corpus design of relevant tips of subjects. 5 sets of related prompt corpuses of different subjects are designed (it should be noted that the related prompt corpuses of the subjects can be expanded according to needs, and the following is only an example). The tips 1-5 columns represent the prompt corpuses associated with the corresponding topics, respectively, as shown in Table 2 below.

TABLE 2

3. The model and written expression language material design of the subject. A set of corpora is designed 46, and the corpus schematic of the design is shown in the following table (it should be noted that the corpus schematic can be expanded as desired, and the following is only an example). Where Model is a Model and expression is a written expression for similarity comparison between the user's speech and text, as shown in table 3 below.

TABLE 3

And respectively displaying the designed small speech contents to the user, thereby obtaining 3 different conversation data sets such as corresponding speech audio, text evaluation, speech score and the like. Taking the partial information of the user session data set as an example, the content topic is "my weekend", as shown in table 4 below:

TABLE 4

As shown in the above table, the results of the actual performance of the 3 users in the speech evaluation system are included, different users can select different topics, and the language expression capabilities and levels of different users are different, so that the total word number and the speech speed of speech are different, the contents are also significantly different, and the final scores and the total score are also different.

The second step is that: and calculating the speech capability evaluation index.

The method specifically comprises the following steps:

s203, respectively carrying out feature extraction on the speech data and the recognition text data to obtain quantifiable evaluation features, wherein the quantifiable evaluation features comprise speech scoring features and speech piece scoring features; the voice scoring characteristics comprise fluency characteristics, effectiveness characteristics, speech speed characteristics and voice basic scoring characteristics; the sentence scoring characteristics comprise a principal and predicate analysis characteristic, a mixed sentence pattern characteristic, a proper vocabulary use characteristic, a language framework scoring characteristic, a healthy emotion characteristic, a consistent theme characteristic, a spoken language expression thinking characteristic and a sentence basic scoring characteristic;

and S204, evaluating the speech data according to the quantifiable evaluation characteristics to obtain the evaluation result of the speech.

In this embodiment, the method for extracting fluency features includes:

counting the times of pause occurrence in the speech data; specifically, the speech data is subjected to end point detection by VAD algorithm, and a threshold value h is set_tTime between endpoints over h_tA pause is indicated;

outputting voice fluency S as fluency characteristics based on the number of times of pause occurrence; the value range of the voice fluency S is [0,1], and is inversely proportional to the pause times, namely, the smaller the pause times, the larger the value of the voice fluency S.

Specifically, fluency characteristics (indexes) measure the thinking power and language organization ability of Chinese learners, and are used for carrying out fluency scoring on spoken texts spoken by users and input voices of the users, and scoring dimensionality is mainly carried out according to pause, speech speed and intonation used for the voices. Performing endpoint detection on voice information by adopting VAD algorithm to obtain the endpoint position of voice, and solving the time length between two connected endpoints, wherein the time length exceeds a set threshold value h_tIndicating that a pause has occurred. Big data training and calculation are carried out on the existing data set through a recurrent neural network to obtain a fluency judging model, the user input voice is given, the voice fluency S is output, and the value range of the voice fluency is [0,1]]The larger the value, the better.

In this embodiment, the method for extracting the validity characteristic includes:

Specifically, the validity characteristics (indexes) determine the occurrence times of the stop words and the invalid repeated voice texts of the Chinese learner in the speaking process. For Chinese speech, when the Chinese learner does not express smoothly, the Chinese speech utters words such as 'forehead', 'click' and 'o', which are called pause words, and speech texts which appear continuously for multiple times are called invalid repeated speech texts. For the text converted from the user voice, counting the number I of pause words in the text based on the pause word list, and counting the occurrence times J of invalid repeated voice text in the text based on a rule method. The numeric area of the number of the stop words and the occurrence frequency of the invalid repeated voice text is [0, ∞ ], and the smaller the value is, the better the value is.

In this embodiment, the method for extracting speech rate features includes:

and taking the speech rate L/T as the speech rate characteristic.

Specifically, the speech speed characteristics (indexes) are used for judging the Chinese oral proficiency of the Chinese learner. Obtaining an evaluation text of a user, obtaining the text length L input by the user and the audio length T input by the user, and calculating the speech speed L/T, wherein the value range of the speech speed is [0, ∞]Having a value of [ v ]_low,v_high]Is preferred wherein v_lowIs the required minimum speech rate threshold, v_highIs the required maximum speech rate threshold.

In this embodiment, the method for extracting the speech basic feature includes:

Specifically, the speech basic sub-characteristics (indexes) measure the basic ability of spoken speech of the Chinese learner. Setting a threshold h for the length of audio input by a user_vtWhen the user audio length exceeds a threshold h_vtGiving the user full score of the voice base, the voice length value range is [0, ∞]The larger the value, the better.

In this embodiment, the method for extracting the syntactic analysis feature of the principal and predicate object includes:

Specifically, the syntactic analysis characteristics (indexes) of the principal and predicate guest measure the Chinese grammar ability of the user, the evaluation text input by the user is divided into an array by taking a sentence as a unit, and meanwhile, the word is divided by taking a word as a unit. The sentence is then parsed through a dependency syntax tree to determine the syntactic structure of the sentence or the dependency between the words in the sentence. And further extracting the main and subordinate object shape complementing relationship of the sentence according to the dependency relationship. And a correct syntactic structure dictionary is counted by combining a specialist method with a technical counting mode, the main meaning object shape complementing structure information analyzed by taking a sentence as a unit is sent to the dictionary for matching of the inclusion relationship, if the main meaning object shape complementing structure information analyzed by the sentence has matching information in the dictionary, the main meaning object shape complementing structure information is regarded as the syntactic structure information standard of the sentence, and if the main meaning object shape complementing structure information does not have matching information, the system judges that the syntactic structure information has a problem. And for N sentences of texts input by the user, calculating the sentence quantity G meeting the grammar structure information standard to obtain the text sentence structure standard rate G/N, wherein the value range of the text sentence structure standard rate is [0,1], and the larger the value is, the better the value is.

In this embodiment, the method for extracting the features of the mixed sentence pattern includes:

Specifically, the mixed sentence pattern characteristics (indexes) judge the Chinese grammar application capability of the user. The method comprises the steps of counting a dictionary of a standard sentence pattern according to the standard sentence pattern given in International Chinese teaching general course outline, analyzing a text input by a user in a sentence unit, dividing a sentence into words, analyzing the words from three angles of main meaning object fixed form supplement, word part of speech and key words, sending information of three dimensions into the dictionary of the standard sentence pattern for accurate matching, and if the standard sentence pattern is matched, regarding the sentence as the standard sentence pattern, otherwise, not the standard sentence pattern. For N sentence texts input by a user, calculating the number M of sentences conforming to the standard sentence pattern to obtain the standard rate M/N of the text sentence pattern, wherein the value range of the standard rate of the text sentence pattern is [0,1], and the larger the value is, the better the value is.

In this embodiment, the method for extracting proper vocabulary use features includes:

the text error number W is used as a word whether proper features are used.

Specifically, whether proper characteristics (indexes) are used for measuring the phrase correlation ability of a user, the text input by the user is corrected through an expert dictionary method and an open source framework pycorector, the expert dictionary method is used for counting error-prone word collocation, such as wrong collocation of 'wearing clothes' and 'wearing hat', and the like, by experts in the Chinese education field, further, when a program is processed, whether error collocation exists in the text is traversed, and if the error collocation exists, the text is regarded as a text error. The open source framework pycorrctor detects the position of the wrongly written character according to the language model, and corrects the wrongly written character through the phonetic sound similar characteristic, the stroke five-stroke editing distance characteristic and the language model confusion characteristic. The technology uses a statistical language model kenlm tool, and simultaneously uses rnn _ attention, rnn _ crf, seq2seq _ attention, transformer, conv _ seq2seq and electra pre-training models to perform joint training, and a deep learning model for text error correction is trained to be used for detecting text error information input by a user. And for N sentences of texts input by the user, calculating the text error number W, wherein the text error number has a value range of [0, ∞ ], and the smaller the value is, the better the value is.

In this embodiment, the method for extracting the language framework score feature includes:

Specifically, the language framework score features (indexes) measure the language logic ability of the user, and a language framework dictionary is arranged by a specialist method, such as "though. And splitting the evaluation text input by the user into an array by taking the sentence as a unit, dividing the word by taking the word as a unit, matching the array after word division with a language frame dictionary, and calculating the number F of the sentences conforming to the language frame. The number of the language frame sentences is in a value range of [0, N ], and the larger the value is, the better the value is.

In this embodiment, the method for extracting whether emotion is healthy includes:

and taking the cosine distance as the health characteristic of the emotion.

Specifically, judging whether the expressed sentences of the user accord with the core value view of social connotation by emotion health characteristics (indexes), performing text health audit aiming at texts input by the user, covering rich audit dimensions such as political involvement, yellow involvement, terrorism involvement, malicious promotion, low-quality irrigation, official illegal content library and the like, and sorting out an illegal dictionary library by a specialist law; firstly, a large-scale corpus is used for training word vectors through word2vec, then word segmentation operation is carried out on input texts, and word vectors word corresponding to each word are found out_iThe difference between the direction of the word vector in the input text and the direction of the two vectors of the word vector in the violation dictionary is measured by cosine distance, and the calculation method is that

Where D is the word vector dimension, word_1kBeing a vector of text words, word_2kAnd the word vectors are word vectors in the violation dictionary library. The cosine value range of the specified included angle is [0,1]]The larger the cosine value is, the smaller the included angle between the two vectors is, namely the two vectors are more similar, and when the similarity is larger than the threshold h_dIt indicates that the word violates the core value view of social meaning, with smaller values being better.

In this embodiment, the method for extracting whether the theme conforms to the feature includes:

according toThe number P of the subject words matched with the identification text data and the number P of all the subject words_allCounting the subject coincidence rate P/P of the identification text data and the subject_all；

Specifically, whether the theme conforms to the characteristics (indexes) or not is evaluated, whether the statement stated by a user is the same as the theme of the theme given by the system or not is judged, the subject words of the theme are counted according to the corresponding theme through a professional method, the subject words are stored by dictionary information, if the evaluation text input by the user is accurately matched with the subject word information listed by an expert, the number P of the subject words matched with the evaluation text of the user and the number P of all the subject words are evaluated according to the user_allFurther, the topic coincidence rate P/P of the text and the topic is counted_allThe subject match rate value range is [0,1]]The larger the value, the better.

In this embodiment, the method for extracting the thought characteristics of spoken language expression includes:

Specifically, the spoken language expression thinking characteristics (indexes) judge the thinking strength of the user, the spoken language expression thinking of the user is analyzed according to the question examination time of the user, the spoken language proportion of the user and the use proportion dimensionality of written languages, the question reading time of the user is captured according to the front end of the small program, and the added branch written languages for summarizing the problem are arranged into a written language dictionary in a specialist method. Analyzing and matching the written language dictionary for the input text of the user in the unit of sentences, and counting the sentences of the written language to obtain the written language use ratio Wr, wherein Wr is the number of the sentences of the written language. The written language is used in a ratio of [0, ∞ ], with larger values being better.

In this embodiment, the method for extracting a speech piece basic feature includes:

using the length of the identification text data as a user languageA basic score Q of the basic score, for the recognized text data, if the standard text length is L_standardThe standard basis of speech is score_{text_basic}Identifying the length of the text data as L, and calculating the basic score of the user language piece

Specifically, the basic feature (index) of the speech piece measures the basic capability of the spoken language piece of the Chinese learner, the text length is evaluated as the basic score Q of the basic part of the speech piece of the user according to the input of the user, and for the evaluation text input by the user, if the standard text length is L_standardThe standard basis of speech is score_{text_basic}Inputting text with length L, calculating user speech piece basic score

The text length is in the range of [0, ∞]The larger the value, the better. The basic score of the language ensures that the user has a certain basic score.

For the user 162514089, based on the above feature extraction method, the related task completion capability index calculation values are respectively as follows:

(1) fluency: the index value range is [0,1], and the larger the value, the better. For the spoken voice audio input by the user 162514089, the fluency value is 0.83, which indicates that the user is fluent during expression and has good spoken language expression capability.

(2) Effectiveness: the index is calculated by the occurrence frequency of stop words and invalid repeated voice texts of Chinese learners in the speaking process, the value range of the occurrence frequency of the stop words and the invalid repeated voice texts is [0, ∞ ], and the smaller the value is, the better the value is. Aiming at the spoken voice audio frequency input by the user 162514089, the occurrence times of stop words and invalid repeated voice texts are all 0, which indicates that the user has coherent thinking during expression, smooth expression and better spoken language expression capability.

(3) The speed of speech: the index value range is [0, ∞]Having a value of [ v ]_low,v_high]Is preferred wherein v_lowIs the required minimum speech rate threshold, v_highIs the required maximum speech rate threshold. For the spoken voice audio input by the user 162514089, the value of the speed of speech is 4.30 words/second, which is close to the native Chinese language user, and the spoken language expression capability of the user is better.

(4) The voice basic score is as follows: the voice basic score is measured through the voice length, the index value range is [0, ∞ ], and the larger the value is, the better the value is. For the spoken voice audio input by the user 162514089, the user voice length is 19.77 seconds, which indicates that the user expression content is not sufficient and fails to reach the suggested voice length.

(5) And (3) carrying out syntactic analysis on the major and predicate guest: the index value field is [0,1], the parsing value of the predicate object is 1.00 aiming at the text of voice audio conversion input by the user 162514089, and the result shows that no grammar error occurs to the user, the grammar knowledge of the user is rich, and the expression accords with the Chinese expression habit.

(6) Mixing sentence patterns: the index value field is [0,1], and the value of the mixed sentence pattern is 0.75 aiming at the text of the voice audio conversion input by the user 162514089, which indicates that the user uses some standard sentence patterns in the expression process and the expression mode is reasonable.

(7) Whether the vocabulary is used properly: the index value range is [0, ∞ ], and whether the vocabulary is properly used is 1.00 for the text of the voice audio conversion input by the user 162514089, which indicates that the user has no wrong word collocation and has a proper expression mode.

(8) Language framework: the index value field is [0, N ], where N is the number of sentences in the text, and the value of the language frame is 0 for the text of the voice audio conversion input by the user 162514089, which indicates that the sentence expression such as "albeit.

(9) Whether the emotion is healthy or not: the index value field is [0,1], and the value of whether emotion is healthy or not is 0.99 for the text of voice audio conversion input by the user 162514089, which indicates that the user expresses idea health and does not have content such as political, yellow, terrorism, malicious popularization and the like.

(10) Whether the topics are consistent: the index value field is [0,1], and whether the theme is consistent or not is 1.00 for the text of the voice audio conversion input by the user 162514089, so that the user can be explained on the basis of understanding the theme by closely fastening the theme during expression.

(11) Spoken language expresses thinking: the index value range is [0, ∞ ], the value of the spoken language expression thinking power for the text of the voice audio conversion input by the user 162514089 is 2, the user is explained to think during expression, some common words are replaced by higher-level words, and the richness of the words mastered by the user is reflected.

(12) The basic part of the language: the basic score of the speech segment is measured by the text length of the speech audio conversion, the index value field is [0, ∞ ], and the text length value of the text of the speech audio conversion input by the user 162514089 is 85, which indicates that the user expression content is not sufficient and cannot reach the suggested text length.

In this embodiment, evaluating the speech data according to the quantifiable evaluation feature to obtain an evaluation result of the speech, including:

Specifically, the following table 5 shows the total score evaluation method designed by the present invention.

TABLE 5

Referring to fig. 3, a speech evaluation apparatus for foreign retention students includes:

a voice data obtaining module 301, configured to obtain speech data of a speaker;

a voice data recognition module 302, configured to perform voice recognition on the speech data to obtain recognition text data;

the feature extraction module 303 is configured to perform feature extraction on the speech data and the recognition text data to obtain quantifiable evaluation features, where the quantifiable evaluation features include speech scoring features and speech piece scoring features; the voice scoring characteristics comprise fluency characteristics, effectiveness characteristics, speech speed characteristics and voice basic scoring characteristics; the sentence scoring characteristics comprise a principal and predicate analysis characteristic, a mixed sentence pattern characteristic, a proper vocabulary use characteristic, a language framework scoring characteristic, a healthy emotion characteristic, a consistent theme characteristic, a spoken language expression thinking characteristic and a sentence basic scoring characteristic;

and the evaluating module 304 is used for evaluating the speech data according to the quantifiable evaluating characteristics to obtain a speech evaluating result.

Referring to fig. 4, a speech evaluation system for foreign retention students includes:

a client 401, configured to obtain speech data of a speaker;

a server 402, configured to send the speech data of the speech by the client; carrying out voice recognition on the speech voice data to obtain recognition text data; respectively extracting features of the speech data and the recognition text data to obtain quantifiable evaluation features, wherein the quantifiable evaluation features comprise speech scoring features and speech piece scoring features; the voice scoring characteristics comprise fluency characteristics, effectiveness characteristics, speech speed characteristics and voice basic scoring characteristics; the sentence scoring characteristics comprise a principal and predicate analysis characteristic, a mixed sentence pattern characteristic, a proper vocabulary use characteristic, a language framework scoring characteristic, a healthy emotion characteristic, a consistent theme characteristic, a spoken language expression thinking characteristic and a sentence basic scoring characteristic; evaluating the speech data according to the quantifiable evaluation characteristics to obtain an evaluation result of the speech; and sending the evaluation result to a client for display.

The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes and modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention.

Claims

1. A speech evaluation method for foreign retention students is characterized by comprising the following steps:

acquiring speech data of a lecturer;

2. The foreign reservation student oriented speech evaluation method according to claim 1, wherein the fluency feature extraction method comprises:

counting the time TD of pause occurrence in the speech data; pause refers to a pairPerforming endpoint detection on the speech data by adopting a VAD algorithm to acquire the endpoint position of the speech, and solving the time length between two connected endpoints which exceeds a set threshold value h_tIndicating that a pause occurred;

a method of extracting significance signatures, comprising:

3. The foreign reservation student oriented speech evaluation method according to claim 1, wherein the method for extracting speech rate features comprises:

taking the speech rate L/T as a speech rate characteristic;

the method for extracting the voice basic feature comprises the following steps:

4. The foreign reservation student oriented speech evaluation method according to claim 1, wherein the method for extracting the syntactic analysis characteristics of the principal and predicate guest comprises the following steps:

taking the standard rate G/N of the structure of the text sentence as the syntactic analysis characteristic of the principal and predicate object;

the method for extracting the mixed sentence pattern features comprises the following steps:

5. The speech evaluation method for the foreign reservation student according to claim 1, wherein the method for extracting the proper feature of vocabulary use comprises:

using the text error number W as a word to determine whether the word is proper or not;

the method for extracting the language framework score features comprises the following steps:

6. The foreign reservation student oriented speech evaluation method according to claim 1, wherein the method for extracting the emotional health characteristic comprises the following steps:

taking the cosine distance as the health characteristic of the emotion;

the method for extracting whether the theme conforms to the characteristics comprises the following steps:

7. The foreign reservation student oriented speech evaluation method according to claim 1, wherein the method of extracting the spoken language expression thinking power feature comprises:

taking the sentence number Wr of the written language as the thinking characteristic of the spoken language expression;

the method for extracting the basic feature of the speech piece comprises the following steps:

taking the length of the identification text data as a basic score Q of a basic part of the user language, and regarding the identification text data, if the standard text length is L_standardThe standard basis of speech is score_{text_basic}Identifying the length of the text data as L, and calculating the basic score of the user language piece

8. The lecture evaluation method for foreign retention students according to claim 1, wherein the evaluation of the lecture voice data according to the quantifiable evaluation features to obtain the evaluation result of the lecture comprises:

9. The utility model provides a speech evaluation device towards foreign student, its characterized in that includes:

10. A speech evaluation system for foreign retention students is characterized by comprising:

the client is used for acquiring the speech data of the speaker;