CN109785698B

CN109785698B - Method, device, electronic equipment and medium for oral language level evaluation

Info

Publication number: CN109785698B
Application number: CN201711111300.7A
Authority: CN
Inventors: 林晖
Original assignee: Shanghai Liulishuo Information Technology Co ltd
Current assignee: Shanghai Liulishuo Information Technology Co ltd
Priority date: 2017-11-13
Filing date: 2017-11-13
Publication date: 2021-11-23
Anticipated expiration: 2037-11-13
Also published as: CN109785698A

Abstract

The embodiment of the invention provides a method for evaluating spoken language proficiency, which comprises the following steps: randomly extracting questions to be tested from a question bank; collecting voice data to be evaluated aiming at the to-be-evaluated question; acquiring corresponding text data to be evaluated and pronunciation characteristics to be evaluated according to the voice data to be evaluated; acquiring a first semantic correlation degree between the text data to be evaluated and the question to be evaluated; and obtaining a scoring result according to the first semantic relevance and the pronunciation characteristics to be scored. The method solves the problem that the semantic relevancy can not be directly calculated according to the question text and the voice data in the prior art, and can enable the user to carry out spoken language test or examination on the Internet, thereby greatly increasing the test and examination efficiency and improving the user experience. In addition, the embodiment of the invention also provides a medium, a device for evaluating the spoken language level and electronic equipment.

Description

Method, device, electronic equipment and medium for oral language level evaluation

Technical Field

The embodiment of the invention relates to the field of computer-aided education, in particular to a method, a device, an electronic device and a medium for spoken language proficiency evaluation.

Background

This section is intended to provide a background or context to the embodiments of the invention that are recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.

Currently, spoken language assessment is mostly performed manually, but manual assessment has the following disadvantages:

1. scoring is subjective: the scores are judged mainly according to the personal wishes of the examiners and are greatly different from one another.

2. The manpower cost is high: most of the artificial oral assessment needs to be performed by reservation and concentration, and is greatly limited by time, regions, people number, money cost and the like.

3. The specialty is low: the professional qualifications and levels of the panelists and instructors of the testing institutions are difficult to ensure.

4. Low efficiency, poor repeatability: most of the manual evaluation is one-to-many or a small number of appraisers face a large number of examinees, the proportion of real evaluation time is low, and meanwhile, the examinees cannot repeatedly review own examination questions to compare evaluation results.

There is also a method of analyzing the evaluation of a user's voice by a program.

Disclosure of Invention

However, the existing program evaluation has the following characteristics or disadvantages:

1. the accuracy is insufficient: the program evaluation on the market is influenced by recording equipment, environment, user accent and the like, and the success rate, namely the accuracy rate, of the recognition of the user voice is low; most accent modeling software even relies only on user praise to screen for good answers (as shown in fig. 8) without providing any evaluation.

2. The scoring dimension is single: the scoring dimension is mostly only evaluated according to the voice length and fluency, and cannot be evaluated according to the pronunciation, grammar, pause, vocabulary, semantic relevance and the like of the user.

3. The scoring efficiency is low: the scoring process is inefficient, and the process from the beginning of scoring to the generation of the assessment report takes a long time.

4. The analysis content is deficient: only spoken scoring is provided, and contents such as overall level evaluation, horizontal spoken comparison evaluation, spoken dimension evaluation, wrong question analysis, standard pronunciation, direction improvement and the like are lacked.

On the other hand, the method in the prior art mainly aims at the test with the standard reference answers or the test questions for scoring, but a large number of subjective questions exist in spoken language tests (such as Able), the subjective questions do not have the standard reference answers, and how to realize scoring through a machine is a technical problem to be solved urgently.

Therefore, in the prior art, how to realize machine scoring for subjective spoken language test questions without standard reference answers and how to realize comprehensive scoring and evaluation of the spoken language of the examinee from a plurality of different scoring dimensions is a very annoying technical problem.

Therefore, an improved technical scheme for spoken language level evaluation is very needed, and the embodiment of the invention randomly extracts the questions to be tested from the question bank; collecting voice data to be evaluated aiming at the to-be-evaluated question; acquiring corresponding text data to be evaluated according to the voice data to be evaluated; and acquiring a first semantic relevance between the text data to be evaluated and the question to be evaluated, so as to acquire a grading result according to the first semantic relevance.

In this context, embodiments of the present invention are intended to provide a method, medium, apparatus, and electronic device for spoken language proficiency assessment.

In a first aspect of the embodiments of the present invention, there is provided a method for spoken language proficiency evaluation, including: randomly extracting questions to be tested from a question bank; collecting voice data to be evaluated aiming at the to-be-evaluated question; acquiring corresponding text data to be evaluated and pronunciation characteristics to be evaluated according to the voice data to be evaluated; acquiring a first semantic correlation degree between the text data to be evaluated and the question to be evaluated; and obtaining a scoring result according to the first semantic relevance and the pronunciation characteristics to be scored.

In one embodiment of the invention, the method further comprises: and acquiring corresponding evaluation dimension and a scoring standard according to the type of the to-be-tested question.

In a further embodiment of the present invention, the evaluation dimension includes a grammar evaluation dimension and/or a vocabulary evaluation dimension and/or a pronunciation evaluation dimension and/or a fluency evaluation dimension, and the corresponding scoring criteria includes a grammar scoring criterion and/or a vocabulary scoring criterion and/or a pronunciation scoring criterion and/or a fluency scoring criterion, and the method further includes: obtaining grammar scores according to the text data to be evaluated and the grammar score standard; and/or acquiring a vocabulary score according to the text data to be evaluated and the vocabulary score standard; and/or acquiring pronunciation scores according to the pronunciation characteristics to be assessed and the pronunciation score standard; and/or acquiring fluency score according to the pronunciation characteristics to be assessed and the fluency score standard.

In yet another embodiment of the present invention, the method further comprises: and obtaining the scoring result according to the grammar scoring and/or the vocabulary scoring and/or the pronunciation scoring and/or the fluency scoring.

In yet another embodiment of the present invention, the method further comprises: acquiring a second semantic correlation degree between the text data to be evaluated and the standard answer of the question to be evaluated; and acquiring the scoring result according to the second semantic relevance.

In still another embodiment of the present invention, further comprising: analyzing the grading result to obtain an analysis result; and generating a comprehensive evaluation report according to the grading result and the analysis result.

In a second aspect of the embodiments of the present invention, there is provided a medium having a program stored thereon, the program, when executed by a processor, implementing the steps of the above method embodiments, for example, randomly extracting a question to be tested from a question bank; collecting voice data to be evaluated aiming at the to-be-evaluated question; acquiring corresponding text data to be evaluated and pronunciation characteristics to be evaluated according to the voice data to be evaluated; acquiring a first semantic correlation degree between the text data to be evaluated and the question to be evaluated; and obtaining a scoring result according to the first semantic relevance and the pronunciation characteristics to be scored.

In a third aspect of the embodiments of the present invention, there is provided an apparatus for spoken language proficiency evaluation, including: the question extraction module is used for randomly extracting questions to be detected from the question bank; the voice acquisition module is used for acquiring voice data to be evaluated aiming at the to-be-evaluated question; the voice recognition module is used for acquiring corresponding text data to be evaluated and pronunciation characteristics to be evaluated according to the voice data to be evaluated; the first relevancy calculation module is used for acquiring first semantic relevancy between the text data to be evaluated and the questions to be evaluated; and the scoring module is used for acquiring a scoring result according to the first semantic relevance and the pronunciation feature to be scored.

In one embodiment of the invention, the apparatus further comprises: and the dimension standard acquisition module is used for acquiring corresponding evaluation dimensions and scoring standards according to the types of the to-be-detected questions.

In a further embodiment of the invention, the evaluation dimension comprises a grammar evaluation dimension and/or a vocabulary evaluation dimension and/or a pronunciation evaluation dimension and/or a fluency evaluation dimension, and the corresponding scoring criteria comprise grammar scoring criteria and/or vocabulary scoring criteria and/or pronunciation scoring criteria and/or fluency scoring criteria. The scoring module further comprises a grammar scoring unit and/or a vocabulary scoring unit and/or a pronunciation scoring unit and/or a fluency scoring unit.

And the grammar scoring unit is used for acquiring grammar scores according to the text data to be scored and the grammar scoring standard.

And the vocabulary quantity scoring unit is used for acquiring vocabulary quantity scoring according to the text data to be scored and the vocabulary quantity scoring standard.

And the pronunciation scoring unit is used for acquiring pronunciation scores according to the pronunciation characteristics to be scored and the pronunciation scoring standard.

And the fluency scoring unit is used for acquiring fluency scores according to the pronunciation characteristics to be scored and the fluency scoring standard.

In yet another embodiment of the present invention, the scoring module further comprises a total scoring unit. The total evaluation unit is used for obtaining the evaluation result according to the grammar evaluation and/or the vocabulary evaluation and/or the pronunciation evaluation and/or the fluency evaluation.

In yet another embodiment of the present invention, the apparatus further comprises a second correlation calculation module. The second correlation technique module is used for obtaining a second semantic correlation between the text data to be evaluated and the standard answer of the question to be evaluated. The scoring module is further used for obtaining the scoring result according to the second semantic relevance.

In yet another embodiment of the present invention, the apparatus further comprises an analysis module and a report generation module. The analysis module is used for analyzing the grading result to obtain an analysis result. And the report generating module is used for generating a comprehensive evaluation report according to the grading result and the analysis result.

In a fourth aspect of embodiments of the present invention, there is provided an electronic apparatus, mainly including: a memory for storing a computer program; a processor for executing a computer program stored in the memory, and when the computer program is executed, the following instructions are executed: randomly extracting questions to be tested from a question bank; collecting voice data to be evaluated aiming at the to-be-evaluated question; acquiring corresponding text data to be evaluated and pronunciation characteristics to be evaluated according to the voice data to be evaluated; acquiring a first semantic correlation degree between the text data to be evaluated and the question to be evaluated; and obtaining a scoring result according to the first semantic relevance and the pronunciation characteristics to be scored.

According to the method, the medium, the device and the electronic equipment for evaluating the spoken language level, which are provided by the embodiment of the invention, the questions to be tested are randomly extracted from the question bank; collecting voice data to be evaluated aiming at the to-be-evaluated question; acquiring corresponding text data to be evaluated and pronunciation characteristics to be evaluated according to the voice data to be evaluated; and acquiring a first semantic relevancy between the text data to be evaluated and the question to be evaluated, wherein the first semantic relevancy and the pronunciation characteristic to be evaluated are used for acquiring a scoring result of the user.

In addition, according to some embodiments of the invention, the method for evaluating the spoken language level obtains the score of the user by calculating the first semantic relevance between the question to be evaluated and the voice data to be evaluated and the pronunciation characteristic to be evaluated, solves the problem that the semantic relevance cannot be directly calculated according to the question text and the voice data in the prior art, can be applied to machine scoring of subjective test questions without standard reference answers, improves the application range and accuracy of the machine scoring, and is beneficial to popularization of the machine scoring in various spoken language examinations or evaluation. Meanwhile, according to other embodiments of the invention, the method also performs comprehensive evaluation and scoring on the user voice through dimensions such as pronunciation, fluency, grammar and vocabulary, and solves several problems of the conventional spoken language level test: the method has the advantages of fuzzy standard, strong evaluation subjectivity, low intelligent automation degree, poor scientific performance of the scoring algorithm and the like, and is an important breakthrough in the field of spoken language evaluation.

Drawings

The above and other objects, features and advantages of exemplary embodiments of the present invention will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the invention are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:

FIG. 1 schematically illustrates an application scenario in which embodiments of the present invention may be implemented;

FIG. 2 schematically illustrates a flow diagram of a method for spoken language proficiency profiling, in accordance with an embodiment of the present invention;

FIG. 3 schematically illustrates a flow diagram of a method for spoken language proficiency profiling in accordance with yet another embodiment of the present invention;

FIG. 4 schematically illustrates an architectural diagram for spoken language proficiency profiling, in accordance with an embodiment of the present invention;

FIG. 5 is a schematic diagram illustrating an architecture of an apparatus for spoken language proficiency testing according to an embodiment of the present invention;

FIG. 6 schematically shows a structural diagram of an electronic device according to an embodiment of the invention;

FIG. 7 schematically shows a schematic view of a medium according to an embodiment of the invention;

fig. 8 is a schematic interface diagram for oral level evaluation in the prior art.

In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.

Detailed Description

The principles and spirit of the present invention will be described with reference to a number of exemplary embodiments. It is understood that these embodiments are given solely for the purpose of enabling those skilled in the art to better understand and to practice the invention, and are not intended to limit the scope of the invention in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

As will be appreciated by one skilled in the art, embodiments of the present invention may be embodied as an apparatus, method or computer program product. Accordingly, the present disclosure may be embodied in the form of: entirely hardware, or entirely software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.

According to the embodiment of the invention, a method, a device, equipment and a medium for spoken language proficiency evaluation are provided.

In this document, it is to be understood that any number of elements in the figures are provided by way of illustration and not limitation, and any nomenclature is used for differentiation only and not in any limiting sense. The principles and spirit of the present invention are explained in detail below with reference to several representative embodiments of the invention.

Summary of The Invention

The inventor finds that in the prior art, for oral examinations such as english oral examinations, mainly, answer recordings of examinees are converted into text contents, then the converted text contents are compared with texts of standard reference answers provided in advance, some keywords are generally extracted from the standard reference answers in advance, and the keywords are matched with the converted text contents, so that generally, the higher the number of matching is, the higher the score is, but, such a method has at least the following problems that on one hand, a large number of subjective test questions exist in many oral examinations such as jazz examinations, the subjective test questions do not exist in standard reference answers, and thus machine scoring cannot be performed by adopting the scheme in the prior art; on the other hand, even for the examination questions having the standard reference answers, since they are generally scored by the keyword matching method, it is highly likely that the examinees can obtain high scores by reciting some general paragraphs in advance, in which the keywords of the standard reference answers are included, but the answers thereof do not actually meet the setting situation of the examination questions, but thus the examinees who have been subjected to the recital can also obtain high scores, deviating from the original intention of setting subjective examination questions for the spoken examination. Meanwhile, the evaluation dimension and the scoring standard of the machine in the prior art are too single, and evaluation and scoring cannot be performed from multiple dimensions.

Therefore, aiming at the technical problem of inaccurate scoring in the prior art, the invention provides a method, a medium, a device and electronic equipment for spoken language level evaluation, which randomly extract the questions to be tested from a question bank; collecting voice data to be evaluated aiming at the to-be-evaluated question; acquiring corresponding text data to be evaluated and pronunciation characteristics to be evaluated according to the voice data to be evaluated; and acquiring a first semantic relevancy between the text data to be evaluated and the question to be evaluated, wherein the first semantic relevancy and the pronunciation characteristic to be evaluated can be utilized to obtain a scoring result, so that the user can take a spoken language test on the Internet, the test efficiency is greatly improved, and the user experience is improved. Meanwhile, the method for evaluating the spoken language level can be applied to machine scoring of subjective test questions without standard reference answers, the application range and accuracy of the machine scoring are improved, and the popularization of the machine scoring in various spoken language examinations or test and evaluation is facilitated. In addition, according to other embodiments of the invention, the method also scores the comprehensive evaluation of the user voice through several dimensions such as pronunciation, fluency, grammar and vocabulary

Having described the general principles of the invention, various non-limiting embodiments of the invention are described in detail below.

Application scene overview

Referring initially to FIG. 1, an application scenario in which embodiments of the present invention may be implemented is schematically illustrated.

In fig. 1, each of the terminal device 1, the terminal device 2, and … … is installed with an Application program capable of accessing a page provided by an online spoken language proficiency evaluation provider (for example, english fluent speech), for example, in a case where the terminal device 1 is represented by a desktop computer or a notebook computer, the terminal device 1 is installed with an Application program such as an Application client or a browser capable of accessing the page provided by the online spoken language proficiency evaluation provider, and in a case where the terminal device 2 is represented by an intelligent mobile phone or a tablet computer, the terminal device 2 is installed with an Application program such as an APP (Application) or a browser capable of accessing the page provided by the online spoken language proficiency evaluation provider; different users can access pages provided by an online spoken language level evaluation provider in a corresponding server by using corresponding application programs installed in terminal equipment of the users, so that the users can check the information provided by the online spoken language level evaluation provider, such as randomly extracting a to-be-evaluated question from a question bank, collecting to-be-evaluated voice data aiming at the to-be-evaluated question, and acquiring corresponding to-be-evaluated text data, to-be-evaluated pronunciation characteristics and the like according to the to-be-evaluated voice data; furthermore, different users can execute corresponding oral level evaluation flow operation based on corresponding pages provided by the online oral level evaluation provider according to actual requirements and the information of the corresponding oral test known by the users, so as to obtain corresponding scoring results provided by the online oral level evaluation provider. However, those skilled in the art will fully appreciate that the applicable scenarios for embodiments of the present invention are not limited in any way by this framework.

The scenario applying the method for spoken language proficiency evaluation in the embodiment of the present invention may include a client (e.g., terminal device 1, terminal device 2, … … terminal device n shown in fig. 1) and a server, which are connected in communication (wirelessly and/or by wire).

The client side of the invention can be a computer, a tablet computer, a high-end smart phone and the like, and the client side has an independent audio and video playing function and an independent audio input device. The client is mainly responsible for interaction between a user and the system, and achieves the display functions of collecting voice information (for example, a recording plug-in can be called through a webpage to record and generate an audio file in a wav format), playing test voice and standard voice which are respectively stored locally and on the server at the client, transmitting the audio file in the wav format to the server and a corpus text, grading results, and comprehensively evaluating reports such as pronunciation feedback guide opinions. The client can be used for the examinee to perform spoken language level evaluation, including test question issuing, evaluation, rolling and the like, process and transmit the examinee answering audio to the server, and convert the format of the examinee answering audio and extract the characteristics of the examinee answering audio. After the evaluation is finished, the evaluation result of the examinee, namely the evaluation result (or a comprehensive evaluation report can be included) can also be published on the client. The answer results of the examinees uploaded by the client can comprise one or two of oral evaluation results of reading questions (objective test questions) and oral evaluation results of spontaneous oral expression questions (subjective test questions).

The server is mainly responsible for sorting and collecting evaluation results, distributing test papers and automatically scoring by a machine, the evaluation information is output to the client through the communication module, the test papers are provided to the client at specific time and the evaluation time is controlled, examinees are collected from the client to serve as answer audios, the answer papers of the examinees are identified, decoded and scored, and the evaluation results are timely fed back to the client through the communication module after the scoring is finished. The server has the functions of corpus collection, voice signal preprocessing, voice recognition, pronunciation quality scoring and the like. According to the scale of the examinees and the calculation task amount, the server can select a mode of building a computer cluster by a plurality of high-performance computers so as to accelerate the scoring and decoding. After the evaluation is finished, the answer information of the examinees and the scoring conditions of the examinees are analyzed and processed in a centralized mode, the information of the total scores, the individual scores, the ranking and the like of the examinees is counted, and teachers and students can inquire the information of the total scores, the individual scores, the ranking and the like of the examinees at any time.

The system may contain three different rights roles: examinees, teachers and managers, wherein the examinees are mainly responsible for evaluation and answering; the teacher is mainly responsible for making the paper, issuing the evaluation, managing the evaluation and checking the evaluation result; the administrator is mainly responsible for management of evaluation, time control of test paper distribution and maintenance of the whole evaluation system.

Exemplary method

A method for spoken language proficiency profiling according to an exemplary embodiment of the present invention is described below with reference to fig. 2-4 in conjunction with the application scenario illustrated in fig. 1. It should be noted that the above application scenarios are merely illustrated for the convenience of understanding the spirit and principles of the present invention, and the embodiments of the present invention are not limited in this respect. Rather, embodiments of the present invention may be applied to any scenario where applicable.

The method of the embodiment of the invention can comprise the following steps: step S200, step S210, step S220, step S230, and step S240; optionally, the method according to the embodiment of the present invention may further include: step S300, step S310, and step S320.

Referring to fig. 2, a flow chart of a method for spoken language proficiency testing according to an embodiment of the present invention is schematically shown, and the method is generally performed in a device capable of running a computer program, for example, a desktop computer or a server, and of course, a notebook computer or even a tablet computer.

In step S200, the test question is randomly extracted from the question bank.

By way of example, the spoken language proficiency evaluation according to the embodiments of the present invention may be a spoken language proficiency evaluation in any language, such as english, chinese, french, german, russian, and the like, and the spoken language proficiency evaluation may be a spoken language proficiency simulation test performed through an online website or an application program, or may be a formal spoken language proficiency test. In the following embodiments, the oral english evaluation such as the jazz test is taken as an example for illustration, but the disclosure is not limited thereto. Correspondingly, different test question banks can be provided for different languages and different spoken language test types, for example, the Yasi test has the Yasi test question bank, and when an examinee or a user logs in the system, the test question is randomly selected from the test question bank.

In step S210, speech data to be evaluated for the subject to be evaluated is collected.

In the embodiment of the invention, also taking the Abbe test as an example, after the system randomly selects the to-be-tested question of the current test or the test of the user from the corresponding question bank, the test or the user starts to answer, and the voice data to be evaluated of each to-be-tested question of the test or the user is recorded by the client and can be uploaded to the server.

In a preferred embodiment, the method may further comprise: and preprocessing the collected voice data to be evaluated aiming at the questions to be evaluated, and processing the voice data to be evaluated into a data format meeting the requirement of a machine scoring system.

In step S220, corresponding text data to be evaluated and pronunciation characteristics to be evaluated are obtained according to the speech data to be evaluated.

In the embodiment of the invention, the speech data to be evaluated can be converted into the corresponding text data to be evaluated and the pronunciation characteristics to be evaluated by adopting an automatic speech recognition technology. Specific automatic speech recognition techniques can be found in the prior art and will not be described further herein.

In step S230, a first semantic relevance between the text data to be evaluated and the question to be evaluated is obtained.

As an example, the method of the embodiment of the present invention may further include: and dividing the questions to be tested into subjective test questions and objective test questions according to the types of the questions. The objectivity test question refers to an examination test question with a standard reference answer, that is, the text data to be evaluated and the corresponding standard reference answer when the user or the examinee answers need to be completely consistent to obtain a full score or a high score, for example, a reading question in a spoken language examination, that is, an English material is given, and the examinee reads the text data. The subjective test questions are examination questions without standard reference answers, which can be freely expressed by each examinee or user, or provide one or more reference answers, but the answers of the examinees are not required to be consistent, for example, in a spoken language examination, the examinee is required to state a matter which the examinee thinks is successful in English.

In a preferred embodiment, the method comprises: aiming at the subjective test questions, acquiring text data to be evaluated of the subjective test questions; and calculating a first semantic correlation degree between the text data to be evaluated of the subjective test question and the corresponding question to be evaluated.

As an example, the first semantic relevance may be obtained by: calculating the semantic relevancy score of each word in the text data to be evaluated of the subjective test question and each word in the corresponding question to be evaluated; calculating the semantic relevancy score of each word in the text data to be evaluated of the subjective test question and each sentence in the corresponding question to be evaluated; calculating the maximum value/average value of the semantic relevancy score of each word in the text data to be evaluated of the subjective question and each sentence in the corresponding question to be evaluated as the semantic relevancy between the word and the sentence; and calculating a first semantic relevancy score between the text data to be evaluated of the subjective test question and the corresponding question to be evaluated.

In step S240, a scoring result is obtained according to the first semantic relevance and the pronunciation feature to be scored.

As an example, in the embodiment of the present invention, the higher the first semantic relevance is, the higher the corresponding score is; conversely, the lower the first semantic relevance, the lower the corresponding score. However, the present disclosure is not limited thereto.

According to the method for evaluating the spoken language level, which is provided by the embodiment of the invention, the corresponding spoken language score of the examinee can be obtained by calculating the relevancy score condition between the question to be evaluated and the voice data answered by the examinee, so that the method can be applied to the machine evaluation of subjective examination questions without standard reference answers, a user can perform online simulation evaluation or examination through the Internet, the efficiency and the accuracy of spoken language level evaluation are greatly improved, and the phenomenon that the examinee remembers some text examinations due to various keywords or keywords in a rote mode in the prior art is avoided.

Although the subjective examination questions are described as an example in the above embodiment, the method according to the embodiment of the present invention may be applied to objective examination questions. Thus, even if the answer of the examinee partially matches the keyword given in the examination paper standard, the entire answer does not match the plot, and the score cannot be obtained or is high.

By way of example, the pronunciation characteristics to be assessed may include pronunciation accuracy, pronunciation fluency, and the like of the examinee, which are not limited by the present disclosure.

Further, the pronunciation characteristics to be assessed may include: fundamental frequency features, formants, speech rate, average energy, etc.

In another embodiment of the present invention, the method may further include: and acquiring corresponding evaluation dimension and a scoring standard according to the type of the to-be-tested question.

For example, each test question can be divided into grammar test questions, vocabulary test questions, pronunciation test questions, fluency test questions, and other types. Correspondingly, different evaluation dimensions and grading standards are set according to different types of the to-be-tested questions.

In a further embodiment of the present invention, the evaluation dimension may include a grammar evaluation dimension and/or a vocabulary evaluation dimension and/or a pronunciation evaluation dimension and/or a fluency evaluation dimension, and the corresponding scoring criteria may include a grammar scoring criterion and/or a vocabulary scoring criterion and/or a pronunciation scoring criterion and/or a fluency scoring criterion.

In a preferred embodiment, the method may further comprise: obtaining grammar scores according to the text data to be evaluated and the grammar score standard; and/or acquiring a vocabulary score according to the text data to be evaluated and the vocabulary score standard; and/or acquiring pronunciation scores according to the pronunciation characteristics to be assessed and the pronunciation score standard; and/or acquiring fluency score according to the pronunciation characteristics to be assessed and the fluency score standard.

For example, the grammar score may be scored according to the number of tenses in the text data to be scored, whether or not the application of each tense is correct, and the like, and in the above examination question requiring an examination in english to state a matter which is considered to be successful by itself, the main use is the past time. For another example, the vocabulary scoring may be performed according to the vocabulary richness, whether the vocabulary expression is appropriate, and the like in the text data to be scored. In general, the richer the vocabulary, the higher the vocabulary score, although the disclosure is not so limited.

The pronunciation scoring mainly examines whether content information of a pronunciation sentence is complete and accurate, whether pronunciation is clear and fluent, and whether pronunciation errors exist. Specifically, the pronunciation score can be obtained by calculating pronunciation accuracy, and the pronunciation accuracy method can refer to the prior art and is not described in detail herein. For example, a deep learning algorithm may be used to evaluate the speech accuracy of the segment, so as to obtain the pronunciation score of the speech data to be evaluated.

Specifically, the fluency score may be obtained through a speech rate feature, a short pause duration feature, and the like. Wherein, the speech rate feature can be obtained by the following steps: counting the frame number corresponding to each phoneme in the speech data to be evaluated according to the pronunciation characteristics to be evaluated; and obtaining the speech rate characteristics by utilizing the ratio of the total number of the phonemes to the duration of all the phonemes. Wherein the short pause duration characteristic can be obtained by: counting the frame number corresponding to each phoneme in the speech data to be evaluated and the total frame number of the audio by using the pronunciation characteristics to be evaluated; and obtaining the short-time pause duration characteristics by utilizing the ratio of the comprehensive duration of all audio short-time pauses to the total pronunciation duration.

In yet another embodiment of the present invention, the method may further include: and obtaining the scoring result according to the grammar scoring and/or the vocabulary scoring and/or the pronunciation scoring and/or the fluency scoring.

In yet another embodiment of the present invention, the method may further include: acquiring a second semantic correlation degree between the text data to be evaluated and the standard answer of the question to be evaluated; and acquiring the scoring result according to the second semantic relevance.

As an example, the second semantic relevance may include a semantic similarity and a syntactic structure similarity.

Specifically, the semantic similarity may be obtained by: calculating the semantic similarity score of each word in the text data to be evaluated and each word in the standard reference answer; calculating the semantic similarity score of each word in the text data to be evaluated and each sentence in the standard reference answer; calculating the maximum value/average value of semantic similarity score in each word in the text data to be evaluated and each sentence in the standard reference answer as the similarity score between the word and the sentence; and calculating a similarity score between the text data to be evaluated and the standard reference answer.

Specifically, the grammar structure similarity may be obtained by the following method: respectively establishing a grammar sequence vector for each sentence of the text data to be evaluated; respectively solving the grammatical structure similarity score of each sentence in the text data to be evaluated and each sentence in the standard reference answer, and taking the maximum value of the grammatical structure similarity score of each sentence in the text data to be evaluated as the grammatical structure similarity score of the sentence; and calculating the grammar structure similarity characteristic between the answer of the examinee and the standard reference answer by carrying out weighted average on the grammar structure similarity score of each sentence in the text data to be evaluated.

Compared with the traditional spoken language evaluation of reading questions, the spoken language evaluation method for spoken language level evaluation can not only be used for spoken language evaluation of the reading questions, but also be used for spoken language evaluation of spontaneous spoken language expression questions; the scoring is more comprehensive and fair, the pronunciation accuracy and fluency of the examinee can be inspected under the condition that the examinee expresses spontaneously, and the actual spoken language level of the examinee can be reflected better; the answer texts of the examinees are not limited any more, and the automatic scoring and evaluating question types are not limited to reading questions, so that the understanding, the application and the expression ability of the examinees to the language under the condition of spontaneous spoken language expression can be examined; therefore, the semantic relevance of the oral language of the examinee can be investigated, the grammar level of the oral language of the examinee can be investigated, the efficient operation of evaluation can be guaranteed, meanwhile, the resources of the whole system are fully utilized, the evaluation efficiency of the oral language organization is greatly improved, and a large amount of manpower and material resources are saved. Meanwhile, the scoring dimensionality of the method provided by the embodiment of the invention is scientific and diversified, and the scoring process efficiency is high.

In yet another embodiment of the present invention, the method may further include: analyzing the grading result to obtain an analysis result; and generating a comprehensive evaluation report according to the grading result and the analysis result.

By way of example, the comprehensive evaluation report in the embodiment of the invention may include overall level evaluation and lateral comparison based on big data, and add detailed analysis of specific problems (error question versus example voice, reason analysis and improvement suggestion, etc.).

A specific example of the method for spoken language proficiency evaluation according to the embodiment of the present invention is described below with reference to fig. 3 and 4.

In step S300, recording and resource allocation are performed. Referring to fig. 4, the recording process and resource allocation phase may include the following steps.

S301, after the user recording is collected, the voice data input by the user is obtained, and the voice content of the user can be quickly recognized through an ASR (Automatic Speech Recognition) technology, and the TEXT corresponding to the voice, such as TEXT answer words and user pronunciation characteristics, can be output.

S302, after the topics are extracted, corresponding configuration files are generated according to different topic categories and grading standards. The configuration file determines the number of analyzers (analyzers) required, the number of features (features) corresponding to each Analyzer (Analyzer) or scoring dimension, and the content of Feedback (Feedback).

By way of example, the configuration file in the embodiment of the present invention is based on the question types, and the dimensions of all question types needing to be fed back in advance, such as the question specially aiming at grammar practice, are not fed back with the scores of pronunciation dimensions. There are various themes in the profile that describe which features to extract.

S303, resource manager (resource manager): the content may include four parts of Model (scoring Model + Other algorithm Model), Data (external Resources used in the evaluation process such as dictionaries), Question Database and Other Resources.

The scoring model is a statistical model, and contains a large number of parameters, such as any one or more of a linear regression method, case inference based learning, association rule learning, a neural network or a support vector machine. Other algorithmic models may include, for example, audio quality detection. Other resources such as the yasi vocabulary, question bank, etc.

Specifically, the scoring model may be obtained by the following method: selecting a plurality of examinees to perform the process of the following five steps, and then combining the characteristics with the teacher score to perform automatic scoring model training to form a scoring model; collecting examinee answering audio; extracting acoustic characteristics of the examinee answering audio to obtain an acoustic model, and obtaining a language model according to the question information and the training text; decoding the answer audio of the examinee according to the established acoustic model and language model to obtain an identification result; extracting features in the recognition result; training the scoring model according to the extracted features. The scoring model obtained by the training can then be used for automatic scoring.

In step S310, the processing stage is analyzed. The analytical processing stage may comprise the following steps.

S311, preprocessing (Pre processing) is carried out on the text corresponding to the voice and the user pronunciation characteristics generated in the step S300, information is sorted and screened, and a screening result is submitted to an Analyzer (Analyzer) for analysis. Such as punctuation, syntactic analysis, etc., of the recognized text.

Wherein the pre-processing may include: randomly dividing an English spoken language audio file to be evaluated into equal-length slices for 5 seconds for example; then, pre-emphasis, voice framing, windowing and endpoint detection are carried out on all audio slices, time domain analysis (analyzing and extracting time domain characteristic parameters in the audio slices, which can comprise short-time energy and short-time average amplitude, short-time average zero-crossing rate, short-time autocorrelation coefficient and short-time average amplitude difference), frequency domain analysis (which can extract frequency spectrum, power spectrum, cepstrum and spectrum envelope of the audio slices by a band-pass filter bank method, a short-time Fourier transform method, a frequency domain fundamental tone detection method and a time-frequency representation method) and cepstrum domain analysis (which further effectively separates glottal excitation information and vocal tract response information, wherein the glottal excitation information is used for judging voiced sound and solving fundamental tone period, and the vocal tract response information is used for solving formant, coding, synthesis, recognition for speech); the acoustic parameters of the audio slice are analyzed and calculated, and the acoustic parameters comprise MEL frequency cepstral coefficients, linear prediction cepstral coefficients and line spectrum pair coefficients.

S312, generating a corresponding analyzer and required analysis characteristics according to the resource management program in the step S300 and the configuration file generated according to the question type.

Referring to fig. 4, for example, feature example 1(feature instance 1), feature example 2(feature instance 2), … feature example n (feature instance n) of analyzer 1(analyzer 1); feature example 1(feature instance 1), feature example 2(feature instance 2), … feature example n (feature instance n) of analyzer 2(analyzer 2); …, respectively; feature example 1(feature instance 1), feature example 2(feature instance 2), and … feature example n (feature instance n)) of analyzer m (analyzer m). Wherein m and n are both positive integers greater than or equal to 1.

S313, inputting the preprocessed voice files and the preprocessed features into an algorithm analysis system for feature analysis, and simultaneously carrying out analysis processing on different features of a plurality of subjects in parallel (multithreading); after the processing is finished, the obtained feedback features are put into a Feature Manager (Feature Manager) for arrangement.

Wherein, the characteristic analysis is a process of scoring through an algorithm model. The feature manager may sort by the type of feedback, such as which type of syntax errors it belongs to.

For example, as shown in fig. 4, Feature 1(Feature 1): name, value, info; feature 2(Feature 2): name, value, info; … feature n (feature n): name, value, info.

In step S320, the feedback phase is scored and evaluated. The scoring and evaluation feedback phase may include the following steps.

S321, Scoring according to the configuration file in the step S300 and the feature manager in the step S310, and performing total Scoring (Scoring) on the user answers according to the corresponding algorithm according to the Scoring dimension scores;

specifically, the scoring is performed based on the features extracted in step S310.

For example, for each dimension of jassia, a corresponding feature value is fed into a corresponding scoring model to calculate a result of each dimension. The scoring model is a set of parameters, which may be an artificial neural network or other statistical learning model. After the resulting four-dimensional scores, the total score may be the average of the four-dimensional scores. For example, the score of fluent & Coherence 5, the score of Lexical Resource 6, the score of Grammar 6, the score of Pronunciation 4, and the score of Overall 5.0.

And S322, a Feedback Manager (Feedback Manager) evaluates and extracts the scores of the feature Manager and the user according to the configuration file, the feature Manager and the evaluation Feedback table, and extracts overall evaluation and corresponding analysis according to the scores and the feature Manager.

And S323, evaluating and generating a final evaluation report, namely final feedback (feedbacks) according to the configuration file and the score.

The method of the embodiment of the invention can bring the following help to the applied products (such as the Yasi flow book, the Zhongkaihui book and the like):

1. providing professional and detailed oral evaluation reports:

through the application of the method in the relevant product evaluation process, a unique proprietary evaluation report can be formed for each user, and scientific and complete evaluation can be performed for the contents of pronunciation overall evaluation and transverse comparison, each dimension score and evaluation, specific error recording reproduction and standard answer comparison, error type analysis, improvement suggestion and the like of each user, so that all users can fully know the aspects of the spoken language ability of the users.

2. The spoken language evaluation efficiency is greatly improved:

A. the evaluation of each dimension of large-space voice data is completed in a very short time:

in the flow of the method in the embodiment of the invention, the evaluation engine can simultaneously operate the characteristic analyzer for analyzing a plurality of evaluation dimensions, so that the evaluation, scoring and evaluation feedback of the plurality of evaluation dimensions are simultaneously completed. The whole examination recording of the Yasi model examination and all the recordings of any set of questions of the middle and high school examinations in the same large-length recording can be evaluated within 30 minutes and analysis reports can be displayed.

B. The evaluation process has high compatibility and iteration:

the method in the embodiment of the invention can be compatible with various different types of spoken language question types and spoken language answer records with various lengths, and can be updated at any time when the scoring model and the scoring standard are updated iteratively, thereby avoiding the inefficiency and the cost increase caused by the fact that the scoring process and algorithm need to be replaced due to the difference of the question types and the scoring standards.

C. The user experience is improved, and the core competitiveness of the product is formed:

the product applied by the method in the embodiment of the invention has the advantages of professionalism, high efficiency and simplicity in operation of contents such as evaluation, practice, simulated examination and the like, greatly improves the use experience of users, is obviously different from the market products, and becomes an important component of the core competitiveness of the product.

Exemplary devices

Having described the method of the exemplary embodiment of the present invention, the apparatus for spoken language proficiency testing of the exemplary embodiment of the present invention will next be described with reference to fig. 5.

Referring to fig. 5, a schematic structural diagram of an apparatus for spoken language proficiency evaluation according to an embodiment of the present invention is schematically shown, where the apparatus is generally disposed in a device that can run a computer program, for example, the apparatus in the embodiment of the present invention may be disposed in a desktop computer or a server, and of course, the apparatus may also be disposed in a notebook computer or even a tablet computer.

The device of the embodiment of the invention mainly comprises: a topic extraction module 500, a speech collection module 510, a speech recognition module 520, a first relevancy calculation module 530 and a scoring module 540. The following describes each module included in the apparatus.

The question extraction module 500 can be used for randomly extracting questions to be tested from the question bank.

The voice collecting module 510 can be used for collecting the voice data to be evaluated for the subject to be evaluated.

The voice recognition module 520 may be configured to obtain corresponding text data to be evaluated and pronunciation characteristics to be evaluated according to the voice data to be evaluated.

The first relevancy calculation module 530 may be configured to obtain a first semantic relevancy between the text data to be evaluated and the question to be evaluated.

The scoring module 540 may be configured to obtain a scoring result according to the first semantic relevance and the pronunciation feature to be scored.

In a preferred embodiment of the present invention, the apparatus may further include: and the dimension standard acquisition module is used for acquiring corresponding evaluation dimensions and scoring standards according to the types of the to-be-detected questions.

In a further preferred embodiment of the present invention, the evaluation dimension may comprise a grammar evaluation dimension and/or a vocabulary evaluation dimension and/or a pronunciation evaluation dimension and/or a fluency evaluation dimension, and the corresponding scoring criteria comprise grammar scoring criteria and/or vocabulary scoring criteria and/or pronunciation scoring criteria and/or fluency scoring criteria. The scoring module 540 may further include a grammar scoring unit and/or a vocabulary scoring unit and/or a pronunciation scoring unit and/or a fluency scoring unit.

The grammar scoring unit may be configured to obtain a grammar score according to the text data to be scored and the grammar scoring standard.

The vocabulary scoring unit may be configured to obtain a vocabulary score according to the text data to be scored and the vocabulary scoring criteria.

The pronunciation scoring unit can be used for acquiring pronunciation scores according to the pronunciation characteristics to be scored and the pronunciation scoring standard.

The fluency scoring unit can be used for acquiring fluency scores according to the pronunciation characteristics to be scored and the fluency scoring standard.

In still another preferred embodiment of the present invention, the scoring module 540 may further include a total scoring unit. Wherein, the total scoring unit may be configured to obtain the scoring result according to the grammar score and/or the vocabulary score and/or the pronunciation score and/or the fluency score.

In still another preferred embodiment of the present invention, the apparatus may further include a second correlation calculation module. The second correlation technique module can be used for obtaining a second semantic correlation between the text data to be evaluated and the standard answer of the question to be evaluated. Wherein the scoring module 540 is further configured to obtain the scoring result according to the second semantic relevance.

In still another preferred embodiment of the present invention, the apparatus may further include an analysis module and a report generation module. The analysis module may be configured to analyze the scoring result to obtain an analysis result. The report generation module can be used for generating a comprehensive evaluation report according to the grading result and the analysis result.

The specific operations performed by each module and/or unit may refer to the descriptions of each step in the above method embodiments, and are not repeated here.

FIG. 6 illustrates a block diagram of an exemplary computer system/server 60 suitable for use in implementing embodiments of the present invention. The computer system/server 60 shown in FIG. 6 is only an example and should not be taken to limit the scope of use and functionality of embodiments of the present invention in any way.

As shown in fig. 6, the computer system/server 60 is embodied in the form of a general purpose electronic device. The components of computer system/server 60 may include, but are not limited to: one or more processors or processing units 601, a system memory 602, and a bus 603 that couples various system components including the system memory 602 and the processing unit 601.

Computer system/server 60 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computer system/server 60 and includes both volatile and nonvolatile media, removable and non-removable media.

The system memory 602 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)6021 and/or cache memory 6022. The computer system/server 60 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, ROM 6023 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 6, but typically referred to as a "hard disk drive"). Although not shown in FIG. 6, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to the bus 603 by one or more data media interfaces. At least one program product may be included in system memory 602 with a set (e.g., at least one) of program modules configured to perform the functions of embodiments of the present invention.

A program/utility 6025 having a set (at least one) of program modules 6024 may be stored, for example, in the system memory 602, and such program modules 6024 include, but are not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment. Program modules 6024 generally carry out the functions and/or methodologies of embodiments of the invention as described herein.

The computer system/server 60 may also communicate with one or more external devices 604, such as a keyboard, pointing device, display, etc. Such communication may occur via input/output (I/O) interfaces 605. Also, the computer system/server 60 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via the network adapter 606. As shown in FIG. 6, network adapter 606 communicates with other modules of computer system/server 60, such as processing unit 601, via bus 603. It should be appreciated that although not shown in FIG. 6, other hardware and/or software modules may be used in conjunction with computer system/server 60.

The processing unit 601 executes various functional applications and data processing, for example, instructions for implementing the steps in the above-described method embodiments, by executing computer programs stored in the system memory 602; in particular, the processing unit 601 may execute a computer program stored in the system memory 602, and when the computer program is executed, the following instructions are executed: randomly extracting questions to be tested from a question bank; collecting voice data to be evaluated aiming at the to-be-evaluated question; acquiring corresponding text data to be evaluated and pronunciation characteristics to be evaluated according to the voice data to be evaluated; acquiring a first semantic correlation degree between the text data to be evaluated and the question to be evaluated; and obtaining a scoring result according to the first semantic relevance and the pronunciation characteristics to be scored.

The specific operations performed by the instructions may be referred to in the description of the steps in the above method embodiments, and the description is not repeated here.

A specific example of the medium according to the embodiment of the present invention is shown in fig. 7.

The medium of fig. 7 is an optical disc 700, on which a computer program (i.e., a program product) is stored, which when executed by a processor, implements the steps described in the above method embodiments, for example, randomly extracting a test question from a question bank; collecting voice data to be evaluated aiming at the to-be-evaluated question; acquiring corresponding text data to be evaluated and pronunciation characteristics to be evaluated according to the voice data to be evaluated; acquiring a first semantic correlation degree between the text data to be evaluated and the question to be evaluated; obtaining a scoring result according to the first semantic relevance and the pronunciation characteristics to be scored; the specific implementation of each step is not repeated here.

It should be noted that although in the above detailed description several modules and/or units of the apparatus for spoken language proficiency are mentioned, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the modules and/or units described above may be embodied in one module and/or unit according to embodiments of the invention. Conversely, the features and functions of one module and/or unit described above may be further divided into embodiments by a plurality of modules and/or units.

Moreover, while the operations of the method of the invention are depicted in the drawings in a particular order, this does not require or imply that the operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.

While the spirit and principles of the invention have been described with reference to several particular embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, nor is the division of aspects, which is for convenience only as the features in such aspects may not be combined to benefit. The invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims

1. A method for spoken language proficiency assessment, comprising:

randomly extracting questions to be tested from a question bank;

collecting voice data to be evaluated aiming at the to-be-evaluated question;

acquiring corresponding text data to be evaluated and pronunciation characteristics to be evaluated according to the voice data to be evaluated;

acquiring a first semantic correlation degree between the text data to be evaluated and the question to be evaluated;

and obtaining a scoring result according to the first semantic relevance and the pronunciation characteristics to be scored.

2. The method of claim 1, further comprising: and acquiring corresponding evaluation dimension and a scoring standard according to the type of the to-be-tested question.

3. The method according to claim 2, wherein the evaluation dimension comprises a grammar evaluation dimension and/or a vocabulary evaluation dimension and/or a pronunciation evaluation dimension and/or a fluency evaluation dimension, the respective scoring criteria comprise grammar scoring criteria and/or vocabulary scoring criteria and/or pronunciation scoring criteria and/or fluency scoring criteria, the method further comprising:

obtaining grammar scores according to the text data to be evaluated and the grammar score standard; and/or

Acquiring a vocabulary score according to the text data to be evaluated and the vocabulary score standard; and/or

Acquiring pronunciation scores according to the pronunciation characteristics to be assessed and the pronunciation score standard; and/or

And acquiring fluency score according to the pronunciation characteristics to be assessed and the fluency score standard.

4. The method of claim 3, further comprising: and obtaining the scoring result according to the grammar scoring and/or the vocabulary scoring and/or the pronunciation scoring and/or the fluency scoring.

5. The method of any of claims 1 to 4, further comprising:

acquiring a second semantic correlation degree between the text data to be evaluated and the standard answer of the question to be evaluated;

and acquiring the scoring result according to the second semantic relevance.

6. The method of claim 1, further comprising:

analyzing the grading result to obtain an analysis result;

and generating a comprehensive evaluation report according to the grading result and the analysis result.

7. A computer-readable storage medium, on which a program is stored which, when being executed by a processor, carries out the method of any one of the preceding claims 1 to 6.

8. An apparatus for spoken language proficiency testing, comprising:

the question extraction module is used for randomly extracting questions to be detected from the question bank;

the voice acquisition module is used for acquiring voice data to be evaluated aiming at the to-be-evaluated question;

the voice recognition module is used for acquiring corresponding text data to be evaluated and pronunciation characteristics to be evaluated according to the voice data to be evaluated;

the first relevancy calculation module is used for acquiring first semantic relevancy between the text data to be evaluated and the questions to be evaluated;

and the scoring module is used for acquiring a scoring result according to the first semantic relevance and the pronunciation feature to be scored.

9. The apparatus of claim 8, further comprising:

and the dimension standard acquisition module is used for acquiring corresponding evaluation dimensions and scoring standards according to the types of the to-be-detected questions.

10. The apparatus according to claim 9, wherein the evaluation dimension comprises a grammar evaluation dimension and/or a vocabulary evaluation dimension and/or a pronunciation evaluation dimension and/or a fluency evaluation dimension, the respective scoring criteria comprise grammar scoring criteria and/or vocabulary scoring criteria and/or pronunciation scoring criteria and/or fluency scoring criteria, the scoring module further comprises:

the grammar scoring unit is used for acquiring grammar scores according to the text data to be scored and the grammar scoring standard; and/or

The vocabulary scoring unit is used for acquiring vocabulary scores according to the text data to be scored and the vocabulary scoring standard; and/or

The pronunciation scoring unit is used for acquiring pronunciation scores according to the pronunciation characteristics to be scored and the pronunciation scoring standard; and/or

11. The apparatus of claim 10, wherein the scoring module further comprises:

and the total evaluation unit is used for acquiring the evaluation result according to the grammar evaluation and/or the vocabulary evaluation and/or the pronunciation evaluation and/or the fluency evaluation.

12. The apparatus of any of claims 8 to 11, further comprising:

the second correlation degree calculation module is used for acquiring a second semantic correlation degree between the text data to be evaluated and the standard answer of the question to be evaluated;

and the scoring module is further used for acquiring the scoring result according to the second semantic relevance.

13. The apparatus of claim 8, further comprising:

the analysis module is used for analyzing the grading result to obtain an analysis result;

and the report generation module is used for generating a comprehensive evaluation report according to the grading result and the analysis result.

14. An electronic device, comprising:

a memory for storing a computer program;

a processor for executing a computer program stored in the memory, and when the computer program is executed, the following instructions are executed:

randomly extracting questions to be tested from a question bank;

collecting voice data to be evaluated aiming at the to-be-evaluated question;