CN112801829B

CN112801829B - Method and device for correlation of test question prediction network model

Info

Publication number: CN112801829B
Application number: CN202011627643.0A
Authority: CN
Inventors: 胡阳; 付瑞吉; 王士进; 魏思; 胡国平
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2020-12-31
Filing date: 2020-12-31
Publication date: 2024-04-30
Anticipated expiration: 2040-12-31
Also published as: CN112801829A

Abstract

The application discloses a method and a device for correlation of a test question prediction network model, and the training method of the test question prediction network model comprises the following steps: acquiring question bank data; extracting correct answers from the question bank data to establish a survey dictionary bank; inputting the question bank data into a first preset network model so as to divide each word in a sample test question of the question bank data into a group of limited public word units through the first preset network model; detecting whether each public word unit exists in a research dictionary library or not so as to give out a corresponding detection characteristic value based on a current detection result; and training based on the context relation and the detection characteristic value between every two sentences in the sample test questions to obtain the test question prediction network model. Through the mode, the training method of the test question prediction network model can ensure the difficulty and quality of the test questions generated correspondingly, and is high in efficiency and low in implementation cost.

Description

Method and device for correlation of test question prediction network model

Technical Field

The application relates to the technical field of auxiliary teaching, in particular to a method and a device for correlation of a test question prediction network model.

Background

Nowadays, with the development and maturity of artificial intelligence technology, the technical fields of natural language understanding, data mining technology, personalized learning and the like are also developed and improved, so that personalized and accurate recommendation of test questions is also widely applied to application scenes of daily teaching of teachers, daily practice of students and the like. Particularly in a personalized learning system, in order to improve the learning efficiency of English subjects of students and improve English levels and English achievements of the students, in limited daily learning time and in scenes such as PC (personal computer), tablet, web reading examination and the like, various terminal data acquisition devices collect exercise data, realize detection and analysis of personal weak knowledge points, finally realize personalized resource recommendation, play important roles in personalized recommendation technology, test question analysis technology and the like, and finally realize the effects of reducing load and enhancing efficiency on the students. For example, when a student uses an intelligent terminal device to recommend various test questions by a machine, the self horizontal capability of the student can be tested to evaluate the personal horizontal capability of the student, and based on the test result of the current horizontal capability, the machine can obtain the weak point of the student so as to realize accurate knowledge information recommendation aiming at the information such as text content, answers, labels and the like of wrong question information of the student.

However, the existing method for processing the english test question resource is to process the test question completely by means of the teaching and research experience of the english teacher and the knowledge points mastered by the teaching and research experience, but the quality of the test question resource content cannot be mastered well due to the uneven level of the teaching and research teacher, and the labor cost is high, so that errors such as word spelling errors, grammar errors, mismatching of knowledge point content and the like are easy to occur, so that great cost waste is brought.

Disclosure of Invention

The application provides a related method and a related device for a test question prediction network model, which can effectively solve the problems of great cost waste caused by high labor cost, easy occurrence of errors such as word spelling errors, grammar errors, mismatching of knowledge point content and the like due to the fact that the traditional method for processing English test question resources completely relies on teaching and research experiences of English teachers.

In order to solve the technical problems, the application adopts a technical scheme that: the training method of the test question prediction network model comprises the following steps: acquiring question bank data; the question library data comprises sample questions and corresponding option answers, wherein the option answers comprise correct answers and wrong answers; extracting correct answers to establish a survey dictionary base; inputting the question bank data into a first preset network model so as to divide each word in a sample test question of the question bank data into a group of limited public word units through the first preset network model; detecting whether each public word unit exists in a research dictionary library or not so as to give out a corresponding detection characteristic value based on a current detection result; and training based on the context relation and the detection characteristic value between every two sentences in the sample test questions to obtain the test question prediction network model.

After the step of inputting the question bank data into the first preset network model to divide each word in the sample questions of the question bank data into a group of limited public word units through the first preset network model, the method further comprises the following steps of: converting each public word unit into a word feature vector, and encoding the position information of the word feature vector into a position feature vector; and obtaining the context relation between every two sentences in the sample test question according to the word feature vector and the position feature vector.

The step of obtaining the test question prediction network model based on the contextual relation between every two sentences in the sample test questions and the detection characteristic value training comprises the following steps: acquiring a target word or a target phrase in the sample test question based on the context relation and the detection characteristic value between every two sentences in the sample test question; hiding the target word or the target phrase; and predicting replacement items of the target words or the target phrase according to the context of the target words or the target phrase to generate test question answer options of the sample test questions so as to train and obtain a test question prediction network model.

The step of acquiring the target word or the target phrase in the sample test question based on the context relation and the detection characteristic value between every two sentences in the sample test question comprises the following steps: and acquiring the prediction probability of each word or phrase suitable for being hidden in the sample test question based on the context relation and the detection characteristic value between every two sentences in the sample test question, so as to determine the word or phrase with the prediction probability exceeding a set threshold value as a target word or target phrase.

In order to solve the technical problems, the application adopts another technical scheme that: the method for generating English test questions comprises the following steps: acquiring a target English text; carrying out format analysis on the target English text through a preset test question template to determine the matching question type of the target English text; performing test question prediction on the target English text after format analysis through a test question prediction network model so as to correspondingly generate the target English text into English test questions of corresponding matching questions; the test question prediction network model is obtained by training the test question prediction network model training method according to any one of the above.

The step of predicting the test questions by using the test question predicting network model to correspondingly generate the target English text into the English test questions of the corresponding matching questions comprises the following steps: acquiring target sentences suitable for generating English test questions in the target English text after format analysis through a test question prediction network model; hiding target words or target phrases in the target sentences; and predicting the replacement item according to the context of the target word or the target phrase to generate test question answer options, so that the target English text is correspondingly generated into the English test questions of the corresponding matching questions.

The step of obtaining target sentences suitable for generating English test questions in the target English text after format analysis through the test question prediction network model comprises the following steps of: acquiring a first probability suitable for generating English test questions in each sentence in the target English text after format analysis through a test question prediction network model, and determining the sentence with the first probability exceeding a first threshold value as a target sentence; the step of hiding the target word or the target phrase in the target sentence includes: and acquiring a second probability that each word or phrase in the target sentence is suitable for being hidden, determining the word or phrase with the second probability exceeding a second threshold value as the target word or target phrase, and hiding.

The method comprises the steps of predicting a replacement item according to the context of a target word or a target phrase to generate test question answer options, and further comprises the following steps before the step of correspondingly generating the target English text into the English test questions of corresponding matching question types: obtaining a word stock of true questions answers; predicting the replacement item according to the context of the target word or the target phrase so as to generate test question answer options, and correspondingly generating the target English text into the English test questions of corresponding matching question types, wherein the step of generating the English test questions of corresponding matching question types comprises the following steps: and predicting the replacement item according to the context of the target word or the target phrase and the true question answer word stock to generate test question answer options, so that the target English text is correspondingly generated into the English test questions of the corresponding matching questions.

In order to solve the technical problems, the application adopts another technical scheme that: the intelligent terminal comprises a memory and a processor which are mutually coupled, wherein the memory stores program data, and the processor is used for executing the program data to realize the training method of the test question prediction network model or the English test question generation method.

In order to solve the technical problems, the application adopts another technical scheme that: there is provided a computer-readable storage medium storing program data executable to implement the training method of the test question prediction network model as set forth in any one of the above, or the generation method of the english test questions as set forth in any one of the above.

The beneficial effects of the application are as follows: compared with the prior art, the training method of the test question prediction network model is characterized in that the test question database data is obtained, the test question database data comprises sample test questions and corresponding option answers, the option answers comprise correct answers and wrong answers, so that the correct answers are extracted, and a survey dictionary database is built; inputting the question bank data into a first preset network model so as to divide each word in a sample test question of the question bank data into a group of limited public word units through the first preset network model; converting each public word unit into a word feature vector, and encoding the position information of the word feature vector into a position feature vector; acquiring a context relation between every two sentences in the sample test question according to the word feature vector and the position feature vector; detecting whether each public word unit exists in a research dictionary library or not so as to give out a corresponding detection characteristic value based on a current detection result; the test question prediction network model is obtained based on the training of the context relation and the detection characteristic value between every two sentences in the sample test questions, so that the processing of English test question resources in a manual mode can be avoided, the difficulty and quality of the corresponding generated test questions are well mastered, and the processing efficiency of the corresponding test questions is also higher, so that the realization cost is reduced to a greater extent.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. Wherein:

FIG. 1 is a schematic flow chart of a first embodiment of a training method of a test question prediction network model of the present application;

FIG. 2 is a schematic diagram of the specific flow of S15 in FIG. 1;

FIG. 3 is a schematic flow chart of a second embodiment of the training method of the test question prediction network model of the present application;

FIG. 4 is a schematic structural diagram of the test question prediction network model in a specific application scenario of the training method of the test question prediction network model in FIG. 3;

FIG. 5 is a flowchart of sentence prediction in a specific application scenario of the training method of the test question prediction network model in FIG. 3;

FIG. 6 is a schematic flow chart of a first embodiment of a method for generating English test questions according to the present application;

FIG. 7 is a schematic flow chart of the method for generating English test questions in FIG. 6 in a specific application scenario;

FIG. 8 is a schematic flow chart of a second embodiment of the method for generating English test questions of the present application;

FIG. 9 is a flowchart of a third embodiment of the method for generating English test questions according to the present application;

fig. 10 is a schematic flow chart of a fourth embodiment of the method for generating english questions of the application;

FIG. 11 is a schematic structural diagram of an embodiment of a smart terminal according to the present application;

fig. 12 is a schematic diagram of a computer-readable storage medium according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the application. All other embodiments, based on the embodiments of the application, which are obtained by a person of ordinary skill in the art without making any inventive effort, are within the scope of the application.

Referring to fig. 1, fig. 1 is a flowchart of a first embodiment of a training method of a test question prediction network model according to the present application. The embodiment comprises the following steps:

S11: and acquiring question bank data.

In order to overcome the defects brought by processing English test question resources by virtue of teaching and research experiences of English teachers, and enable the richness, the variability and the unrepeatability of recommended test question resources to meet the actual requirements, the application provides a method and a system for manufacturing brand-new English test questions based on machine model assistance. Specifically, a test question prediction network model is obtained based on internet big data training, so that prediction analysis is performed on the obtained massive data through English articles and the like published on the internet by incoming journals, news and various middle and primary school teachers, and further brand-new English test questions are generated fully automatically.

It can be understood that before the test question prediction network model is used for generating the english test questions, the test question prediction network model needs to be trained first, so that the test question prediction network model obtained through final training can effectively generate the english test questions meeting actual requirements.

Specifically, in this embodiment, the intelligent terminal first obtains test question data in an existing test question library, for example, obtains question library data locally stored in the intelligent terminal, or obtains the well-arranged question library data on the internet, to be used as a training sample of this time.

The question library data specifically comprises sample questions and corresponding option answers, and the option answers further comprise correct answers and wrong answers.

It can be understood that the sample test questions in the question bank data can specifically include several question types commonly used in the prior art, such as single choice questions, complete filling, grammar filling, text correction, and the like. The examination methods of single choice questions, complete filling, grammar filling and short text correction have the characteristics of the examination methods, the corresponding main examination methods are examination points such as vocabulary types, grammar types, phrase types and the like, the main forms are to excavate the sentences existing in the original text, and supplement the corresponding error type candidates so as to examine the mastering conditions of students on the examination points such as vocabulary types, grammar points, phrases and the like.

S12: correct answers are extracted to build a survey dictionary base.

Further, after the question bank data is obtained, correct answers in the option answers are extracted from the question bank data so as to establish a survey dictionary bank, and the correct answers are used as training labels.

It can be understood that the examination dictionary library is the examination point which is aimed at this time, and the examination dictionary library specifically comprises examination words and examination phrases so as to be capable of meeting all possible appearing forms in answer options of general examination questions.

S13: the method comprises the steps of inputting question bank data into a first preset network model, and dividing each word in a sample test question of the question bank data into a group of limited public word units through the first preset network model.

Still further, the obtained question bank data is input into the first preset network model as a training sample, so that the first preset network model firstly divides each word in the sample questions of the question bank data into a group of limited public word units, and therefore the split public word units can obtain a compromise balance between the validity of the words and the flexibility of the characters. For example, "play" is split into "play" and "ing" so that it more conforms to the rule set as the test question option.

S14: and detecting whether each public word unit exists in the examination dictionary library or not so as to give a corresponding detection characteristic value based on the current detection result.

Specifically, whether each of the divided common word units exists in a pre-established investigation dictionary library is detected to give a corresponding detection feature value based on the current detection result, for example, the feature value of the common word unit existing in the investigation dictionary library is set to 1, and the feature value of the common word unit not existing in the investigation dictionary library is set to 0.

S15: and training based on the context relation and the detection characteristic value between every two sentences in the sample test questions to obtain the test question prediction network model.

Further, training the first preset network model through the context relation and the detection characteristic value between every two sentences in the obtained sample test questions to obtain a test question prediction network model through training.

Specifically, in an embodiment, referring to fig. 2, fig. 2 is a schematic flowchart of step S15 in fig. 1, and step S15 of the training method of the test question prediction network model of the present application further specifically includes the following steps:

S151: and acquiring a target word or a target phrase in the sample test question based on the context relation and the detection characteristic value between every two sentences in the sample test question.

Specifically, the first preset network model predicts the probability that each word in the sample test question is suitable for being hidden based on the context relation between every two sentences in the sample test question and the detection feature value given by whether each public word unit exists in the examination dictionary base, so as to correspondingly determine the word or phrase (phrase) suitable for being hidden as a target word or target phrase.

S152: the target word or target phrase is masked.

Further, the obtained target word or target phrase is hollowed out and hidden, for example, the target word in the sample test question is replaced by a [ mask ] character.

S153: and predicting replacement items of the target words or the target phrase according to the context of the target words or the target phrase to generate test question answer options of the sample test questions so as to train and obtain a test question prediction network model.

And further, predicting the replacement item according to the context of the target word or the target phrase, namely, searching all words or phrases which are related to the context of the target word or the target phrase and are suitable for being filled in the position of the target word or the target phrase so as to correspondingly generate test question answer options of the sample test questions, and further training to obtain a test question prediction network model.

Specifically, in one embodiment, the method for predicting the answer score of a test question according to the present application, S151, further specifically includes the steps of: and acquiring the prediction probability of each word or phrase suitable for being hidden in the sample test question based on the context relation and the detection characteristic value between every two sentences in the sample test question, so as to determine the word or phrase with the prediction probability exceeding a set threshold value as a target word or target phrase.

In the process of selecting words or phrases suitable for masking to correspondingly generate sample test questions, the probability of masking each word in the sample test questions can be predicted based on the context relation between every two sentences in the sample test questions and the detection characteristic value given by whether each public word unit exists in the examination dictionary base, so that words or phrases with the prediction probability exceeding a set threshold can be further determined as target words or target phrases.

In some specific embodiments, the test question prediction network model can be further trimmed based on a transducer, specifically, a seq2seq (universal encoder & decoder framework) task before the transducer is adopted, the main scheme is a framework of an encoder+decoder (encoder) formed by a cyclic network like RNN (Recurrent Neural Network, RNN, cyclic neural network)/LSTM (Long-Short Term Memory, long-short-term memory network) or a CNN (convolutional neural network), and an Attention mechanism is applied on the basis, so that the model represents the best effect of the potential neural network characteristics in the field of NLP (natural language processing). The BERT model based on the transducer not only has excellent effect in the word vector characteristics of the neural network, but also structurally comprises NSP tasks of sentences, and accords with the prediction of whether the current sentence is a hollow sentence.

After the BERT is trained on a huge number of corpus singles, the BERT can be applied to each task of the NLP. A Next Sentence Prediction (NSP, next sentence prediction) pre-training objective is introduced, which is good at handling the matching task of sentences or paragraphs, whose task is to determine if sentence B is the following of sentence a, in this embodiment, the predictive label (tag) indicates if the sentence is a hollowed sentence, and for NSP tasks its conditional probability is expressed as p=softmax (CW ^T), where C is the CLS symbol in the BERT output and W is a learnable weight matrix. For other tasks, corresponding predictions may also be made based on the output information of the BERT.

Referring to fig. 3, fig. 3 is a flowchart illustrating a training method of a test question prediction network model according to a second embodiment of the present application. The training method of the test question prediction network model of the present embodiment is a flowchart of a refinement embodiment of the training method of the test question prediction network model in fig. 1, and includes the following steps:

S31: and acquiring question bank data.

S32: correct answers are extracted to build a survey dictionary base.

S33: the method comprises the steps of inputting question bank data into a first preset network model, and dividing each word in a sample test question of the question bank data into a group of limited public word units through the first preset network model.

The S31, S32 and S33 are the same as S11, S12 and S13 in fig. 1, and specific reference is made to S11, S12 and S13 and the related text descriptions thereof, which are not repeated herein.

S34: each common word unit is converted into a word feature vector and its position information is encoded into a position feature vector.

Specifically, each common word unit is converted into a word feature vector, for example, each common word unit is itself represented as a feature vector of a specific dimension, wherein it should be noted that deletion, modification, and addition in the debug question are also required as part of the word vector.

Further, the location information of each common word unit is encoded as a location feature vector.

S35: and obtaining the context relation between every two sentences in the sample test question according to the word feature vector and the position feature vector.

Still further, a context relationship between every two sentences in the corresponding sample questions is obtained according to the obtained word feature vector and the position feature vector, so as to be used for distinguishing the two sentences, for example, distinguishing whether B is the context of A or not. Whereas for sentences that occur in pairs, the feature value of the first sentence may be set to 0, then the feature value of the second sentence is 1.

S36: and detecting whether each public word unit exists in the examination dictionary library or not so as to give a corresponding detection characteristic value based on the current detection result.

S37: and training based on the context relation and the detection characteristic value between every two sentences in the sample test questions to obtain the test question prediction network model.

The S36 and S37 are the same as S14 and S15 in fig. 1, and specific reference is made to S14 and S15 and the related text descriptions thereof, which are not repeated here.

In a specific embodiment, the test question prediction network model is trained by adopting a conventional BERT (Bidirectional Encoder Representations from Transformers, bi-directional encoder representation based on a natural language processing model), wherein, for convenience of explanation, as shown in fig. 4, fig. 4 is a schematic structural diagram of the test question prediction network model in a specific application scenario of the training method of the test question prediction network model in fig. 3. The description is as follows:

Input [ CLS ] my dog is cute [ SEP ] HE LIKES PLAY # ing [ SEP ];

WordPiece: wordPiece refers to dividing each complete word into a finite set of common subword units to enable a trade-off between word validity and character flexibility. For example, "play" in the example is split into "play" and "ing";

Word embedding (Token Embedding): word embedding refers to the expression of a word itself as a feature vector of a specific dimension, wherein it should be noted that deletion, modification, and addition in the debug question are also required as part of the word vector;

Position embedding (Position Embedding): position embedding refers to encoding the position information of words into feature vectors, and is a vital ring for introducing the position relation of words into a network model;

Segmentation embedding (Segment Embedding): for distinguishing between two sentences, such as whether B is the context of a (dialogue scene, question-answer scene, etc.). For sentences that occur in pairs, where the feature value of the first sentence is 0 and the feature value of the second sentence is 1.

Answer embedding (Answer Embedding): and the method is used for judging whether the word exists in the established test question answer word and phrase dictionary library, if so, the characteristic value is 1, and if not, the characteristic value is 0.

And the [ CLS ] represents that the feature is used for a classification model, namely, for sentences with the [ CLS ] marked at the front end, whether the sentences are suitable as hollow sentences or not is judged. [ SEP ] represents a clause symbol for breaking two sentences in the input corpus.

In the training process, the traditional BERT is improved and innovated by adding the answer characteristic library, so that more proper words or phrases in the sample test questions are selected and hidden, and the corresponding generated English test questions are more proper and effective.

Further, MLM (hidden language model) using BERT randomly removes some words from all of the radio, grammar filling, completion filling, short text debugging questions in the input corpus test library, and then predicts the removed words by context, 15% WordPiece Token (common subword unit marks) will be removed by random Mask in BERT experiments. In training the model, a sentence is input into the model multiple times for parameter learning, and after determining the word to be Mask, 80% of the words are directly replaced by [ Mask ],10% of the words are replaced by any other words, and 10% of the words are reserved with original Token. The above-described conventional method of MLM training for BERT. After the improvement of the answer feature library is introduced, for the training set, after the words to be Mask-removed are determined, 50% of the words are replaced by [ Mask ],10% of the words are replaced by random other words, 20% of the words are replaced by wrong candidate answers in the test questions, and because English test questions are mainly selected questions, the wrong candidates of each test question are always not lower than 4, and the original Token is reserved when 20%.

Examples are as follows: in "my dog is cute, I like playing." it is assumed that like is a question answer option of a sample question, and likes, liking, etc. are wrong answers. The corresponding processing results are as follows:

50％：my dog is cute，I like playing.->my dog is cute,I[Mask]playing.

10％：my dog is cute，I like playing.->my dog is cute,I happy playing.

20％：my dog is cute，I like playing.->my dog is cute,I liked playing.

20％：my dog is cute，I like playing.->my dog is cute,I likeing playing.

Therefore, by taking the potential fuzzy answers after the sample test questions are hollowed out as important factors to participate in training, top1 can be set as a correct answer, and Top 2-5 can be set as a wrong candidate in TopN probability words predicted by the model. The probability values corresponding to Top 1-5 are sequentially reduced, namely, the word or phrase with the largest corresponding probability in the model prediction is set as a correct answer, and other alternatives are set as error candidates.

In some specific embodiments, in order to ensure that the english test questions generated by the test question prediction network model have higher quality, real test questions may be used as training samples, for example, a certain amount of high-quality test questions such as middle-quality test questions and college entrance examination questions may be selected as training samples, so as to add the present training. It can be understood that the real high-quality test question library has corresponding characteristics on the difficulty, the small number of questions, the length of chapters and the suitability of the genres of each test question, and the test question with high quality is selected as a training sample, and template characteristics are extracted by using a bert pretreatment mode, so that the test question prediction network model can effectively contain hidden information of the high-quality test questions, the hollowed prediction result is more reasonable when the hidden information is input into the test question prediction network model for training together, the machine prediction result is often more than the actual small number, and the posterior probability of the hollowed test questions can be effectively optimized by adding the test question template, thereby further enabling the hollowed prediction result to be more reasonable.

In some specific embodiments, the training process further includes selecting high-quality questions such as middle-school and college entrance examination questions and the like to be added into question bank data according to the characteristics of the legibility, the genre and the difficulty coefficient of the chapters, and taking the high-quality questions as training samples, so that the corresponding hollowing difficulty, legibility and the like are added into the test question prediction network model, and further, the hollowing prediction result can be more reasonable.

Fig. 5 is a schematic flow chart of sentence prediction in a specific application scenario of the training method of the test question prediction network model in fig. 3, and fig. 5 shows that only one output layer needs to be added on the basis of BERT to finish fine tuning of a specific task. In fig. 5, where Tok represents different Token, E represents an embedded vector, and Tok _i (i is a positive integer) represents a feature vector obtained after BERT preprocessing for the ith Token. M _i represents the feature vector obtained after BERT processing of the ith Token of the corresponding predictive question type template.

The task of Choice Sentence Prediction (predictive sentence selection) is to determine whether sentence a corresponds to a sentence for which a hollowing answer is present. If the sentence in the training process is a sentence with a hollowed answer, the tag is 1, otherwise 0, and this relationship is stored in the CLS symbol of fig. 3. In the supervised training model bert, it is used directly as Label to predict whether a test sentence can be a hollow sentence.

Compared with the prior art, the training method of the test question prediction network model is characterized in that the test question database data is obtained, the test question database data comprises sample test questions and corresponding option answers, the option answers comprise correct answers and wrong answers, so that the correct answers are extracted, and a survey dictionary database is built; inputting the question bank data into a first preset network model so as to divide each word in a sample test question of the question bank data into a group of limited public word units through the first preset network model; converting each public word unit into a word feature vector, and encoding the position information of the word feature vector into a position feature vector; acquiring a context relation between every two sentences in the sample test question according to the word feature vector and the position feature vector; detecting whether each public word unit exists in a research dictionary library or not so as to give out a corresponding detection characteristic value based on a current detection result; the test question prediction network model is obtained based on the training of the context relation and the detection characteristic value between every two sentences in the sample test questions, so that the processing of English test question resources in a manual mode can be avoided, the difficulty and quality of the corresponding generated test questions are well mastered, and the processing efficiency of the corresponding test questions is also higher, so that the realization cost is reduced to a greater extent.

Referring to fig. 6, fig. 6 is a flow chart of a first embodiment of the method for generating english questions according to the application. The embodiment comprises the following steps:

S61: and obtaining the target English text.

Specifically, an english text suitable for generating test questions, for example, an english text with proper space, proper genre and moderate difficulty, is selected from the question library data locally stored in the intelligent terminal or the question library data already arranged on the internet, so as to be used as a target english text for generating the english test questions.

S62: and carrying out format analysis on the target English text through a preset test question template to determine the matching question type of the target English text.

Further, format analysis is performed on the target english text through the preset test question template, for example, the target english text is input into the preset test question template, so as to detect whether the target english text is matched with the test question format corresponding to the preset test question template, and further determine the matching question type of the target english text.

The matching question type comprises single choice questions, complete filling, grammar filling and short text correction, and the preset question template can be understood as a format rule corresponding to each type of question type, for example, the format rule corresponding to the single choice questions, complete filling, grammar filling and short text correction, so that when the target English text is determined to meet the format requirement corresponding to one or more types of question types, the question type is determined to be the matching question type corresponding to the target English text. For example, when it is determined that the obtained target english text meets the format requirements of the complete space filling question and the grammar space filling question, the matching question type of the target english text is the complete space filling question and the grammar space filling question.

Therefore, the preset test question templates correspond to a plurality of single-choice question generation templates, complete blank filling generation templates, grammar blank filling generation templates, short text error correction generation templates and the like, and the target English text can correspond to one of the templates or a plurality of the templates so as to further correspondingly generate one or more test questions.

S63: and carrying out test question prediction on the target English text after format analysis through a test question prediction network model so as to correspondingly generate the target English text into English test questions of corresponding matching questions.

Further, after the matching question type of the target English text is determined, the target English text after format analysis is subjected to question prediction through a question prediction network model, so that the target English text is correspondingly generated into English questions of corresponding question types which are correspondingly matched.

The test question prediction network model is obtained by training the test question prediction network model training method according to any one of the above, and will not be described herein.

In a specific embodiment, as shown in fig. 7, fig. 7 is a flow chart of a specific application scenario of the method for generating english examination questions in fig. 6, which specifically includes the following steps:

Step 1: and acquiring reliable English chapter text from Internet reliable journals, articles, news and the like.

Step 2: and analyzing the content of the acquired English chapter text to select English text to be predicted, and analyzing the English text in a format by using each test question template aiming at the English text.

Step 3: and using a test question prediction engine aiming at the English text after format analysis, loading a test question template to predict the English test questions which are suitable for generation.

Step 4: corresponding English texts are correspondingly generated into English test questions of the matching question types, such as one or more of single choice questions, complete form filling, grammar filling and short text correction.

Referring to fig. 8, fig. 8 is a flow chart of a second embodiment of the method for generating english questions according to the application. The method for generating english examination questions of the present embodiment is a flowchart of a refinement of the method for generating english examination questions of fig. 6, and includes the following steps:

S81: english chapter text is obtained from the Internet or a local database.

Specifically, in order to ensure the richness of resources for generating the english examination questions, various english chapter texts, such as journals, news, english articles published on the internet by various middle and primary school teachers, etc., may be continuously obtained from the internet or a local database.

S82: and carrying out content analysis on the English chapter text to screen out target English text suitable for generating English test questions.

Further, since the internet and the locally stored data are too huge, instead of each english chapter text being suitable for processing to generate the english test questions, after a large number of english chapter texts are obtained, the english chapter texts that do not meet the specification are filtered out to select a suitable english chapter text as a target english text, that is, the english chapter text is subjected to content analysis to select an appropriate english chapter text with moderate difficulty and proper genre, and the appropriate english chapter text is determined as the target english text suitable for generating the english test questions.

S83: and obtaining the target English text.

S84: and carrying out format analysis on the target English text through a preset test question template to determine the matching question type of the target English text.

S85: and carrying out test question prediction on the target English text after format analysis through a test question prediction network model so as to correspondingly generate the target English text into English test questions of corresponding matching questions.

S83, S84, and S85 are the same as S61, S62, and S63 in fig. 6, and refer to S61, S62, and S63 and their related text descriptions, which are not repeated here.

Referring to fig. 9, fig. 9 is a flow chart of a third embodiment of the method for generating english questions according to the application. The method for generating english examination questions of the present embodiment is a flowchart of a refinement of the method for generating english examination questions of fig. 6, and includes the following steps:

S91: and obtaining the target English text.

S92: and carrying out format analysis on the target English text through a preset test question template to determine the matching question type of the target English text.

In the embodiment, S91 and S92 are the same as S61 and S62 in fig. 6, and specific reference is made to S61 and S62 and the related text descriptions thereof, which are not repeated here.

S93: and obtaining target sentences suitable for generating English test questions in the target English text after format analysis through the test question prediction network model.

Specifically, inputting the target English text after format analysis into a trained test question prediction network model, and processing the target English text through the test question prediction network model to obtain a target sentence which is suitable for generating English test questions in the target English text.

It can be understood that the method of processing the target english text by the test question prediction network model to obtain the target sentence is the same as the training process corresponding to the test question prediction network model, and will not be described herein.

S94: and hiding the target word or the target phrase in the target sentence.

Further, the target word or the target phrase in the obtained target sentence is hidden, for example, the target word in the target sentence is replaced by a [ mask ] character.

S95: and predicting the replacement item according to the context of the target word or the target phrase to generate test question answer options, so that the target English text is correspondingly generated into the English test questions of the corresponding matching questions.

Still further, the test question predicting network model predicts the replacement item according to the context of the target word or the target phrase, namely, searches all words or phrases which are associated with the context of the target word or the target phrase and are suitable for being filled in the position of the target word or the target phrase so as to correspondingly generate the answer option of the test question, thereby correspondingly generating the target English text into the English test question of the corresponding question type which is correspondingly matched with the target English text.

Referring to fig. 10, fig. 10 is a flowchart illustrating a fourth embodiment of the method for generating english questions according to the application. The method for generating english examination questions of the present embodiment is a flowchart of a refinement of the method for generating english examination questions of fig. 9, and includes the following steps:

s101: and obtaining the target English text.

S102: and carrying out format analysis on the target English text through a preset test question template to determine the matching question type of the target English text.

In the embodiment, S101 and S102 are the same as S91 and S92 in fig. 9, and specific reference is made to S91 and S92 and the related text descriptions thereof, which are not repeated here.

S103: and acquiring a first probability suitable for generating English test questions in each sentence in the target English text after format analysis through the test question prediction network model, and determining the sentence with the first probability exceeding a first threshold value as a target sentence.

Specifically, inputting the target English text after format analysis into a trained test question prediction network model, and predicting a first probability suitable for generating English test questions in each sentence in the target English text through the test question prediction network model, so as to determine sentences in which the first probability exceeds a first threshold value as target sentences.

It can be understood that not every sentence in the target english text is suitable for the question face of the test question, so that the sentences in the target english text need to be screened to screen out target sentences meeting the requirements. The first probability is a quantized value of the suitability of each sentence in the target english text as a target sentence, so that when the first probability corresponding to a sentence exceeds a first threshold, the sentence is determined to be suitable as the target sentence, and a sentence suitable for hollowing out can be selected from the target english text.

The first threshold is specifically set by a test question prediction network model according to requirements.

S104: and acquiring a second probability that each word or phrase in the target sentence is suitable for being hidden, determining the word or phrase with the second probability exceeding a second threshold value as the target word or target phrase, and hiding.

Further, the second probability that each word or phrase in the target sentence is suitable for being hidden is predicted through the test question prediction network model, so that the word or phrase in which the second probability exceeds the second threshold value is further determined to be the target word or the target phrase, and the obtained target word or the target phrase is hidden, for example, the obtained target word or the target phrase is replaced by a [ mask ] character.

It is also understood that not every word or phrase in the target sentence is suitable as a test question option, so that the words or phrases in the target sentence need to be screened to screen out target words or target phrases meeting the requirements. The second probability is a quantized value of the suitability of each word or phrase in the target sentence as the target word or phrase, so that when the second probability corresponding to a word or phrase exceeds a second threshold, the sentence is determined to be suitable as the target word or phrase, and a sentence suitable for hollowing out can be selected from the target english text.

The second threshold is specifically set by the test question prediction network model according to the requirement.

S105: and predicting the replacement item according to the context of the target word or the target phrase to generate test question answer options, so that the target English text is correspondingly generated into the English test questions of the corresponding matching questions.

In the embodiment, S105 is the same as S95 in fig. 9, please refer to S95 and the related text descriptions thereof, and the detailed description is omitted herein.

Specifically, in an embodiment, before S105, the method further includes: the step S105 of obtaining the word stock of true questions answer, and the method for predicting the answer score of the test questions according to the present application further comprises the following steps: and predicting the replacement item according to the context of the target word or the target phrase and the true question answer word stock to generate test question answer options, so that the target English text is correspondingly generated into the English test questions of the corresponding matching questions.

It can be understood that, in order to ensure that the quality of the finally obtained english test questions is ensured, the substitution of the obtained target word or target phrase is more reasonable by introducing the true question answer word stock, so that the quality of the finally produced english test questions can be effectively improved.

Based on the general inventive concept, the present application also provides an intelligent terminal, referring to fig. 11, fig. 1 is a schematic structural diagram of an embodiment of the intelligent terminal of the present application. The intelligent terminal 111 includes a memory 1111 and a processor 1112 coupled to each other, where the memory 1111 stores program data, and the processor 1112 is configured to execute the program data to implement the training method of the test question prediction network model as described in any one of the above, or the generating method of the english test questions as described in any one of the above.

Based on the general inventive concept, the present application also provides a computer readable storage medium, referring to fig. 12, fig. 12 is a schematic structural diagram of an embodiment of the computer readable storage medium of the present application. Wherein the computer-readable storage medium 121 stores program data 1211, the program data 1211 being executable to implement the training method of the test question prediction network model as described in any one of the above, or the generating method of the english test questions as described in any one of the above.

In one embodiment, the computer readable storage medium 121 may be a memory chip, a hard disk or a removable hard disk in the terminal, or other readable and writable storage means such as a flash disk, an optical disk, etc., and may also be a server, etc.

In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., a division of a processor or memory, merely a division of a logic function, and there may be additional divisions of an actual implementation, e.g., multiple processor and memory implemented functions may be combined or integrated into another system, or some features may be omitted, or not performed. In addition, the coupling or direct coupling or connection shown or discussed with respect to each other may be through some interface, indirect coupling or connection of devices or elements, electrical, mechanical, or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the object of the present embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing description is only of embodiments of the present application, and is not intended to limit the scope of the application, and all equivalent structures or equivalent processes using the descriptions and the drawings of the present application or directly or indirectly applied to other related technical fields are included in the scope of the present application.

Claims

1. The training method of the test question prediction network model is characterized by comprising the following steps of:

Acquiring question bank data; the question library data comprises sample questions and corresponding option answers, wherein the option answers comprise correct answers and wrong answers;

Extracting the correct answers to establish a survey dictionary database;

Inputting the question bank data into a first preset network model to divide each word in the sample test questions of the question bank data into a group of limited public word units through the first preset network model;

Detecting whether each public word unit exists in the investigation dictionary library or not so as to give out a corresponding detection characteristic value based on a current detection result; wherein the detected feature value of the common word unit existing in the investigation dictionary base is 1, and the detected feature value of the common word unit not existing in the investigation dictionary base is 0;

And training to obtain the test question prediction network model based on the context relation between every two sentences in the sample test questions and the detection characteristic value.

2. The method according to claim 1, wherein after the step of inputting the question bank data into a first preset network model to divide each word in the sample questions of the question bank data into a limited set of common word units by the first preset network model, the step of detecting whether each of the common word units exists in the examination dictionary bank to give a corresponding detection feature value based on a current detection result, further comprises:

Converting each public word unit into a word feature vector, and encoding the position information of the word feature vector into a position feature vector;

and acquiring the context relation between every two sentences in the sample test question according to the word feature vector and the position feature vector.

3. The method for training a test question prediction network model according to claim 1, wherein the step of training to obtain the test question prediction network model based on the context relation between every two sentences in the sample test questions and the detection feature values comprises:

acquiring a target word or a target phrase in the sample test question based on the context relation between every two sentences in the sample test question and the detection characteristic value;

Hiding the target word or the target phrase;

and predicting replacement items of the target words or the target phrase according to the context of the target words or the target phrase so as to generate test question answer options of the sample test questions, so that the test question prediction network model is obtained through training.

4. The training method of a test question prediction network model according to claim 3, wherein the step of acquiring the target word or the target phrase in the sample test question based on the context relation between every two sentences in the sample test question and the detection feature value comprises:

And acquiring the prediction probability of each word or phrase suitable for being hidden in the sample test question based on the context relation between every two sentences in the sample test question and the detection characteristic value, so as to determine the word or phrase with the prediction probability exceeding a set threshold value as the target word or the target phrase.

5. The English test question generating method is characterized by comprising the following steps of:

acquiring a target English text;

carrying out format analysis on the target English text through a preset test question template to determine a matching question type of the target English text;

Performing test question prediction on the target English text after format analysis through a test question prediction network model so as to correspondingly generate the target English text into the English test questions of corresponding matching question types; the test question prediction network model is obtained by training the test question prediction network model according to any one of claims 1-4.

6. The method for generating english questions according to claim 5, wherein the step of generating the english questions corresponding to the target english text into corresponding matching questions by predicting the target english text into the format-resolved english text using a question prediction network model comprises:

acquiring a target sentence suitable for generating the English test question in the target English text after format analysis through the test question prediction network model;

Hiding target words or target phrases in the target sentences;

And predicting replacement items of the target words or the target phrase according to the context of the target words or the target phrase so as to generate test question answer options, thereby correspondingly generating the target English text as the English test question of the corresponding matching question type.

7. The method for generating english questions according to claim 6, wherein the step of obtaining, by the question prediction network model, the target sentence suitable for generating the english questions from the target english text after format parsing comprises:

Acquiring a first probability suitable for generating the English test questions in each sentence in the target English text after format analysis through the test question prediction network model, so as to determine the sentence with the first probability exceeding a first threshold value as the target sentence;

the step of hiding the target word or the target phrase in the target sentence comprises the following steps:

And acquiring a second probability that each word or phrase in the target sentence is suitable for being hidden, determining the word or phrase with the second probability exceeding a second threshold value as the target word or the target phrase, and hiding the word or phrase.

8. The method for generating english questions according to claim 6, wherein the step of predicting the replacement item according to the context of the target word or the target phrase to generate question answer options, thereby generating the target english text corresponding to the english questions of the corresponding matching question type, further comprises:

Obtaining a word stock of true questions answers;

The step of predicting the replacement item according to the context of the target word or the target phrase to generate test question answer options, so as to correspondingly generate the target English text into the English test questions of corresponding matching question types comprises the following steps:

And predicting replacement items of the target words or the target phrase according to the context of the target words or the target phrase and the true question answer word stock to generate test question answer options, so that the target English text is correspondingly generated into the English test questions of the corresponding matching questions.

9. An intelligent terminal, characterized in that the intelligent terminal comprises a memory and a processor which are mutually coupled;

The memory stores program data;

The processor is configured to execute the program data to implement the training method of the test question prediction network model according to any one of claims 1 to 4, or the generating method of the english test questions according to any one of claims 5 to 8.

10. A computer-readable storage medium storing program data executable to implement the training method of the test question prediction network model according to any one of claims 1 to 4 or the generation method of the english test questions according to any one of claims 5 to 8.