CN113392187A - Automatic scoring and error correction recommendation method for subjective questions - Google Patents
Automatic scoring and error correction recommendation method for subjective questions Download PDFInfo
- Publication number
- CN113392187A CN113392187A CN202110672735.9A CN202110672735A CN113392187A CN 113392187 A CN113392187 A CN 113392187A CN 202110672735 A CN202110672735 A CN 202110672735A CN 113392187 A CN113392187 A CN 113392187A
- Authority
- CN
- China
- Prior art keywords
- question
- answer
- test paper
- algorithm
- title
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012937 correction Methods 0.000 title claims abstract description 29
- 238000000034 method Methods 0.000 title claims abstract description 29
- 238000012360 testing method Methods 0.000 claims abstract description 76
- 238000005516 engineering process Methods 0.000 claims abstract description 26
- 230000011218 segmentation Effects 0.000 claims abstract description 13
- 238000010845 search algorithm Methods 0.000 claims abstract description 7
- 230000007812 deficiency Effects 0.000 claims abstract description 3
- 238000007596 consolidation process Methods 0.000 claims abstract 2
- 238000012549 training Methods 0.000 claims description 17
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000013145 classification model Methods 0.000 claims description 6
- 238000013527 convolutional neural network Methods 0.000 claims description 6
- 238000013135 deep learning Methods 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 6
- 239000013598 vector Substances 0.000 claims description 6
- NLINVDHEDVEOMJ-UHFFFAOYSA-N 1-Methylamino-1-(3,4-Methylenedioxyphenyl)Propane Chemical compound CCC(NC)C1=CC=C2OCOC2=C1 NLINVDHEDVEOMJ-UHFFFAOYSA-N 0.000 claims description 3
- 102400000233 M-alpha Human genes 0.000 claims description 3
- 101800001695 M-alpha Proteins 0.000 claims description 3
- 230000002159 abnormal effect Effects 0.000 claims description 3
- 238000013528 artificial neural network Methods 0.000 claims description 3
- 230000004927 fusion Effects 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 230000000306 recurrent effect Effects 0.000 claims description 3
- 238000013526 transfer learning Methods 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 3
- 238000005259 measurement Methods 0.000 description 2
- 230000001502 supplementing effect Effects 0.000 description 2
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/3332—Query translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/316—Indexing structures
- G06F16/319—Inverted lists
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Evolutionary Computation (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Electrically Operated Instructional Devices (AREA)
Abstract
The invention provides an automatic scoring and error correction recommendation method for subjective questions, which comprises the following steps: step S1, establishing a question bank, wherein the question bank comprises questions, corresponding standard answers, knowledge point labels and question numbers; step S2, establishing a multi-bit inverted index table for the title number, and establishing a multi-bit inverted index total library; step S3, receiving an answer sheet picture to be scored, and dividing a question area and an answer area according to a target segmentation algorithm; step S4, identifying the question text in the question area by using OCR technology to obtain the test paper question, finding out the matched question from the question bank according to the multi-digit search algorithm and the question matching algorithm, and extracting the standard answer and the knowledge point label; step S5, recognizing the answer text of the answer area by using OCR technology to obtain the answer of the test paper, calculating the similarity between the answer of the test paper and the standard answer according to the answer matching algorithm, and providing the deficiency compared with the standard answer; and step S6, providing similar topics according to the recommendation strategy for knowledge point consolidation.
Description
Technical Field
The invention belongs to the technical field of automatic scoring, error correction and recommendation, and particularly relates to an automatic scoring and error correction recommendation method for subjective questions.
Background
In recent years, students have increased burdens and have more and more learning tasks. In writing, especially when the subject is not met, if the teacher is helping to solve the problem in school, if the parent has limited ability at home, the teacher can take measures.
With the development of science and technology, researchers can utilize related technologies to realize automatic correction and error correction of jobs under the support of big data and artificial intelligence. The prior art discloses a method, a device, an electronic device and a storage medium for automatically correcting a job with application number 202010603637.5, wherein the method comprises the following steps: receiving a job picture to be corrected sent by an intelligent terminal; inputting the operation picture into a pre-trained text detection model to generate subject information and answer information of a target title; performing OCR recognition on the question information and the answer information respectively to obtain a question text and an answer text; searching in a resource library according to the question text to obtain answer analysis corresponding to the original question; comparing the answer analysis with the answer text to obtain the similarity of the answer analysis and the answer text; and when the similarity is greater than or equal to the preset threshold, the answer result of the correction target title is correct, and when the similarity is less than the preset threshold, the answer result of the correction target title is wrong, and the answer is returned and analyzed to the intelligent terminal. Although the technology can be used for automatically correcting the homework, no specific implementation algorithm is given in the links of searching, comparing and the like, only the correct or wrong question is given for the objective question, the reason for the correct or wrong question is not given, and the students are still in a blank face; in addition, the implementation effect on the subjective questions is limited.
Disclosure of Invention
The present invention is made to solve the above problems, and an object of the present invention is to provide an automatic scoring and error correction recommendation method for subjective questions.
The invention provides an automatic scoring and error correction recommendation method for subjective questions, which is characterized by comprising the following steps of:
step S1, establishing a question bank, wherein the question bank comprises questions, standard answers corresponding to the questions, knowledge point labels and question numbers;
step S2, establishing a multi-bit inverted index table for the question numbers in the question bank, and establishing a multi-bit inverted index total bank;
step S3, receiving an answer sheet picture to be scored, and dividing a question area and an answer area according to a target segmentation algorithm;
step S4, identifying the question text in the question area by using an OCR technology to obtain the question of the test paper, finding the question matched with the question of the test paper from the question library according to a multi-digit search algorithm and a question matching algorithm, and correspondingly extracting a standard answer and a knowledge point label;
step S5, recognizing the answer text of the answer area by using an OCR technology to obtain the answer of the test paper, calculating the similarity between the answer of the test paper and the standard answer according to an answer matching algorithm, and providing the shortage of the answer of the test paper compared with the standard answer;
step S6, searching questions similar to the questions of the test paper from the question bank according to the recommendation strategy to consolidate the knowledge points,
in step S5, the answer matching algorithm includes the following steps:
step S5-1, recognizing answer texts in answer areas by using an OCR technology to obtain test paper answers, marking the test paper answers as Daan, and marking the standard answers returned in the step S4 as Biaozhun;
step S5-2, a bert pre-training model or an xlnet pre-training model is used for generating sentence vectors of the test paper answer Daan and the standard answer Biaozhun, and a cosine similarity algorithm is used for calculating similarity Sim between the vectors1,0≤Sim1≤1;
Step S5-3, extracting a keyword set G from the test paper answer Daan by using a textrank algorithm1Extracting a keyword set G from the standard answer Biaozhun by using a textrank algorithm2And calculating the similarity Sim of the two groups of keyword sets2,0≤Sim2Less than or equal to 1, similarity Sim2The calculation formula of (a) is as follows:
step S5-4, similarity Sim1And similarity Sim2And (3) carrying out fusion to obtain the similarity Sim, wherein the calculation formula is as follows:
Sim=Sim1×a+Sim2×b (2)
step S5-5, according to the set score of the test paper question, carrying out similarity Sim mapping and returning the score of the test paper answer, and in addition, collecting the keywords G2But keyword set G1Also returns an element representing the insufficient feature of the test paper answer compared with the standard answer,
in the formula (2), a and b are respectively similarity Sim1And similarity Sim2The weight of (a) satisfies that a + b is 1 and a is not less than b.
The automatic scoring and error correction recommendation method for the subjective questions provided by the invention can also have the following characteristics: in step S1, the titles, the standard answers, and the knowledge point labels are derived from various books and internet resources, and the title numbers are generated from the titles, and the specific generation steps are as follows:
step S1-1, performing word segmentation processing on the text of the title;
step S1-2, using MD5 algorithm as pseudo-random number generator, using TF-IDF algorithm or BM25 algorithm to calculate the word weight of each word after word segmentation;
and step S1-3, generating a hash value corresponding to the text of the title by using a 128-bit simhash algorithm according to the pseudo-random number generator and the word weight, and taking the hash value as the title number.
The automatic scoring and error correction recommendation method for the subjective questions provided by the invention can also have the following characteristics: in step S2, the specific steps of establishing the multi-bit inverted index table and establishing the multi-bit inverted index total library are as follows:
step S2-1, the title number of a title in the title library is hash, the title number hash is divided according to M segments to obtain M segments of sub-title numbers, as shown in formula (3),
hash=[hash1,hash2,…,hashi,…,hashM],i∈[1,M] (3)
step S2-2, the M segment sub-topic number hash of the topic number hashi,i∈[1,M]Performing (M-alpha) bit permutation and combination to establish a multi-bit inverted index table, wherein alpha is phaseLike the threshold, and M > α, each topic number hash will haveAn index points to the index, and the indexes are sequentially marked as index 1, index 2, … … and index from top to bottomThe multi-bit inverted index table when α is 3 and M is 4 is shown in formula (4),
the multi-bit inverted index table when α is 3 and M is 5 is shown in equation (5),
step S2-3, summarizing the multi-bit inverted index tables of all question numbers in the question bank, constructing to obtain a multi-bit inverted index total bank,
wherein, in the formula (3), hashi,i∈[1,M]The sub-topic number representing the topic number hash.
The automatic scoring and error correction recommendation method for the subjective questions provided by the invention can also have the following characteristics: in step S3, the target segmentation algorithm includes the following specific steps:
step S3-1, collecting a plurality of answer sheet pictures, marking question areas and answer areas on the answer sheet pictures by using a manual marking method, and taking the areas formed by the printing forms as the question areas and the areas formed by the handwriting forms as the answer areas when manual marking is carried out;
step S3-2, using the collected answer sheet pictures and the manually marked information as training data, using a deep learning technology and training the two classification models of the print form and the handwriting form by means of a transfer learning technology;
step S3-3, inputting the answer sheet picture to be scored into the two classification models obtained by training, correspondingly dividing according to the print form and the handwriting form to obtain a question area and an answer area,
wherein, the deep learning technology is convolutional neural network CNN, recurrent neural network RNN or LSTM or GRU.
The automatic scoring and error correction recommendation method for the subjective questions provided by the invention can also have the following characteristics: in step S4, the multi-bit search algorithm includes the following specific steps:
step S4-1-1, identifying the question text of the question area by using an OCR technology to obtain the question of the test paper, performing the same processing on the question of the test paper according to the specific generation step of the question number in the step 1 to obtain the question number of the test paper, and recording the question number of the test paper as Thash;
step S4-1-2, according to the specific steps of establishing the multi-bit inverted index table in the step S2, the test paper title number Thash is also segmented by M sections, and the multi-bit inverted index table of the test paper title number Thash is established;
step S4-1-3, according to the multi-bit inverted index table of the test paper title number Thash, searching the title number which is the same as the index number of the test paper title number Thash and has the same index value in the multi-bit inverted index total library to obtain a title number set,
the specific steps of the topic matching algorithm are as follows:
step S4-2-1, calculating Hamming distance H for the test paper title number Thash and the title numbers in the title number set one by one;
step S4-2-2, if the Hamming distance H of only one question number in the question number set meets the requirement that H is less than or equal to alpha, taking the standard answer and the knowledge point label corresponding to the question number as return values;
s4-2-3, if Hamming distance H of a plurality of question numbers in the question number set meets the requirement that H is less than or equal to alpha, taking a standard answer and a knowledge point label corresponding to the question number with the smallest Hamming distance H as a return value;
step S4-2-4, if the Hamming distance H of a plurality of question numbers in the question number set meets the requirement that H is less than or equal to alpha, and the question number with the smallest Hamming distance H also has a plurality of question numbers, the standard answer and the knowledge point label corresponding to a certain question number are arbitrarily used as return values;
and S4-2-5, if no question number matched with the question number of the test paper is searched, outputting abnormal information of the correct answer of the question temporarily, recording the corresponding question of the test paper, and filling the question library after the question library expert gives the correct standard answer and the knowledge point label.
The automatic scoring and error correction recommendation method for the subjective questions provided by the invention can also have the following characteristics: in step S6, the specific steps of recommending a policy are as follows:
step S6-1, the knowledge point label returned in step S4 is recorded as Tags;
step S6-2, searching the question bank for the question corresponding to the knowledge point tag similar to the knowledge point tag Tags, and randomly returning a question to consolidate the knowledge points.
Action and Effect of the invention
According to the automatic scoring and error correction recommendation method for the subjective questions, a simhash algorithm is used for generating the question numbers of the questions in the aspect of question matching so as to ensure the retrieval accuracy, and a multi-bit inverted index table is used for retrieval, so that the retrieval efficiency can be effectively improved; in the aspect of answer matching, similarity is calculated by using an advanced pre-training model such as bert and xlnet and a cosine similarity algorithm, and a textrank keyword algorithm is used for supplementing similarity measurement so as to ensure the accuracy of an answer matching result, and meanwhile, insufficient characteristics of a student test paper answer and a standard answer can be given, so that the student can be helped to determine own problems; in the aspect of recommendation strategies, similar questions are recommended for students according to the knowledge point labels, and the students can be helped to consolidate the knowledge points. Therefore, the automatic scoring and error correction recommendation method for the subjective questions can solve the phenomenon that students are overwhelmed when working alone to face difficult problems, solve the phenomena of time consumption during question matching and unreasonable answer matching, and achieve the purpose of helping the students improve knowledge by using big data and an artificial intelligence algorithm.
Drawings
Fig. 1 is a flowchart of an automatic scoring and error correction recommendation method for subjective questions according to an embodiment of the present invention;
FIG. 2 is a flowchart of step S1 in an embodiment of the present invention;
fig. 3 is a schematic diagram of a multi-bit inverted index table established when α is 3 and M is 4 in an embodiment of the present invention;
fig. 4 is a schematic diagram of a multi-bit inverted index table established when α is 3 and M is 5 in an embodiment of the present invention;
fig. 5 is a flowchart of step S4 in an embodiment of the present invention.
Detailed Description
In order to make the technical means and functions of the present invention easy to understand, the present invention is specifically described below with reference to the embodiments and the accompanying drawings.
< example >
Fig. 1 is a flowchart of an automatic scoring and error correction recommendation method for subjective questions according to an embodiment of the present invention.
As shown in fig. 1, the automatic scoring and error correction recommendation method for subjective questions of this embodiment includes the following steps:
step S1, establishing a question bank, wherein the question bank comprises questions, standard answers corresponding to the questions, knowledge point labels and question numbers.
Fig. 2 is a flowchart of step S1 in an embodiment of the present invention.
As shown in fig. 2, in step S1, the title, the standard answer, and the knowledge point tag are derived from various books and internet resources, and the title number is generated from the title, which includes the following specific steps:
step S1-1, performing word segmentation processing on the text of the title;
step S1-2, using MD5 algorithm as pseudo-random number generator, using TF-IDF algorithm or BM25 algorithm to calculate the word weight of each word after word segmentation;
and step S1-3, generating a hash value corresponding to the text of the title by using a 128-bit simhash algorithm according to the pseudo-random number generator and the word weight, and taking the hash value as the title number.
Step S2, establishing a multi-bit reverse index table for the question numbers in the question bank, and establishing a multi-bit reverse index total bank.
In step S2, the specific steps of establishing the multi-bit inverted index table and establishing the multi-bit inverted index total library are as follows:
step S2-1, the title number of a title in the title library is hash, the title number hash is divided according to M segments to obtain M segments of sub-title numbers, as shown in formula (3),
hash=[hash1,hash2,…,hashi,…,hashM],i∈[1,M] (3)
step S2-2, the M segment sub-topic number hash of the topic number hashi,i∈[1,M]Performing (M-alpha) bit permutation and combination to establish a multi-bit inverted index table, wherein alpha is a similarity threshold value, M is more than alpha, and each title number hash will haveAn index points to the index, and the indexes are sequentially marked as index 1, index 2, … … and index from top to bottomThe multi-bit inverted index table when α is 3 and M is 4 is shown in formula (4),
fig. 3 is a schematic diagram of a multi-bit inverted index table established when α is 3 and M is 4 in an embodiment of the present invention.
As shown in fig. 3, when α is 3 and M is 4, the topic number is segmented into 4 segments of sub-topic numbers, and common index 1, index 2, index 3, and index 4 point to the topic number.
The multi-bit inverted index table when α is 3 and M is 5 is shown in equation (5),
fig. 4 is a schematic diagram of a multi-bit inverted index table established when α is 3 and M is 5 in an embodiment of the present invention.
As shown in fig. 4, when α is 3 and M is 5, the topic number is segmented into 5 segments of sub-topic numbers, and common indexes 1 to 10 point to the topic number.
Step S2-3, summarizing the multi-bit inverted index tables of all question numbers in the question bank, constructing to obtain a multi-bit inverted index total bank,
wherein, in the formula (3), hashi,i∈[1,M]The sub-topic number representing the topic number hash.
And step S3, receiving the picture of the answer sheet to be scored, and dividing a question area and an answer area according to a target segmentation algorithm.
In step S3, the target segmentation algorithm includes the following specific steps:
step S3-1, collecting a plurality of answer sheet pictures, marking question areas and answer areas on the answer sheet pictures by using a manual marking method, taking the areas formed by printing as the question areas and taking the areas formed by handwriting as the answer areas when manual marking is carried out, wherein the question areas and the answer areas are mainly rectangles;
step S3-2, using the collected answer sheet pictures and the manually marked information as training data, using a deep learning technology and training the two classification models of the print form and the handwriting form by means of a transfer learning technology;
step S3-3, inputting the answer sheet picture to be scored into the two classification models obtained by training, correspondingly dividing according to the print form and the handwriting form to obtain a question area and an answer area,
wherein, the deep learning technology is convolutional neural network CNN, recurrent neural network RNN or LSTM or GRU.
Fig. 5 is a flowchart of step S4 in an embodiment of the present invention.
As shown in fig. 5, in step S4, OCR technology is used to identify the question text in the question area to obtain the question of the test paper, and the question matching the question of the test paper is found from the question bank according to the multi-digit search algorithm and the question matching algorithm, and the standard answer and the knowledge point label are extracted correspondingly.
In step S4, the multi-bit search algorithm includes the following steps:
step S4-1-1, identifying the question text of the question area by using an OCR technology to obtain the question of the test paper, performing the same processing on the question of the test paper according to the specific generation step of the question number in the step 1 to obtain the question number of the test paper, and recording the question number of the test paper as Thash;
step S4-1-2, according to the specific steps of establishing the multi-bit inverted index table in the step S2, the test paper title number Thash is also segmented by M sections, and the multi-bit inverted index table of the test paper title number Thash is established;
step S4-1-3, according to the multi-bit inverted index table of the test paper title number Thash, searching the title number which is the same as the index number of the test paper title number Thash and has the same index value in the multi-bit inverted index total library to obtain a title number set,
the specific steps of the topic matching algorithm are as follows:
step S4-2-1, calculating Hamming distance H for the test paper title number Thash and the title numbers in the title number set one by one;
step S4-2-2, if the Hamming distance H of only one question number in the question number set meets the requirement that H is less than or equal to alpha, taking the standard answer and the knowledge point label corresponding to the question number as return values;
s4-2-3, if Hamming distance H of a plurality of question numbers in the question number set meets the requirement that H is less than or equal to alpha, taking a standard answer and a knowledge point label corresponding to the question number with the smallest Hamming distance H as a return value;
step S4-2-4, if the Hamming distance H of a plurality of question numbers in the question number set meets the requirement that H is less than or equal to alpha, and the question number with the smallest Hamming distance H also has a plurality of question numbers, the standard answer and the knowledge point label corresponding to a certain question number are arbitrarily used as return values;
and S4-2-5, if no question number matched with the question number of the test paper is searched, outputting abnormal information of the correct answer of the question temporarily, recording the corresponding question of the test paper, and filling the question library after the question library expert gives the correct standard answer and the knowledge point label.
And step S5, recognizing the answer text of the answer area by using an OCR technology to obtain the answer of the test paper, calculating the similarity between the answer of the test paper and the standard answer according to an answer matching algorithm, and providing the deficiency of the answer of the test paper compared with the standard answer.
In step S5, the answer matching algorithm includes the following steps:
step S5-1, recognizing answer texts in answer areas by using an OCR technology to obtain test paper answers, marking the test paper answers as Daan, and marking the standard answers returned in the step S4 as Biaozhun;
step S5-2, a bert pre-training model or an xlnet pre-training model is used for generating sentence vectors of the test paper answer Daan and the standard answer Biaozhun, and a cosine similarity algorithm is used for calculating similarity Sim between the vectors1,0≤Sim1≤1;
Step S5-3, extracting a keyword set G from the test paper answer Daan by using a textrank algorithm1Extracting a keyword set G from the standard answer Biaozhun by using a textrank algorithm2And calculating the similarity Sim of the two groups of keyword sets2,0≤Sim2Less than or equal to 1, similarity Sim2The calculation formula of (a) is as follows:
step S5-4, similarity Sim1And similarity Sim2And (3) carrying out fusion to obtain the similarity Sim, wherein the calculation formula is as follows:
Sim=Sim1×a+Sim2×b (2)
step S5-5, according to the set score of the test paper question, carrying out similarity Sim mapping and returning the score of the test paper answer, and in addition, collecting the keywords G2But keyword set G1Also returns an element representing the insufficient feature of the test paper answer compared with the standard answer,
in the formula (2), a and b are respectively similarity Sim1And similarity Sim2The weight of (a) satisfies that a + b is 1 and a is not less than b.
Step S6, searching question similar to the question of the test paper from the question bank according to the recommendation strategy to consolidate the knowledge point.
In step S6, the specific steps of recommending a policy are as follows:
step S6-1, the knowledge point label returned in step S4 is recorded as Tags;
step S6-2, searching the question bank for the question corresponding to the knowledge point tag similar to the knowledge point tag Tags, and randomly returning a question to consolidate the knowledge points.
Effects and effects of the embodiments
According to the automatic scoring and error correction recommendation method for the subjective questions, in the aspect of question matching, a simhash algorithm is used for generating question numbers of the questions to ensure the retrieval accuracy, and a multi-bit inverted index table is used for retrieval, so that the retrieval efficiency can be effectively improved; in the aspect of answer matching, similarity is calculated by using an advanced pre-training model such as bert and xlnet and a cosine similarity algorithm, and a textrank keyword algorithm is used for supplementing similarity measurement so as to ensure the accuracy of an answer matching result, and meanwhile, insufficient characteristics of a student test paper answer and a standard answer can be given, so that the student can be helped to determine own problems; in the aspect of recommendation strategies, similar questions are recommended for students according to the knowledge point labels, and the students can be helped to consolidate the knowledge points. Therefore, the automatic scoring and error correction recommendation method for the subjective questions can solve the phenomenon that students are overwhelmed when working alone to face difficult problems, solve the phenomena that time is consumed when questions are matched and answers are unreasonable when answers are matched, and achieve the purpose of helping students improve knowledge by using big data and an artificial intelligence algorithm.
The above embodiments are preferred examples of the present invention, and are not intended to limit the scope of the present invention.
Claims (6)
1. An automatic scoring and error correction recommendation method for subjective questions is characterized by comprising the following steps:
step S1, establishing a question bank, wherein the question bank comprises questions, and standard answers, knowledge point labels and question numbers corresponding to the questions;
step S2, establishing a multi-bit inverted index table for the question numbers in the question bank, and establishing a multi-bit inverted index total bank;
step S3, receiving an answer sheet picture to be scored, and dividing a question area and an answer area according to a target segmentation algorithm;
step S4, identifying the question text in the question area by using an OCR technology to obtain a test paper question, finding the question matched with the test paper question from the question library according to a multi-digit search algorithm and a question matching algorithm, and correspondingly extracting the standard answer and the knowledge point label;
step S5, recognizing the answer text of the answer area by using an OCR technology to obtain a test paper answer, calculating the similarity between the test paper answer and the standard answer according to an answer matching algorithm, and providing the deficiency of the test paper answer compared with the standard answer;
step S6, searching the question bank for the question similar to the question of the test paper according to the recommendation strategy to consolidate the knowledge point,
in step S5, the answer matching algorithm includes the following specific steps:
step S5-1, recognizing the answer text of the answer area by using an OCR technology, obtaining the answer of the test paper, marking as Daan, and recording the standard answer returned in the step S4 as Biaozhun;
step S5-2, a bert pre-training model or an xlnet pre-training model is used for generating sentence vectors of the test paper answer Daan and the standard answer Biaozhun, and a cosine similarity algorithm is used for calculating similarity Sim between the vectors1,0≤Sim1≤1;
Step S5-3, extracting a keyword set G from the test paper answer Daan by using a textrank algorithm1Extracting a keyword set G from the standard answer Biaozhun by using a textrank algorithm2And calculating the similarity Sim of the two groups of keyword sets2,0≤Sim2Less than or equal to 1, similarity Sim2The calculation formula of (a) is as follows:
step S5-4, similarity Sim1And similarity Sim2And (3) carrying out fusion to obtain the similarity Sim, wherein the calculation formula is as follows:
Sim=Sim1×a+Sim2×b (2)
step S5-5, according to the set score of the test paper question, carrying out similarity Sim mapping and returning the score of the test paper answer, and in addition, collecting the keywords G2But keyword set G1Also returns an element representing an insufficient feature of the test paper answer compared to the standard answer,
in the formula (2), a and b are respectively similarity Sim1And similarity Sim2The weight of (a) satisfies that a + b is 1 and a is not less than b.
2. The automatic scoring and correction recommendation method for subjective questions according to claim 1, wherein:
in step S1, the titles, the standard answers, and the knowledge point labels are derived from various books and internet resources, and the title numbers are generated from the titles, and the specific generation steps are as follows:
step S1-1, performing word segmentation processing on the text of the title;
step S1-2, using MD5 algorithm as pseudo-random number generator, using TF-IDF algorithm or BM25 algorithm to calculate the word weight of each word after word segmentation;
and step S1-3, generating a hash value corresponding to the text of the title by using a 128-bit simhash algorithm according to the pseudo-random number generator and the word weight, and taking the hash value as the title number.
3. The automatic scoring and correction recommendation method for subjective questions according to claim 2, wherein:
in step S2, the specific steps of establishing the multi-bit inverted index table and establishing the multi-bit inverted index total library are as follows:
step S2-1, marking the question number of a certain question in the question bank as a hash, segmenting the question number hash according to M segments to obtain M segments of sub-question numbers, as shown in formula (3),
hash=[hash1,hash2,…,hashi,…,hashM],i∈[1,M] (3)
step S2-2, M segments of the topic number hash and the sub-topic number hashi,i∈[1,M]Carrying out (M-alpha) bit permutation and combination to establish the multi-bit inverted index table, wherein alpha is a similar threshold value, M is more than alpha, and each title number hash hasAn index points to the index, and the indexes are sequentially marked as index 1, index 2, … … and index from top to bottomThe multi-bit inverted index table when α is 3 and M is 4 is shown in formula (4),
the multi-bit inverted index table when α is 3 and M is 5 is shown in equation (5),
step S2-3, summarizing the multi-bit inverted index tables of all the question numbers in the question bank, constructing to obtain the multi-bit inverted index total bank,
wherein, in the formula (3), hashi,i∈[1,M]The sub-topic number representing the topic number hash.
4. The automatic scoring and correction recommendation method for subjective questions according to claim 1, wherein:
in step S3, the target segmentation algorithm includes the following specific steps:
step S3-1, collecting a plurality of answer sheet pictures, marking the question areas and the answer areas on the answer sheet pictures by using a manual marking method, and taking the areas formed by the printed form as the question areas and the areas formed by the handwritten form as the answer areas when the manual marking is carried out;
step S3-2, using the collected answer sheet pictures and the manually marked information as training data, using a deep learning technology and training the two classification models of the print form and the handwriting form by means of a transfer learning technology;
step S3-3, inputting the answer sheet picture to be scored into the two classification models obtained by training, correspondingly dividing the answer sheet picture according to the print form and the handwriting form to obtain the question area and the answer area,
wherein, the deep learning technology is convolutional neural network CNN, recurrent neural network RNN or LSTM or GRU.
5. The automatic scoring and correction recommendation method for subjective questions according to claim 1, wherein:
in step S4, the multi-bit search algorithm includes the following specific steps:
step S4-1-1, identifying the question text of the question area by using an OCR technology to obtain the test paper question, performing the same processing on the test paper question according to the specific generation step of the question number in the step 1 to obtain a test paper question number, and recording the test paper question number as Thash;
step S4-1-2, according to the specific steps established by the multi-bit inverted index table in the step S2, the test paper title number Thash is also segmented by M sections, and the multi-bit inverted index table of the test paper title number Thash is established;
step S4-1-3, searching the title numbers which are the same as the index numbers and the index values of the test paper title numbers Thash in the multi-bit inverted index total library according to the multi-bit inverted index table of the test paper title numbers Thash to obtain a title number set,
the title matching algorithm comprises the following specific steps:
step S4-2-1, calculating Hamming distance H for the test paper title number Thash and the title numbers in the title number set one by one;
step S4-2-2, if the Hamming distance H of only one question number in the question number set meets the requirement that H is less than or equal to alpha, taking the standard answer and the knowledge point label corresponding to the question number as return values;
step S4-2-3, if the Hamming distance H of a plurality of question numbers in the question number set meets the requirement that H is not more than alpha, taking the standard answer and the knowledge point label corresponding to the question number with the smallest Hamming distance H as a return value;
step S4-2-4, if the Hamming distance H of a plurality of question numbers in the question number set meets the requirement that H is not more than alpha, and the question number with the smallest Hamming distance H also has a plurality of question numbers, the standard answer and the knowledge point label corresponding to one of the question numbers are arbitrarily used as return values;
and S4-2-5, if the question number matched with the question number of the test paper is not searched, outputting abnormal information of the correct answer of the question temporarily, recording the corresponding question of the test paper, and filling the correct standard answer and the knowledge point label into the question bank after the question bank expert gives out the correct standard answer and the knowledge point label.
6. The automatic scoring and correction recommendation method for subjective questions according to claim 1, wherein:
in step S6, the recommendation strategy includes the following specific steps:
step S6-1, recording the knowledge point label returned in the step S4 as Tags;
step S6-2, search in the question bank with the knowledge point label Tags similar to the question that the knowledge point label corresponds to the question, and random return one the question to carry on the knowledge point consolidation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110672735.9A CN113392187A (en) | 2021-06-17 | 2021-06-17 | Automatic scoring and error correction recommendation method for subjective questions |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110672735.9A CN113392187A (en) | 2021-06-17 | 2021-06-17 | Automatic scoring and error correction recommendation method for subjective questions |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113392187A true CN113392187A (en) | 2021-09-14 |
Family
ID=77621762
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110672735.9A Pending CN113392187A (en) | 2021-06-17 | 2021-06-17 | Automatic scoring and error correction recommendation method for subjective questions |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113392187A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115774996A (en) * | 2022-12-05 | 2023-03-10 | 英仕互联(北京)信息技术有限公司 | Question-following generation method and device for intelligent interview and electronic equipment |
CN116595129A (en) * | 2023-06-12 | 2023-08-15 | 广州市南方人力资源评价中心有限公司 | Subjective question scoring method and device based on knowledge point labeling |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108172050A (en) * | 2017-12-26 | 2018-06-15 | 科大讯飞股份有限公司 | Mathematics subjective item answer result corrects method and system |
CN110363194A (en) * | 2019-06-17 | 2019-10-22 | 深圳壹账通智能科技有限公司 | Intelligently reading method, apparatus, equipment and storage medium based on NLP |
CN111310458A (en) * | 2020-03-20 | 2020-06-19 | 广东工业大学 | Subjective question automatic scoring method based on multi-feature fusion |
CN111753767A (en) * | 2020-06-29 | 2020-10-09 | 广东小天才科技有限公司 | Method and device for automatically correcting operation, electronic equipment and storage medium |
CN111897982A (en) * | 2020-06-17 | 2020-11-06 | 昆明理工大学 | Medical CT image storage and retrieval method |
CN112560429A (en) * | 2020-12-23 | 2021-03-26 | 信雅达科技股份有限公司 | Intelligent training detection method and system based on deep learning |
-
2021
- 2021-06-17 CN CN202110672735.9A patent/CN113392187A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108172050A (en) * | 2017-12-26 | 2018-06-15 | 科大讯飞股份有限公司 | Mathematics subjective item answer result corrects method and system |
CN110363194A (en) * | 2019-06-17 | 2019-10-22 | 深圳壹账通智能科技有限公司 | Intelligently reading method, apparatus, equipment and storage medium based on NLP |
CN111310458A (en) * | 2020-03-20 | 2020-06-19 | 广东工业大学 | Subjective question automatic scoring method based on multi-feature fusion |
CN111897982A (en) * | 2020-06-17 | 2020-11-06 | 昆明理工大学 | Medical CT image storage and retrieval method |
CN111753767A (en) * | 2020-06-29 | 2020-10-09 | 广东小天才科技有限公司 | Method and device for automatically correcting operation, electronic equipment and storage medium |
CN112560429A (en) * | 2020-12-23 | 2021-03-26 | 信雅达科技股份有限公司 | Intelligent training detection method and system based on deep learning |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115774996A (en) * | 2022-12-05 | 2023-03-10 | 英仕互联(北京)信息技术有限公司 | Question-following generation method and device for intelligent interview and electronic equipment |
CN116595129A (en) * | 2023-06-12 | 2023-08-15 | 广州市南方人力资源评价中心有限公司 | Subjective question scoring method and device based on knowledge point labeling |
CN116595129B (en) * | 2023-06-12 | 2023-10-27 | 广州市南方人力资源评价中心有限公司 | Subjective question scoring method and device based on knowledge point labeling |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11508251B2 (en) | Method and system for intelligent identification and correction of questions | |
CN111753767B (en) | Method and device for automatically correcting operation, electronic equipment and storage medium | |
CN107169485B (en) | Mathematical formula identification method and device | |
Yahya et al. | Automatic classification of questions into Bloom's cognitive levels using support vector machines | |
CN113392187A (en) | Automatic scoring and error correction recommendation method for subjective questions | |
CN112559781B (en) | Image retrieval system and method | |
Rasyidi et al. | Classification of handwritten Javanese script using random forest algorithm | |
CN110968708A (en) | Method and system for labeling education information resource attributes | |
CN111914550A (en) | Knowledge graph updating method and system for limited field | |
Agarwal et al. | Autoeval: A nlp approach for automatic test evaluation system | |
CN112966518B (en) | High-quality answer identification method for large-scale online learning platform | |
Belaid et al. | Administrative document analysis and structure | |
CN111783697A (en) | Wrong question detection and target recommendation system and method based on convolutional neural network | |
JP7293658B2 (en) | Information processing device, information processing method and program | |
Lu et al. | Automatic scoring system for handwritten examination papers based on YOLO algorithm | |
CN113792574B (en) | Cross-dataset expression recognition method based on metric learning and teacher student model | |
Wu et al. | A self-relevant cnn-svm model for problem classification in k-12 question-driven learning | |
Saha et al. | Adopting computer-assisted assessment in evaluation of handwritten answer books: An experimental study | |
Brummerloh et al. | Boromir at Touché 2022: Combining Natural Language Processing and Machine Learning Techniques for Image Retrieval for Arguments. | |
Maniar et al. | Generation and grading of arduous MCQs using NLP and OMR detection using OpenCV | |
CN110825930A (en) | Method for automatically identifying correct answers in community question-answering forum based on artificial intelligence | |
Hu et al. | A new intelligent learning diagnosis method constructed based on concept map | |
Srihari et al. | Automated scoring of handwritten essays based on latent semantic analysis | |
Negi et al. | An artificially intelligent machine for answer scripts evaluation during pandemic to support the online methodology of teaching and evaluation | |
Krisnadi et al. | A multiple-choice test recognition system based on android and RBFNN |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210914 |