CN107301165A

CN107301165A - A kind of item difficulty analysis method and system

Info

Publication number: CN107301165A
Application number: CN201610237683.1A
Authority: CN
Inventors: 张丹; 苏喻; 陈志刚; 邓晓栋; 魏思; 胡国平; 胡郁
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2016-04-14
Filing date: 2016-04-14
Publication date: 2017-10-27

Abstract

The invention discloses a kind of item difficulty analysis method and system, this method includes：Obtain examination question to be analyzed；The topic feature of the examination question to be analyzed is extracted, the topic feature includes topic region feature, parsing feature and answer feature；The difficulty score of the examination question to be analyzed is obtained according to the topic feature of the examination question to be analyzed and the item difficulty forecast model built in advance.Because the method that the present invention is provided predicts the difficulty score of examination question to be analyzed according to the topic feature of examination question to be analyzed, the difficulty score of examination question to be analyzed is predicted without historical scores distribution situation according to the examination question to be analyzed etc., therefore, it is possible to cold data situation, i.e., carry out item difficulty evaluation without the topic that student did.

Description

A kind of item difficulty analysis method and system

Technical field

The present invention relates to electronic education sector, and in particular to a kind of item difficulty analysis method and system.

Background technology

In recent years, develop with the continuous propulsion of computer technology and IT application in education sector, computer technology and people Work intellectual technology is progressively applied in the Activities of daily education and instruction.The difficulty of examination question is used as examination One of important indicator of examination question, its selected topic in exam pool building process, student ability are assessed and individual character Chemistry plays an important role in terms of practising.

Existing item difficulty analysis method, mainly has：1. the method marked based on human expert, 2. are based on The method of simple statistics, 3. evaluation methods based on education sector model.Wherein, 1. it is based on human expert mark The method of note carries out subjective assessment by the human expert in field where this examination question to this examination question, so as to obtain examination Difficulty is inscribed, its shortcoming is that mark difficulty and cost are bigger than normal, and subjective factor is larger, be easily caused different special Family's evaluation criterion is inconsistent；2. the historical record that the method based on simple statistics was done by examination question by student, The accuracy of examination question is counted, using accuracy as the assessment foundation of question difficulty coefficient, such as CTT, it lacks Point is that the distribution to student's answer sample and quantity have higher requirements, and it is different examine time between scoring with master The property seen, evaluation of the difficulty obtained with such a method in most instances with human expert can produce deviation 3. Based on the evaluation method of education sector model, the item response theory model in such as education sector, according to student Answer matrix direct estimation set a question purpose difficulty, this method carries out Holistic modeling to student's answer sample, The difficulty of acquisition improves a lot in accuracy than the method for simple statistics, but this method can not be right Cold data, i.e., carry out difficulty evaluation without the topic that student did.

The content of the invention

The embodiment of the present invention provides a kind of item difficulty analysis method and system, to solve existing item difficulty Evaluation method can not be to cold data, i.e., the problem of carrying out difficulty evaluation without the topic that student did.

Therefore, the embodiment of the present invention provides following technical scheme：

A kind of item difficulty analysis method, including：

Obtain examination question to be analyzed；

The topic feature of the examination question to be analyzed is extracted, the topic feature includes topic region feature, parsing feature With answer feature；

Institute is obtained according to the topic feature of the examination question to be analyzed and the item difficulty forecast model built in advance State the difficulty score of examination question to be analyzed.

Preferably, the topic region feature, the parsing feature and the answer feature include respectively：Formula Semantic feature and it is following any one or more：Literal feature, the semantic feature of word of word.

Preferably, obtaining the semantic feature of the formula includes step：

Carry out formulas solutions, the formula in extraction topic face, parsing and answer respectively to topic face, parsing and answer；

The probability CFG model of the formula in advance structure topic face, parsing and answer；

Formula is parsed using each probability CFG model, acquisition topic face, parsing and answer Formula character syntax tree；

Travel through the syntax tree of the character of each formula, the semantic feature of the formula in acquisition topic face, parsing and answer.

Preferably, obtaining the literal feature of the word includes step：

Carry out formulas solutions, the word in extraction topic face, parsing and answer respectively to topic face, parsing and answer；

Word segmentation processing is carried out respectively to the word of the topic face, parsing and answer；

The literal feature of the word of face, parsing and answer is inscribed according to the result acquisition of each word segmentation processing.

Preferably, methods described also includes：

After the literal feature of the word in acquisition topic face, parsing and answer, to the topic face, parsing and answer The literal feature of word carry out characteristic optimization, obtain topic face after optimization, the word of parsing and answer it is literal Feature.

Preferably, methods described also includes：

The attributive character of the examination question to be analyzed is extracted, the attributive character of the examination question to be analyzed includes following Meaning is one or more：Topic face length, parsing length, answer length, examination question type, examination question topic type, grade, Examination question source school popularity, examination question include knowledge point, examination question in paper comprising knowledge point number, examination question Position；

The topic feature according to the examination question to be analyzed and the item difficulty forecast model built in advance are obtained Taking the difficulty score of the examination question to be analyzed includes：

According to the topic feature of the examination question to be analyzed, attributive character and the item difficulty built in advance prediction mould Type obtains the difficulty score of the examination question to be analyzed.

Preferably, the item difficulty forecast model is built in advance to be included：

Collect the training corpus for building item difficulty forecast model；

Extract the topic feature of the training corpus；

Topic features training item difficulty forecast model based on the training corpus.

Collect the training corpus for building item difficulty forecast model；

Extract the topic feature and attributive character of the training corpus；

Topic feature and attributive character training item difficulty forecast model based on the training corpus.

Preferably, the item difficulty forecast model uses regression model.

Correspondingly, present invention also provides a kind of item difficulty analysis system, including：

Acquisition module, for obtaining examination question to be analyzed；

Topic characteristic extracting module, the topic feature for extracting the examination question to be analyzed, the topic feature Including topic region feature, parsing feature and answer feature；

Difficulty prediction module is difficult for the topic feature according to the examination question to be analyzed and the examination question built in advance Spend the difficulty score that forecast model obtains the examination question to be analyzed.

Preferably, the topic characteristic extracting module includes：

Formulas Extraction unit, for carrying out formulas solutions respectively to topic face, parsing and answer, extraction topic face, Parsing and the formula of answer；

Syntactic model construction unit, the probability context of the formula of face, parsing and answer is inscribed for advance structure Free grammar model；

Syntax tree acquiring unit, for being parsed using each probability CFG model to formula, The syntax tree of the character of the formula in acquisition topic face, parsing and answer；

The semantic feature acquiring unit of formula, the syntax tree of the character for traveling through each formula, acquisition topic face, The semantic feature of the formula of parsing and answer.

Preferably, the topic characteristic extracting module also includes：

Word Input unit, for carrying out formulas solutions respectively to topic face, parsing and answer, extraction topic face, Parsing and the word of answer；

Participle unit, word segmentation processing is carried out for the word to the topic face, parsing and answer respectively；

Literal feature acquiring unit, for inscribing face, parsing and answer according to the result acquisition of each word segmentation processing The literal feature of word.

Preferably, the system also includes：

Characteristic optimization module, the text for inscribing face, parsing and answer in the topic characteristic extracting module acquisition After the literal feature of word, the literal feature to the word of the topic face, parsing and answer carries out characteristic optimization, Obtain the literal feature of the word in topic face, parsing and answer after optimization.

Preferably, the system also includes：

Attributive character extraction module, the attributive character for extracting the examination question to be analyzed, the examination to be analyzed The attributive character of topic include it is following any one or more：Topic face length, parsing length, answer length, examination Topic type, examination question topic type, grade, examination question source school popularity, examination question include knowledge point number, examination question Comprising knowledge point, examination question in paper position；

The difficulty prediction module specifically for the topic feature according to the examination question to be analyzed, attributive character and The item difficulty forecast model built in advance obtains the difficulty score of the examination question to be analyzed.

Preferably, the system also includes：

First pre- modeling module, for building the item difficulty forecast model in advance, including：

Collector unit, for collecting the training corpus for being used for building item difficulty forecast model；

First extraction unit, the topic feature for extracting the training corpus；

First training unit, mould is predicted for the topic features training item difficulty based on the training corpus Type.

Preferably, the system also includes：

Second pre- modeling module, for building the item difficulty forecast model in advance, including：

Second extraction unit, topic feature and attributive character for extracting the training corpus；

Second training unit, it is difficult for topic feature and attributive character the training examination question based on the training corpus Spend forecast model.

Item difficulty analysis method and system provided in an embodiment of the present invention, this method obtain examination question to be analyzed Afterwards, the topic feature of examination question to be analyzed, including topic region feature, parsing feature and answer feature are extracted, then Obtained according to the topic feature of examination question to be analyzed with being in advance based on the item difficulty forecast model of topic feature construction Take the difficulty score of examination question to be analyzed.Due to topic feature of the method for the invention provided according to examination question to be analyzed The difficulty of examination question to be analyzed is obtained, without the prediction such as historical scores distribution situation according to the examination question to be analyzed The difficulty score of examination question to be analyzed, therefore, it is possible to cold data situation, i.e., be carried out without the topic that student did Item difficulty is evaluated.

Further, item difficulty analysis method and system provided in an embodiment of the present invention, this method are also extracted The attributive character of examination question to be analyzed, finally according to the attributive character of examination question to be analyzed, topic feature and in advance The examination question to be analyzed is obtained based on the item difficulty forecast model that attributive character and topic feature are built jointly Difficulty score, can further lift the accuracy for the difficulty score for predicting examination question to be analyzed.

Brief description of the drawings

, below will be to implementing in order to illustrate more clearly of the embodiment of the present application or technical scheme of the prior art The accompanying drawing used required in example is briefly described, it should be apparent that, drawings in the following description are only Some embodiments described in the present invention, for those of ordinary skill in the art, can also be according to these Accompanying drawing obtains other accompanying drawings.

Fig. 1 is a kind of flow chart of item difficulty analysis method provided in an embodiment of the present invention；

Fig. 2 is a kind of schematic diagram of the syntax tree of formula provided in an embodiment of the present invention；

Fig. 3 is a kind of structural representation of item difficulty analysis system provided in an embodiment of the present invention.

Embodiment

In order that those skilled in the art more fully understand the scheme of the embodiment of the present invention, first to existing topic Mesh difficulty analysis method carries out brief introduction.Method in the prior art based on education sector model than statistical method and The effect of the method for human expert mark has a distinct increment, but it is still relatively strong to data dependency, for example, not Cold data situation can be handled, i.e., can not carry out item difficulty evaluation without the topic that student did.

Item difficulty analysis method and system that the present invention is provided, this method is by extracting the topic of examination question to be analyzed Mesh feature, and obtain according to topic feature the difficulty score of the examination question to be analyzed, should during without according to Raw answer sample is modeled to obtain the difficulty score of examination question, so as to cold data, i.e., no student The topic done carries out difficulty evaluation.

The embodiment of the present invention is described in further detail with embodiment below in conjunction with the accompanying drawings.

As shown in figure 1, be a kind of flow chart of item difficulty analysis method provided in an embodiment of the present invention, bag Include following steps：

Step S01, obtains examination question to be analyzed.

In the present embodiment, the examination question to be analyzed can be to wait to take out any one topic in topic exam pool, bag The topic for being used by a person or not being used by a person is included, the difficulty for being respectively extracted topic eventually through control is obtained Point with control the difficulty total score of each topic in a set of paper for ultimately forming be in default difficulty scope it It is interior；Certainly, the examination question to be analyzed can also be any one topic in checking exam pool, the difficulty of the topic Score, it is known that by the method that provides of the present invention predict the topic difficulty score whether with known to the topic Difficulty score matches, to carry out self-inspection, further, it is also possible to regard the topic of known difficulty score as base Model parameter used in standard, the method provided the present invention etc. is adjusted, make it that the present invention provides Invent the difficulty score obtained more accurate.

If it should be noted that examination question to be analyzed is present in the form of picture, optics word can be passed through Symbol identification (Optical Character Recognition, OCR) technology, text is identified as by image data Data.

Step S02, extracts the topic feature of the examination question to be analyzed, the topic feature include topic region feature, Parse feature and answer feature.

In the present embodiment, the topic region feature, the parsing feature and the answer feature include respectively： The semantic feature of formula and it is following any one or more：Literal feature, the semantic feature of word of word, Effect is optimal when simultaneously using the semantic feature, the literal feature of word, the semantic feature of word of formula. Wherein, the semantic feature for obtaining the formula can include step：Topic face, parsing and answer are carried out respectively Formulas solutions, the formula in extraction topic face, parsing and answer；The formula in advance structure topic face, parsing and answer Probability CFG model；Formula is solved using each probability CFG model Analysis, the syntax tree of the character of the formula in acquisition topic face, parsing and answer；Travel through the grammer of the character of each formula Tree, the semantic feature of the formula in acquisition topic face, parsing and answer.The literal feature of the word is obtained, is had Body can include step：Carry out formulas solutions respectively to topic face, parsing and answer, extraction topic face, parsing and The word of answer, for example, build formulas solutions model, receive after text to be identified in advance, extracts to be identified The formulas solutions feature of each character in text, is known using the formulas solutions feature and the formula that builds in advance of extraction Other model carries out formulas solutions to text to be identified, and the formula being identified out comes out Word Input； Word segmentation processing is carried out respectively to the word of the topic face, parsing and answer；Obtained according to the result of each word segmentation processing Take the literal feature of the word of topic face, parsing and answer.Furthermore, it is possible to obtain the text using prior art The semantic feature of word.Certainly, extracting the word and formula of examination question to be analyzed can be carried out simultaneously, such as be treated point Analyse examination question and carry out formulas solutions, using non-formula part all as word segment, do not limit herein.

Further, methods described can also include：The literal spy of the word in acquisition topic face, parsing and answer After levying, the literal feature to the word of the topic face, parsing and answer carries out characteristic optimization, obtains optimization The literal feature of the word of face, parsing and answer is inscribed afterwards.Specifically, information gain can be passed through (Information Gain) method, the text feature selection method of feature based frequency, mutual information (Mutual ) etc. Information the literal feature to the word of the topic face, parsing and answer carries out characteristic optimization, such as Vector space model (VSM) feature of the topic face after optimization, parsing and answer is obtained respectively, then will The VSM features in the topic face, parsing and answer carry out dimensionality reduction respectively, for example with limited Boltzmann machine RBM dimensionality reductions, obtain the literal feature of the word in topic face, parsing and answer after optimization.

In addition, the accuracy in order to further lift the difficulty score for predicting examination question to be analyzed, the present embodiment is carried The method of confession also includes：The attributive character of the examination question to be analyzed is extracted, the attribute of the examination question to be analyzed is special Levy including it is following any one or more：Topic face length, parsing length, answer length, examination question type, examination Topic topic type, grade, examination question source school popularity, examination question comprising knowledge point number, examination question comprising knowledge point, Examination question position in paper；The topic feature according to the examination question to be analyzed and the examination question built in advance are difficult The difficulty score that degree forecast model obtains the examination question to be analyzed includes：According to the topic of the examination question to be analyzed The difficulty that feature, attributive character and the item difficulty forecast model built in advance obtain the examination question to be analyzed is obtained Point.So, when having considered the topic feature and attributive character of examination question to be analyzed, to examination question to be analyzed Analysis it is more comprehensive, contribute to lifted item difficulty anticipation the degree of accuracy.

In actual applications, the topic feature includes the topic region feature of examination question, parsing feature, answer feature, Because these three feature extracting methods are the same, to be described in detail exemplified by the topic region feature for extracting examination question：

In a specific embodiment, first, the word and formula in face of setting a question are recognized, wherein it is possible to logical The formulas solutions feature for crossing each character in text to be identified is identified, and extracts the formulas solutions feature of each character When, can use sliding window method extract current character and its front and rear multiple characters characteristic information, with And the context pattern feature of current character, the context pattern feature describes bracket before and after current character Match condition, so, the ambiguity and the unmatched situation of front and rear bracket of character can be effectively prevented from, on The timing that formulas solutions process considers intercharacter is stated, the recognition accuracy of ambiguity character can be lifted；Connect , semantic understanding carried out to the word in topic face and formula respectively, obtain the word in topic face literal feature, The semantic feature of word and the semantic feature of formula.

(1) for word：

1st, the literal feature for extracting word may comprise steps of：

A) word in topic face can be uniformly converted to TXT forms, for example, for the picture format of acquisition Examination question relevant information to be analyzed, TXT forms can be converted into by OCR technique；

B) participle is carried out to TXT contents；It can specifically include：Punctuate, participle, remove the step such as stop words Suddenly；

C) the literal feature of word is obtained；

The literal feature of the word can be the individual character (unigram) or double word (bigram) of word segmentation result Characteristic vector, can also two kinds simultaneously obtain, illustrated by taking " binary linear function " as an example：unigram Form is：Two/member/mono-/time/letter/number；Bigram forms are：Binary/member one/once/time letter/function；In addition It can also be three words.This case is taken as example so that both individual character and double word are same.Then word segmentation result is subjected to vectorization, Obtain the literal feature of word, such as vector space model (VSM) feature.

D) characteristic optimization is carried out to step c results by information gain；

Described is to realize that useful feature is selected by information gain progress characteristic optimization, and this method is in filtration problem In be used to measure a known feature and whether come across in certain theme related text have many for theme prediction Few information.By calculate information gain can obtain those frequencies of occurrences in positive example sample it is high and in counter-example sample The low feature of the frequency of occurrences in this, and those frequencies of occurrences in negative data are high and go out in positive example sample The low feature of existing frequency.Information gain is a kind of appraisal procedure based on entropy, and it is whole to be defined as certain characteristic item The information content that classification can be provided, difference of the entropy of any feature with considering the entropy after this feature is not considered.

Specifically, according to training data, the information gain of each characteristic item is calculated, information gain is deleted very Small item, remaining sorts, is specifically no longer described in detail from big to small according to information gain.For the text after optimization The unigram and/or bigram of the feature of word, i.e. word segmentation result, are represented with VSM characteristic formps respectively.

E) VSM features are subjected to limited Boltzmann machine (Restricted Boltzmann Machine, RBM) Dimensionality reduction, inscribes the literal feature of the word in face after being optimized.

Specifically, with both individual character and double word with example is taken as, according to the VSM in the topic face of all topics in exam pool Unigram and bigram RBM models are respectively trained in feature, and input is respectively to represent unigram during training With bigram VSM features, the text of the expression unigram and bigram after respectively dimensionality reduction is exported also The characteristic vector of word.Input is model and VSM features during prediction, is output as the feature of the word after dimensionality reduction Vector, that is, the literal feature of the word in topic face after optimizing.

It should be noted that：When difficulty is predicted, it is necessary to by the expression unigram and/or bigram after dimensionality reduction Character features vector merge, be used as the input of ridge regression model.

2nd, the semantic feature for extracting word may comprise steps of：

The semanteme that word can be obtained in the present embodiment using method of semantic differential, grammer networks analytic method etc. is special Levy, for example, obtaining the semantic feature of word by semantic differential scale；In addition it is also possible to be set according to user The literary syntax rule of sentence put builds oriented graph grammar network library and obtains semantic feature, does not limit herein.Need Illustrate, the literal feature of word and the semantic feature of word can be extracted simultaneously, can also step by step arithmetic, And in no particular order sequentially.

(2) for formula：

A) syntax of mathematical formulae are determined, the probability CFG model of mathematical formulae is built；The mould Type can be statistical model, and model training algorithm can be EM algorithm (Expectation Maximization Algorithm, EM) etc.；

B) formula is parsed using the probability CFG model, obtains mathematical formulae character Syntax tree；For example：Formula character " f ' (x, y)=a* sqrt { x }+frac { 1 } { 2 } * x*y " parsing obtain Syntax tree is as shown in Figure 2；

C) syntax tree is traveled through, the semantic feature of formula is obtained.

It should be noted that during due to formula character resolution, often there is a plurality of grammatical stipulations path, therefore, Many syntax trees can be obtained, when formula character resolution result there are many syntax trees, then select probability is maximum Syntax tree, be used as final analysis result.Wherein, the probability of syntax tree be in syntax tree grammatical probability it Product.In addition, the semantic feature of formula can also carry out feature extraction using prior art, such as pass through canonical table Semantic parsing etc. is carried out up to the method that formula is matched, is not limited thereto.

Then, the attributive character of the examination question to be analyzed is extracted, it is of course also possible to described to be analyzed extracting Before the topic feature of examination question, the attributive character of the examination question to be analyzed is first extracted, wherein：

Topic face length is the length that examination question inscribes face, of word and/or word in the topic face obtained after such as word segmentation processing Number, complexity of syntax tree etc.；

Parsing length is the length of examination question parsing；

Answer length is the length of examination question answer；

Examination question position in paper, i.e. examination question appear in which topic of that topic type of a set of paper, and this is tried Topic positional information be normalized to 0-1, i.e. examination question appear in it is such in which topic/paper of this type examination question in paper Total topic number of type examination question；Because the topic in a topic type in paper may incrementally be gone out according to difficulty It is existing, can to examination question, position quantifies in paper, for example：Examination question appears in same topic in paper The topic type always inscribes number in which topic/paper of type, and such as gap-filling questions one have 5 topics, and the examination question is the 2nd topic, Then it is set to 2/5 in paper middle position.

Examination question type is examination question occurs in which kind of type, such as college entrance examination simulation, college entrance examination are really inscribed, contest thematic, Exercise in pace with studying etc.；

Examination question topic type is the topic type of this topic, such as multiple-choice question, gap-filling questions, solution answer；

Examination question source school popularity is that examination question source school is national top 100, or province's emphasis, city's emphasis etc.； The acquisition methods of school popularity are as follows：

A) elite school's list is being searched on the net；

B) elite school is filtered in all entries of Baidupedia and obtains well-known school；

C) well-known school and popularity corresponding table are obtained according to elite school's list and the well-known school；

D) examination question source school is subjected to fuzzy search in the well-known school and popularity corresponding table, The popularity of examination question source school is found, if not finding, examination question source school is not well-known school；

Examination question includes the knowledge point number that knowledge point number is that this road examination question is included；

Examination question inscribes all knowledge points included for this comprising knowledge point.

In other embodiments, some analysis bars can also be preset according to the subject belonging to examination question to be analyzed Part, for example：For Mathematics Discipline, whether analysis answer process needs to be known by means of boost line, higher mathematics Know etc.；For physics subject, whether analysis answer process needs multiple reference substances, reference system etc..

It should be noted that in practical application, can also classify to examination question to be analyzed, such as according to year Level is divided：Senior middle school's difficulty, including high 1 difficulty, high 2 difficulty, college entrance examination difficulty etc.；Junior middle school is accordingly difficult Degree etc..It can also be divided according to industry proficiency：Entry level, lifting level, skilled level etc..One can so be entered The degree of accuracy of step lifting item difficulty score.

Step S03, mould is predicted according to the topic feature of the examination question to be analyzed and the item difficulty built in advance Type obtains the difficulty score of the examination question to be analyzed.

In the present embodiment, it is described to build the item difficulty forecast model in advance and include：Collect for building The training corpus of item difficulty forecast model；Extract the topic feature of the training corpus；Based on the training The topic features training item difficulty forecast model of language material.

Especially, when the method that the present invention is provided also is extracted the attributive character of the examination question to be analyzed, institute State according to being obtained the topic feature of the examination question to be analyzed and the item difficulty forecast model built in advance The difficulty score of examination question to be analyzed includes：According to the topic feature of the examination question to be analyzed, attributive character and pre- The item difficulty forecast model first built obtains the difficulty score of the examination question to be analyzed.Wherein, build in advance The item difficulty forecast model includes：Collect the training corpus for building item difficulty forecast model；Carry Take the topic feature and attributive character of the training corpus；Topic feature and attribute based on the training corpus Features training item difficulty forecast model.

In actual applications, the item difficulty forecast model uses regression model, specifically, builds in advance The item difficulty forecast model, the mode based on recurrence carries out model training, and the item difficulty predicts mould The input of type is characterized vector, and the difficulty that the item difficulty forecast model is output as examination question to be analyzed is obtained Point.

In a specific embodiment, it is described to build the item difficulty forecast model in advance and include：

1. collect the training corpus for building item difficulty forecast model

The language material for including the test question information for specifying number is collected from exam pool, such as 16W language material is used as difficulty Spend forecast model training corpus.The test question information includes examination question topic face, parsing, answer, examination question attribute. The collection of training corpus can divide a variety of, including itself is comprising difficulty information, comprising examination question comment on information and There are the wordings relevant with difficulty such as " this topic is more difficult ", " this topic difficulty is general " in comment, mapped according to wording Training corpus is used as to correspondence difficulty.

2. extract the topic feature and attributive character of the training corpus

The topic feature includes examination question topic region feature, parsing feature, answer feature, and above-mentioned three kinds of features are carried Take method the same, wherein, the topic region feature, the parsing feature and the answer feature include respectively： The semantic feature of formula, the literal feature of word, the semantic feature of word, with specific reference to step S02 features Part is extracted, be will not be described in detail herein.

3. attributive character and topic features training item difficulty forecast model based on the training corpus

Mode based on recurrence is trained, and can use linear regression model (LRM) or nonlinear regression model (NLRM).This reality Apply example and finally use ridge regression model, its input is characterized vector, including examination question topic feature and attributive character, Common 5K dimensions；It is output as difficulty value；Model training is carried out based on ridge regression algorithm.

Finally, according to the topic feature, attributive character and the item difficulty built in advance of the examination question to be analyzed Forecast model obtains the difficulty score of the examination question to be analyzed.For example, from《The close volume in Huang gang》The height of middle acquisition Simulation examination question is examined, the examination question includes topic, parsing, the part of answer three, wherein, the formula that the examination question is used Semantic feature it is more complicated, such as syntax tree is more complicated, and contains the keyword such as " this topic is more difficult " in parsing； The topic feature and attributive character of the examination question to be analyzed are extracted, is then inputted topic feature and attributive character pre- The item difficulty forecast model first trained, it is possible to obtain the difficulty score of the examination question to be analyzed.If should Examination question does not have student to do, and prior art just manually expert can only mark the difficulty of the topic, but only lean on special The subjectivity of family's mark is too strong, and the method that provides of the present invention can pass through topic feature to examination question (including public affairs The semantic feature of formula and it is following any one or more：The literal feature and the semantic feature of word of word) And whether attributive character (is the features such as elite school's examination question, parsing length, grade to judge the substantially difficulty of examination question Scope) analysis examination question to be analyzed difficulty, can effectively solve the above problems.

Item difficulty analysis method provided in an embodiment of the present invention, this method is by extracting the topic of examination question to be analyzed Mesh feature, is then obtained according to the topic feature of examination question to be analyzed and the item difficulty forecast model built in advance The difficulty score of the examination question to be analyzed.Because the method that the present invention is provided is special according to the topic of examination question to be analyzed The difficulty score for predicting examination question to be analyzed is levied, cold data situation can be entered without the topic that student did Row item difficulty is evaluated.

Correspondingly, the present invention also embodiment provides a kind of item difficulty analysis system, as shown in figure 3, being A kind of structural representation of the system.

In this embodiment, the system includes：

Acquisition module 301, for obtaining examination question to be analyzed；

Topic characteristic extracting module 302, the topic feature for extracting the examination question to be analyzed, the topic Feature includes topic region feature, parsing feature and answer feature；

Difficulty prediction module 303, for the topic feature according to the examination question to be analyzed and the examination built in advance Inscribe the difficulty score that difficulty forecast model obtains the examination question to be analyzed.

In the present embodiment, the topic region feature, the parsing feature and the answer feature include respectively： The semantic feature of formula and it is following any one or more：Literal feature, the semantic feature of word of word.

Specifically, the topic characteristic extracting module 302 can include：

It should be noted that in order to improve item difficulty prediction accuracy, in addition it is also necessary to the literal spy for extracting word Levy, correspondingly, the topic characteristic extracting module 302 can also include：

Certainly, the topic characteristic extracting module 302 can also include the semantic feature acquiring unit of word, It will not be described in detail herein.

Further, in order to the word that improves the system acquisition literal feature the degree of accuracy, and reduce follow-up The operand of characteristic processing, the system can also include：

Characteristic optimization module 404, for the acquisition of topic characteristic extracting module 302 inscribe face, parsing and After the literal feature of the word of answer, the literal feature to the word of the topic face, parsing and answer is carried out Characteristic optimization, obtains the literal feature of the word in topic face, parsing and answer after optimization.

In addition, the accuracy of the item difficulty in order to further improve the system analysis, the system also includes：

Attributive character extraction module 505, the attributive character for extracting the examination question to be analyzed is described to treat point Analyse examination question attributive character include it is following any one or more：Topic face length, parsing length, answer length, Examination question type, examination question topic type, grade, examination question source school popularity, examination question include knowledge point number, examination Topic includes knowledge point, the examination question position in paper；

The difficulty prediction module 303 is special specifically for the topic feature according to the examination question to be analyzed, attribute Levy and the item difficulty forecast model that builds in advance obtains the difficulty score of the examination question to be analyzed.

In actual applications, the system also includes being used to build the module with training pattern：

First pre- modeling module 606, for building the item difficulty forecast model in advance, including：

First extraction unit, the topic feature for extracting the training corpus；

Especially, when the system also includes attributive character extraction module 505, the system also includes：

Second pre- modeling module 607, for building the item difficulty forecast model in advance, including：

It should be noted that the system that the present invention is provided can include topic characteristic extracting module 302 simultaneously With attributive character extraction module 505, but the two modules can run simultaneously/only run one of mould Block, for example, when examination question to be analyzed does not have attributive character relevant information, can be only with topic feature and base The item difficulty forecast model built in advance in topic feature, the system that the present invention is provided can be in topic feature Automatically switched between extraction module 302 and attributive character extraction module 505, to extract the examination to be analyzed The existing feature of topic, subsequently through the grade of difficulty prediction module 303 according to the existing suitable examination question of Feature Selection Difficulty forecast model, to obtain the difficulty score of the examination question to be analyzed.

Certainly, the system can further include memory module (not shown), for preserving advance structure The relevant information such as item difficulty forecast model, model parameter, language material, topic feature, attributive character, such as Elite school's information etc..So, examination question to be analyzed progress computer is automatically processed with facilitating.

Item difficulty analysis system provided in an embodiment of the present invention, the system passes through acquisition module 301 first Examination question to be analyzed is obtained, the topic of the examination question to be analyzed is then extracted by topic characteristic extracting module 302 Feature, is then built by difficulty prediction module 303 according to the topic feature of the examination question to be analyzed and in advance Item difficulty forecast model obtain the difficulty score of the examination question to be analyzed.Due to difficulty prediction module 303 The difficulty score of the examination question to be analyzed is obtained according to the topic feature of the examination question to be analyzed so that the present invention The system of offer can carry out item difficulty evaluation to cold data situation without the topic that student did.

Each embodiment in this specification is described by the way of progressive, identical phase between each embodiment As part mutually referring to, what each embodiment was stressed be it is different from other embodiment it Place.For system embodiment, because it is substantially similar to embodiment of the method, so describing Fairly simple, the relevent part can refer to the partial explaination of embodiments of method.System described above is implemented Example is only schematical, wherein the unit illustrated as separating component can be or may not be Physically separate, the part shown as unit can be or may not be physical location, you can with Positioned at a place, or it can also be distributed on multiple NEs.It can select according to the actual needs Some or all of module therein realizes the purpose of this embodiment scheme.Those of ordinary skill in the art exist In the case of not paying creative work, you can to understand and implement.

The embodiment of the present invention is described in detail above, embodiment used herein is to this hair Bright to be set forth, the explanation of above example is only intended to method, system and the ear for helping to understand the present invention Machine；Simultaneously for those of ordinary skill in the art, according to the thought of the present invention, in embodiment And will change in application, in summary, this specification content should not be construed as to the present invention Limitation.

Claims

1. a kind of item difficulty analysis method, it is characterised in that including：

Obtain examination question to be analyzed；

2. according to the method described in claim 1, it is characterised in that the topic region feature, the parsing Feature and the answer feature include respectively：The semantic feature of formula and it is following any one or more：Text Literal feature, the semantic feature of word of word.

3. method according to claim 2, it is characterised in that obtain the semantic feature of the formula Including step：

4. method according to claim 2, it is characterised in that obtain the literal feature of the word Including step：

5. method according to claim 4, it is characterised in that methods described also includes：

6. the method according to any one of claim 1 to 5, it is characterised in that methods described is also wrapped Include：

7. the method according to any one of claim 1 to 5, it is characterised in that build in advance described Item difficulty forecast model includes：

Collect the training corpus for building item difficulty forecast model；

Extract the topic feature of the training corpus；

8. method according to claim 6, it is characterised in that build the item difficulty in advance pre- Surveying model includes：

Collect the training corpus for building item difficulty forecast model；

Extract the topic feature and attributive character of the training corpus；

9. the method according to claim 1 to 5,8 any one, it is characterised in that the examination question is difficult Degree forecast model uses regression model.

10. a kind of item difficulty analysis system, it is characterised in that including：

Acquisition module, for obtaining examination question to be analyzed；

11. system according to claim 10, it is characterised in that the topic region feature, the solution Analysis feature and the answer feature include respectively：The semantic feature of formula and it is following any one or more： Literal feature, the semantic feature of word of word.

12. system according to claim 11, it is characterised in that the topic characteristic extracting module Including：

13. system according to claim 11, it is characterised in that the topic characteristic extracting module Also include：

14. system according to claim 13, it is characterised in that the system also includes：

15. the system according to any one of claim 10 to 14, it is characterised in that the system is also Including：

16. the system according to any one of claim 10 to 14, it is characterised in that the system is also Including：

First extraction unit, the topic feature for extracting the training corpus；

17. system according to claim 15, it is characterised in that the system also includes：