CN109255028B

CN109255028B - Teaching quality comprehensive evaluation method based on teaching evaluation data credibility

Info

Publication number: CN109255028B
Application number: CN201810990031.4A
Authority: CN
Inventors: 田锋; 王媛媛; 吴凡; 陈妍; 杨子晨; 籍伟华; 郑庆华
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2018-08-28
Filing date: 2018-08-28
Publication date: 2021-08-13
Anticipated expiration: 2038-08-28
Also published as: CN109255028A

Abstract

The invention discloses a teaching quality comprehensive evaluation method based on teaching evaluation data credibility, which extracts characteristics from multi-source evaluation data, historical evaluation data and evaluation subject behavior data, adopts a machine learning method to carry out credibility analysis on single evaluation data, simultaneously carries out teaching quality evaluation index weight self-adaptive adjustment based on course types and evaluation subject attention, and then carries out multi-index fusion course teaching quality comprehensive evaluation based on the two results. By the method, the authenticity and the effectiveness of evaluation and teaching are improved, and support is provided for improving the teaching quality.

Description

Teaching quality comprehensive evaluation method based on teaching evaluation data credibility

Technical Field

The invention belongs to the field of comprehensive evaluation of teaching quality, and particularly relates to a comprehensive evaluation method of teaching quality based on the reliability of teaching evaluation data.

Background

The teaching evaluation plays an important role in improving the teaching quality of universities and colleges, building teachers and teams, culturing talents and the like, and is an indispensable component of a teaching quality assurance system for teachers in colleges and universities. Course quality evaluation subjects of colleges and universities are generally divided into students, supervisors and colleagues. At present, colleges and universities generally adopt a method of combining a multi-level (secondary) index scoring part and a text comment part to evaluate a course. However, the scores of the courses are calculated mainly by means of index score weighted combination, text comment contents are only subjected to simple statistical display, and the text comment contents are not fully mined.

The evaluation of the current teaching quality is based on the assumption that the evaluation content is true and reliable, but the reality evaluation teaching main body has various behavior deviations in the evaluation teaching process, so that the evaluation teaching information is distorted, and the problem that the evaluation teaching information cannot truly reflect the teaching level and effect of a teacher is generated. The students are mostly passively and dynamically laid during the evaluation and education, negative random evaluation, self-profit evaluation, high-score evaluation of compromise and other poor evaluation and education behaviors exist, and the students do not really actively participate in the evaluation and education activities, so that the evaluation and education are gradually formalized. The evaluation reliability caused by interpersonal factors is also reduced in the evaluation and education behaviors of the same lines and supervision. Therefore, the evaluation and teaching mode based on the premise that all evaluations are true and reliable is difficult to adapt to the current complex evaluation and teaching environment.

Meanwhile, the traditional assessment and education think that the weights of different subjects on all the assessment index dimensions are the same, but in practice, due to the fact that the functions and professional levels of the assessment subjects are different, the attention points of different subjects during course assessment are different. For example: the attention of the supervising or the peer to the head-up rate is higher than that of the students, and the attention of the students to the work volume is higher than that of the supervising. The influence of different attention degrees of different evaluation subjects on the evaluation and education results is not considered in the existing method.

Disclosure of Invention

The invention aims to overcome the defects and provides a teaching quality comprehensive evaluation method based on the reliability of teaching evaluation data. By the method, the authenticity and the effectiveness of evaluation and teaching are improved, and support is provided for improving the teaching quality.

In order to achieve the above object, the present invention comprises the steps of:

step one, carrying out credibility grade classification on single teaching evaluation data, and firstly extracting newly acquired dataObtaining corresponding characteristics of teaching evaluation data, and taking the characteristics as input; then, calling the established credibility grade classification model according to the evaluation subject to obtain the credibility grade C of the evaluation data_instances；

Step two, based on a multi-level teaching evaluation index system structure, aiming at each type of course, according to respective attention degrees of different evaluation main bodies, carrying out self-adaptive adjustment on the evaluation index weights of the original different main bodies;

and step three, calculating the final classroom teaching quality score of each course according to the evaluation credibility score and the adaptively adjusted index weight based on the course type and the attention of the evaluation subject, arranging all the course scores in a descending order, and selecting the top L% as excellent courses.

In the first step, a specific method for constructing a single teaching evaluation credibility grade classification model is as follows:

the method comprises the steps that firstly, single teaching evaluation comprises index scores and word content, the evaluation records aim at a lesson or a course, evaluation sources comprise student evaluation, supervision evaluation and peer evaluation, characteristics are extracted for comment texts and index scores in the single teaching evaluation, and characteristics are extracted for comment behaviors of reviewers;

secondly, firstly, constructing a training data set and marking the credibility grades of the training data, dividing the credibility into credibility, unreliability and unreliability four grades, and marking the credibility grades of the training data by adopting a method based on rules and manual combination; then, according to the features extracted in the first step, training an evaluation credibility classification model on the multi-source evaluation data with the credibility grades marked by a random forest method respectively to obtain a student evaluation credibility classification model and a supervision peer evaluation credibility classification model.

In the first step, a specific method for extracting features of the comment behavior of the reviewer is as follows:

1) extracting comment text features F_textLen, num _ sub, same _ text, correlation, repetition _ rate, emo _ intensity, emo _ polarity): extracting features from each comment textThe feature is used for describing statistical rules and main contents in the comment text, and the specifically extracted features comprise:

comment text length feature len: calculating the total number of characters appearing in each comment text;

number num _ sub of topics in which comment text appears: extracting comment subject words from the evaluation text, and counting the number of different subject words;

the same comment text number same _ text: the number of comments which are identical to the comment content in the comment text is equal to the number of comments in the comment text;

comment text self repetition rate _ rate: the ratio of the repeated number of characters of the comment text to the number of all the characters of the comment text;

correlation between comment text and evaluation index: the consistency degree of the words appearing in the comment text and the evaluation index words is the correlation between the comment text and the evaluation indexes;

comment emotion intensity emo _ intensity: calculating the emotional intensity degree of the positive emotional polarity, the negative emotional polarity or the neutral emotional polarity of the comment text;

comment sentiment polarity emo _ polarity: emotional tendency of comment text: including positive emotional polarity, negative emotional polarity, or neutral emotional polarity;

2) extraction evaluation scoring feature F_index(atteude, content, standard, method, effect): scoring each evaluation in the multi-source evaluation data set to extract features, extracting all primary indexes as feature items according to a course evaluation index scale, and converting the value of the feature items according to the evaluation level of each index;

3) and (3) extracting behavior features of the reviewers: extracting characteristics of the behavior of an evaluator corresponding to a single teaching evaluation, wherein the characteristics comprise personal state description, behavior pattern description and all history evaluation behavior descriptions of the evaluator before the scholarly, and the evaluation behavior characteristics comprise F_action＝(hist_word，hist_polarity，hist_intensity，hist_score，hist_variance，hist_consistency，duplicate_rate，relationship)：

Average word count of history comments hist _ word: the ratio of the total word number of the historical course comment texts of a single evaluator to the number of the comment pieces of the single evaluator is the historical average comment word number;

historical review emotion polarity mean hist _ polarity: the tendency of emotional polarity of the historical comments is reflected;

historical review emotion intensity mean hist _ intensity: the tendency for reflecting the emotional intensity of the historical comments;

historical score mean hist _ score: used for reflecting historical scoring tendencies;

historical score variance hist _ variance: the fluctuation condition is used for reflecting the historical scoring;

history evaluation consistency hist _ consistency: the evaluation system is used for reflecting the difference degree of the evaluation person score and the average value of all person scores;

all course review content repetition rate — rate: dividing the evaluation with the same content by the total evaluation times in all course evaluations within the study period by a single person;

historical familiarity relationship between raters and raters: obtaining the familiarity of the relationship between two persons from the historical contact records of the evaluators and the evaluators, and reflecting the objectivity of evaluation;

extracting additional evaluation behavior characteristics for students, and assisting in judging evaluation credibility of students, wherein the extracted characteristics comprise F_student＝(max_num，all_num，avg_num，attendance_rate，grade，score，major，span)：

Maximum evaluation times max _ num for single lesson: the evaluator evaluates the maximum value of the times of a course in the course starting time period of the school;

total number of comments all _ num: the total number of times of comments on all selected courses by the appraisers in the course starting time period of the school period;

average curriculum evaluation times avg _ num: dividing the total number of the comments of the single person in the period by the total number of the selected courses;

student to class rate attribute _ rate: the rate of arrival of the evaluators in the course starting time period of the evaluated course;

grade of student: the grade of the rater;

student score segment score: describing the academic ability of the evaluator on the evaluated course;

student specialty and course information consistency major: whether the college of the specialty where the evaluator is located is consistent with the school of the evaluation course;

span of student course selection: the span of the courses selected by the appraiser in the school is represented by dividing the number of courses selected by the whole school, wherein the course units in all the courses selected by the appraiser are not the colleges of the school.

In the second step, the specific method for adaptively adjusting the evaluation index weights of different original subjects is as follows: firstly, based on semantic similarity, mapping evaluation subject words to secondary index core words in a many-to-one manner, carrying out word frequency statistics on the secondary index core words, then mapping the secondary index core words to primary evaluation indexes in a one-to-one manner, selecting the first k secondary index core words corresponding to each primary index to calculate the attention of different main bodies on each primary index, and finally adaptively adjusting the weight of different evaluation main bodies to each evaluation index of the class through the attention.

The specific method for adaptively adjusting the weights of different evaluation subjects to the evaluation indexes of the class course through the attention degree is as follows:

firstly, calling a short text theme extraction model, extracting the theme words contained in each comment text, and then mapping all the theme words to a constructed dictionary D in a many-to-one manner according to the semantic similarity of the words_indexFinally, a data set W which takes the second-level index core word as content and takes the label as the evaluation subject category is formed_stu，W_sup，W_exp}; wherein, W_stuIs a core word of a secondary index in student evaluation, W_supIs a core word of a secondary index in supervision and evaluation, W_expFor a secondary index core word in peer evaluation, stu represents a student, sup represents a governor, and exp represents a peer; the constructed dictionary is one-to-one mapping from the core word of the secondary index to the primary index, and the core word of the secondary index is the multiple of the index extracted from each secondary evaluation indexA core word;

secondly, on the data set constructed in the first step, by taking the evaluation subject category as a label, calculating the TFIDF value of the secondary index core word in each subject by adopting a TFIDF method, and recording the TFIDF value as the label

Thirdly, mapping the secondary index core words contained in each main body to the primary indexes, then arranging the secondary index core words contained in each primary index in a TFIDF value descending order, taking the first k corresponding secondary index core words as the evaluation concern hotspot words of the main body on the corresponding primary indexes, and adding the evaluation concern hotspot words into the set

Fourthly, calculating the TFIDF value of each evaluation main body on each primary index, wherein the TFIDF value is the sum of the TFIDF values of the comment attention hot words contained in the corresponding primary index; the TFIDF value of each evaluation subject class at the respective primary index is calculated as follows,

index＝{attitude,content,ability,method,effect}；

wherein index is a first-level evaluation index, attribute refers to teaching attitude, content refers to teaching content, ability refers to teaching ability, method refers to teaching method, effect refers to teaching effect, class refers to evaluation subject, stu refers to student, sup refers to supervising, exp refers to peer, topk _ k refers to the first k comment attention hotspot words, word (i) refers to the ith comment attention hotspot word;

normalizing the primary index TFIDF of each main body;

fifthly, summing the TFIDF values of all the evaluation subjects on a single primary index, and dividing the TFIDF value corresponding to each evaluation subject by the sum to be used as the attention of the subject on the primary index:

obtaining the attention degrees of different evaluation subjects on a single primary index;

sixthly, adjusting the subject index weight according to the initial subject evaluation weight CW and the calculated subject attention A, and calculating as follows:

and finishing the self-adaptive adjustment of the weight of the main index according to the attention degree.

In the third step, the concrete steps of calculating the final classroom teaching quality score of the course are as follows:

the first step is that all the evaluations of the curriculum are carried out on a certain evaluation index according to the credibility grade C of each piece of comment data_instancesAnd (3) as a weight, carrying out weighted average on index scores S of evaluators belonging to the same evaluation subject, and calculating a weighted credible score of the evaluation subject on a certain evaluation index:

wherein instances refers to single teaching evaluation, index refers to a first-level evaluation index, attribute refers to teaching attitude, content refers to teaching content, ability refers to teaching ability, method refers to teaching method, effect refers to teaching effect, class refers to evaluation subject, stu refers to student, sup-finger supervision, and exp refers to the same line;

and secondly, according to the scores G of different evaluation subjects on each index calculated in the first step, taking the single index weight R of the evaluation subject as a weight, and calculating the final score IS of a single index in a weighting manner:

thirdly, the weight of each index follows the weight W specified in the original educational administration system_indexIf the weight is not set, directly averaging the index scores into a comprehensive teaching score:

and obtaining the final classroom teaching quality score of the course.

Compared with the prior art, the method extracts features from multi-source evaluation data, historical evaluation data and evaluation subject behavior data, adopts a machine learning method to carry out reliability analysis on single evaluation data, simultaneously carries out self-adaptive adjustment on the teaching quality evaluation index weight based on the course type and the attention of the evaluation subject, and then carries out comprehensive evaluation on the multi-index fusion course teaching quality based on the two results. By the method, the authenticity and the effectiveness of evaluation and teaching are improved, and support is provided for improving the teaching quality.

Drawings

FIG. 1 is a block diagram of a method for classifying the reliability levels of a single piece of teaching evaluation data according to the present invention;

FIG. 2 is a flow chart of the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

Referring to fig. 2, the present invention comprises the steps of:

the first step is as follows: classification stage of credibility grade of single teaching evaluation data:

(1) and multi-source evaluation data feature extraction, namely extracting comment text features, evaluation grading features and comment behavior features from student evaluation data, supervision evaluation data and peer evaluation data.

(2) And respectively establishing teaching evaluation reliability grade classification models, namely a student evaluation reliability grade classification model and a supervision peer evaluation reliability grade classification model, on different evaluation subjects by using the labeled data.

(3) And classifying the credibility grade of the new single teaching evaluation data by using the constructed model.

The second step is that: a teaching quality evaluation index weight self-adaptive adjusting stage based on course types and evaluation subject attention degree:

(1) and extracting the subject term in the comment text, and mapping the subject term to an evaluation secondary index core term according to semantic similarity to form a new data set with the evaluation subject category as a label and the secondary index core term as content.

(2) The TFIDF values of all secondary index core words are first calculated and mapped onto 5 primary indexes. And then taking the sum of TFIDF values of the first k secondary index core words of each primary index as the TFIDF value of the primary index, and calculating the attention degrees of different subjects on each primary index according to the TFIDF value. And finally, adaptively adjusting the weight of different evaluation subjects to each evaluation index of the class through the attention degree.

The third step: a multi-index fusion teaching quality comprehensive evaluation stage:

for a course, firstly calculating credibility scores of different evaluation subjects on indexes, then calculating final index scores according to the credibility scores and index weights of the adaptively adjusted evaluation subjects on the course, and finally calculating the teaching quality comprehensive scores of the single course in a multi-index weighting manner.

In the first step, feature extraction is carried out on the evaluation and teaching data, and a random forest method is adopted to train a credibility grade classification model on the multi-source evaluation and teaching data with the marked credibility grade. Feature extraction is performed on the new data, confidence level classification is performed by using the trained model, and the confidence level of the evaluation is given, as shown in fig. 1.

In the first step, a single teaching evaluation credibility grade classification model is constructed, and the method comprises two key steps of feature extraction and evaluation credibility grade classification model training.

The calculation process of feature extraction is as follows:

firstly, evaluation data comprises an index scoring evaluation part and a text evaluation part, after a piece of teaching evaluation data of an evaluation subject is obtained, comment text feature extraction is carried out on an evaluation text in the evaluation data, and index scoring feature extraction is carried out on the index scoring data. And then, extracting the characteristics of the evaluation and teaching behaviors of the appraisers, including respectively extracting the characteristics of the current academic evaluation and teaching data sum, the historical evaluation and teaching data sum and the personal data of the evaluation subject. The above three types of features are extracted. By integrating the three types of characteristics, one appraisal data generated by one appraiser contains three types of characteristics.

(1) Comment text feature extraction

The comment text is a text description of the evaluated course by the evaluation subject in the evaluation and teaching process, and can comprise various contents such as overall evaluation or opinion feedback. The traditional evaluation does not fully utilize the comment text, mostly, the comment text is simply sorted, is collected and directly fed back to a teacher, and information such as statistics, semantics and the like of the comment text is ignored. Meanwhile, the problem of low credibility existing in the comment text is not considered. According to the invention, statistics and semantic features of the comment text are extracted as a part of comment text credibility calculation.

Comment text length feature Len: the phenomenon of few words and low description information amount exists in more derivative comments and false comments. The length of the comment text is utilized to reflect the information content of the comment content of the commentator, the total number of characters appearing in the comment text is calculated for each comment text, and the number of characters is standardized and converted into a value in the [0,1] interval. The normalization method is to divide the number of characters of the comment by the maximum number of characters in all comments.

Number num _ sub of topics in which comment text appears: the more the subject of the evaluation appears in the comment text, the more detailed the description will be, and the more credibility may be deepened. And extracting comment subject words from the evaluation text, and counting the number of different subject words. The number of subjects is normalized and converted to a value within the interval of [0,1 ]. The normalization method is to divide the number of topics in the review by the maximum number of topics in all reviews.

Same comment text ratio same _ rate: the number of comments appearing in all comment texts and identical to the comment content of the comment is divided by the total number of all the texts. The comment ratio is used for calculating copy and paste and taking note of other texts, and the ratio is between 0 and 1.

Comment text self repetition rate _ rate: the ratio of the number of the same words in the comment text to the length len of the comment text is between 0 and 1.

Correlation between comment text and evaluation index: the comment text should be in a self-organizing language, and the condition that the comment text is completely consistent with the word in the scoring evaluation index does not appear like the established standard, and if the comment text appears, the comment text is mostly copied and pasted, and the comment can be over-graded. And comparing words appearing in the comment text with evaluation index words, finding out the number of continuous same characters of the comment text and the evaluation index, dividing the number by the length len of the comment text, and recording as the correlation between the comment text and the evaluation index, wherein the correlation is between 0 and 1.

Comment emotion intensity emo _ intensity: poor or good evaluation generally adulterates more personal likes and dislikes, discounts objectivity, possibly reduces the comment credibility to a certain extent, and adds the comment emotion intensity into the characteristics. And (3) searching the emotional intensity in the corresponding dictionary according to the degree adverbs such as 'extraordinary', 'erect', 'dotted', and the like appearing in the comment text by combining the Chinese degree adverb dictionary on the Chinese knowledge network, and finally weighting and averaging the intensities of the adverbs of all degrees to determine the comment emotional intensity. The intensity levels include 4, which are smooth, generally strong, strong and extreme, and are uniformly distributed according to the highest level of 1, the lowest level of 0 and the rest of the levels between 0 and 1.

Comment sentiment polarity emo _ polarity: and (3) commenting the emotional tendency of the text expression, wherein the emotional tendency comprises positive emotional polarity which is positively satisfied, negative emotional polarity which is not fully satisfied or neutral emotional polarity which is not emotional. And calculating the types of the emotional words appearing in the comment text by combining 7 types of emotional types in the Chinese emotional word topic library of the university of the great-succession studios, and taking out the most current emotional polarity as the emotional polarity of the comment. Note that the positive polarity is 1, the neutral emotional polarity is 0.5, and the negative polarity is 0.

Total 7-dimensional F of text features of extracted comments_text＝(len，num_sub，correlation，same_text，repetition_rate，emo_intensity，emo_polarity)。

(2) Index scoring feature extraction

The index scoring feature is generated simultaneously with each comment text record and is used for evaluating the level of a course on a plurality of primary indexes. Generally, the index evaluation is weighted by different weights to calculate a grade score, and a final evaluation score of the lesson is calculated. In the invention, the index weighted score is not taken as a feature, but the grade score under each primary index is directly used, and all the primary indexes are taken as features.

And for each evaluation record, extracting all primary indexes as characteristic items according to the course evaluation index scale, and converting the value of the characteristic items according to the evaluation grade under each index. The evaluation indexes processed by the method are 5 first-level indexes of teaching attitude, teaching content, teaching level, teaching method and teaching effect respectively, the index evaluation grades are divided into five grades and the like, the best grade is 1, the lowest grade is 0, and the rest grades are uniformly distributed between 0 and 1.

Total extraction evaluation index scoring feature total 5-dimensional F_index＝(attitude，content，standard，method，effect)。

(3) Evaluation and education behavior feature extraction

When the evaluation and education activities are carried out, the personal state, the historical evaluation and education state and the behavior mode of the evaluation and education subject all have an effect on the evaluation behaviors, the evaluation and education credibility is influenced, and the extraction of the characteristics of the evaluation behaviors also comprises the three aspects. The personal state mainly can describe personal information, the academic state of students and the like, the historical evaluation and education state can describe the evaluation and education level of an evaluator in the past course, and the behavior mode can describe the behavior characteristics displayed in the evaluation and education. The history evaluation data in the invention refers to course evaluation data generated before the current evaluation teaching period. As students can evaluate each class of a course, the evaluation numbers of different students for each course selected in the period are different. Meanwhile, personal information of students, including grades, class arrival rate and the like, is also included in the description of the assessment and teaching state of the academic period, so that the assessment behavior characteristics of the students are more than those of the supervising and supervising peer, and the common assessment behavior characteristics comprise:

history average word count per comment hist _ word: the ratio of the total word number of the historical course comment text of a single rater to the number of the comment text of the single rater is the historical average comment word number, and the historical average comment word number is normalized to a value between [0 and 1 ]. The standardization method is that the average word number value of all the appraisers is the largest, and the historical average comment word number of a single appraiser is divided by the maximum word number to obtain the value of the appraiser after standardization;

historical review emotion polarity mean hist _ polarity: and dividing the sum of the emotional polarities of the historical comment texts of the single rater by the total number of the comments to obtain a single historical comment emotional polarity mean value which is a value between [0 and 1 ].

Historical review emotion intensity mean hist _ intensity: and dividing the sum of the emotional intensity of the single rater historical comment text by the total number of the comments to obtain a single historical comment emotional intensity mean value which is a value between [0 and 1 ].

Historical score mean hist _ score: firstly, standardizing the score value of each course in the historical evaluation to a value between 0 and 1, and then summing all the score values of the courses and dividing the sum by the total number of the courses to obtain the average value of the historical evaluation.

Historical score variance hist _ variance: firstly, standardizing the grade value of each course in the historical evaluation to be 0,1]Value in between, then score the value S for all courses_iSubtract the mean of single historical scores

The variance of the historical scores is obtained by squaring and dividing the sum by the total number n of courses.

History evaluation consistency hist _ consistency: and the method is used for reflecting the difference degree between the appraisers and all the appraisals. Firstly, each course in the historical evaluation

Program, standardize the curriculum credit value to [0, 1%]Calculating the appraiser course score value S_iAnd the difference value of the average grading value of the courses is added, and the sum of the grading difference values of all the courses is divided by the total number n of the courses, so that the single-person historical evaluation consistency is obtained. The value is [0,1]]A value in between;

all course review content repetition rate — rate: in all course evaluations in the course starting time period of the school period, the evaluation with the identical content is divided by the total evaluation times.

Historical familiarity relationship between raters and raters: the familiarity of the relationship between two persons is obtained from the historical contact records of the evaluators and the evaluators, and is classified as unfamiliar, relatively familiar and quite familiar. Specifically, the student is unfamiliar with the course that the teacher to be evaluated has not been selected, is familiar with the course once, and is familiar with the course more than once. The teacher is unfamiliar, is familiar with the teacher who listened to the teacher, and is familiar with the teacher after listening once or more than once. The notation is 1 in full familiarity, 0.5 in full familiarity, and 0 in no familiarity.

8-dimensional F for extracting training evaluation behavior characteristics of governor and counseling_action＝(his_word，hist_polirity，hist_intensity，hist_score，hist_variance，hist_consistency，duplicate_rate，relationship)。

In reality, a student can evaluate a course for multiple times in a course starting time period, and the multiple evaluation mode makes up the one-sidedness of one-time evaluation at the end of the period. In addition to the above common features, the following features were additionally extracted from student evaluation subjects:

maximum number of curriculum evaluations max _ num: and evaluating the maximum value of the times of one course by a single evaluator in the course starting time period of the study period, and standardizing. The standardization method is to divide the maximum evaluation times of the single curriculum by the total allowable evaluation times of the single curriculum to obtain a value between 0 and 1.

Total number of comments all _ num: the total number of times of commenting on all the selected courses by a single evaluator in the course starting time period of the study period is standardized. The standardization method is to divide the total review times of the single course by the maximum total review times of all courses to obtain a value between [0,1 ].

Average curriculum evaluation times avg _ num: the number of single total reviews is divided by the total number of courses selected and normalized. The standardization method is to divide the average evaluation times of the courses by the maximum allowable number of times of the comments of the courses to obtain a value between [0 and 1 ].

Student to class rate attribute _ rate: and evaluating the arrival rate of the students in the class starting time period of the evaluated course.

Grade of student: the grade of the student was evaluated and standardized. The normalization method is divided by the top grade 4.

Student score segment score: and describing the academic ability of the students on the evaluated courses. The performance on the evaluation course was divided into 5 grades including excellent, good, medium, poor and failing, and normalized.

Whether the student major is consistent with the course information: whether the college of the profession where the student is located is consistent with the college of the opening class of the evaluation course is 1, but not 0. The relevance of student evaluation and student majors is reflected, such as: the students can more thoroughly evaluate the courses opened by the college than the courses opened by other colleges.

Span of student course selection: the number of course units in all courses selected by the student in the period is not the number of courses of the college where the student is located divided by the total number of the course selections, the span of course selections of the student in the whole school is reflected, the closer the value is to 1, the more the course selection of the student is indicated to be almost not the course set by the college, and the closer the value is to 0, the more the course selection of the student is indicated to be almost the course set by the college.

The evaluation and education behavior characteristics additionally extracted for students are 8-dimensional: f_student＝(max_num，all_num，avg_num，attendance_rate，grade，score，major，span)。

In conclusion, when the characteristics of the evaluation and teaching behaviors are extracted, 16-dimensional characteristics are extracted from student evaluation subjects, and 8-dimensional characteristics are extracted from supervising and peer evaluation subjects.

In the first step, the credibility grade classification model training process is as follows:

due to the fact that the phenomena of large comment behavior difference and large comment quantity difference exist among different evaluation subjects, the number of student evaluation times is large, the number of supervision and peer evaluation times is small, evaluation preferences are obviously different, and the average length of supervision and peer evaluation is obviously longer than that of student evaluation. And establishing two evaluation reliability models according to different subjects. The classification model is a student evaluation credibility grade classification model and a supervision peer evaluation credibility grade classification model.

In order to train the classifier, a training data set needs to be constructed. After the teaching evaluation data is obtained, judging the credibility of the single teaching evaluation by using a rule and manual labeling-based method, and endowing corresponding credibility class labels to the characteristic information of the evaluation.

Specifically, a rule-based method of determining whether a tutorial review is not authentic includes: the evaluation text contains a large number of identical evaluation texts; the number of words of the comment is within 3; whether the comment is not different from the existing evaluation index text in a word or not is judged; whether the comment has disorder word order or not and is unclear in expression; whether the comment exists is irrelevant to the course evaluation, and the like. The manual marking method is to perform secondary judgment on the identified credible evaluation on the basis of judging the credibility of the single teaching evaluation based on the rule, and identify the credible evaluation which is missed by the rule-based method. And adopting a plurality of people for calibration and taking the credibility grade determined by the plurality of people as the credibility grade of the comment. The error rate caused by human error is reduced.

(1) A student evaluates a credibility grade classification model;

the student evaluation credibility classification model takes the three features extracted in Step1 and the feature extracted facing a student body together with 28 dimensions as input, takes the marked comment data credibility as output, and constructs a student evaluation credibility grade classification model based on a random forest on the data set.

(2) Supervising a peer evaluation credibility grade classification model;

and the governor co-trip evaluation credibility classification model takes three features extracted in Step1 as input and 20 dimensions in total, takes the labeled comment data credibility as output, and constructs a governor co-trip evaluation credibility classification model based on random forests on the data set.

When new comment data are obtained, corresponding features are extracted according to the feature extraction process in Step1, the features are used as input, corresponding trained reliability classification models are called according to different evaluation subjects, and the output is the reliability grade C of the comment data_instances。

And in the second step, teaching quality evaluation index weight self-adaptive adjustment based on the course type and the attention of the evaluation subject is provided, wherein in the multi-level teaching evaluation index system structure, on courses of different types, the evaluation index weights of the original different subjects are self-adaptively adjusted for each course according to the different attention of the student subject, the supervising subject and the peer subject in the evaluation and teaching.

The step of adaptively adjusting the teaching quality evaluation indexes of the same class of courses according to the attention degrees of different evaluation main bodies comprises the following steps: firstly, based on semantic similarity, mapping evaluation subject words to secondary index core words in a many-to-one manner, carrying out word frequency statistics on the secondary index core words, then mapping the secondary index core words to primary evaluation indexes in a one-to-one manner, selecting the first k secondary index core words corresponding to each primary index to calculate the attention of different main bodies on each primary index, and finally adaptively adjusting the weight of different evaluation main bodies to each evaluation index of the class through the attention.

The specific calculation process for the same class of courses is as follows:

step 1: short text topic extraction and secondary index core word mapping

Firstly, a short text topic extraction model is called to extract topic words contained in each comment text. Then mapping all subject words to the constructed dictionary D according to the semantic similarity of the words_indexThe second-level index of (2) is on the core word. And finally, forming a new data set W ═ W with a secondary index core word as content and a label as an evaluation subject category_stu，W_sup，W_exp}。

The process of establishing the evaluation index dictionary is as follows: first, extracting a first-level index from evaluation scoring index tables of different course types, and extracting a plurality of core words of the index from each second-level evaluation index to serve as second-level index core words. And then, according to the mapping relation between the original primary and secondary indexes, one-to-one mapping from the secondary index core words to the primary indexes is formed. Constructing an evaluation index dictionary D according to the existing multi-level evaluation indexes of the Xian traffic university_indexAs in table 1 below: the secondary index core words comprise 82 in total, and the secondary index core words are respectively mapped to 5 primary indexes (only part of the secondary evaluation index core words are listed). The evaluation index dictionary is constructed according to the course evaluation index system of the Xian transportation university.

TABLE 1 evaluation index dictionary

First-order evaluation index	Core word of secondary evaluation index
		Attitude of teaching	Attitude question answering operation is a tutor one-to-one.
Teaching content	The link arrangement is strict and clear.
		Teaching level	Emphasis is placed on theoretical connection reality.
Teaching means	Multimedia technology teaching resources for blackboard writing.
		Teaching effect	Inspiring thinking of classroom atmosphere.

Step 2: teaching quality evaluation index weight self-adaptive adjustment based on evaluation subject attention

(1) On the data set constructed in Step1, using the evaluation subject category as a label, calculating the TFIDF value of the secondary index core word in each subject by adopting a TFIDF method, and recording the TFIDF value as the label

The specific calculation process is as follows:

the first step is as follows: for the core word (i) of the second-level index, the word is used as W_stuNumber of occurrences in dataset divided by W_stuThe total number of core words of the secondary index appearing in the data set is marked as word (i) in W_stuWord frequency of

The second step is that: for the core word (i) of the second-level index, calculating the word W of the word_stu，W_sup、W_expAnd if the three data sets are present, recording the presence as 1, and not recording the absence as 0. Dividing the total occurrence times by 3 to obtain the inverse document frequency of the secondary index core word (i)

The third step: and (5) calculating the TFIDF value of the secondary index core word (i) for the student evaluation category.

The fourth step: according to the first step, the second step and the third step, calculating all secondary index core words W in the same calculation mode_sup，W_expTFIDF values on the data set. Is composed of

(2) Firstly, mapping secondary index core words contained in each main body to primary indexes, then arranging the secondary index core words contained in each primary index in a TFIDF value descending order, taking the secondary index core words corresponding to top _ k TFIDF values as the evaluation concern hotspot words of the main body on the corresponding primary indexes, and adding the evaluation concern hotspot words into a set:

and obtaining an attitude hot word set, a content hot word set, a horizontal hot word set, a method hot word set and an effect hot word set which are concerned by each main body.

(3) The TFIDF value of each of the evaluation subjects at each primary index was calculated. The value is the sum of TFIDF values of the hot words contained in the corresponding primary index. The TFIDF value of each evaluation subject class at each primary index is calculated as follows

index＝{attitude,content,ability,method,effect}；

and the primary index TFIDF of each main body is normalized. The normalization method comprises the following steps:

(4) and calculating the attention of different evaluation subjects on a single primary index. Summing the TFIDF values of all the evaluation subjects on a single primary index, and dividing the TFIDF value corresponding to each evaluation subject by the sum to serve as the attention of the subject on the primary index:

(5) and adaptively adjusting the weight of the main index according to the attention degree. Adjusting the subject index weight according to the initial subject evaluation weight CW and the calculated subject attention A, and calculating as follows:

according to the teaching evaluation scheme of the Sigan traffic university, the initial subject evaluation index weight CW is as follows: when the participation rating rate of students reaches 50% and the number of participants is more than 15, the evaluation weight of the students accounts for 70%, the evaluation of the same team accounts for 20%, and the supervision evaluation accounts for 10%; otherwise, the evaluation weight of students accounts for 50%, the evaluation of the same line accounts for 30%, and the supervision evaluation accounts for 20%.

And calculating the evaluation main body index weight of all similar courses according to the 5 steps.

The table for classifying types of courses of the transport university of western ann is directly adopted as shown in the following table 2, wherein the types of the courses comprise 52 types:

table 2 course type table

And in the third step, comprehensive evaluation is carried out on each course, and the comprehensive teaching quality score of the single course is calculated by utilizing the comment credibility obtained by the previous two steps and the self-adaptively adjusted single index weights of different evaluation subjects. And sorting all the lesson scores in descending order, and selecting the top L% as excellent lessons. The calculation of the composite score for each course is as follows:

step 1: and calculating the weighted credibility scores of the evaluation subjects on the indexes. All the evaluations of course court are selected. After the evaluation data credibility grade is calculated, the index scoring evaluation and the text evaluation in the evaluation are considered to have the credibility. Therefore, for a certain evaluation index, the credibility grade C of each piece of comment data_instancesAnd (3) as a weight, carrying out weighted average on index scores S of evaluators belonging to the same evaluation subject, and calculating a weighted credible score G of the evaluation subject on a certain evaluation index:

step 2: a weighted confidence score on a single index is calculated. And according to the weight of different subjects on each index and the index credibility score which are adjusted in a self-adaptive manner, calculating the final score of each index in a weighting manner.

According to the scores G of different evaluation subjects on each index calculated at Step1, taking the weight R of a single index of the evaluation subject as a weight, and calculating the final score IS of the single index in a weighting manner:

step 3: and calculating a comprehensive teaching quality score. The final score CS for the lesson is calculated by weighted averaging the individual index scores. Wherein the weight of each index follows the weight W specified in the original educational administration system_index. And if the weight is not set, directly averaging the index scores to obtain the comprehensive teaching score.

Wherein the specified weights are: the teaching attitude accounts for 20%, the teaching content accounts for 20%, the teaching level accounts for 20%, the teaching method accounts for 20%, and the teaching effect accounts for 20%.

And calculating comprehensive teaching quality scores of all types of courses according to the steps. All curriculum scores are sorted in descending order, and the top L% is selected as excellent curriculum.

Claims

1. The teaching quality comprehensive evaluation method based on the teaching evaluation data credibility is characterized by comprising the following steps of:

step one, carrying out credibility grade classification on a single piece of teaching evaluation data, firstly extracting corresponding characteristics of newly obtained teaching evaluation data, and taking the characteristics as input; then, calling the established credibility grade classification model according to the evaluation subject to obtain credibility grades C of the different evaluation subjects_instances；

Step two, based on a multi-level teaching evaluation index system structure, aiming at each type of course, according to respective attention degrees of different evaluation subjects, carrying out self-adaptive adjustment on the evaluation index weights of the original different subjects, wherein the specific method comprises the following steps:

firstly, mapping evaluation subject words to secondary index core words in a many-to-one manner based on semantic similarity, carrying out word frequency statistics on the secondary index core words, then mapping the secondary index core words to primary evaluation indexes in a one-to-one manner, selecting the first k secondary index core words corresponding to each primary index to calculate the attention of different main bodies on each primary index, and finally adaptively adjusting the weight of different evaluation main bodies to each evaluation index of the class through the attention;

firstly, calling a short text theme extraction model to extract the theme words contained in each comment text, and then mapping all the theme words in a many-to-one manner according to the semantic similarity of the wordsTo constructed dictionary D_indexFinally, a data set W which takes the second-level index core word as content and takes the label as the evaluation subject category is formed_stu，W_sup，W_exp}; wherein, W_stuIs a core word of a secondary index in student evaluation, W_supIs a core word of a secondary index in supervision and evaluation, W_expFor a secondary index core word in peer evaluation, stu represents a student, sup represents a governor, and exp represents a peer; the constructed dictionary is one-to-one mapping from a secondary index core word to a primary index, and the secondary index core word is a plurality of core words of the index extracted from each secondary evaluation index;

class＝{stu,sup,exp}，index＝{attitude,content,ability,method,effect}；

normalizing the primary index TFIDF of each main body;

completing self-adaptive adjustment of the weight of the main index according to the attention degree;

and step three, aiming at each course, calculating the final classroom teaching quality score of the course according to the evaluation reliability score and the adaptively adjusted index weight based on the course type and the attention of the evaluation subject, arranging all the course scores in a descending order, and selecting the top L% as the excellent course.

2. The comprehensive teaching quality evaluation method based on teaching evaluation data credibility of claim 1, characterized in that in the first step, a specific method for constructing a single teaching evaluation credibility grade classification model is as follows:

3. The comprehensive teaching quality evaluation method based on the teaching evaluation data credibility of claim 2 is characterized in that in the first step, the specific method for extracting the characteristics of the comment behavior of the reviewer is as follows:

1) extracting comment text features F_textLen, num _ sub, same _ text, correlation, repetition _ rate, emo _ intensity, emo _ polarity): extracting features of each comment text for describing statistical rules and main contents in the comment text, wherein the specifically extracted features comprise:

grade of student: the grade of the rater;

4. The comprehensive teaching quality evaluation method based on the teaching evaluation data credibility as claimed in claim 1, wherein in the third step, the specific step of calculating the final classroom teaching quality score of the lesson is as follows:

index＝{attitude,content,ability,method,effect}；

and obtaining the final classroom teaching quality score of the course.