CN110119770B

CN110119770B - Decision tree model construction method, device, electronic equipment and medium

Info

Publication number: CN110119770B
Application number: CN201910349851.XA
Authority: CN
Inventors: 金戈; 徐亮
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-04-28
Filing date: 2019-04-28
Publication date: 2024-05-14
Anticipated expiration: 2039-04-28
Also published as: CN110119770A

Abstract

The embodiment of the application provides a method, a device, electronic equipment and a medium for constructing a decision tree model, wherein the method comprises the following steps: constructing a bag-of-words model by using the training text; establishing a first decision tree model according to a first characteristic value of each answer text included in the word bag model and an answer scoring label set for each answer text, and obtaining an importance value of word characteristics of each answer text output by the first decision tree model; and screening out keyword features meeting preset conditions from the word features of each answer text according to the importance values of the word features of each answer text, and establishing a second decision tree model for answer score prediction according to the second feature values of each answer text obtained by the keyword features and the answer score labels set for each answer text. By adopting the method and the device, the accuracy of scoring prediction can be improved, and meanwhile, the interpretability of the model can be ensured.

Description

Decision tree model construction method, device, electronic equipment and medium

Technical Field

The present application relates to the field of deep learning, and in particular, to a method and apparatus for constructing a decision tree model, an electronic device, and a medium.

Background

With the development of science and technology, in order to save the trouble of manual scoring, an intelligent scoring system has been developed and is increasingly widely applied in schools, enterprises and other institutions. The relevant personnel can manually formulate corresponding rules in the intelligent scoring system, and the intelligent scoring system can score questions by adopting the manually formulated rules, however, the scoring prediction accuracy realized by adopting the mode is limited. In order to improve the scoring prediction accuracy, part of staff adopts a machine learning method of logistic regression to score questions. While machine learning methods employing logistic regression can achieve higher accuracy of scoring predictions, models obtained in this manner have lower interpretability.

Disclosure of Invention

The embodiment of the application provides a decision tree model construction method, a decision tree model construction device, electronic equipment and a decision tree medium, which can improve the scoring prediction precision and ensure the interpretation of the model.

In a first aspect, an embodiment of the present application provides a method for constructing a decision tree model, including:

Constructing a bag-of-words model by using the training text; the word bag model comprises first characteristic values of each answer text in the training text;

establishing a first decision tree model according to the first characteristic value of each answer text and an answer scoring label set for each answer text, and obtaining the importance value of the word characteristic of each answer text output by the first decision tree model;

according to the importance value of the word characteristics of each answer text, selecting keyword characteristics meeting preset conditions from the word characteristics of each answer text, and obtaining a second characteristic value of each answer text according to the keyword characteristics;

And establishing a second decision tree model according to the second characteristic value of each answer text and the answer score label set for each answer text, so as to be used for answer score prediction.

Optionally, after the establishing the second decision tree model, the method further includes:

When answer score prediction is needed to be carried out on the target answer text, the target answer text is used as input data of the second decision tree model;

And outputting scoring result information of the target answer text through the second decision tree model.

Optionally, the screening, according to the importance value of the word feature of each answer text, the keyword feature meeting the preset condition from the word features of each answer text includes:

according to the importance values of the word features of the answer texts, first word features with the importance values larger than or equal to a preset value are screened out from the word features of the answer texts;

receiving a deleting instruction, and deleting a second word feature from the first word features according to the deleting instruction;

And determining the first word characteristic with the deleting operation as the keyword characteristic meeting the preset condition.

Optionally, the establishing a first decision tree model according to the first feature value of each answer text and the answer scoring label set for each answer text includes:

Inputting a first characteristic value of each answer text and an answer score label set for each answer text into a first initial decision tree model so as to train the first initial decision tree model;

and taking the trained first initial decision tree model as a first decision tree model.

Optionally, the establishing a second decision tree model according to the second feature value of each answer text and the answer scoring label set for each answer text includes:

Inputting a second characteristic value of each answer text and an answer scoring label set for each answer text into a second initial decision tree model so as to train the second initial decision tree model;

and taking the trained second initial decision tree model as a second decision tree model.

Optionally, the establishing a second decision tree model according to the second feature value of each answer text and the answer scoring label set for each large text includes:

Determining the length of each answer text;

And establishing a second decision tree model according to the length of each answer text, the second characteristic value of each answer text and answer scoring labels set for each answer text.

Optionally, the constructing the bag-of-words model by using training text includes:

Constructing a dictionary by using the training text; the dictionary comprises word characteristics of each answer text in the training text;

Counting whether each word feature in the dictionary appears in each answer text;

And determining the first characteristic value of each answer text according to the statistical result, and generating a word bag model comprising the first characteristic value of each answer text.

In a second aspect, an embodiment of the present application provides a decision tree model building apparatus, including:

the construction unit is used for constructing a bag-of-words model by utilizing the training text; the word bag model comprises first characteristic values of each answer text in the training text;

the construction unit is further used for establishing a first decision tree model according to the first characteristic value of each answer text and the answer scoring label set for each answer text, and obtaining the importance value of the word characteristic of each answer text output by the first decision tree model;

the processing unit is used for screening out keyword features meeting preset conditions from the word features of each answer text according to the importance degree values of the word features of each answer text, and obtaining second feature values of each answer text according to the keyword features;

The construction unit is further configured to establish a second decision tree model according to the second feature values of the answer texts and the answer score labels set for each answer text, so as to be used for answer score prediction.

In a third aspect, an embodiment of the present application provides an electronic device, including a processor, an input device, an output device, and a memory, where the processor, the input device, the output device, and the memory are connected to each other, and where the memory is configured to store a computer program, where the computer program includes program instructions, and where the processor is configured to invoke the program instructions to perform a method according to the first aspect.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium storing a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the method according to the first aspect.

In summary, the electronic device may construct a word bag model by using the training text, and establish a first decision tree model according to the word bag model and answer scoring labels set for each answer text, so as to obtain importance values of word features of each answer text output by the first decision tree model, so as to be used for screening out keyword features meeting preset conditions; the electronic equipment can establish a second decision tree model according to the second characteristic value of each answer text obtained by the keyword characteristics and the answer score label set for each answer text so as to be used for answer score prediction, thereby improving the score prediction precision and guaranteeing the interpretation of the model.

Drawings

In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a decision tree model construction method provided by an embodiment of the application;

FIG. 2 is a flow chart of another method for constructing a decision tree model according to an embodiment of the present application;

FIG. 3 is a schematic structural diagram of a decision tree model building apparatus according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application.

Referring to fig. 1, a flow chart of a decision tree model construction method according to an embodiment of the present application is shown. The method can be applied to the electronic equipment, and the electronic equipment can be a terminal or a server. Specifically, the method may include:

s101, constructing a bag-of-words model by using the training text.

The word bag model comprises first characteristic values of each answer text in the training text. The first eigenvalue may be an eigenvector. The first feature value of each answer text is determined according to the numerical value of the word feature of each answer text. The numerical value is determined according to whether the word feature appears in the corresponding answer text or not, or may also be determined according to the number of times the word feature appears in the corresponding answer text, which is not limited by the embodiment of the present invention.

In one embodiment, the electronic device utilizing training text to construct a bag of words model may include: the electronic equipment builds a dictionary by using the training text; the dictionary comprises word characteristics of each answer text in the training text; the electronic equipment counts whether each word feature in the dictionary appears in each answer text; and the electronic equipment determines the first characteristic value of each answer text according to the statistical result, and generates a word bag model comprising the first characteristic value of each answer text.

For example, the training text includes answer text 1 and answer text 2, wherein answer text 1: the first place of China is Beijing, answer text 2: the first part of the united kingdom is london. The dictionary constructed using the training text includes: china, uk, capital, yes, beijing, london. And using 0 and 1 to indicate whether 7 words appear in the answer text 1 and the answer text 2 (the appearance is indicated as 1, the non-appearance is indicated as 0), determining that the first characteristic value of the answer text 1 is (1,0,1,1,1,1,0) and the first characteristic value of the answer text 2 is (0,1,1,1,1,0,1) according to the statistical result, and generating a word bag model comprising the first characteristic value of the answer text 1 and the first characteristic value of the answer text 2.

In addition to the generation of the bag-of-words model in the above manner, the bag-of-words model may also be generated by counting the number of times each word feature in the dictionary appears in each answer text.

In one embodiment, the electronic device constructs a bag of words model using training text, and may further include: the electronic equipment utilizes the training texts to construct a dictionary which comprises word characteristics (such as words 1, 2 and N) of each answer text in the training texts; counting the number of times that each word feature in the dictionary appears in each answer text, and determining a first feature value of each answer text according to the counted result aiming at the number of times, so as to generate a word bag model comprising the first feature value of each answer text.

The two statistical methods differ in that, for example, a word appears twice in the answer text 3, then the statistical result obtained for the word is 1 (indicating that the word appears in the text 3) in the first statistical method, and the statistical result for the word is 2 (indicating that the word appears twice in the text 3) in the second statistical method. Of course, besides the frequency, a frequency statistics manner may be adopted, and this scheme is not described herein.

In one embodiment, the electronic device building a dictionary using training text may include: the electronic equipment preprocesses the training text to obtain a dictionary. The preprocessing process includes, but is not limited to, word segmentation, removal of stop words, and the like, and the scheme is not described herein.

S102, establishing a first decision tree model according to the first characteristic value of each answer text and the answer scoring label set for each answer text, and obtaining the importance value of the word characteristic of each answer text output by the first decision tree model.

In one embodiment, the electronic device establishes a first decision tree model according to the first feature value of each answer text and the answer score label set for each answer text, including: the electronic equipment inputs a first characteristic value of each answer text and an answer scoring label set for each answer text into a first initial decision tree model so as to train the first initial decision tree model; the electronic equipment takes the trained first initial decision tree model as a first decision tree model. For example, the first decision tree model may be a decision tree model with a maximum depth of 10 and a minimum number of samples of leaf nodes of 100. The answer score label may be a score, such as 90 points, or may be a rating, such as good medium.

The first decision tree model can calculate and obtain the importance degree value of the word characteristics of each answer text, and output the importance degree value of the word characteristics of each answer text, wherein the higher the importance degree value is, the larger the influence on scoring is. The importance level value includes, but is not limited to, being embodied in the form of numerals, letters, etc.

The first decision tree model can also output the word characteristics of each answer text after sequencing the word characteristics of each answer text according to the sequence from front to back according to the importance level value.

Wherein the first decision tree model may also output classification results for each word feature, e.g., good and bad answers.

And S103, screening out keyword features meeting preset conditions from the word features of each answer text according to the importance degree values of the word features of each answer text, and obtaining second feature values of each answer text according to the keyword features.

In one embodiment, the electronic device screens out keyword features meeting a preset condition from the word features of each answer text according to the importance value of the word features of each answer text, including: the electronic equipment screens out first word features with importance degree values larger than or equal to a preset value from the word features of each answer text according to the importance degree values of the word features of each answer text; and the electronic equipment determines the first word characteristic as a keyword characteristic meeting a preset condition.

For example, the electronic device outputs importance values of 1000 word features, and the electronic device may screen 500 word features having importance values greater than or equal to a preset value from the 1000 word features, and determine the 500 word features as keyword features satisfying a preset condition.

In one embodiment, the electronic device screens out keyword features meeting a preset condition from the word features of each answer text according to the importance value of the word features of each answer text, including: the electronic equipment screens out first word features with importance degree values larger than or equal to a preset value from the word features of each answer text according to the importance degree values of the word features of each answer text; the electronic equipment receives a deleting instruction, and deletes the second word feature from the first word features according to the deleting instruction; the electronic device determines the first word feature with the deleting operation as the keyword feature meeting the preset condition. Wherein the second word feature may be a word feature that is less interpretable or less contributing.

For example, the electronic device outputs importance values of 1000 word features, and the electronic device may screen 500 word features whose importance values are greater than or equal to a preset value from the 1000 word features, delete 50 word features of the 500 word features after receiving a deletion instruction for 50 word features having lower interpretability, and determine the remaining 450 word features as keyword features satisfying a preset condition.

In one embodiment, the electronic device, according to the importance value of the word feature of each answer text, screens out the keyword feature meeting the preset condition from the word features of each answer text, and may include: the electronic equipment screens out word features with the sequence being in the preset number from the word features of each answer text according to the importance value of the word features of each answer text; and the electronic equipment determines the word characteristics of which the ranking is in the preset number as the keyword characteristics meeting the preset conditions.

The electronic equipment outputs importance values of 1000 word features, the 1000 word features are word features sequenced from front to back according to the importance values, the electronic equipment can screen out word features sequenced from the 1000 word features, which are positioned in the top 500 word features, and the word features sequenced in the top 500 word features are determined to be keyword features meeting preset conditions.

In one embodiment, the electronic device, according to the importance value of the word feature of each answer text, screens out the keyword feature meeting the preset condition from the word features of each answer text, and may include: the electronic equipment screens out word features with the sequence being in the preset number from the word features of each answer text according to the importance value of the word features of each answer text; the electronic equipment receives a deleting instruction, deletes the third word feature from the word features which are ranked in the preset number according to the deleting instruction, and determines the preset number of word features which are subjected to deleting operation as keyword features meeting preset conditions. The third word feature may be the same as or different from the second word feature depending on the actual situation. The third word feature is a word feature that may be interpreted to a lesser or lesser degree of contribution.

In one embodiment, the obtaining, by the electronic device, the second feature value of each answer text according to the keyword feature may include: the electronic equipment deletes the numerical value of the word characteristic except the keyword characteristic in the first characteristic value of each answer text so as to obtain the second characteristic value of each answer text. By adopting a direct deleting mode, the modeling speed is improved, and the workload of the electronic equipment is reduced.

In addition to the deletion, the electronic device may also re-perform statistics. In one embodiment, the obtaining, by the electronic device, the second feature value of each answer text according to the keyword feature may include: and determining whether the keyword features of the electronic equipment appear in each answer text or not, and determining a second feature value of each answer text according to the statistical result. Or, the times that the key word features of the electronic equipment appear in each answer text are determined, and the second feature value of each answer text is determined according to the statistics result aiming at the times.

And S104, establishing a second decision tree model according to the second characteristic value of each answer text and the answer score label set for each answer text, so as to be used for answer score prediction.

Specifically, the establishing, by the electronic device, a second decision tree model according to the second feature values of the answer texts and the answer scoring labels set for each answer text may include: the electronic equipment inputs a second characteristic value of each answer text and an answer scoring label set for each answer text into a second initial decision tree model so as to train the second initial decision tree model; the electronic equipment takes the trained second initial decision tree model as a second decision tree model. Wherein the second initial decision tree model may be different from the first initial decision tree model. For example, the second decision tree model may be a decision tree model with a maximum depth of 5 and a leaf node minimum number of samples of 100.

In one embodiment, when answer score prediction is required to be performed on the target answer text, the target answer text is used as input data of the second decision tree model; and outputting scoring result information of the target answer text through the second decision tree model. The target answer text may be an answer text to be predicted, for example, a new answer text. The scoring result information may include information such as a score.

In the embodiment shown in fig. 1, the electronic device may construct a word bag model by using training texts, and establish a first decision tree model according to the word bag model and answer scoring labels set for each answer text, so as to obtain importance values of word features of each answer text output by the first decision tree model, so as to be used for screening out keyword features meeting preset conditions; the electronic equipment can establish a second decision tree model according to the second characteristic value of each answer text obtained by the keyword characteristics and the answer score label set for each answer text so as to be used for answer score prediction, thereby improving the score prediction precision and guaranteeing the interpretation of the model.

Referring to fig. 2, a flow chart of another decision tree model construction method according to an embodiment of the application is shown. Specifically, the method may include:

s201, constructing a bag-of-words model by using training texts;

s202, establishing a first decision tree model according to the first characteristic value of each answer text and answer scoring labels set for each answer text, and obtaining importance values of word characteristics of each answer text output by the first decision tree model;

S203, according to the importance value of the word characteristics of each answer text, selecting the keyword characteristics meeting the preset conditions from the word characteristics of each answer text, and obtaining the second characteristic value of each answer text according to the keyword characteristics.

Steps S201 to S203 may refer to steps S101 to S103 in the embodiment of fig. 1, and the embodiment of the present application is not described herein.

S204, determining the length of each answer text;

S205, a second decision tree model is established according to the length of each answer text, the second characteristic value of each answer text and answer scoring labels set for each answer text, and the second decision tree model is used for answer scoring prediction.

In the embodiment of the application, the electronic equipment can establish the second decision tree model directly according to the second characteristic value of each answer text and the answer scoring label set for each answer text, and can also introduce the length of each answer text to establish the second decision tree model. According to the embodiment of the application, the length of each answer text is introduced, so that the scoring prediction accuracy can be effectively improved.

Specifically, the electronic device establishes a second decision tree model according to the length of each answer text, the second characteristic value of each answer text and the answer scoring label set for each answer text, and includes that the electronic device inputs the length of each answer text, the second characteristic value of each answer text and the answer scoring label set for each answer text into a second initial decision tree model to train the second initial decision tree model; the electronic equipment takes the trained second initial decision tree model as a second decision tree model.

In the embodiment shown in fig. 2, the electronic device may construct a word bag model by using training texts, and establish a first decision tree model according to the word bag model and answer scoring labels set for each answer text, so as to obtain importance values of word features of each answer text output by the first decision tree model, so as to be used for screening out keyword features meeting preset conditions; the electronic equipment can establish a second decision tree model according to the length of each answer text, the second characteristic value of each answer text obtained by the key word characteristics and the answer score label set for each answer text, so as to be used for answer score prediction, thereby improving the score prediction precision and guaranteeing the interpretation of the model.

Referring to fig. 3, a schematic structural diagram of a decision tree model building apparatus according to an embodiment of the present application is shown. The device can be applied to electronic equipment. Specifically, the apparatus may include:

a construction unit 31 for constructing a bag-of-words model using training text; the word bag model comprises first characteristic values of each answer text in the training text;

The construction unit 31 is further configured to establish a first decision tree model according to the first feature value of each answer text and the answer score label set for each answer text, and obtain an importance value of the word feature of each answer text output by the first decision tree model;

the processing unit 32 is configured to screen out keyword features meeting a preset condition from the word features of each answer text according to the importance values of the word features of each answer text, and obtain a second feature value of each answer text according to the keyword features;

the construction unit 31 is further configured to establish a second decision tree model according to the second feature value of each answer text and the answer score label set for each answer text, so as to be used for answer score prediction.

In an alternative embodiment, the processing unit 32 is further configured to, after the second decision tree model is built, use the target answer text as the input data of the second decision tree model when answer score prediction is required for the target answer text; and outputting scoring result information of the target answer text through the second decision tree model.

In an alternative embodiment, the processing unit 32 screens out keyword features meeting a preset condition from the word features of each answer text according to the importance values of the word features of each answer text, specifically screens out first word features with importance values greater than or equal to a preset value from the word features of each answer text according to the importance values of the word features of each answer text; receiving a deleting instruction, and deleting a second word feature from the first word features according to the deleting instruction; and determining the first word characteristic with the deleting operation as the keyword characteristic meeting the preset condition.

In an alternative embodiment, the construction unit 31 establishes a first decision tree model according to the first feature value of each answer text and the answer score label set for each answer text, specifically, inputs the first feature value of each answer text and the answer score label set for each answer text into a first initial decision tree model to train the first initial decision tree model; and taking the trained first initial decision tree model as a first decision tree model.

In an alternative embodiment, the construction unit 31 establishes a second decision tree model according to the second feature value of each answer text and the answer score label set for each answer text, specifically, inputs the second feature value of each answer text and the answer score label set for each answer text into a second initial decision tree model to train the second initial decision tree model; and taking the trained second initial decision tree model as a second decision tree model.

In an alternative embodiment, the construction unit 31 establishes a second decision tree model according to the second feature value of each answer text and the answer score label set for each large text, specifically determining the length of each answer text; and establishing a second decision tree model according to the length of each answer text, the second characteristic value of each answer text and answer scoring labels set for each answer text.

In an alternative embodiment, the construction unit 31 constructs the bag of words model using training text, in particular constructs the dictionary using training text; the dictionary comprises word characteristics of each answer text in the training text; counting whether each word feature in the dictionary appears in each answer text; and determining the first characteristic value of each answer text according to the statistical result, and generating a word bag model comprising the first characteristic value of each answer text.

In the embodiment shown in fig. 3, the electronic device may construct a word bag model by using the training text, and establish a first decision tree model according to the word bag model and answer scoring labels set for each answer text, so as to obtain importance values of word features of each answer text output by the first decision tree model, so as to be used for screening out keyword features meeting preset conditions; the electronic equipment can establish a second decision tree model according to the second characteristic value of each answer text obtained by the keyword characteristics and the answer score label set for each answer text so as to be used for answer score prediction, thereby improving the score prediction precision and guaranteeing the interpretation of the model.

Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device described in the present embodiment may include: one or more processors 1000, one or more input devices 2000, one or more output devices 3000, and memory 4000. The processor 1000, input device 2000, output device 3000, and memory 4000 may be connected by a bus or other means.

The input device 2000 and the output device 3000 may be standard wired or wireless communication interfaces.

The Processor 1000 may be a central processing module (Central Processing Unit, CPU), which may also be other general purpose processors, digital signal processors (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), off-the-shelf Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Memory 4000 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as a disk memory. Memory 4000 is used to store a set of program codes, and input device 2000, output device 3000, and processor 1000 can call up the program codes stored in memory 4000. Specifically:

A processor 1000 for constructing a bag of words model using training text; the word bag model comprises first characteristic values of each answer text in the training text; establishing a first decision tree model according to the first characteristic value of each answer text and an answer scoring label set for each answer text, and obtaining the importance value of the word characteristic of each answer text output by the first decision tree model; according to the importance value of the word characteristics of each answer text, selecting keyword characteristics meeting preset conditions from the word characteristics of each answer text, and obtaining a second characteristic value of each answer text according to the keyword characteristics; and establishing a second decision tree model according to the second characteristic value of each answer text and the answer score label set for each answer text, so as to be used for answer score prediction.

Optionally, the processor 1000 is further configured to, after the second decision tree model is established, use the target answer text as input data of the second decision tree model when answer score prediction needs to be performed on the target answer text; and outputting scoring result information of the target answer text through the second decision tree model.

Optionally, the processor 1000 screens out keyword features meeting a preset condition from the word features of each answer text according to the importance values of the word features of each answer text, specifically screens out first word features with importance values greater than or equal to a preset value from the word features of each answer text according to the importance values of the word features of each answer text; receiving a deletion instruction through the input device 2000, and deleting a second word feature from the first word features according to the deletion instruction; and determining the first word characteristic with the deleting operation as the keyword characteristic meeting the preset condition.

Optionally, the processor 1000 establishes a first decision tree model according to the first feature value of each answer text and the answer score label set for each answer text, specifically, inputs the first feature value of each answer text and the answer score label set for each answer text into a first initial decision tree model to train the first initial decision tree model; and taking the trained first initial decision tree model as a first decision tree model.

Optionally, the processor 1000 establishes a second decision tree model according to the second feature value of each answer text and the answer score label set for each answer text, specifically, inputs the second feature value of each answer text and the answer score label set for each answer text into a second initial decision tree model to train the second initial decision tree model; and taking the trained second initial decision tree model as a second decision tree model.

Optionally, the processor 1000 establishes a second decision tree model according to the second feature value of each answer text and the answer score label set for each large text, specifically determining the length of each answer text; and establishing a second decision tree model according to the length of each answer text, the second characteristic value of each answer text and answer scoring labels set for each answer text.

Optionally, the processor 1000 builds a bag of words model using training text, in particular builds a dictionary using training text; the dictionary comprises word characteristics of each answer text in the training text; counting whether each word feature in the dictionary appears in each answer text; and determining the first characteristic value of each answer text according to the statistical result, and generating a word bag model comprising the first characteristic value of each answer text.

In a specific implementation, the processor 1000, the input device 2000 and the output device 3000 described in the embodiments of the present application may perform the implementation described in the embodiments of fig. 1-2, and may also perform the implementation described in the embodiments of the present application, which are not described herein again.

The functional modules in the embodiments of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module. The integrated modules may be implemented in the form of sampling hardware or in the form of sampling software functional modules.

Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random-access Memory (Random Access Memory, RAM), or the like.

The above disclosure is only a preferred embodiment of the present application, and it should be understood that the scope of the application is not limited thereto, and those skilled in the art will appreciate that all or part of the procedures described above can be performed according to the equivalent changes of the claims, and still fall within the scope of the present application.

Claims

1. A method for constructing a decision tree model, comprising:

Establishing a second decision tree model according to the second characteristic value of each answer text and the answer score label set for each answer text, so as to be used for answer score prediction;

the step of screening out keyword features meeting preset conditions from the word features of each answer text according to the importance values of the word features of each answer text comprises the following steps:

Receiving a deleting instruction, and deleting a second word feature from the first word features according to the deleting instruction; the second word feature is a word feature with low interpretability or low contribution;

2. The method of claim 1, wherein after the establishing the second decision tree model, the method further comprises:

3. The method according to any one of claims 1-2, wherein the establishing a first decision tree model according to the first feature value of each answer text and the answer score label set for each answer text includes:

4. A method as claimed in claim 3, wherein said building a second decision tree model based on the second feature values of the respective answer texts and answer scoring labels provided for each answer text comprises:

5. The method according to any one of claims 1-2, wherein the creating a second decision tree model according to the second feature value of each answer text and the answer score label set for each large text includes:

Determining the length of each answer text;

6. The method of claim 1, wherein constructing a bag of words model using training text comprises:

7. A decision tree model building apparatus, comprising:

the construction unit is further configured to establish a second decision tree model according to the second feature values of the answer texts and the answer score labels set for each answer text, so as to be used for answer score prediction;

The processing unit screens out keyword features meeting preset conditions from the word features of each answer text according to the importance values of the word features of each answer text, and the processing unit is specifically used for:

8. An electronic device comprising a processor, an input device, an output device, and a memory, the processor, the input device, the output device, and the memory being interconnected, wherein the memory is configured to store a computer program comprising program instructions, the processor being configured to invoke the program instructions to perform the method of any of claims 1-6.

9. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the method of any of claims 1-6.