CN110119770B - Decision tree model construction method, device, electronic equipment and medium - Google Patents

Decision tree model construction method, device, electronic equipment and medium Download PDF

Info

Publication number
CN110119770B
CN110119770B CN201910349851.XA CN201910349851A CN110119770B CN 110119770 B CN110119770 B CN 110119770B CN 201910349851 A CN201910349851 A CN 201910349851A CN 110119770 B CN110119770 B CN 110119770B
Authority
CN
China
Prior art keywords
answer
answer text
word
text
decision tree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910349851.XA
Other languages
Chinese (zh)
Other versions
CN110119770A (en
Inventor
金戈
徐亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910349851.XA priority Critical patent/CN110119770B/en
Publication of CN110119770A publication Critical patent/CN110119770A/en
Application granted granted Critical
Publication of CN110119770B publication Critical patent/CN110119770B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the application provides a method, a device, electronic equipment and a medium for constructing a decision tree model, wherein the method comprises the following steps: constructing a bag-of-words model by using the training text; establishing a first decision tree model according to a first characteristic value of each answer text included in the word bag model and an answer scoring label set for each answer text, and obtaining an importance value of word characteristics of each answer text output by the first decision tree model; and screening out keyword features meeting preset conditions from the word features of each answer text according to the importance values of the word features of each answer text, and establishing a second decision tree model for answer score prediction according to the second feature values of each answer text obtained by the keyword features and the answer score labels set for each answer text. By adopting the method and the device, the accuracy of scoring prediction can be improved, and meanwhile, the interpretability of the model can be ensured.

Description

Decision tree model construction method, device, electronic equipment and medium
Technical Field
The present application relates to the field of deep learning, and in particular, to a method and apparatus for constructing a decision tree model, an electronic device, and a medium.
Background
With the development of science and technology, in order to save the trouble of manual scoring, an intelligent scoring system has been developed and is increasingly widely applied in schools, enterprises and other institutions. The relevant personnel can manually formulate corresponding rules in the intelligent scoring system, and the intelligent scoring system can score questions by adopting the manually formulated rules, however, the scoring prediction accuracy realized by adopting the mode is limited. In order to improve the scoring prediction accuracy, part of staff adopts a machine learning method of logistic regression to score questions. While machine learning methods employing logistic regression can achieve higher accuracy of scoring predictions, models obtained in this manner have lower interpretability.
Disclosure of Invention
The embodiment of the application provides a decision tree model construction method, a decision tree model construction device, electronic equipment and a decision tree medium, which can improve the scoring prediction precision and ensure the interpretation of the model.
In a first aspect, an embodiment of the present application provides a method for constructing a decision tree model, including:
Constructing a bag-of-words model by using the training text; the word bag model comprises first characteristic values of each answer text in the training text;
establishing a first decision tree model according to the first characteristic value of each answer text and an answer scoring label set for each answer text, and obtaining the importance value of the word characteristic of each answer text output by the first decision tree model;
according to the importance value of the word characteristics of each answer text, selecting keyword characteristics meeting preset conditions from the word characteristics of each answer text, and obtaining a second characteristic value of each answer text according to the keyword characteristics;
And establishing a second decision tree model according to the second characteristic value of each answer text and the answer score label set for each answer text, so as to be used for answer score prediction.
Optionally, after the establishing the second decision tree model, the method further includes:
When answer score prediction is needed to be carried out on the target answer text, the target answer text is used as input data of the second decision tree model;
And outputting scoring result information of the target answer text through the second decision tree model.
Optionally, the screening, according to the importance value of the word feature of each answer text, the keyword feature meeting the preset condition from the word features of each answer text includes:
according to the importance values of the word features of the answer texts, first word features with the importance values larger than or equal to a preset value are screened out from the word features of the answer texts;
receiving a deleting instruction, and deleting a second word feature from the first word features according to the deleting instruction;
And determining the first word characteristic with the deleting operation as the keyword characteristic meeting the preset condition.
Optionally, the establishing a first decision tree model according to the first feature value of each answer text and the answer scoring label set for each answer text includes:
Inputting a first characteristic value of each answer text and an answer score label set for each answer text into a first initial decision tree model so as to train the first initial decision tree model;
and taking the trained first initial decision tree model as a first decision tree model.
Optionally, the establishing a second decision tree model according to the second feature value of each answer text and the answer scoring label set for each answer text includes:
Inputting a second characteristic value of each answer text and an answer scoring label set for each answer text into a second initial decision tree model so as to train the second initial decision tree model;
and taking the trained second initial decision tree model as a second decision tree model.
Optionally, the establishing a second decision tree model according to the second feature value of each answer text and the answer scoring label set for each large text includes:
Determining the length of each answer text;
And establishing a second decision tree model according to the length of each answer text, the second characteristic value of each answer text and answer scoring labels set for each answer text.
Optionally, the constructing the bag-of-words model by using training text includes:
Constructing a dictionary by using the training text; the dictionary comprises word characteristics of each answer text in the training text;
Counting whether each word feature in the dictionary appears in each answer text;
And determining the first characteristic value of each answer text according to the statistical result, and generating a word bag model comprising the first characteristic value of each answer text.
In a second aspect, an embodiment of the present application provides a decision tree model building apparatus, including:
the construction unit is used for constructing a bag-of-words model by utilizing the training text; the word bag model comprises first characteristic values of each answer text in the training text;
the construction unit is further used for establishing a first decision tree model according to the first characteristic value of each answer text and the answer scoring label set for each answer text, and obtaining the importance value of the word characteristic of each answer text output by the first decision tree model;
the processing unit is used for screening out keyword features meeting preset conditions from the word features of each answer text according to the importance degree values of the word features of each answer text, and obtaining second feature values of each answer text according to the keyword features;
The construction unit is further configured to establish a second decision tree model according to the second feature values of the answer texts and the answer score labels set for each answer text, so as to be used for answer score prediction.
In a third aspect, an embodiment of the present application provides an electronic device, including a processor, an input device, an output device, and a memory, where the processor, the input device, the output device, and the memory are connected to each other, and where the memory is configured to store a computer program, where the computer program includes program instructions, and where the processor is configured to invoke the program instructions to perform a method according to the first aspect.
In a fourth aspect, embodiments of the present application provide a computer readable storage medium storing a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the method according to the first aspect.
In summary, the electronic device may construct a word bag model by using the training text, and establish a first decision tree model according to the word bag model and answer scoring labels set for each answer text, so as to obtain importance values of word features of each answer text output by the first decision tree model, so as to be used for screening out keyword features meeting preset conditions; the electronic equipment can establish a second decision tree model according to the second characteristic value of each answer text obtained by the keyword characteristics and the answer score label set for each answer text so as to be used for answer score prediction, thereby improving the score prediction precision and guaranteeing the interpretation of the model.
Drawings
In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a decision tree model construction method provided by an embodiment of the application;
FIG. 2 is a flow chart of another method for constructing a decision tree model according to an embodiment of the present application;
FIG. 3 is a schematic structural diagram of a decision tree model building apparatus according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application.
Referring to fig. 1, a flow chart of a decision tree model construction method according to an embodiment of the present application is shown. The method can be applied to the electronic equipment, and the electronic equipment can be a terminal or a server. Specifically, the method may include:
s101, constructing a bag-of-words model by using the training text.
The word bag model comprises first characteristic values of each answer text in the training text. The first eigenvalue may be an eigenvector. The first feature value of each answer text is determined according to the numerical value of the word feature of each answer text. The numerical value is determined according to whether the word feature appears in the corresponding answer text or not, or may also be determined according to the number of times the word feature appears in the corresponding answer text, which is not limited by the embodiment of the present invention.
In one embodiment, the electronic device utilizing training text to construct a bag of words model may include: the electronic equipment builds a dictionary by using the training text; the dictionary comprises word characteristics of each answer text in the training text; the electronic equipment counts whether each word feature in the dictionary appears in each answer text; and the electronic equipment determines the first characteristic value of each answer text according to the statistical result, and generates a word bag model comprising the first characteristic value of each answer text.
For example, the training text includes answer text 1 and answer text 2, wherein answer text 1: the first place of China is Beijing, answer text 2: the first part of the united kingdom is london. The dictionary constructed using the training text includes: china, uk, capital, yes, beijing, london. And using 0 and 1 to indicate whether 7 words appear in the answer text 1 and the answer text 2 (the appearance is indicated as 1, the non-appearance is indicated as 0), determining that the first characteristic value of the answer text 1 is (1,0,1,1,1,1,0) and the first characteristic value of the answer text 2 is (0,1,1,1,1,0,1) according to the statistical result, and generating a word bag model comprising the first characteristic value of the answer text 1 and the first characteristic value of the answer text 2.
In one embodiment, the electronic device utilizing training text to construct a bag of words model may include: the electronic equipment builds a dictionary by using the training text; the dictionary comprises word characteristics of each answer text in the training text; the electronic equipment counts whether each word feature in the dictionary appears in each answer text; and the electronic equipment determines the first characteristic value of each answer text according to the statistical result, and generates a word bag model comprising the first characteristic value of each answer text.
In addition to the generation of the bag-of-words model in the above manner, the bag-of-words model may also be generated by counting the number of times each word feature in the dictionary appears in each answer text.
In one embodiment, the electronic device constructs a bag of words model using training text, and may further include: the electronic equipment utilizes the training texts to construct a dictionary which comprises word characteristics (such as words 1, 2 and N) of each answer text in the training texts; counting the number of times that each word feature in the dictionary appears in each answer text, and determining a first feature value of each answer text according to the counted result aiming at the number of times, so as to generate a word bag model comprising the first feature value of each answer text.
The two statistical methods differ in that, for example, a word appears twice in the answer text 3, then the statistical result obtained for the word is 1 (indicating that the word appears in the text 3) in the first statistical method, and the statistical result for the word is 2 (indicating that the word appears twice in the text 3) in the second statistical method. Of course, besides the frequency, a frequency statistics manner may be adopted, and this scheme is not described herein.
In one embodiment, the electronic device building a dictionary using training text may include: the electronic equipment preprocesses the training text to obtain a dictionary. The preprocessing process includes, but is not limited to, word segmentation, removal of stop words, and the like, and the scheme is not described herein.
S102, establishing a first decision tree model according to the first characteristic value of each answer text and the answer scoring label set for each answer text, and obtaining the importance value of the word characteristic of each answer text output by the first decision tree model.
In one embodiment, the electronic device establishes a first decision tree model according to the first feature value of each answer text and the answer score label set for each answer text, including: the electronic equipment inputs a first characteristic value of each answer text and an answer scoring label set for each answer text into a first initial decision tree model so as to train the first initial decision tree model; the electronic equipment takes the trained first initial decision tree model as a first decision tree model. For example, the first decision tree model may be a decision tree model with a maximum depth of 10 and a minimum number of samples of leaf nodes of 100. The answer score label may be a score, such as 90 points, or may be a rating, such as good medium.
The first decision tree model can calculate and obtain the importance degree value of the word characteristics of each answer text, and output the importance degree value of the word characteristics of each answer text, wherein the higher the importance degree value is, the larger the influence on scoring is. The importance level value includes, but is not limited to, being embodied in the form of numerals, letters, etc.
The first decision tree model can also output the word characteristics of each answer text after sequencing the word characteristics of each answer text according to the sequence from front to back according to the importance level value.
Wherein the first decision tree model may also output classification results for each word feature, e.g., good and bad answers.
And S103, screening out keyword features meeting preset conditions from the word features of each answer text according to the importance degree values of the word features of each answer text, and obtaining second feature values of each answer text according to the keyword features.
In one embodiment, the electronic device screens out keyword features meeting a preset condition from the word features of each answer text according to the importance value of the word features of each answer text, including: the electronic equipment screens out first word features with importance degree values larger than or equal to a preset value from the word features of each answer text according to the importance degree values of the word features of each answer text; and the electronic equipment determines the first word characteristic as a keyword characteristic meeting a preset condition.
For example, the electronic device outputs importance values of 1000 word features, and the electronic device may screen 500 word features having importance values greater than or equal to a preset value from the 1000 word features, and determine the 500 word features as keyword features satisfying a preset condition.
In one embodiment, the electronic device screens out keyword features meeting a preset condition from the word features of each answer text according to the importance value of the word features of each answer text, including: the electronic equipment screens out first word features with importance degree values larger than or equal to a preset value from the word features of each answer text according to the importance degree values of the word features of each answer text; the electronic equipment receives a deleting instruction, and deletes the second word feature from the first word features according to the deleting instruction; the electronic device determines the first word feature with the deleting operation as the keyword feature meeting the preset condition. Wherein the second word feature may be a word feature that is less interpretable or less contributing.
For example, the electronic device outputs importance values of 1000 word features, and the electronic device may screen 500 word features whose importance values are greater than or equal to a preset value from the 1000 word features, delete 50 word features of the 500 word features after receiving a deletion instruction for 50 word features having lower interpretability, and determine the remaining 450 word features as keyword features satisfying a preset condition.
In one embodiment, the electronic device, according to the importance value of the word feature of each answer text, screens out the keyword feature meeting the preset condition from the word features of each answer text, and may include: the electronic equipment screens out word features with the sequence being in the preset number from the word features of each answer text according to the importance value of the word features of each answer text; and the electronic equipment determines the word characteristics of which the ranking is in the preset number as the keyword characteristics meeting the preset conditions.
The electronic equipment outputs importance values of 1000 word features, the 1000 word features are word features sequenced from front to back according to the importance values, the electronic equipment can screen out word features sequenced from the 1000 word features, which are positioned in the top 500 word features, and the word features sequenced in the top 500 word features are determined to be keyword features meeting preset conditions.
In one embodiment, the electronic device, according to the importance value of the word feature of each answer text, screens out the keyword feature meeting the preset condition from the word features of each answer text, and may include: the electronic equipment screens out word features with the sequence being in the preset number from the word features of each answer text according to the importance value of the word features of each answer text; the electronic equipment receives a deleting instruction, deletes the third word feature from the word features which are ranked in the preset number according to the deleting instruction, and determines the preset number of word features which are subjected to deleting operation as keyword features meeting preset conditions. The third word feature may be the same as or different from the second word feature depending on the actual situation. The third word feature is a word feature that may be interpreted to a lesser or lesser degree of contribution.
In one embodiment, the obtaining, by the electronic device, the second feature value of each answer text according to the keyword feature may include: the electronic equipment deletes the numerical value of the word characteristic except the keyword characteristic in the first characteristic value of each answer text so as to obtain the second characteristic value of each answer text. By adopting a direct deleting mode, the modeling speed is improved, and the workload of the electronic equipment is reduced.
In addition to the deletion, the electronic device may also re-perform statistics. In one embodiment, the obtaining, by the electronic device, the second feature value of each answer text according to the keyword feature may include: and determining whether the keyword features of the electronic equipment appear in each answer text or not, and determining a second feature value of each answer text according to the statistical result. Or, the times that the key word features of the electronic equipment appear in each answer text are determined, and the second feature value of each answer text is determined according to the statistics result aiming at the times.
And S104, establishing a second decision tree model according to the second characteristic value of each answer text and the answer score label set for each answer text, so as to be used for answer score prediction.
Specifically, the establishing, by the electronic device, a second decision tree model according to the second feature values of the answer texts and the answer scoring labels set for each answer text may include: the electronic equipment inputs a second characteristic value of each answer text and an answer scoring label set for each answer text into a second initial decision tree model so as to train the second initial decision tree model; the electronic equipment takes the trained second initial decision tree model as a second decision tree model. Wherein the second initial decision tree model may be different from the first initial decision tree model. For example, the second decision tree model may be a decision tree model with a maximum depth of 5 and a leaf node minimum number of samples of 100.
In one embodiment, when answer score prediction is required to be performed on the target answer text, the target answer text is used as input data of the second decision tree model; and outputting scoring result information of the target answer text through the second decision tree model. The target answer text may be an answer text to be predicted, for example, a new answer text. The scoring result information may include information such as a score.
In the embodiment shown in fig. 1, the electronic device may construct a word bag model by using training texts, and establish a first decision tree model according to the word bag model and answer scoring labels set for each answer text, so as to obtain importance values of word features of each answer text output by the first decision tree model, so as to be used for screening out keyword features meeting preset conditions; the electronic equipment can establish a second decision tree model according to the second characteristic value of each answer text obtained by the keyword characteristics and the answer score label set for each answer text so as to be used for answer score prediction, thereby improving the score prediction precision and guaranteeing the interpretation of the model.
Referring to fig. 2, a flow chart of another decision tree model construction method according to an embodiment of the application is shown. Specifically, the method may include:
s201, constructing a bag-of-words model by using training texts;
s202, establishing a first decision tree model according to the first characteristic value of each answer text and answer scoring labels set for each answer text, and obtaining importance values of word characteristics of each answer text output by the first decision tree model;
S203, according to the importance value of the word characteristics of each answer text, selecting the keyword characteristics meeting the preset conditions from the word characteristics of each answer text, and obtaining the second characteristic value of each answer text according to the keyword characteristics.
Steps S201 to S203 may refer to steps S101 to S103 in the embodiment of fig. 1, and the embodiment of the present application is not described herein.
S204, determining the length of each answer text;
S205, a second decision tree model is established according to the length of each answer text, the second characteristic value of each answer text and answer scoring labels set for each answer text, and the second decision tree model is used for answer scoring prediction.
In the embodiment of the application, the electronic equipment can establish the second decision tree model directly according to the second characteristic value of each answer text and the answer scoring label set for each answer text, and can also introduce the length of each answer text to establish the second decision tree model. According to the embodiment of the application, the length of each answer text is introduced, so that the scoring prediction accuracy can be effectively improved.
Specifically, the electronic device establishes a second decision tree model according to the length of each answer text, the second characteristic value of each answer text and the answer scoring label set for each answer text, and includes that the electronic device inputs the length of each answer text, the second characteristic value of each answer text and the answer scoring label set for each answer text into a second initial decision tree model to train the second initial decision tree model; the electronic equipment takes the trained second initial decision tree model as a second decision tree model.
In the embodiment shown in fig. 2, the electronic device may construct a word bag model by using training texts, and establish a first decision tree model according to the word bag model and answer scoring labels set for each answer text, so as to obtain importance values of word features of each answer text output by the first decision tree model, so as to be used for screening out keyword features meeting preset conditions; the electronic equipment can establish a second decision tree model according to the length of each answer text, the second characteristic value of each answer text obtained by the key word characteristics and the answer score label set for each answer text, so as to be used for answer score prediction, thereby improving the score prediction precision and guaranteeing the interpretation of the model.
Referring to fig. 3, a schematic structural diagram of a decision tree model building apparatus according to an embodiment of the present application is shown. The device can be applied to electronic equipment. Specifically, the apparatus may include:
a construction unit 31 for constructing a bag-of-words model using training text; the word bag model comprises first characteristic values of each answer text in the training text;
The construction unit 31 is further configured to establish a first decision tree model according to the first feature value of each answer text and the answer score label set for each answer text, and obtain an importance value of the word feature of each answer text output by the first decision tree model;
the processing unit 32 is configured to screen out keyword features meeting a preset condition from the word features of each answer text according to the importance values of the word features of each answer text, and obtain a second feature value of each answer text according to the keyword features;
the construction unit 31 is further configured to establish a second decision tree model according to the second feature value of each answer text and the answer score label set for each answer text, so as to be used for answer score prediction.
In an alternative embodiment, the processing unit 32 is further configured to, after the second decision tree model is built, use the target answer text as the input data of the second decision tree model when answer score prediction is required for the target answer text; and outputting scoring result information of the target answer text through the second decision tree model.
In an alternative embodiment, the processing unit 32 screens out keyword features meeting a preset condition from the word features of each answer text according to the importance values of the word features of each answer text, specifically screens out first word features with importance values greater than or equal to a preset value from the word features of each answer text according to the importance values of the word features of each answer text; receiving a deleting instruction, and deleting a second word feature from the first word features according to the deleting instruction; and determining the first word characteristic with the deleting operation as the keyword characteristic meeting the preset condition.
In an alternative embodiment, the construction unit 31 establishes a first decision tree model according to the first feature value of each answer text and the answer score label set for each answer text, specifically, inputs the first feature value of each answer text and the answer score label set for each answer text into a first initial decision tree model to train the first initial decision tree model; and taking the trained first initial decision tree model as a first decision tree model.
In an alternative embodiment, the construction unit 31 establishes a second decision tree model according to the second feature value of each answer text and the answer score label set for each answer text, specifically, inputs the second feature value of each answer text and the answer score label set for each answer text into a second initial decision tree model to train the second initial decision tree model; and taking the trained second initial decision tree model as a second decision tree model.
In an alternative embodiment, the construction unit 31 establishes a second decision tree model according to the second feature value of each answer text and the answer score label set for each large text, specifically determining the length of each answer text; and establishing a second decision tree model according to the length of each answer text, the second characteristic value of each answer text and answer scoring labels set for each answer text.
In an alternative embodiment, the construction unit 31 constructs the bag of words model using training text, in particular constructs the dictionary using training text; the dictionary comprises word characteristics of each answer text in the training text; counting whether each word feature in the dictionary appears in each answer text; and determining the first characteristic value of each answer text according to the statistical result, and generating a word bag model comprising the first characteristic value of each answer text.
In the embodiment shown in fig. 3, the electronic device may construct a word bag model by using the training text, and establish a first decision tree model according to the word bag model and answer scoring labels set for each answer text, so as to obtain importance values of word features of each answer text output by the first decision tree model, so as to be used for screening out keyword features meeting preset conditions; the electronic equipment can establish a second decision tree model according to the second characteristic value of each answer text obtained by the keyword characteristics and the answer score label set for each answer text so as to be used for answer score prediction, thereby improving the score prediction precision and guaranteeing the interpretation of the model.
Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device described in the present embodiment may include: one or more processors 1000, one or more input devices 2000, one or more output devices 3000, and memory 4000. The processor 1000, input device 2000, output device 3000, and memory 4000 may be connected by a bus or other means.
The input device 2000 and the output device 3000 may be standard wired or wireless communication interfaces.
The Processor 1000 may be a central processing module (Central Processing Unit, CPU), which may also be other general purpose processors, digital signal processors (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), off-the-shelf Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
Memory 4000 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as a disk memory. Memory 4000 is used to store a set of program codes, and input device 2000, output device 3000, and processor 1000 can call up the program codes stored in memory 4000. Specifically:
A processor 1000 for constructing a bag of words model using training text; the word bag model comprises first characteristic values of each answer text in the training text; establishing a first decision tree model according to the first characteristic value of each answer text and an answer scoring label set for each answer text, and obtaining the importance value of the word characteristic of each answer text output by the first decision tree model; according to the importance value of the word characteristics of each answer text, selecting keyword characteristics meeting preset conditions from the word characteristics of each answer text, and obtaining a second characteristic value of each answer text according to the keyword characteristics; and establishing a second decision tree model according to the second characteristic value of each answer text and the answer score label set for each answer text, so as to be used for answer score prediction.
Optionally, the processor 1000 is further configured to, after the second decision tree model is established, use the target answer text as input data of the second decision tree model when answer score prediction needs to be performed on the target answer text; and outputting scoring result information of the target answer text through the second decision tree model.
Optionally, the processor 1000 screens out keyword features meeting a preset condition from the word features of each answer text according to the importance values of the word features of each answer text, specifically screens out first word features with importance values greater than or equal to a preset value from the word features of each answer text according to the importance values of the word features of each answer text; receiving a deletion instruction through the input device 2000, and deleting a second word feature from the first word features according to the deletion instruction; and determining the first word characteristic with the deleting operation as the keyword characteristic meeting the preset condition.
Optionally, the processor 1000 establishes a first decision tree model according to the first feature value of each answer text and the answer score label set for each answer text, specifically, inputs the first feature value of each answer text and the answer score label set for each answer text into a first initial decision tree model to train the first initial decision tree model; and taking the trained first initial decision tree model as a first decision tree model.
Optionally, the processor 1000 establishes a second decision tree model according to the second feature value of each answer text and the answer score label set for each answer text, specifically, inputs the second feature value of each answer text and the answer score label set for each answer text into a second initial decision tree model to train the second initial decision tree model; and taking the trained second initial decision tree model as a second decision tree model.
Optionally, the processor 1000 establishes a second decision tree model according to the second feature value of each answer text and the answer score label set for each large text, specifically determining the length of each answer text; and establishing a second decision tree model according to the length of each answer text, the second characteristic value of each answer text and answer scoring labels set for each answer text.
Optionally, the processor 1000 builds a bag of words model using training text, in particular builds a dictionary using training text; the dictionary comprises word characteristics of each answer text in the training text; counting whether each word feature in the dictionary appears in each answer text; and determining the first characteristic value of each answer text according to the statistical result, and generating a word bag model comprising the first characteristic value of each answer text.
In a specific implementation, the processor 1000, the input device 2000 and the output device 3000 described in the embodiments of the present application may perform the implementation described in the embodiments of fig. 1-2, and may also perform the implementation described in the embodiments of the present application, which are not described herein again.
The functional modules in the embodiments of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module. The integrated modules may be implemented in the form of sampling hardware or in the form of sampling software functional modules.
Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random-access Memory (Random Access Memory, RAM), or the like.
The above disclosure is only a preferred embodiment of the present application, and it should be understood that the scope of the application is not limited thereto, and those skilled in the art will appreciate that all or part of the procedures described above can be performed according to the equivalent changes of the claims, and still fall within the scope of the present application.

Claims (9)

1. A method for constructing a decision tree model, comprising:
Constructing a bag-of-words model by using the training text; the word bag model comprises first characteristic values of each answer text in the training text;
establishing a first decision tree model according to the first characteristic value of each answer text and an answer scoring label set for each answer text, and obtaining the importance value of the word characteristic of each answer text output by the first decision tree model;
according to the importance value of the word characteristics of each answer text, selecting keyword characteristics meeting preset conditions from the word characteristics of each answer text, and obtaining a second characteristic value of each answer text according to the keyword characteristics;
Establishing a second decision tree model according to the second characteristic value of each answer text and the answer score label set for each answer text, so as to be used for answer score prediction;
the step of screening out keyword features meeting preset conditions from the word features of each answer text according to the importance values of the word features of each answer text comprises the following steps:
according to the importance values of the word features of the answer texts, first word features with the importance values larger than or equal to a preset value are screened out from the word features of the answer texts;
Receiving a deleting instruction, and deleting a second word feature from the first word features according to the deleting instruction; the second word feature is a word feature with low interpretability or low contribution;
And determining the first word characteristic with the deleting operation as the keyword characteristic meeting the preset condition.
2. The method of claim 1, wherein after the establishing the second decision tree model, the method further comprises:
When answer score prediction is needed to be carried out on the target answer text, the target answer text is used as input data of the second decision tree model;
And outputting scoring result information of the target answer text through the second decision tree model.
3. The method according to any one of claims 1-2, wherein the establishing a first decision tree model according to the first feature value of each answer text and the answer score label set for each answer text includes:
Inputting a first characteristic value of each answer text and an answer score label set for each answer text into a first initial decision tree model so as to train the first initial decision tree model;
and taking the trained first initial decision tree model as a first decision tree model.
4. A method as claimed in claim 3, wherein said building a second decision tree model based on the second feature values of the respective answer texts and answer scoring labels provided for each answer text comprises:
Inputting a second characteristic value of each answer text and an answer scoring label set for each answer text into a second initial decision tree model so as to train the second initial decision tree model;
and taking the trained second initial decision tree model as a second decision tree model.
5. The method according to any one of claims 1-2, wherein the creating a second decision tree model according to the second feature value of each answer text and the answer score label set for each large text includes:
Determining the length of each answer text;
And establishing a second decision tree model according to the length of each answer text, the second characteristic value of each answer text and answer scoring labels set for each answer text.
6. The method of claim 1, wherein constructing a bag of words model using training text comprises:
Constructing a dictionary by using the training text; the dictionary comprises word characteristics of each answer text in the training text;
Counting whether each word feature in the dictionary appears in each answer text;
And determining the first characteristic value of each answer text according to the statistical result, and generating a word bag model comprising the first characteristic value of each answer text.
7. A decision tree model building apparatus, comprising:
the construction unit is used for constructing a bag-of-words model by utilizing the training text; the word bag model comprises first characteristic values of each answer text in the training text;
the construction unit is further used for establishing a first decision tree model according to the first characteristic value of each answer text and the answer scoring label set for each answer text, and obtaining the importance value of the word characteristic of each answer text output by the first decision tree model;
the processing unit is used for screening out keyword features meeting preset conditions from the word features of each answer text according to the importance degree values of the word features of each answer text, and obtaining second feature values of each answer text according to the keyword features;
the construction unit is further configured to establish a second decision tree model according to the second feature values of the answer texts and the answer score labels set for each answer text, so as to be used for answer score prediction;
The processing unit screens out keyword features meeting preset conditions from the word features of each answer text according to the importance values of the word features of each answer text, and the processing unit is specifically used for:
according to the importance values of the word features of the answer texts, first word features with the importance values larger than or equal to a preset value are screened out from the word features of the answer texts;
Receiving a deleting instruction, and deleting a second word feature from the first word features according to the deleting instruction; the second word feature is a word feature with low interpretability or low contribution;
And determining the first word characteristic with the deleting operation as the keyword characteristic meeting the preset condition.
8. An electronic device comprising a processor, an input device, an output device, and a memory, the processor, the input device, the output device, and the memory being interconnected, wherein the memory is configured to store a computer program comprising program instructions, the processor being configured to invoke the program instructions to perform the method of any of claims 1-6.
9. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the method of any of claims 1-6.
CN201910349851.XA 2019-04-28 2019-04-28 Decision tree model construction method, device, electronic equipment and medium Active CN110119770B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910349851.XA CN110119770B (en) 2019-04-28 2019-04-28 Decision tree model construction method, device, electronic equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910349851.XA CN110119770B (en) 2019-04-28 2019-04-28 Decision tree model construction method, device, electronic equipment and medium

Publications (2)

Publication Number Publication Date
CN110119770A CN110119770A (en) 2019-08-13
CN110119770B true CN110119770B (en) 2024-05-14

Family

ID=67521599

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910349851.XA Active CN110119770B (en) 2019-04-28 2019-04-28 Decision tree model construction method, device, electronic equipment and medium

Country Status (1)

Country Link
CN (1) CN110119770B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112395855A (en) * 2020-12-03 2021-02-23 中国联合网络通信集团有限公司 Comment-based evaluation method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108073568A (en) * 2016-11-10 2018-05-25 腾讯科技(深圳)有限公司 keyword extracting method and device
CN109472305A (en) * 2018-10-31 2019-03-15 国信优易数据有限公司 Answer quality determines model training method, answer quality determination method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150199913A1 (en) * 2014-01-10 2015-07-16 LightSide Labs, LLC Method and system for automated essay scoring using nominal classification

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108073568A (en) * 2016-11-10 2018-05-25 腾讯科技(深圳)有限公司 keyword extracting method and device
CN109472305A (en) * 2018-10-31 2019-03-15 国信优易数据有限公司 Answer quality determines model training method, answer quality determination method and device

Also Published As

Publication number Publication date
CN110119770A (en) 2019-08-13

Similar Documents

Publication Publication Date Title
CN110543552B (en) Conversation interaction method and device and electronic equipment
CN107220386A (en) Information-pushing method and device
CN110222328B (en) Method, device and equipment for labeling participles and parts of speech based on neural network and storage medium
CN113360711B (en) Model training and executing method, device, equipment and medium for video understanding task
US20240211692A1 (en) Method of training ranking model, and electronic device
CN113392197B (en) Question-answering reasoning method and device, storage medium and electronic equipment
CN111737464B (en) Text classification method and device and electronic equipment
CN112818110B (en) Text filtering method, equipment and computer storage medium
CN110232128A (en) Topic file classification method and device
CN110717019A (en) Question-answering processing method, question-answering system, electronic device and medium
CN115099239B (en) Resource identification method, device, equipment and storage medium
CN103164428A (en) Method and device for determining correlation between microblog and given entity
CN115062718A (en) Language model training method and device, electronic equipment and storage medium
CN113255365A (en) Text data enhancement method, device and equipment and computer readable storage medium
US20220198358A1 (en) Method for generating user interest profile, electronic device and storage medium
CN113569559B (en) Short text entity emotion analysis method, system, electronic equipment and storage medium
CN110119770B (en) Decision tree model construction method, device, electronic equipment and medium
CN113515620A (en) Method and device for sorting technical standard documents of power equipment, electronic equipment and medium
CN112527967A (en) Text matching method, device, terminal and storage medium
CN110347934B (en) Text data filtering method, device and medium
CN111639494A (en) Case affair relation determining method and system
CN114281983B (en) Hierarchical text classification method, hierarchical text classification system, electronic device and storage medium
CN113722477B (en) Internet citizen emotion recognition method and system based on multitask learning and electronic equipment
CN114141236B (en) Language model updating method and device, electronic equipment and storage medium
CN115587173A (en) Dialog text prediction method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant