CN114330716A

CN114330716A - University student employment prediction method based on CART decision tree

Info

Publication number: CN114330716A
Application number: CN202111608264.1A
Authority: CN
Inventors: 党向盈; 鲍蓉; 姜代红; 徐玮玮; 佟恒乐; 王晓雪
Original assignee: Xuzhou University of Technology
Current assignee: Xuzhou University of Technology
Priority date: 2021-12-24
Filing date: 2021-12-24
Publication date: 2022-04-12

Abstract

The invention discloses a CART decision tree-based college student employment prediction method, and aims to provide a method for predicting college student employment conditions. Firstly, preprocessing data information of college students to form a standard basic attribute data set for data mining; then, determining the correlation between the basic attributes of the undergraduates in the data set and the employment prediction target attributes by utilizing a Pearson correlation analysis method, and determining the basic attributes of the undergraduates related to the employment prediction target attributes as feature vectors for constructing an undergraduate employment prediction model; finally, based on the training set, calculating a kini coefficient by the characteristic vector; and constructing a university student employment prediction model by adopting a CART-based decision tree algorithm. The method can predict the employment situation of the university students according to the information data set of the university students, provide intelligent service for the employment management departments of colleges and universities, guide the students to reasonably take employment, and contribute to improving the employment rate of the university students.

Description

University student employment prediction method based on CART decision tree

Technical Field

The invention relates to the technical field of artificial intelligence informatization and big data analysis, in particular to a CART decision tree-based college student employment prediction model for predicting the employment situation of college students according to the past college student employment big data information.

Background

The college student number in 2019 is up to 830 thousands of people, and the college student number in 2020 breaks through 840 thousands of people. With the addition of nearly 30 million students returning to the country and college students who have not found work before, and the number of social re-employment people, nearly ten million people will be engaged in competition for employment opportunities in 2020. The domestic employment situation pressure is huge, the newly-increased labor force far exceeds the newly-increased employment opportunity, and due to the continuous development of the national advanced education process, the employment competition severity of the contemporary university students is seriously aggravated by high-quality talent ratios. Secondly, the imbalance problem of employment structure is also very serious. From the employment areas, more college students are willing to develop employment in the first-line and second-line areas, but are not willing to develop in the three-four-line cities. However, the performance of college students during school, such as scores, whether the students are cadres or not, and other factors can influence the employment situation of the college students, and meanwhile, from the perspective of subject specialty, the employment situation of the college students in study departments is slightly better, and the employment situation of the subject specialty is not optimistic.

Decision trees belong to a supervised learning model in the field of artificial intelligence, And three types of the decision trees include ID3, C4,5 And CART (classification And Regression Tree), which are typical classification prediction algorithms. The node in the decision tree represents a certain attribute value, the bifurcation represents all possible values of the attribute represented by the node, and the leaf node represents the prediction result of the association rule from the root node to the current leaf node. The decision tree may have a single output or a plurality of outputs. Decision tree models are often employed in data mining to accomplish predictive data mining tasks. The generation process of the decision tree is very complex, firstly, a proper decision tree algorithm needs to be selected according to a prepared data set and a mining target, and different decision tree algorithms are suitable for different mining tasks; for the constructed decision tree model, a test set with a certain data quantity needs to be divided from the original data set, and the test set is used for evaluating the accuracy of the constructed decision tree model and analyzing whether the decision tree model meets the mining purpose.

In the existing university student employment prediction method, the influence of relevant attributes in the employment information of the university students is considered to be less. The method uses a more classical CART algorithm in a decision tree algorithm to construct a employment prediction model. The prediction function is realized by firstly carrying out preprocessing such as cleaning and correlation analysis on collected data, then constructing a model based on a CART decision tree and finishing model training, wherein the constructed model can be used for predicting employment areas, positions, salaries and the like of college students, so that the recommendation of the related positions of the college students is further realized.

Disclosure of Invention

In order to solve the defects in the university student employment situation prediction technology, the invention provides a CART decision tree-based university student employment prediction method, attributes related to employment situations are determined according to basic attributes in university student information data, and a CART decision tree prediction model is constructed, wherein the CART decision tree prediction model can predict the university student employment situations.

The technical scheme adopted by the invention is as follows: an university student employment prediction method based on a CART decision tree comprises the following steps:

s1: preprocessing information data of college students;

collecting primary data of college students, constructing a student data basic attribute set, and carrying out standardization processing on each data to form a standardized data set, wherein the college student data basic attribute set is recorded as N ═ { N ═ N₁,n₂,…,n_cIn which n is_iIs the ith basic attribute, and c is the number of the basic attributes;

s2: determining relevant attributes influencing the employment prediction targets of the college students;

setting the undergraduate employment prediction target attribute set as Y ═ Y₁,y₂,…y_|Y|Y, where Y is the number of values of the predicted target attribute_uIs a predicted target attribute value;

calculating the element N in N_iAnd the element Y in Y_uHas a Pearson correlation coefficient of lambda_i,uComprises the following steps:

cov (n) among them_i,y_u) Is n_iAnd y_uThe variance of the covariance,

and

are each n_iAnd y_uStandard deviation of (2).

Setting the Pearson correlation coefficient threshold value as h when lambda_i,uIs not less than h, defined as_iIs related to Y; otherwise, define n_iIs not related to Y; based on the method, the basic attributes of the undergraduates related to Y are counted; recording the related attributes influencing the employment objective Y of the college students as a characteristic vector X ═ X₁,x₂,…,x_mThe symbol is, wherein m is the number of characteristic variables, and m is less than or equal to c; wherein for x_iHas a value of K_iClass, is marked as

S3: constructing a university student employment prediction model based on the CART decision tree;

setting basic attribute data information of the university students as alpha groups, setting r group data in the alpha groups as a training set S, and setting the rest alpha-r group data as a test set; the training set S is used for constructing a employment prediction model, and the testing set is used for verifying the accuracy of the employment testing model;

calculated in the training set S

Coefficient of kini of

Carrying out the fundamental coefficient solution on the basic attributes of the college students in the training set S, setting the threshold value of the fundamental coefficient as l, and then based on

And (3) constructing a career decision tree of college students, namely a career prediction model.

Preferably, in step S3, 70% of the data is set as the training set, and 30% of the data is set as the test set.

Preferably, in step S3, the method for solving the kini coefficient of the basic attributes of the college students in the training set S includes:

when x is_iTake a value of

When is recorded as

When x is_iValue is not

When is recorded as

Whereby S can be divided into

Two parts, the number of corresponding training sets is respectively

And

at S, when

When Y takes on value Y_uHas a probability of

When in use

When Y takes on value Y_uIs given by

Then it is determined that,

the kini coefficient of (a) can be expressed as:

in the same way

The kini coefficient of (a) can be expressed as:

by

And

as can be seen, for S, V (x)_i) Get

Coefficient of kini of

Can be expressed as:

preferably, the step S3 is based on

The method for constructing the university student employment CART decision tree comprises the following steps:

inputting: s, X ═ X₁,x₂,…,x_m}，l，m；

And (3) outputting: a decision tree T;

step 1: computing

If it is not

T is a single node tree; otherwise, turning to Step 2;

step2 for

Solving for their minimum value, noting that the minimum value is

Get

Is a cutting point of a binary tree;

step3 according to x in S_iWhether the value is equal to

Divide S into two subsets

And

and will be

And

distributing the child nodes into two child nodes, wherein if the child node has a damping coefficient less than l, the child node is a leaf node, if the two child nodes are both leaf nodes, returning to the decision tree T, otherwise, performing Step 4;

step4 for non-leaf nodes, respectively in order

And

order to

And recursively calling Step1 to Step4 to generate a binary decision tree T.

After construction, the effectiveness of the built prediction model needs to be evaluated. The test set samples have alpha-r in total, the basic attributes of students are used as the input of a CART decision tree prediction model, the output result of the target attribute is counted and then is compared with the real target attribute in the test set in a consistency mode, b data are consistent, and then the accuracy of the prediction model can be expressed as follows:

if the accuracy exceeds a certain threshold, the constructed prediction model is valid.

The invention has the beneficial effects that: different from the prior art, the invention fully considers the basic attributes in the university student employment information related to the employment prediction target and adopts the Pearson correlation coefficient method to determine the related attributes influencing the employment of the university students. According to the invention, a proper calculation method of the kini coefficient can be designed according to the characteristics of information data of college students; according to the characteristics of university student information data, an university student employment prediction model can be constructed based on the CART decision tree. The invention provides intelligent service for employment management departments of colleges and universities, guides students to reasonably adopt employment, is beneficial to improving the employment rate of college students and can also provide intelligent position recommendation for a recruitment network platform.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a general flowchart of a CART decision tree-based college student employment prediction method provided by the present invention;

FIG. 2 is a data fragment diagram of a student data information summary table;

FIG. 3 is a data fragment diagram after student data information summary table normalization processing;

FIG. 4 is a thermodynamic diagram of the correlation between student attributes of a data set;

FIG. 5 is a diagram of a decision tree model with employment area attributes as prediction targets;

FIG. 6 is a prediction tools page;

fig. 7 is a prediction result display page.

Detailed Description

For further explanation of the details of the technical solutions of the present invention and their advantages, reference is now made to the detailed description of the embodiments taken in conjunction with the accompanying drawings.

Step S1: preprocessing of university student information data

1.1 college student data acquisition and integration

The raw data collection is from different management departments of the school, including a student basic information data table, a student achievement data table of each period, a student employment situation data table and the like, the data in each table contains a plurality of student attributes, such as school number, name, specialty, class, unit name and unit telephone, and the tables contain some repeated attributes and need to process the raw data.

Different data attributes are first integrated into a data table to form the final canonical data set. For example, the student basic information data sheet, the student achievement data sheet and the student employment situation data sheet contain many same attributes, information in different data sheets is integrated by using the attribute, namely the student number, for uniquely identifying the student, and the same attribute is deleted to form an undergraduate data information summary sheet; the method is characterized in that possible attributes of employment trend prediction are reserved, data such as a study number, a mobile phone number and the like are redundant data, and the attributes are deleted during data preprocessing.

1.2 data normalization processing

In the data mining process, a certain attribute has a large number of different values, and the attributes can be subjected to normalized processing, so that the values of the attributes fall into a limited and smaller value domain. Such as a gender attribute, with boys labeled 1 and girls labeled 0. The property of the origin place is that according to the latest Chinese city grade division table in 2020, the information of the origin place is divided into five types according to the grade of the city where the household registration is located, and the cities of one, two, three, four and five lines are respectively represented by the

numbers

1,2,3,4 and 5. Other student attribute values are normalized in a similar manner.

The normalized "college student data information summary table" has c items of attributes such as profession, place of birth, unit name, cadre, score, etc. These college student basic attribute sets are denoted as N ═ N₁,n₂,…,n_cIn which n is_iIs the ith basic attribute, and c is the number of basic attributes.

Step S2: determining relevant attributes affecting undergraduate employment prediction goals

The employment prediction target attribute of the college students can be employment areas, position salaries and the like, and the attributes of the students corresponding to different prediction targets are different. The university student employment prediction target attribute is set as Y ═ Y₁,y₂,...y_|Y|Y, where Y is the number of values of the predicted target attribute_iIs a predicted target attribute value. For example, the predicted target attribute Y is a employment area, and the employment area has an attribute value Y₁,y₂,y₃,y₄,y₅Corresponding to cities of one, two, three, four and five lines.

In order to find out the attributes of students that affect Y, the present invention employs the pearson correlation coefficient method. Pearson's correlation coefficient of Y and N, cov (N)_i,y_u) Is n_iAnd y_jThe variance of the covariance,

and

are each n_iAnd y_uStandard deviation of (2). Then, n_iAnd y_uCoefficient of correlation λ_i,uCan be expressed as:

from the above formula, λ is known_i,uValues of (a) are always between-1.0 and 1.0, variables close to 0 are said to be uncorrelated, and close to 1 or-1 is said to have strong correlation.

The threshold is set to h. When lambda is_i,uIs not less than h, defined as_iIs related to Y. n is_iSelected as a characteristic variable for constructing a employment prediction model; otherwise, n_iAre not selected as feature variables for building employment prediction models. The attribute of the student associated with Y is denoted as X ═ X₁,x₂,…,x_mM is the number of characteristic variables, m is less than or equal to c, for x_iHas a value of K_iClass, is marked as

For example, the property of the origin is related to the property of the employment area of the university students, and the value of the property of the origin is a city of one line, two lines, three lines, four lines and five lines, which is marked as {1,2,3,4,5 }.

Step S3 construction of university student employment prediction model based on CART decision tree

Setting the number of records in the college student data information summary table as alpha, setting S as a training set, and setting the number of the training data sets as r as alpha multiplied by u; the number of test sets is alpha-r. The training set is used for constructing the employment prediction model, the testing set is used for verifying the accuracy of the employment prediction model, u is determined according to the actual condition, and u is generally required to be larger than 70.

The CART decision tree algorithm uses the minimization criterion of the kini coefficient (Gini) to select features, generating a binary tree. Based on x_iCan divide the training set S into two parts, when x_iTake a value of

When is shown as

When x is_iValue is not

When is shown as

Thus, S can be divided into two parts, each of which is denoted as

And

the number of corresponding training sets is respectively

And

3.1 determining the Keyny coefficients of a student Attribute training set

In the training set S, when

When Y takes on value Y_uIs expressed as

When in use

When Y takes on value Y_uIs expressed as

Then it is determined that,

the kini coefficient of (a) can be expressed as:

in the same way

The kini coefficient of (a) can be expressed as:

by

And

it can be seen that, for S,

the coefficient of kini is recorded as

The kini coefficient represents the attribute x_iThe smaller the Giny coefficient is, the higher the purity of the attribute is, and the feature degree isThe higher. Let the threshold value of the kini coefficient be l.

3.2 college student employment decision tree construction algorithm

Inputting: s, X ═ X₁,x₂,…,x_m}，l，m

And (3) outputting: decision tree T

Step 1: computing

If it is not

T is a single node tree; otherwise, turning to Step 2;

step2 for

Solving for their minimum value, noting that the minimum value is

Get

Is a cutting point of a binary tree;

step3 according to x in S_iWhether the value is equal to

Divide S into two subsets

And

and will be

And

distributing the data into two child nodes, if the child node has a damping coefficient less than l, the child node is a leaf node, ifIf the two child nodes are leaf nodes, returning to the decision tree T, otherwise, performing Step 4;

step4 for non-leaf nodes, respectively in order

And

order to

And recursively calling Step1 to Step4 to generate a binary decision tree T.

After construction, the effectiveness of the built model needs to be evaluated. The test set samples have alpha-r in total, the basic attributes of each student are used as the input of a CART decision tree prediction model, the output result of the target attribute is counted and then is compared with the real target attribute in the test set in a consistency mode, b data are consistent, and then the accuracy SR of the prediction model can be expressed as:

Example analysis

1. Data pre-processing

The research object of the research is related data of employment of computer professional college students in 2018 of a certain school.

Table 1 is student basic data table

Attribute name	Name of field	Data type
			Name (I)	Name	Char(20)
Number learning	ID	Char(20)
			Sex	Gender	Char(20)
Political aspect	Politics	Char(20)
			Source of life ground	Origin	Char(20)
Whether or not there is a dry part	Leader	Char(20)

TABLE 2 student achievement data sheet

Attribute name	Name of field	Data ofType (B)
			Name (I)	Name	Char(20)
Number learning	ID	Char(20)
			Professional	Major	Char(20)
Class of class	Clbum	Char(20)
			Achievement	Score	Char(20)

Table 3 is a student employment data table

Attribute name	Name of field	Data type
			Name (I)	Name	Char(20)
Number learning	ID	Char(20)
			Sex	Gender	Char(20)
Employment situation	Job	Char(20)
			Name of unit	Firm	Char(20)
Unit address	Address	Char(20)
			Job category	JobType	Char(20)
Unit class	FirmType	Char(20)
			Unit telephone	Tel	Char(20)

Different kinds of information of students come from different management departments of schools respectively and are relatively dispersed, so that the received original data need to be integrated. And integrating different data information together and putting the integrated data information into a unified data table so as to form a final specified data set. The three data tables contain many same attributes, and information in different data tables is integrated by using the unique identification student attribute 'school number', and the same attribute is deleted to form a 'student data information summary table', as shown in table 4.

Table 4 is student data information summary table

The student data information table after the summary contains more redundant attributes, so that redundant attributes of the students after the summary are deleted. The student data information table after the summary has 14 attributes, wherein the 6 attributes including the school number, the name, the specialty, the class, the unit name and the unit telephone number do not influence the employment tendency area of the university students, and the 6 attributes are deleted. In addition, in order to describe the employment situation of the students more intuitively and effectively, the attribute of 'position matching' is introduced to describe whether the professional learned by the students is matched with the position category of the employment realized by the students. Finally, a student data information summary table containing 9 attributes is formed, as shown in table 5.

Table 5 is student data information summary table

Attribute name	Name of field	Data type
			Sex	Gender	Char(20)
Political aspect	Politics	Char(20)
			Source of life ground	Origin	Char(20)
Whether or not there is a dry part	Leader	Char(20)
			Achievement	Score	Char(20)
Employment situation	Job	Char(20)
			Job category	JobType	Char(20)
Job matching	Match	Char(20)
			Unit address	Address	Char(20)

According to the preprocessing of the university student data, a student data information summary table is obtained, and the data segments are shown in fig. 2.

In the data mining process, if a certain attribute has a large number of different values, obstruction can be generated in the data mining process, the values of the attributes can be normalized, so that the values of the attributes fall into a limited and smaller value domain, and the analysis mining of data and the generation of a decision tree can be facilitated. The 9 attributes involved are normalized as shown in table 6.

TABLE 6 conversion and comparison table for attribute values

And converting each attribute according to the attribute value conversion comparison table of the table 6 and the specification to form a standard data set, wherein the converted data segments are shown in fig. 3.

2. Correlation analysis of employment attributes affecting college students

In order to establish a decision tree prediction model of the employment tendency areas of the college students, data mining needs to be carried out on information of the college students, the correlation between the basic attributes of the students and the attributes of the employment areas in the table 6 is determined, and then the basic attributes of the students with high correlation are determined.

Based on equation (1), the correlations of the student basic attributes and the prediction target attributes are calculated, respectively. Fig. 4 is a thermodynamic diagram of the relationship between the attributes of the data set, in which the abscissa is from left to right and the ordinate is from top to bottom, "employment area", "biographical area", "whether cadre", "achievement", "job category", "job matching", "sex", "political face", "employment situation", respectively. In the thermodynamic diagram, the darker the color is to reflect the higher the correlation degree, the lighter the color is to reflect the lower the correlation degree, and the numerical value in the grid in the thermodynamic diagram represents the correlation degree between the two attributes of the horizontal and vertical coordinates.

FIG. 7 is a table of correlation partitions

Interval of taking correlation coefficient	Means of
		0.8-1.0	The correlation degree is very high
0.5-0.8	Moderate degree of correlation
		0.2-0.5	The degree of correlation is low
0.0-0.2	Very low or no correlation

From the analysis of fig. 4 and table 7, it can be seen that: the relevance of the 'biogenic place', 'score', 'position category', 'gender', 'position matching' and 'whether cadres' to the 'employment area of college students' is strong, so the six attributes are selected as characteristic variables and used for constructing a prediction model of the employment area of the college students.

3. Construction of employment prediction model based on CART decision tree

The data set comprises 500 samples, 70% of the data set is a training set, and 350 samples, wherein each sample has 6 attributes, namely 'place of life', 'achievement', 'position category', 'gender', 'position matching' and 'whether cadre' respectively, and 1 predicted target attribute 'unit address (employment area)'.

The 'origin' value collection is {1,2,3,4,5}, the 'score' value collection is {0, 1}, the 'position category' value collection is {0, 1}, the 'gender' value collection is {0, 1}, the 'position matching' value collection is {0, 1}, the 'cadre' value collection is {0, 1}, the 'unit address' value collection is {1,2,3,4,5 }.

In the following, the property "provenance" is taken as an example to solve the kiney coefficients of all possible cut points.

The given threshold value of the kini coefficient is 0.255, and if the kini coefficient of a certain node is smaller than the given threshold value, the node is a leaf node. The number of samples of "1" is 8, the number of samples of "2" is 97, the number of samples of "3" is 185, the number of samples of "4" is 50, and the number of samples of "5" is 10 for the attribute "provenance". The number of samples of attribute "unit address" is 51, the number of samples of attribute "1" is 179, the number of samples of attribute "2" is 81, the number of samples of attribute "4" is 25, and the number of samples of attribute "5" is 15. When the attribute 'origin' is taken as '1', the number of samples of the corresponding prediction attribute 'unit address' taken as '1' is 5, the number of samples taken as '2' is 3, the number of samples taken as '3' is 0, the number of samples taken as '4' is 0, and the number of samples taken as '5' is 0.

Based on equations (1) - (3), the kini coefficient is calculated as:

repeating the steps to obtain:

gini (D, biogenic place 2) ═ 0.561

Gini (D, 3. from birth) 0.512

Gini (D, origin of origin ═ 4) ═ 0.533

Gini (D, biogenic ground-5) ═ 0.568

From the analysis of the calculation results, since Gini (D, 3 from the origin) is 0.512, the optimum cut point having the attribute "origin" of "3 from the origin" is selected.

The attributes of "score", "job category", "gender", "job matching" and "cadre" are all binary and do not need to be segmented. Their respective damping coefficients are:

gini (D, score 0) 0.587

Gini (D, job category 0) ═ 0.625

Gini (D, sex-0) ═ 0.598

Gini (D, job matching 0) 0.621

Gini (D, dry or dry part 0) 0.579

As can be seen from the analysis of the above calculation results, since Gini (D, where the origin is 3) ═ 0.512 is the minimum, "origin" is selected as the optimal feature, and "origin is 3" is selected as the optimal division point, that is, the attribute "origin" is selected as the root node, and "origin is 3" is selected as the optimal division point of the root node. The root node generates two child nodes, one is a leaf node, the other leaf node continuously selects the optimal feature and the optimal segmentation point thereof in the 'achievement', 'position category', 'gender', 'position matching' and 'whether cadre' by using the method, and finally the construction of the university student employment tendency area decision tree model is completed. The constructed decision tree prediction model of the university student employment area is shown in fig. 5.

The effectiveness of the created prediction model is evaluated below. In this example, the number of training sets is 350 and the number of test sets is 150. Based on the decision tree prediction model for the university student employment area established above, the preparation rate was calculated to be 67.67% from equation (5).

The method is applied to actual undergraduate employment area prediction, a prediction tool page is shown in figure 6, and a prediction result display page is shown in figure 7.

Claims

1. An university student employment prediction method based on a CART decision tree is characterized by comprising the following steps:

s1: preprocessing information data of college students;

setting the undergraduate employment prediction target attribute set as Y ═ Y₁,y₂,...y_|Y|Y, where Y is the number of values of the predicted target attribute_uIs a predicted target attribute value; let the element N in N_iAnd the element Y in Y_uHas a Pearson correlation coefficient of lambda_i,u；

calculated in the training set S

Coefficient of kini of

2. The CART decision tree-based college student employment prediction method according to claim 1, wherein: in step S2, the pearson correlation coefficient λ of Y and N is calculated_i，uThe method comprises the following steps:

cov (n) among them_i,y_u) Is n_iAnd y_uThe variance of the covariance,

and

are each n_iAnd y_uStandard deviation of (2).

3. The CART decision tree-based college student employment prediction method according to claim 1, wherein: in step S3, 70% of the data is set as the training set, and 30% of the data is set as the test set.

4. The CART decision tree-based college student employment prediction method according to claim 1 or 3, wherein: in step S3, the method for solving the kini coefficient of the basic attributes of the college students in the training set S includes:

when x is_iTake a value of

When is recorded as

When x is_iValue is not

When is recorded as

Whereby S can be divided into

And

two parts, the number of corresponding training sets is respectively

And

at S, when

When Y takes on value Y_uHas a probability of

When in use

When Y takes on value Y_uIs given by

Then it is determined that,

of (2) aThe damping coefficient can be expressed as:

in the same way

The kini coefficient of (a) can be expressed as:

by

And

as can be seen, for S, V (x)_i) Get

Coefficient of kini of

Can be expressed as:

5. the CART decision tree-based college student employment prediction method according to claim 1 or 3, wherein: in the step S3, based on

let the threshold value of the Keyney coefficient be l

Inputting: s, X ═ X₁,x₂,…,x_m}，l，m；

And (3) outputting: a decision tree T;

step 1: computing

If it is not

T is a single node tree; otherwise, turning to Step 2;

step2 for

Solving for their minimum value, noting that the minimum value is

Get

Is a cutting point of a binary tree;

step3 according to x in S_iWhether the value is equal to

Divide S into two subsets

And

and will be

And

is distributed into two child nodes, and if the child node has a damping coefficient less than l, the child node is a leaf node, such asIf the two child nodes are leaf nodes, returning to the decision tree T, otherwise, performing Step 4;

step4 for non-leaf nodes, respectively in order

And

order to

And recursively calling Step1 to Step4 to generate a binary decision tree T.