CN112418274A

CN112418274A - Decision tree generation method and device

Info

Publication number: CN112418274A
Application number: CN202011205097.1A
Authority: CN
Inventors: 李龙飞
Original assignee: Advanced New Technologies Co Ltd
Current assignee: Alibaba Group Holding Ltd; Advanced New Technologies Co Ltd
Priority date: 2018-09-21
Filing date: 2018-09-21
Publication date: 2021-02-26
Also published as: WO2020057301A1; TW202013266A; CN109242034B; CN109242034A

Abstract

The specification discloses a decision tree generation method and apparatus. The method comprises the following steps: acquiring a basic decision tree, wherein the basic decision tree is generated based on first type of sample data; extracting a tree skeleton of the base decision tree, wherein the tree skeleton comprises splitting characteristics of nodes and does not comprise splitting values or comprises partial splitting values; and training the missing split value of the tree skeleton by using the second type of sample data to obtain the target decision tree.

Description

Decision tree generation method and device

Technical Field

The present disclosure relates to the field of machine learning technologies, and in particular, to a method and an apparatus for generating a decision tree.

Background

The decision tree is a basic supervised learning model, and can continuously cut data so as to achieve the purpose of segmenting the data. The generation of the decision tree depends on a large number of samples with labels, and when the number of the samples is small, the effect of the trained decision tree is poor.

Disclosure of Invention

In view of the above, the present specification provides a decision tree generation method and apparatus.

Specifically, the description is realized by the following technical scheme:

a decision tree generation method, comprising:

acquiring a basic decision tree, wherein the basic decision tree is generated based on first type of sample data;

extracting a tree skeleton of the base decision tree, wherein the tree skeleton comprises splitting characteristics of nodes and does not comprise splitting values or comprises partial splitting values;

and training the missing split value of the tree skeleton by using the second type of sample data to obtain the target decision tree.

A decision tree generation apparatus comprising:

a basic obtaining unit for obtaining a basic decision tree, wherein the basic decision tree is generated based on first type sample data;

a skeleton extraction unit, which extracts a tree skeleton of the basic decision tree, wherein the tree skeleton comprises splitting characteristics of nodes and does not comprise splitting values or comprises partial splitting values;

and the target training unit is used for training the split value lacking in the tree skeleton by using the second type of sample data to obtain the target decision tree.

A decision tree generation apparatus comprising:

a processor;

a memory for storing machine executable instructions;

wherein, by reading and executing machine-executable instructions stored by the memory that correspond to decision tree generation logic, the processor is caused to:

From the above description, it can be seen that the present specification can extract a tree skeleton from a basic decision tree, migrate the tree skeleton to a scene with less sample data, and train the tree skeleton based on the sample data in the scene, thereby generating a credible decision tree for the scene with less sample data, and solving the problem of model training for the scene with less sample data.

Drawings

Fig. 1 is a flowchart illustrating a decision tree generation method according to an exemplary embodiment of the present disclosure.

FIG. 2 is a diagram of a base decision tree that is shown in an exemplary embodiment of the present description.

Fig. 3 is a schematic diagram of a tree skeleton according to an exemplary embodiment of the present specification.

Fig. 4 is a schematic structural diagram of a decision tree generating apparatus according to an exemplary embodiment of the present specification.

Fig. 5 is a block diagram illustrating a decision tree generation apparatus according to an exemplary embodiment of the present specification.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present specification. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the specification, as detailed in the appended claims.

The terminology used in the description herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the description. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, the first information may also be referred to as second information, and similarly, the second information may also be referred to as first information, without departing from the scope of the present specification. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

The present specification provides a decision tree generation scheme, which may extract a tree skeleton from a decision tree in a scene with a large sample size, migrate the tree skeleton to a scene with a small sample size, and train the tree skeleton based on sample data of the scene, thereby training a more reliable decision tree for a scene with a small sample size.

Referring to fig. 1, the decision tree generating method may include the following steps:

step 102, a basic decision tree is obtained, and the basic decision tree is generated based on the first type of sample data.

In this embodiment, the first type of sample data is from a first scenario, where the first scenario is generally a scenario with a large sample size, and a decision tree oriented to a specified subject may be generated based on the first type of sample data, and for convenience of distinction, the decision tree may be referred to as a basic decision tree.

For example, the base decision tree may be generated using algorithms such as C4.5, C5, and the like.

For another example, a GBDT (Gradient Boosting Decision Tree) algorithm may be used to generate a base Decision Tree containing a Tree.

In this embodiment, the theme is generally a classification determination theme, such as cash register determination, abnormal account determination, money laundering determination, and the like, and this specification does not particularly limit this.

In this embodiment, the generated basic decision tree is more reliable due to the larger amount of the first type samples.

And 104, extracting a tree skeleton of the basic decision tree, wherein the tree skeleton comprises the splitting characteristics of the nodes and does not comprise splitting values or comprises partial splitting values.

In this embodiment, the partial nodes and the branch paths between the partial nodes may be extracted downward from the root node of the basic decision tree, or all the nodes and the branch paths between all the nodes of the basic decision tree may be extracted downward from the root node to generate the tree skeleton.

The tree skeleton may include a splitting feature for extracting a node, but may not include a splitting value of the splitting feature, and may also include a splitting value of a partial splitting feature, which is not particularly limited in this specification.

And 106, training the missing split values of the tree skeleton by using second type of sample data to obtain a target decision tree.

In this embodiment, the second type of sample data comes from a second scenario, which is a scenario with a smaller sample size, and has partially the same characteristics as the first scenario, for example, the total amount of transactions in the last 3 days, the total number of money transfers in the current day, and the like. The decision tree fitting degree generated based on the second type of sample data is often too high, and the reliability is poor. In this step, the tree skeleton extracted in the foregoing step 104 may be trained based on a second type of sample data, so as to obtain a split value that the tree skeleton lacks, and then the tree skeleton may be further extended to generate a target decision tree with the same theme for the second scene.

From the above description, it can be seen that the present specification can extract a tree skeleton from a basic decision tree, migrate the tree skeleton to a scene with less sample data, and train the tree skeleton based on the sample data in the scene, thereby generating a more reliable decision tree for the scene with less sample data, and solving the problem of model training for the scene with less sample data.

The following describes a specific implementation process of the present specification in detail, taking the fact that a specific subject is a set of judgment as an example.

The cash register refers to cash withdrawal, and generally refers to the exchange of illegal or false means for obtaining cash benefits, such as credit card cash register, credit product cash register, and the like.

In the present embodiment, it is assumed that the first scenario is an O2O (Online To Offline) scenario, for example, an Offline code swipe payment. Assume that the second scenario is a receive code scenario, e.g., the user scans a merchant's static two-dimensional code for payment.

In this embodiment, there are many cash-out determination samples in the O2O scenario, and based on the first type of sample data in the O2O scenario, the basic decision tree for cash-out determination may be generated by using algorithms such as C4.5 and C5.

Assume that the basic decision tree trained in the O2O scenario is shown in fig. 2. Referring to fig. 2, node 1 is the root node of the base decision tree, nodes 2 to 7 are the normal tree nodes of the decision tree, and nodes 8 to 15 are the leaf nodes of the base decision tree.

The basic decision tree includes several divergent paths for connecting the respective nodes, for example, path 12 connects root node 1 and common tree node 2, path 13 connects root node 1 and common tree node 3, etc.

The maximum depth of the basic decision tree is 3, and the depth can be understood as the distance from a node to a root node, for example, the distance from a common tree node 2 to a root node 1 is 1, that is, the depth of the common tree node 2 is 1; the distance from the leaf node 8 to the root node 1 is 3, i.e. the depth of the leaf node 8 is 3 etc.

Node point	Splitting feature
		Root node 1	Total amount traded in last 10 days
Common tree node 2	Total amount traded in last 5 days
		Common tree node 3	Number of accounts transferred in nearly 5 days
Common tree node 4	Number of accounts transferred in nearly 8 days
		Common tree node 5	Number of accounts transferred in nearly 3 days
…	…

TABLE 1

Each node in the basic decision tree except for the leaf node may represent a splitting characteristic, and referring to the example in table 1, the splitting characteristic represented by the root node 1 is a total transaction amount of approximately 10 days, the splitting characteristic represented by the common tree node 2 is a total transaction amount of approximately 5 days, and the splitting characteristic represented by the common tree node 3 is a total number of account transfers of approximately 5 days, etc.

Node point	Splitting feature	Splitting value
			Root node 1	Total amount traded in last 10 days	1000
Common tree node 2	Total amount traded in last 5 days	500
			Common tree node 3	Number of accounts transferred in nearly 5 days	8
Common tree node 4	Number of accounts transferred in nearly 8 days	12
			Common tree node 5	Number of accounts transferred in nearly 3 days	5
…	…

TABLE 2

Each splitting feature may correspond to a splitting value, and a unique bifurcation path may be determined based on the splitting value and a selection policy of the bifurcation path. The selection policy of the branch path may be preset, for example, the branch path on the left corresponds to a determination result smaller than or equal to the split value, and the branch path on the right corresponds to a determination result larger than the split value.

Referring to the example of table 2, the splitting value of the total transaction amount of the root node 1 in the last 10 days is 1000, when the total transaction amount of the last 10 days is less than or equal to 1000, it may be determined that the branch path is 12, the node jumps to the common tree node 2, and the size relationship between the total transaction amount of the last 5 days and the split value is continuously determined. And when the total transaction amount is more than 1000 in the last 10 days, determining that the branch path is 13, jumping to the common tree node 3, continuously judging the size relationship between the number of the account transfers in the last 5 days and the split value 8, and repeating the steps.

For example, assuming that the total transaction amount of an account is 950 in the last 10 days and 550 in the last 5 days, the path of the account in the basic decision tree shown in fig. 2 is root node 1-normal tree node 2-normal tree node 5 …, and so on.

It should be noted that fig. 2 is only an exemplary illustration, and in practical applications, the generated basic decision tree is generally more complex than fig. 2.

In this embodiment, after the basic decision tree is generated, the extraction of the tree skeleton may be performed.

In one example, nodes at or below a specified depth and diverging paths between the nodes may be extracted downward from a root node of the base decision tree.

The specified depth is typically less than the maximum depth of the base decision tree, and may be preset, e.g., by a business person based on experience, etc.

Assuming that the specified depth is 2, and taking the basic decision tree shown in fig. 2 as an example, the respective nodes with depths of 1 and 2 and the divergent paths between the nodes, i.e., the nodes 1 to 7 and the divergent paths between the nodes 1 to 7, can be extracted from the root node: path 12, path 13, path 24, path 25, path 36, and path 37, and results in the tree skeleton shown in fig. 3.

In this embodiment, the tree skeleton includes the splitting characteristics represented by the extracted nodes, that is, the total transaction amount of the splitting characteristics including the root node 1 in approximately 10 days, the total transaction amount of the splitting characteristics including the common tree node 2 in approximately 5 days, and the like.

The tree skeleton may not include the splitting values of the splitting features, or may include the splitting values of the partial splitting features, for example, the splitting values of the splitting features of only the root node 1, the common tree node 2, and the common tree node 3 may be included, which is not particularly limited in this specification.

In another example, all nodes of the base decision tree and the branching paths between all nodes may be extracted from the root node of the base decision tree downward to obtain the tree skeleton of the base decision tree.

The tree skeleton may not include the split value of each split feature, and may also include the split value of a partial split feature, which is not particularly limited in this specification.

In another example, the extraction of the tree skeleton may not be performed based on the depth. Still taking fig. 2 as an example, the root node 1 and the normal tree node 2 to the normal tree node 5 can be extracted.

In this embodiment, after the tree skeleton of the basic decision tree is extracted, the tree skeleton may be trained by using the second type of sample data in the money receiving code scene to obtain the missing split value of the tree skeleton.

Node point	Splitting feature	Splitting value
			Root node 1	Total amount traded in last 10 days	800
Common tree node 2	Total amount traded in last 5 days	400
			Common tree node 3	Number of accounts transferred in nearly 5 days	7
Common tree node 4	Number of accounts transferred in nearly 8 days	10
			Common tree node 5	Number of accounts transferred in nearly 3 days	4
…	…

TABLE 3

Taking the splitting value of the tree skeleton without any splitting feature as an example, based on the second type of sample data in the money receiving code scene, the splitting value of each splitting feature can be trained. Referring to the example of table 3, it can be obtained that the splitting value of the total transaction amount of the root node 1 in the last 10 days is 800, and according to the predetermined bifurcation path selection policy, when the total transaction amount of the last 10 days is less than or equal to 800, it can be determined that the bifurcation path is 12, and so on.

In this embodiment, after the split values of the split features in the tree skeleton are obtained, fitting extension may be continued on the tree skeleton based on the second type of sample data, and the split features and the split values of the extended nodes are determined until the model converges to obtain the target decision tree, thereby completing training of the recurrence judgment decision tree in the money receiving code scene.

In general, a leaf node is generally considered to be untrustworthy when the amount of black samples of the leaf node is small. Optionally, for a target decision tree obtained through training, the confidence level of each leaf node in the target decision tree may be calculated by using a second type of sample data in a second scene, and then the leaf nodes with the confidence levels not meeting the confidence condition are filtered to simplify the target decision tree.

Taking the GBDT algorithm as an example, the leaf nodes of the target decision tree may be scored based on all the second type of sample data, and for each leaf node, the scoring results may be summarized and may be used as the reliability of the leaf node. Assuming that the confidence condition is that the confidence level is ranked at the top 1%, the leaf nodes with confidence levels ranked at the top 1% may be retained, and the leaf nodes ranked at the back may be filtered.

It should be noted that, in practical applications, to ensure the integrity of the target decision tree, the leaf nodes that do not satisfy the confidence condition may not be pruned, and only in the use of the target decision tree, the leaf nodes that do not satisfy the confidence condition are not used.

Alternatively, the specification may automatically generate decision rules for the model for a finance-related target decision tree with high requirements on interpretability.

In this example, for each leaf node of the trained target decision tree, a complete path from a root node to the leaf node may be obtained from bottom to top, and then a decision rule corresponding to the target decision tree is generated according to the split feature and the split value of the node on the complete path.

Referring to fig. 3, the target decision tree shown in fig. 3 includes 4 complete paths, which are node 1-node 2-node 4, node 1-node 2-node 5, node 1-node 3-node 6, and node 1-node 3-node 7.

Assuming that the splitting characteristics and the splitting values represented by the nodes are shown in table 2, a logical and may be used to connect the splitting characteristics and the splitting values. Taking node 1-node 2-node 4 as an example, the corresponding determination rule is as follows: the total amount of transactions in the last 10 days is more than or equal to 1000and the total amount of transactions in the last 5 days is more than or equal to 500and the number of the transfer accounts in the last 8 days is more than or equal to 12.

In this way, the respective decision rules of the target decision tree can be automatically generated.

Corresponding to the embodiments of the foregoing decision tree generating method, the present specification also provides embodiments of a decision tree generating apparatus.

The embodiments of the decision tree generation apparatus of the present specification can be applied to a server. The device embodiments may be implemented by software, or by hardware, or by a combination of hardware and software. Taking a software implementation as an example, as a logical device, the device is formed by reading corresponding computer program instructions in the nonvolatile memory into the memory for operation through the processor of the server where the device is located. From a hardware aspect, as shown in fig. 4, the hardware structure diagram of the server where the decision tree generation apparatus is located in this specification is shown, except for the processor, the memory, the network interface, and the nonvolatile memory shown in fig. 4, the server where the apparatus is located in the embodiment may also include other hardware according to the actual function of the server, which is not described again.

Referring to fig. 5, the decision tree generating apparatus 400 can be applied to the server shown in fig. 4, and includes: a basis acquisition unit 401, a skeleton extraction unit 402, a target training unit 403, and a rule generation unit 404.

The basic obtaining unit 401 obtains a basic decision tree, where the basic decision tree is generated based on a first type of sample data;

a skeleton extraction unit 402, configured to extract a tree skeleton of the basic decision tree, where the tree skeleton includes splitting features of nodes and does not include a splitting value or includes a partial splitting value;

and the target training unit 403 trains the missing split values of the tree skeleton by using the second type of sample data to obtain a target decision tree.

Optionally, the skeleton extracting unit 402 extracts, from a root node of the basic decision tree, nodes with a specified depth or less and branch paths between the nodes, where the specified depth is less than the depth of the basic decision tree.

Optionally, the skeleton extracting unit 402 extracts all nodes of the basic decision tree and branch paths between all nodes from a root node of the basic decision tree.

Optionally, the target training unit 403, after obtaining the splitting value that the tree skeleton lacks through training with the second type of sample data, extends the tree skeleton based on the second sample data, and determines the splitting characteristic and the splitting value of the extended node until convergence.

A rule generating unit 404, configured to obtain, for each leaf node of the target decision tree, a complete path from a root node to the leaf node;

and generating a judgment rule corresponding to the target decision tree according to the splitting characteristics and the splitting values of the nodes on the complete path.

The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.

For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution in the specification. One of ordinary skill in the art can understand and implement it without inventive effort.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. A typical implementation device is a computer, which may take the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.

Corresponding to the embodiment of the foregoing decision tree generating method, this specification further provides a decision tree generating device, including: a processor and a memory for storing machine executable instructions. Wherein the processor and the memory are typically interconnected by means of an internal bus. In other possible implementations, the device may also include an external interface to enable communication with other devices or components.

In this embodiment, the processor is caused to:

Optionally, in extracting the tree skeleton of the base decision tree, the processor is caused to:

and extracting nodes with the depth less than or equal to a specified depth and a branching path between the nodes from the root node of the basic decision tree downwards, wherein the specified depth is less than the depth of the basic decision tree.

and extracting all nodes of the basic decision tree and branch paths among all the nodes from the root node of the basic decision tree downwards.

Optionally, the processor is further caused to:

after a splitting value lacking in the tree skeleton is obtained by training with second type of sample data, extending the tree skeleton based on the second sample data, and determining splitting characteristics and splitting values of extended nodes until convergence.

Optionally, the processor is further caused to:

aiming at each leaf node of the target decision tree, acquiring a complete path from a root node to the leaf node;

In correspondence with the aforementioned embodiments of the decision tree generation method, the present specification also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of:

Optionally, extracting the tree skeleton of the basic decision tree includes:

Optionally, the method further includes:

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The above description is only a preferred embodiment of the present disclosure, and should not be taken as limiting the present disclosure, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.

Claims

1. A decision tree generation method for determination of an abnormal account number, cash register determination or money laundering determination, the method comprising:

acquiring a basic decision tree, wherein the basic decision tree is generated based on first type of sample data in a first scene;

training the missing split value of the tree skeleton by using second type of sample data in a second scene to obtain a target decision tree;

wherein the number of the first type of sample data in the first scene is greater than the number of the second type of sample data in the second scene.

2. The method of claim 1, extracting a tree skeleton of the base decision tree comprising:

3. The method of claim 1, extracting a tree skeleton of the base decision tree comprising:

4. The method of claim 2 or 3, further comprising:

after a splitting value lacking in the tree skeleton is obtained by training with second type of sample data, extending the tree skeleton based on the second type of sample data, and determining splitting characteristics and splitting values of extended nodes until convergence.

5. The method of claim 1, further comprising:

6. A decision tree generation apparatus for determination of an abnormal account number, cash register determination or money laundering determination, comprising:

the basic decision tree generating unit is used for generating a basic decision tree based on first type of sample data in a first scene;

the target training unit is used for training the missing split values of the tree framework by using second type of sample data in a second scene to obtain a target decision tree;

7. The apparatus of claim 6, wherein the first and second electrodes are disposed on opposite sides of the substrate,

and the skeleton extraction unit extracts nodes with a specified depth and a bifurcation path between the nodes from the root node of the basic decision tree downwards, wherein the specified depth is less than the depth of the basic decision tree.

8. The apparatus of claim 6, wherein the first and second electrodes are disposed on opposite sides of the substrate,

and the skeleton extraction unit is used for downwards extracting all nodes of the basic decision tree and the branch paths among all the nodes from the root node of the basic decision tree.

9. The apparatus of claim 7 or 8,

and the target training unit is used for training by using second type of sample data to obtain the splitting value lacking in the tree skeleton, extending the tree skeleton based on the second type of sample data, and determining the splitting characteristic and the splitting value of the extending node until convergence.

10. The apparatus of claim 6, further comprising:

the rule generating unit is used for acquiring a complete path from a root node to each leaf node of the target decision tree;

11. A decision tree generation apparatus for determination of an abnormal account number, cash register determination or money laundering determination, comprising:

a processor;

a memory for storing machine executable instructions;