CN112418274A - Decision tree generation method and device - Google Patents

Decision tree generation method and device Download PDF

Info

Publication number
CN112418274A
CN112418274A CN202011205097.1A CN202011205097A CN112418274A CN 112418274 A CN112418274 A CN 112418274A CN 202011205097 A CN202011205097 A CN 202011205097A CN 112418274 A CN112418274 A CN 112418274A
Authority
CN
China
Prior art keywords
decision tree
tree
splitting
skeleton
sample data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011205097.1A
Other languages
Chinese (zh)
Inventor
李龙飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Advanced New Technologies Co Ltd
Original Assignee
Advanced New Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Advanced New Technologies Co Ltd filed Critical Advanced New Technologies Co Ltd
Priority to CN202011205097.1A priority Critical patent/CN112418274A/en
Publication of CN112418274A publication Critical patent/CN112418274A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The specification discloses a decision tree generation method and apparatus. The method comprises the following steps: acquiring a basic decision tree, wherein the basic decision tree is generated based on first type of sample data; extracting a tree skeleton of the base decision tree, wherein the tree skeleton comprises splitting characteristics of nodes and does not comprise splitting values or comprises partial splitting values; and training the missing split value of the tree skeleton by using the second type of sample data to obtain the target decision tree.

Description

Decision tree generation method and device
Technical Field
The present disclosure relates to the field of machine learning technologies, and in particular, to a method and an apparatus for generating a decision tree.
Background
The decision tree is a basic supervised learning model, and can continuously cut data so as to achieve the purpose of segmenting the data. The generation of the decision tree depends on a large number of samples with labels, and when the number of the samples is small, the effect of the trained decision tree is poor.
Disclosure of Invention
In view of the above, the present specification provides a decision tree generation method and apparatus.
Specifically, the description is realized by the following technical scheme:
a decision tree generation method, comprising:
acquiring a basic decision tree, wherein the basic decision tree is generated based on first type of sample data;
extracting a tree skeleton of the base decision tree, wherein the tree skeleton comprises splitting characteristics of nodes and does not comprise splitting values or comprises partial splitting values;
and training the missing split value of the tree skeleton by using the second type of sample data to obtain the target decision tree.
A decision tree generation apparatus comprising:
a basic obtaining unit for obtaining a basic decision tree, wherein the basic decision tree is generated based on first type sample data;
a skeleton extraction unit, which extracts a tree skeleton of the basic decision tree, wherein the tree skeleton comprises splitting characteristics of nodes and does not comprise splitting values or comprises partial splitting values;
and the target training unit is used for training the split value lacking in the tree skeleton by using the second type of sample data to obtain the target decision tree.
A decision tree generation apparatus comprising:
a processor;
a memory for storing machine executable instructions;
wherein, by reading and executing machine-executable instructions stored by the memory that correspond to decision tree generation logic, the processor is caused to:
acquiring a basic decision tree, wherein the basic decision tree is generated based on first type of sample data;
extracting a tree skeleton of the base decision tree, wherein the tree skeleton comprises splitting characteristics of nodes and does not comprise splitting values or comprises partial splitting values;
and training the missing split value of the tree skeleton by using the second type of sample data to obtain the target decision tree.
From the above description, it can be seen that the present specification can extract a tree skeleton from a basic decision tree, migrate the tree skeleton to a scene with less sample data, and train the tree skeleton based on the sample data in the scene, thereby generating a credible decision tree for the scene with less sample data, and solving the problem of model training for the scene with less sample data.
Drawings
Fig. 1 is a flowchart illustrating a decision tree generation method according to an exemplary embodiment of the present disclosure.
FIG. 2 is a diagram of a base decision tree that is shown in an exemplary embodiment of the present description.
Fig. 3 is a schematic diagram of a tree skeleton according to an exemplary embodiment of the present specification.
Fig. 4 is a schematic structural diagram of a decision tree generating apparatus according to an exemplary embodiment of the present specification.
Fig. 5 is a block diagram illustrating a decision tree generation apparatus according to an exemplary embodiment of the present specification.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present specification. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the specification, as detailed in the appended claims.
The terminology used in the description herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the description. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, the first information may also be referred to as second information, and similarly, the second information may also be referred to as first information, without departing from the scope of the present specification. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
The present specification provides a decision tree generation scheme, which may extract a tree skeleton from a decision tree in a scene with a large sample size, migrate the tree skeleton to a scene with a small sample size, and train the tree skeleton based on sample data of the scene, thereby training a more reliable decision tree for a scene with a small sample size.
Fig. 1 is a flowchart illustrating a decision tree generation method according to an exemplary embodiment of the present disclosure.
Referring to fig. 1, the decision tree generating method may include the following steps:
step 102, a basic decision tree is obtained, and the basic decision tree is generated based on the first type of sample data.
In this embodiment, the first type of sample data is from a first scenario, where the first scenario is generally a scenario with a large sample size, and a decision tree oriented to a specified subject may be generated based on the first type of sample data, and for convenience of distinction, the decision tree may be referred to as a basic decision tree.
For example, the base decision tree may be generated using algorithms such as C4.5, C5, and the like.
For another example, a GBDT (Gradient Boosting Decision Tree) algorithm may be used to generate a base Decision Tree containing a Tree.
In this embodiment, the theme is generally a classification determination theme, such as cash register determination, abnormal account determination, money laundering determination, and the like, and this specification does not particularly limit this.
In this embodiment, the generated basic decision tree is more reliable due to the larger amount of the first type samples.
And 104, extracting a tree skeleton of the basic decision tree, wherein the tree skeleton comprises the splitting characteristics of the nodes and does not comprise splitting values or comprises partial splitting values.
In this embodiment, the partial nodes and the branch paths between the partial nodes may be extracted downward from the root node of the basic decision tree, or all the nodes and the branch paths between all the nodes of the basic decision tree may be extracted downward from the root node to generate the tree skeleton.
The tree skeleton may include a splitting feature for extracting a node, but may not include a splitting value of the splitting feature, and may also include a splitting value of a partial splitting feature, which is not particularly limited in this specification.
And 106, training the missing split values of the tree skeleton by using second type of sample data to obtain a target decision tree.
In this embodiment, the second type of sample data comes from a second scenario, which is a scenario with a smaller sample size, and has partially the same characteristics as the first scenario, for example, the total amount of transactions in the last 3 days, the total number of money transfers in the current day, and the like. The decision tree fitting degree generated based on the second type of sample data is often too high, and the reliability is poor. In this step, the tree skeleton extracted in the foregoing step 104 may be trained based on a second type of sample data, so as to obtain a split value that the tree skeleton lacks, and then the tree skeleton may be further extended to generate a target decision tree with the same theme for the second scene.
From the above description, it can be seen that the present specification can extract a tree skeleton from a basic decision tree, migrate the tree skeleton to a scene with less sample data, and train the tree skeleton based on the sample data in the scene, thereby generating a more reliable decision tree for the scene with less sample data, and solving the problem of model training for the scene with less sample data.
The following describes a specific implementation process of the present specification in detail, taking the fact that a specific subject is a set of judgment as an example.
The cash register refers to cash withdrawal, and generally refers to the exchange of illegal or false means for obtaining cash benefits, such as credit card cash register, credit product cash register, and the like.
In the present embodiment, it is assumed that the first scenario is an O2O (Online To Offline) scenario, for example, an Offline code swipe payment. Assume that the second scenario is a receive code scenario, e.g., the user scans a merchant's static two-dimensional code for payment.
In this embodiment, there are many cash-out determination samples in the O2O scenario, and based on the first type of sample data in the O2O scenario, the basic decision tree for cash-out determination may be generated by using algorithms such as C4.5 and C5.
Assume that the basic decision tree trained in the O2O scenario is shown in fig. 2. Referring to fig. 2, node 1 is the root node of the base decision tree, nodes 2 to 7 are the normal tree nodes of the decision tree, and nodes 8 to 15 are the leaf nodes of the base decision tree.
The basic decision tree includes several divergent paths for connecting the respective nodes, for example, path 12 connects root node 1 and common tree node 2, path 13 connects root node 1 and common tree node 3, etc.
The maximum depth of the basic decision tree is 3, and the depth can be understood as the distance from a node to a root node, for example, the distance from a common tree node 2 to a root node 1 is 1, that is, the depth of the common tree node 2 is 1; the distance from the leaf node 8 to the root node 1 is 3, i.e. the depth of the leaf node 8 is 3 etc.
Node point Splitting feature
Root node 1 Total amount traded in last 10 days
Common tree node 2 Total amount traded in last 5 days
Common tree node 3 Number of accounts transferred in nearly 5 days
Common tree node 4 Number of accounts transferred in nearly 8 days
Common tree node 5 Number of accounts transferred in nearly 3 days
TABLE 1
Each node in the basic decision tree except for the leaf node may represent a splitting characteristic, and referring to the example in table 1, the splitting characteristic represented by the root node 1 is a total transaction amount of approximately 10 days, the splitting characteristic represented by the common tree node 2 is a total transaction amount of approximately 5 days, and the splitting characteristic represented by the common tree node 3 is a total number of account transfers of approximately 5 days, etc.
Node point Splitting feature Splitting value
Root node 1 Total amount traded in last 10 days 1000
Common tree node 2 Total amount traded in last 5 days 500
Common tree node 3 Number of accounts transferred in nearly 5 days 8
Common tree node 4 Number of accounts transferred in nearly 8 days 12
Common tree node 5 Number of accounts transferred in nearly 3 days 5
TABLE 2
Each splitting feature may correspond to a splitting value, and a unique bifurcation path may be determined based on the splitting value and a selection policy of the bifurcation path. The selection policy of the branch path may be preset, for example, the branch path on the left corresponds to a determination result smaller than or equal to the split value, and the branch path on the right corresponds to a determination result larger than the split value.
Referring to the example of table 2, the splitting value of the total transaction amount of the root node 1 in the last 10 days is 1000, when the total transaction amount of the last 10 days is less than or equal to 1000, it may be determined that the branch path is 12, the node jumps to the common tree node 2, and the size relationship between the total transaction amount of the last 5 days and the split value is continuously determined. And when the total transaction amount is more than 1000 in the last 10 days, determining that the branch path is 13, jumping to the common tree node 3, continuously judging the size relationship between the number of the account transfers in the last 5 days and the split value 8, and repeating the steps.
For example, assuming that the total transaction amount of an account is 950 in the last 10 days and 550 in the last 5 days, the path of the account in the basic decision tree shown in fig. 2 is root node 1-normal tree node 2-normal tree node 5 …, and so on.
It should be noted that fig. 2 is only an exemplary illustration, and in practical applications, the generated basic decision tree is generally more complex than fig. 2.
In this embodiment, after the basic decision tree is generated, the extraction of the tree skeleton may be performed.
In one example, nodes at or below a specified depth and diverging paths between the nodes may be extracted downward from a root node of the base decision tree.
The specified depth is typically less than the maximum depth of the base decision tree, and may be preset, e.g., by a business person based on experience, etc.
Assuming that the specified depth is 2, and taking the basic decision tree shown in fig. 2 as an example, the respective nodes with depths of 1 and 2 and the divergent paths between the nodes, i.e., the nodes 1 to 7 and the divergent paths between the nodes 1 to 7, can be extracted from the root node: path 12, path 13, path 24, path 25, path 36, and path 37, and results in the tree skeleton shown in fig. 3.
In this embodiment, the tree skeleton includes the splitting characteristics represented by the extracted nodes, that is, the total transaction amount of the splitting characteristics including the root node 1 in approximately 10 days, the total transaction amount of the splitting characteristics including the common tree node 2 in approximately 5 days, and the like.
The tree skeleton may not include the splitting values of the splitting features, or may include the splitting values of the partial splitting features, for example, the splitting values of the splitting features of only the root node 1, the common tree node 2, and the common tree node 3 may be included, which is not particularly limited in this specification.
In another example, all nodes of the base decision tree and the branching paths between all nodes may be extracted from the root node of the base decision tree downward to obtain the tree skeleton of the base decision tree.
The tree skeleton may not include the split value of each split feature, and may also include the split value of a partial split feature, which is not particularly limited in this specification.
In another example, the extraction of the tree skeleton may not be performed based on the depth. Still taking fig. 2 as an example, the root node 1 and the normal tree node 2 to the normal tree node 5 can be extracted.
In this embodiment, after the tree skeleton of the basic decision tree is extracted, the tree skeleton may be trained by using the second type of sample data in the money receiving code scene to obtain the missing split value of the tree skeleton.
Node point Splitting feature Splitting value
Root node 1 Total amount traded in last 10 days 800
Common tree node 2 Total amount traded in last 5 days 400
Common tree node 3 Number of accounts transferred in nearly 5 days 7
Common tree node 4 Number of accounts transferred in nearly 8 days 10
Common tree node 5 Number of accounts transferred in nearly 3 days 4
TABLE 3
Taking the splitting value of the tree skeleton without any splitting feature as an example, based on the second type of sample data in the money receiving code scene, the splitting value of each splitting feature can be trained. Referring to the example of table 3, it can be obtained that the splitting value of the total transaction amount of the root node 1 in the last 10 days is 800, and according to the predetermined bifurcation path selection policy, when the total transaction amount of the last 10 days is less than or equal to 800, it can be determined that the bifurcation path is 12, and so on.
In this embodiment, after the split values of the split features in the tree skeleton are obtained, fitting extension may be continued on the tree skeleton based on the second type of sample data, and the split features and the split values of the extended nodes are determined until the model converges to obtain the target decision tree, thereby completing training of the recurrence judgment decision tree in the money receiving code scene.
In general, a leaf node is generally considered to be untrustworthy when the amount of black samples of the leaf node is small. Optionally, for a target decision tree obtained through training, the confidence level of each leaf node in the target decision tree may be calculated by using a second type of sample data in a second scene, and then the leaf nodes with the confidence levels not meeting the confidence condition are filtered to simplify the target decision tree.
Taking the GBDT algorithm as an example, the leaf nodes of the target decision tree may be scored based on all the second type of sample data, and for each leaf node, the scoring results may be summarized and may be used as the reliability of the leaf node. Assuming that the confidence condition is that the confidence level is ranked at the top 1%, the leaf nodes with confidence levels ranked at the top 1% may be retained, and the leaf nodes ranked at the back may be filtered.
It should be noted that, in practical applications, to ensure the integrity of the target decision tree, the leaf nodes that do not satisfy the confidence condition may not be pruned, and only in the use of the target decision tree, the leaf nodes that do not satisfy the confidence condition are not used.
Alternatively, the specification may automatically generate decision rules for the model for a finance-related target decision tree with high requirements on interpretability.
In this example, for each leaf node of the trained target decision tree, a complete path from a root node to the leaf node may be obtained from bottom to top, and then a decision rule corresponding to the target decision tree is generated according to the split feature and the split value of the node on the complete path.
Referring to fig. 3, the target decision tree shown in fig. 3 includes 4 complete paths, which are node 1-node 2-node 4, node 1-node 2-node 5, node 1-node 3-node 6, and node 1-node 3-node 7.
Assuming that the splitting characteristics and the splitting values represented by the nodes are shown in table 2, a logical and may be used to connect the splitting characteristics and the splitting values. Taking node 1-node 2-node 4 as an example, the corresponding determination rule is as follows: the total amount of transactions in the last 10 days is more than or equal to 1000and the total amount of transactions in the last 5 days is more than or equal to 500and the number of the transfer accounts in the last 8 days is more than or equal to 12.
In this way, the respective decision rules of the target decision tree can be automatically generated.
Corresponding to the embodiments of the foregoing decision tree generating method, the present specification also provides embodiments of a decision tree generating apparatus.
The embodiments of the decision tree generation apparatus of the present specification can be applied to a server. The device embodiments may be implemented by software, or by hardware, or by a combination of hardware and software. Taking a software implementation as an example, as a logical device, the device is formed by reading corresponding computer program instructions in the nonvolatile memory into the memory for operation through the processor of the server where the device is located. From a hardware aspect, as shown in fig. 4, the hardware structure diagram of the server where the decision tree generation apparatus is located in this specification is shown, except for the processor, the memory, the network interface, and the nonvolatile memory shown in fig. 4, the server where the apparatus is located in the embodiment may also include other hardware according to the actual function of the server, which is not described again.
Fig. 5 is a block diagram illustrating a decision tree generation apparatus according to an exemplary embodiment of the present specification.
Referring to fig. 5, the decision tree generating apparatus 400 can be applied to the server shown in fig. 4, and includes: a basis acquisition unit 401, a skeleton extraction unit 402, a target training unit 403, and a rule generation unit 404.
The basic obtaining unit 401 obtains a basic decision tree, where the basic decision tree is generated based on a first type of sample data;
a skeleton extraction unit 402, configured to extract a tree skeleton of the basic decision tree, where the tree skeleton includes splitting features of nodes and does not include a splitting value or includes a partial splitting value;
and the target training unit 403 trains the missing split values of the tree skeleton by using the second type of sample data to obtain a target decision tree.
Optionally, the skeleton extracting unit 402 extracts, from a root node of the basic decision tree, nodes with a specified depth or less and branch paths between the nodes, where the specified depth is less than the depth of the basic decision tree.
Optionally, the skeleton extracting unit 402 extracts all nodes of the basic decision tree and branch paths between all nodes from a root node of the basic decision tree.
Optionally, the target training unit 403, after obtaining the splitting value that the tree skeleton lacks through training with the second type of sample data, extends the tree skeleton based on the second sample data, and determines the splitting characteristic and the splitting value of the extended node until convergence.
A rule generating unit 404, configured to obtain, for each leaf node of the target decision tree, a complete path from a root node to the leaf node;
and generating a judgment rule corresponding to the target decision tree according to the splitting characteristics and the splitting values of the nodes on the complete path.
The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution in the specification. One of ordinary skill in the art can understand and implement it without inventive effort.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. A typical implementation device is a computer, which may take the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.
Corresponding to the embodiment of the foregoing decision tree generating method, this specification further provides a decision tree generating device, including: a processor and a memory for storing machine executable instructions. Wherein the processor and the memory are typically interconnected by means of an internal bus. In other possible implementations, the device may also include an external interface to enable communication with other devices or components.
In this embodiment, the processor is caused to:
acquiring a basic decision tree, wherein the basic decision tree is generated based on first type of sample data;
extracting a tree skeleton of the base decision tree, wherein the tree skeleton comprises splitting characteristics of nodes and does not comprise splitting values or comprises partial splitting values;
and training the missing split value of the tree skeleton by using the second type of sample data to obtain the target decision tree.
Optionally, in extracting the tree skeleton of the base decision tree, the processor is caused to:
and extracting nodes with the depth less than or equal to a specified depth and a branching path between the nodes from the root node of the basic decision tree downwards, wherein the specified depth is less than the depth of the basic decision tree.
Optionally, in extracting the tree skeleton of the base decision tree, the processor is caused to:
and extracting all nodes of the basic decision tree and branch paths among all the nodes from the root node of the basic decision tree downwards.
Optionally, the processor is further caused to:
after a splitting value lacking in the tree skeleton is obtained by training with second type of sample data, extending the tree skeleton based on the second sample data, and determining splitting characteristics and splitting values of extended nodes until convergence.
Optionally, the processor is further caused to:
aiming at each leaf node of the target decision tree, acquiring a complete path from a root node to the leaf node;
and generating a judgment rule corresponding to the target decision tree according to the splitting characteristics and the splitting values of the nodes on the complete path.
In correspondence with the aforementioned embodiments of the decision tree generation method, the present specification also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of:
acquiring a basic decision tree, wherein the basic decision tree is generated based on first type of sample data;
extracting a tree skeleton of the base decision tree, wherein the tree skeleton comprises splitting characteristics of nodes and does not comprise splitting values or comprises partial splitting values;
and training the missing split value of the tree skeleton by using the second type of sample data to obtain the target decision tree.
Optionally, extracting the tree skeleton of the basic decision tree includes:
and extracting nodes with the depth less than or equal to a specified depth and a branching path between the nodes from the root node of the basic decision tree downwards, wherein the specified depth is less than the depth of the basic decision tree.
Optionally, extracting the tree skeleton of the basic decision tree includes:
and extracting all nodes of the basic decision tree and branch paths among all the nodes from the root node of the basic decision tree downwards.
Optionally, the method further includes:
after a splitting value lacking in the tree skeleton is obtained by training with second type of sample data, extending the tree skeleton based on the second sample data, and determining splitting characteristics and splitting values of extended nodes until convergence.
Optionally, the method further includes:
aiming at each leaf node of the target decision tree, acquiring a complete path from a root node to the leaf node;
and generating a judgment rule corresponding to the target decision tree according to the splitting characteristics and the splitting values of the nodes on the complete path.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The above description is only a preferred embodiment of the present disclosure, and should not be taken as limiting the present disclosure, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.

Claims (11)

1. A decision tree generation method for determination of an abnormal account number, cash register determination or money laundering determination, the method comprising:
acquiring a basic decision tree, wherein the basic decision tree is generated based on first type of sample data in a first scene;
extracting a tree skeleton of the base decision tree, wherein the tree skeleton comprises splitting characteristics of nodes and does not comprise splitting values or comprises partial splitting values;
training the missing split value of the tree skeleton by using second type of sample data in a second scene to obtain a target decision tree;
wherein the number of the first type of sample data in the first scene is greater than the number of the second type of sample data in the second scene.
2. The method of claim 1, extracting a tree skeleton of the base decision tree comprising:
and extracting nodes with the depth less than or equal to a specified depth and a branching path between the nodes from the root node of the basic decision tree downwards, wherein the specified depth is less than the depth of the basic decision tree.
3. The method of claim 1, extracting a tree skeleton of the base decision tree comprising:
and extracting all nodes of the basic decision tree and branch paths among all the nodes from the root node of the basic decision tree downwards.
4. The method of claim 2 or 3, further comprising:
after a splitting value lacking in the tree skeleton is obtained by training with second type of sample data, extending the tree skeleton based on the second type of sample data, and determining splitting characteristics and splitting values of extended nodes until convergence.
5. The method of claim 1, further comprising:
aiming at each leaf node of the target decision tree, acquiring a complete path from a root node to the leaf node;
and generating a judgment rule corresponding to the target decision tree according to the splitting characteristics and the splitting values of the nodes on the complete path.
6. A decision tree generation apparatus for determination of an abnormal account number, cash register determination or money laundering determination, comprising:
the basic decision tree generating unit is used for generating a basic decision tree based on first type of sample data in a first scene;
a skeleton extraction unit, which extracts a tree skeleton of the basic decision tree, wherein the tree skeleton comprises splitting characteristics of nodes and does not comprise splitting values or comprises partial splitting values;
the target training unit is used for training the missing split values of the tree framework by using second type of sample data in a second scene to obtain a target decision tree;
wherein the number of the first type of sample data in the first scene is greater than the number of the second type of sample data in the second scene.
7. The apparatus of claim 6, wherein the first and second electrodes are disposed on opposite sides of the substrate,
and the skeleton extraction unit extracts nodes with a specified depth and a bifurcation path between the nodes from the root node of the basic decision tree downwards, wherein the specified depth is less than the depth of the basic decision tree.
8. The apparatus of claim 6, wherein the first and second electrodes are disposed on opposite sides of the substrate,
and the skeleton extraction unit is used for downwards extracting all nodes of the basic decision tree and the branch paths among all the nodes from the root node of the basic decision tree.
9. The apparatus of claim 7 or 8,
and the target training unit is used for training by using second type of sample data to obtain the splitting value lacking in the tree skeleton, extending the tree skeleton based on the second type of sample data, and determining the splitting characteristic and the splitting value of the extending node until convergence.
10. The apparatus of claim 6, further comprising:
the rule generating unit is used for acquiring a complete path from a root node to each leaf node of the target decision tree;
and generating a judgment rule corresponding to the target decision tree according to the splitting characteristics and the splitting values of the nodes on the complete path.
11. A decision tree generation apparatus for determination of an abnormal account number, cash register determination or money laundering determination, comprising:
a processor;
a memory for storing machine executable instructions;
wherein, by reading and executing machine-executable instructions stored by the memory that correspond to decision tree generation logic, the processor is caused to:
acquiring a basic decision tree, wherein the basic decision tree is generated based on first type of sample data in a first scene;
extracting a tree skeleton of the base decision tree, wherein the tree skeleton comprises splitting characteristics of nodes and does not comprise splitting values or comprises partial splitting values;
training the missing split value of the tree skeleton by using second type of sample data in a second scene to obtain a target decision tree;
wherein the number of the first type of sample data in the first scene is greater than the number of the second type of sample data in the second scene.
CN202011205097.1A 2018-09-21 2018-09-21 Decision tree generation method and device Pending CN112418274A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011205097.1A CN112418274A (en) 2018-09-21 2018-09-21 Decision tree generation method and device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811110423.3A CN109242034B (en) 2018-09-21 2018-09-21 Decision tree generation method and device
CN202011205097.1A CN112418274A (en) 2018-09-21 2018-09-21 Decision tree generation method and device

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN201811110423.3A Division CN109242034B (en) 2018-09-21 2018-09-21 Decision tree generation method and device

Publications (1)

Publication Number Publication Date
CN112418274A true CN112418274A (en) 2021-02-26

Family

ID=65056548

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201811110423.3A Active CN109242034B (en) 2018-09-21 2018-09-21 Decision tree generation method and device
CN202011205097.1A Pending CN112418274A (en) 2018-09-21 2018-09-21 Decision tree generation method and device

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN201811110423.3A Active CN109242034B (en) 2018-09-21 2018-09-21 Decision tree generation method and device

Country Status (3)

Country Link
CN (2) CN109242034B (en)
TW (1) TW202013266A (en)
WO (1) WO2020057301A1 (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109242034B (en) * 2018-09-21 2020-09-15 阿里巴巴集团控股有限公司 Decision tree generation method and device
CN111353600B (en) * 2020-02-20 2023-12-12 第四范式(北京)技术有限公司 Abnormal behavior detection method and device
CN111429282B (en) * 2020-03-27 2023-08-25 中国工商银行股份有限公司 Transaction money back-flushing method and device based on money back-flushing model migration
CN111401570B (en) * 2020-04-10 2022-04-12 支付宝(杭州)信息技术有限公司 Interpretation method and device for privacy tree model
CN112329874B (en) * 2020-11-12 2024-08-20 京东科技控股股份有限公司 Decision method and device for data service, electronic equipment and storage medium
CN112330054B (en) * 2020-11-23 2024-03-19 大连海事大学 Dynamic travel business problem solving method, system and storage medium based on decision tree
CN114399000A (en) * 2022-01-20 2022-04-26 中国平安人寿保险股份有限公司 Object interpretability feature extraction method, device, equipment and medium of tree model

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140214736A1 (en) * 2013-01-30 2014-07-31 Technion Research & Development Foundation Limited Training ensembles of randomized decision trees
US20140324871A1 (en) * 2013-04-30 2014-10-30 Wal-Mart Stores, Inc. Decision-tree based quantitative and qualitative record classification
CN105574544A (en) * 2015-12-16 2016-05-11 平安科技(深圳)有限公司 Data processing method and device
US20160162793A1 (en) * 2014-12-05 2016-06-09 Alibaba Group Holding Limited Method and apparatus for decision tree based search result ranking
CN106096748A (en) * 2016-04-28 2016-11-09 武汉宝钢华中贸易有限公司 Entrucking forecast model in man-hour based on cluster analysis and decision Tree algorithms
CN106682414A (en) * 2016-12-23 2017-05-17 中国科学院深圳先进技术研究院 Method and device for establishing timing sequence prediction model
US20170221075A1 (en) * 2016-01-29 2017-08-03 Sap Se Fraud inspection framework
US20180046939A1 (en) * 2016-08-10 2018-02-15 Paypal, Inc. Automated Machine Learning Feature Processing
CN108304936A (en) * 2017-07-12 2018-07-20 腾讯科技(深圳)有限公司 Machine learning model training method and device, facial expression image sorting technique and device
CN108491891A (en) * 2018-04-04 2018-09-04 桂林电子科技大学 A kind of online transfer learning method of multi-source based on decision tree local similarity
US20180260531A1 (en) * 2017-03-10 2018-09-13 Microsoft Technology Licensing, Llc Training random decision trees for sensor data processing

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102982336B (en) * 2011-09-02 2015-11-25 株式会社理光 Model of cognition generates method and system
CN104679777B (en) * 2013-12-02 2018-05-18 ***股份有限公司 A kind of method and system for being used to detect fraudulent trading
CN107203774A (en) * 2016-03-17 2017-09-26 阿里巴巴集团控股有限公司 The method and device that the belonging kinds of data are predicted
US11100421B2 (en) * 2016-10-24 2021-08-24 Adobe Inc. Customized website predictions for machine-learning systems
CN107135061B (en) * 2017-04-17 2019-10-22 北京科技大学 A kind of distributed secret protection machine learning method under 5g communication standard
CN109242034B (en) * 2018-09-21 2020-09-15 阿里巴巴集团控股有限公司 Decision tree generation method and device

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140214736A1 (en) * 2013-01-30 2014-07-31 Technion Research & Development Foundation Limited Training ensembles of randomized decision trees
US20140324871A1 (en) * 2013-04-30 2014-10-30 Wal-Mart Stores, Inc. Decision-tree based quantitative and qualitative record classification
US20160162793A1 (en) * 2014-12-05 2016-06-09 Alibaba Group Holding Limited Method and apparatus for decision tree based search result ranking
CN105574544A (en) * 2015-12-16 2016-05-11 平安科技(深圳)有限公司 Data processing method and device
US20170221075A1 (en) * 2016-01-29 2017-08-03 Sap Se Fraud inspection framework
CN106096748A (en) * 2016-04-28 2016-11-09 武汉宝钢华中贸易有限公司 Entrucking forecast model in man-hour based on cluster analysis and decision Tree algorithms
US20180046939A1 (en) * 2016-08-10 2018-02-15 Paypal, Inc. Automated Machine Learning Feature Processing
CN106682414A (en) * 2016-12-23 2017-05-17 中国科学院深圳先进技术研究院 Method and device for establishing timing sequence prediction model
US20180260531A1 (en) * 2017-03-10 2018-09-13 Microsoft Technology Licensing, Llc Training random decision trees for sensor data processing
CN108304936A (en) * 2017-07-12 2018-07-20 腾讯科技(深圳)有限公司 Machine learning model training method and device, facial expression image sorting technique and device
CN108491891A (en) * 2018-04-04 2018-09-04 桂林电子科技大学 A kind of online transfer learning method of multi-source based on decision tree local similarity

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ROLAND RIEKE, ET.AL: "Fraud Detection in Mobile Payments Utilizing Process Behavior Analysis", 2013 INTERNATIONAL CONFERENCE ON AVAILABILITY, RELIABILITY AND SECURITY, 6 September 2013 (2013-09-06), pages 662 - 669, XP032524240, DOI: 10.1109/ARES.2013.87 *
丁昱: "商业银行个人客户洗钱静态风险分类方法研究", 金融理论与实践, no. 8, 10 August 2017 (2017-08-10), pages 59 - 63 *

Also Published As

Publication number Publication date
WO2020057301A1 (en) 2020-03-26
TW202013266A (en) 2020-04-01
CN109242034B (en) 2020-09-15
CN109242034A (en) 2019-01-18

Similar Documents

Publication Publication Date Title
CN109242034B (en) Decision tree generation method and device
CN106709800B (en) Community division method and device based on feature matching network
CN107133865B (en) Credit score obtaining and feature vector value output method and device
CN110166438B (en) Account information login method and device, computer equipment and computer storage medium
CN112600810B (en) Ether house phishing fraud detection method and device based on graph classification
CN114119137B (en) Risk control method and apparatus
CN109741181A (en) A kind of transaction match method, system, server and medium based on intelligent contract
US20210243215A1 (en) Fraud detection based on analysis of frequency-domain data
CN105389488A (en) Identity authentication method and apparatus
CN102567534B (en) Interactive product user generated content intercepting system and intercepting method for the same
CN111428217A (en) Method and device for identifying cheat group, electronic equipment and computer readable storage medium
CN109685336A (en) Collection task distribution method, device, computer equipment and storage medium
CN109544262A (en) Item recommendation method, device, electronic equipment, system and readable storage medium storing program for executing
CN111080306A (en) Transaction risk determination method, device, equipment and storage medium
CN110046648A (en) The method and device of business classification is carried out based at least one business disaggregated model
CN109598513B (en) Risk identification method and risk identification device
CN106842246B (en) Big Dipper chip on-line authentication method and device
CN107679862A (en) A kind of characteristic value of fraudulent trading model determines method and device
CN106294115A (en) The method of testing of a kind of application system animal migration and device
CN110717817A (en) Pre-loan approval method and device, electronic equipment and computer-readable storage medium
CN113792849B (en) Training method of character generation model, character generation method, device and equipment
CN108205757B (en) Method and device for verifying legality of electronic payment service
CN107025545A (en) A kind of transaction processing method and transaction system
Agrawal et al. A novel approach for credit card fraud detection
CN113487320A (en) Fraud transaction detection method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40046460

Country of ref document: HK