CN111738450A - Node analysis method, device and equipment based on model training and storage medium - Google Patents

Node analysis method, device and equipment based on model training and storage medium Download PDF

Info

Publication number
CN111738450A
CN111738450A CN202010433477.4A CN202010433477A CN111738450A CN 111738450 A CN111738450 A CN 111738450A CN 202010433477 A CN202010433477 A CN 202010433477A CN 111738450 A CN111738450 A CN 111738450A
Authority
CN
China
Prior art keywords
nodes
node
gbdt
leaf
structures
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010433477.4A
Other languages
Chinese (zh)
Other versions
CN111738450B (en
Inventor
张�杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202010433477.4A priority Critical patent/CN111738450B/en
Publication of CN111738450A publication Critical patent/CN111738450A/en
Application granted granted Critical
Publication of CN111738450B publication Critical patent/CN111738450B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of artificial intelligence, and discloses a node analysis method, a node analysis device, node analysis equipment and a storage medium based on model training, which are used for reducing the number of node judgment times, improving the efficiency of analyzing target leaf nodes and achieving the effect of millisecond feedback. The node analysis method based on model training comprises the following steps: obtaining a plurality of GBDT structures from a preset gradient lifting decision tree GBDT model; traversing a plurality of GBDT structures, merging the GBDT structures, and generating a plurality of tree structures to be judged; acquiring a plurality of leaf node judgment conditions; selecting leaf nodes to be judged from each layer of leaf nodes to obtain a plurality of leaf nodes to be judged, and selecting target leaf nodes to be judged or brother leaf nodes corresponding to the target leaf nodes to be judged as target leaf nodes according to leaf node judging conditions corresponding to the leaf nodes to be judged to obtain a plurality of target leaf nodes. In addition, the invention also relates to a block chain technology, and a plurality of target leaf nodes can be stored in the block chain.

Description

Node analysis method, device and equipment based on model training and storage medium
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a node analysis method, a node analysis device, node analysis equipment and a storage medium based on model training.
Background
Leaf nodes of a Gradient Boosting Decision Tree (GBDT) model can be used as features of a logistic regression model to train so as to improve the effect of the logistic regression model, and are well certified and practiced in the industry. The rough process comprises three steps: training a GBDT model; the second step is that: analyzing leaf nodes of the GBDT model to generate characteristics; the third step: and inputting the GBDT leaf node characteristics into a logistic regression model for training.
In the prior art, the way of analyzing the leaf nodes of the GBDT model is to traverse each tree in the GBDT model and determine whether the leaf nodes satisfy the condition. Generally, the GBDT model is trained to set the tree depth to 5, the number of iterations is 100, and the number of leaf nodes of each tree is 64, and this traversal method needs 5 × 64 × 100 to 32000 loop conditions for judgment, which is very time-consuming and results in extremely low work efficiency.
Disclosure of Invention
The invention mainly aims to solve the problems of time consumption and low working efficiency when leaf nodes are analyzed.
The invention provides a node analysis method based on model training in a first aspect, which comprises the following steps: obtaining a plurality of GBDT structures from a preset gradient boosting decision tree GBDT model, wherein each GBDT structure comprises a plurality of leaf nodes; traversing the GBDT structures, merging the GBDT structures, and generating a plurality of tree structures to be judged; acquiring a plurality of leaf node judgment conditions from the tree structures to be judged, wherein each leaf node judgment condition corresponds to one leaf node; selecting leaf nodes to be judged from each layer of leaf nodes to obtain a plurality of leaf nodes to be judged, and selecting target leaf nodes to be judged or brother leaf nodes corresponding to the target leaf nodes to be judged according to leaf node judging conditions corresponding to the leaf nodes to be judged to obtain a plurality of target leaf nodes, wherein the plurality of leaf nodes to be judged comprise first leaf nodes to be judged to Nth leaf nodes to be judged, and N is a positive integer.
Optionally, in a first implementation manner of the first aspect of the present invention, traversing the plurality of GBDT structures, merging the plurality of GBDT structures, and generating a plurality of tree structures to be determined includes: traversing the plurality of GBDT structures, and judging whether a plurality of leaf nodes in the plurality of GBDT structures comprise at least one group of same nodes; and if the plurality of leaf nodes in the plurality of GBDT structures comprise at least one group of same nodes, combining a plurality of child nodes under the at least one group of same nodes to generate a plurality of tree structures to be judged.
Optionally, in a second implementation manner of the first aspect of the present invention, the traversing the plurality of GBDT structures, and determining whether a plurality of leaf nodes in the plurality of GBDT structures include at least one group of same nodes includes: traversing the GBDT structures, randomly selecting two GBDT structures, and judging whether leaf nodes of the same layer of the two randomly selected GBDT structures comprise at least one group of same nodes; if the same layer of leaf nodes of the two randomly selected GBDT structures at least comprise one group of same nodes, judging that the plurality of GBDT structures comprise at least one group of same nodes; or respectively counting the number of nodes of the plurality of GBDT structures, selecting two GBDT structures with the highest number of nodes to obtain two sequential GBDT structures, and judging whether the same layer of leaf nodes of the two sequential GBDT structures comprises at least one group of same nodes; and if the same layer of leaf nodes of the two sequential GBDT structures at least comprise one group of same nodes, judging that the plurality of GBDT structures comprise at least one group of same nodes.
Optionally, in a third implementation manner of the first aspect of the present invention, if the plurality of leaf nodes in the plurality of GBDT structures include at least one group of identical nodes, merging a plurality of child nodes under the at least one group of identical nodes, and generating a plurality of tree structures to be determined includes: if a plurality of leaf nodes in the plurality of GBDT structures comprise at least one group of same nodes, extracting a plurality of first same child nodes and a plurality of first same brother child nodes from the randomly selected two GBDT structures or any one GBDT structure in the two sequential GBDT structures, and extracting a plurality of second same child nodes and a plurality of second same brother child nodes from the other GBDT structure; merging the second same child node to the first same child node, merging the second same brother node to the first same brother child node, and generating a tree structure to be judged to be integrated; obtaining a plurality of other tree structures to be judged to be integrated according to other GBDT structures in the plurality of GBDT structures; and integrating the tree structure to be judged to be integrated and the plurality of other tree structures to be judged to be integrated to obtain a plurality of tree structures to be judged.
Optionally, in a fourth implementation manner of the first aspect of the present invention, the merging the second same child node into the first same child node, merging the second same sibling node into the first same sibling child node, and generating the to-be-integrated tree structure to be determined includes: integrating the first same child nodes and the first same brother child nodes into a target same child node set, and integrating the second same child nodes and the second same brother child nodes into a pre-merged child node set; determining whether the target identical child node set includes a first identical child node or a first identical sibling child node that is identical to the plurality of second identical child nodes or the plurality of second identical sibling child nodes; if yes, deleting a corresponding second same child node or a second same brother child node from the pre-merged same child node set, merging the remaining second same child nodes under the same node corresponding to the first same child node, merging the remaining second same brother child nodes under the same brother node corresponding to the first same brother child node, and generating a plurality of tree structures to be judged.
Optionally, in a fifth implementation manner of the first aspect of the present invention, the selecting a leaf node to be determined from each layer of leaf nodes to obtain a plurality of leaf nodes to be determined, and selecting a target leaf node to be determined or a brother leaf node corresponding to the target leaf node to be determined as a target leaf node according to a leaf node determination condition corresponding to each leaf node to be determined to obtain a plurality of target leaf nodes, where the plurality of leaf nodes to be determined include a first leaf node to be determined to an nth leaf node to be determined, where N is a positive integer and includes: reading a plurality of leaf node judgment conditions for each tree structure to be judged in the plurality of tree structures to be judged; in any layer of nodes, randomly selecting a leaf node from leaf nodes which are brothers in a target tree structure to be judged as a first leaf node to be judged, and judging whether the first leaf node to be judged meets a corresponding leaf node judgment condition; if the first leaf node to be judged meets the corresponding leaf node judgment condition, selecting a second leaf node to be judged in the next layer of nodes and judging whether the second leaf node to be judged meets the corresponding leaf node judgment condition; if the second leaf node to be judged meets the corresponding leaf node judgment condition, continuing to judge the next layer of nodes until a plurality of target leaf nodes are determined in the plurality of leaf nodes to be judged; if the first leaf node to be judged does not meet the corresponding leaf node judgment condition, determining a plurality of target leaf nodes based on the plurality of leaf node judgment conditions and the brother leaf node corresponding to the first leaf node to be judged; further comprising uploading the plurality of target leaf nodes into a blockchain.
Optionally, in a sixth implementation manner of the first aspect of the present invention, before obtaining a plurality of GBDT structures from a preset gradient boosting decision tree GBDT model, where each GBDT structure includes a plurality of leaf nodes, the node parsing method based on model training further includes: acquiring an initial GBDT model, initializing the initial GBDT model to obtain an estimated constant value, wherein the estimated constant value is a predicted constant value corresponding to a minimum loss function; and calculating the negative gradient values of the loss function for multiple times to obtain a plurality of target negative gradient values, and performing residual error fitting one by one according to the target negative gradient values and the estimated constant values to generate a preset gradient lifting decision tree GBDT model.
The second aspect of the present invention provides a node analysis device based on model training, including: the structure obtaining module is used for obtaining a plurality of GBDT structures from a preset gradient lifting decision tree GBDT model, and each GBDT structure comprises a plurality of leaf nodes; the tree structure generating module is used for traversing the GBDT structures, merging the GBDT structures and generating a plurality of tree structures to be judged; a judgment condition obtaining module, configured to obtain a plurality of leaf node judgment conditions from the plurality of tree structures to be judged, where each leaf node judgment condition corresponds to one leaf node; the leaf node determining module is used for selecting leaf nodes to be judged from each layer of leaf nodes to obtain a plurality of leaf nodes to be judged, selecting a target leaf node to be judged or a brother leaf node corresponding to the target leaf node to be judged as a target leaf node according to a leaf node judging condition corresponding to each leaf node to be judged to obtain a plurality of target leaf nodes, wherein the plurality of leaf nodes to be judged comprise a first leaf node to be judged to a Nth leaf node to be judged, and N is a positive integer.
Optionally, in a first implementation manner of the second aspect of the present invention, the tree structure generating module includes: the judging unit is used for traversing the plurality of GBDT structures and judging whether a plurality of leaf nodes in the plurality of GBDT structures comprise at least one group of same nodes; and the merging unit is used for merging a plurality of child nodes under at least one group of same nodes to generate a plurality of tree structures to be judged if the plurality of leaf nodes in the plurality of GBDT structures comprise at least one group of same nodes.
Optionally, in a second implementation manner of the second aspect of the present invention, the determining unit is specifically configured to: traversing the GBDT structures, randomly selecting two GBDT structures, and judging whether leaf nodes of the same layer of the two randomly selected GBDT structures comprise at least one group of same nodes; if the same layer of leaf nodes of the two randomly selected GBDT structures at least comprise one group of same nodes, judging that the plurality of GBDT structures comprise at least one group of same nodes; or respectively counting the number of nodes of the plurality of GBDT structures, selecting two GBDT structures with the highest number of nodes to obtain two sequential GBDT structures, and judging whether the same layer of leaf nodes of the two sequential GBDT structures comprises at least one group of same nodes; and if the same layer of leaf nodes of the two sequential GBDT structures at least comprise one group of same nodes, judging that the plurality of GBDT structures comprise at least one group of same nodes.
Optionally, in a third implementation manner of the second aspect of the present invention, the merging unit includes: a child node extracting subunit, configured to extract, if at least one group of identical nodes is included in a plurality of leaf nodes in the plurality of GBDT structures, a plurality of first identical child nodes and a plurality of first identical sibling child nodes from the two randomly selected GBDT structures or any one of the two sequential GBDT structures, and a plurality of second identical child nodes and a plurality of second identical sibling child nodes from another GBDT structure; a merging subunit, configured to merge the second same child node under the first same sibling child node, merge the second same sibling child node under the first same sibling child node, and generate a tree structure to be determined that is to be integrated; the tree structure determining subunit is used for obtaining a plurality of other tree structures to be judged to be integrated according to other GBDT structures in the plurality of GBDT structures; and the integrating subunit is used for integrating the tree structure to be judged to be integrated and the plurality of other tree structures to be judged to be integrated to obtain a plurality of tree structures to be judged.
Optionally, in a fourth implementation manner of the second aspect of the present invention, the merging subunit is specifically configured to: integrating the first same child nodes and the first same brother child nodes into a target same child node set, and integrating the second same child nodes and the second same brother child nodes into a pre-merged child node set; determining whether the target identical child node set includes a first identical child node or a first identical sibling child node that is identical to the plurality of second identical child nodes or the plurality of second identical sibling child nodes; if yes, deleting a corresponding second same child node or a second same brother child node from the pre-merged same child node set, merging the remaining second same child nodes under the same node corresponding to the first same child node, merging the remaining second same brother child nodes under the same brother node corresponding to the first same brother child node, and generating a plurality of tree structures to be judged.
Optionally, in a fifth implementation manner of the second aspect of the present invention, the leaf node determining module is specifically configured to: reading a plurality of leaf node judgment conditions for each tree structure to be judged in the plurality of tree structures to be judged; in any layer of nodes, randomly selecting a leaf node from leaf nodes which are brothers in a target tree structure to be judged as a first leaf node to be judged, and judging whether the first leaf node to be judged meets a corresponding leaf node judgment condition; if the first leaf node to be judged meets the corresponding leaf node judgment condition, selecting a second leaf node to be judged in the next layer of nodes and judging whether the second leaf node to be judged meets the corresponding leaf node judgment condition; if the second leaf node to be judged meets the corresponding leaf node judgment condition, continuing to judge the next layer of nodes until a plurality of target leaf nodes are determined in the plurality of leaf nodes to be judged; if the first leaf node to be judged does not meet the corresponding leaf node judgment condition, determining a plurality of target leaf nodes based on the plurality of leaf node judgment conditions and the brother leaf node corresponding to the first leaf node to be judged; further comprising uploading the plurality of target leaf nodes into a blockchain.
Optionally, in a sixth implementation manner of the second aspect of the present invention, the node analysis device based on model training further includes: the estimation module is used for acquiring an initial GBDT model and initializing the initial GBDT model to obtain an estimated constant value, wherein the estimated constant value is a predicted constant value corresponding to a minimum loss function; and the residual error fitting module is used for calculating the negative gradient values of the loss function for multiple times to obtain a plurality of target negative gradient values, and performing residual error fitting one by one according to the target negative gradient values and the estimated constant values to generate a preset gradient lifting decision tree GBDT model.
The third aspect of the present invention provides a node analysis device based on model training, including: a memory having instructions stored therein and at least one processor, the memory and the at least one processor interconnected by a line; the at least one processor invokes the instructions in the memory to cause the model training-based node resolution device to perform the model training-based node resolution method described above.
A fourth aspect of the present invention provides a computer-readable storage medium having stored therein instructions, which, when run on a computer, cause the computer to perform the above-described method of node resolution based on model training.
In the technical scheme provided by the invention, a plurality of GBDT structures are obtained from a preset gradient lifting decision tree GBDT model, and each GBDT structure comprises a plurality of leaf nodes; traversing the GBDT structures, merging the GBDT structures, and generating a plurality of tree structures to be judged; acquiring a plurality of leaf node judgment conditions from the tree structures to be judged, wherein each leaf node judgment condition corresponds to one leaf node; selecting leaf nodes to be judged from each layer of leaf nodes to obtain a plurality of leaf nodes to be judged, and selecting target leaf nodes to be judged or brother leaf nodes corresponding to the target leaf nodes to be judged according to leaf node judging conditions corresponding to the leaf nodes to be judged to obtain a plurality of target leaf nodes, wherein the plurality of leaf nodes to be judged comprise first leaf nodes to be judged to Nth leaf nodes to be judged, and N is a positive integer. In the embodiment of the invention, a plurality of GBDT structures are combined, and the nodes are judged by combining the principle of opposite conditions of brother nodes, so that the node judgment times are reduced, the efficiency of analyzing the target leaf nodes is improved, and the effect of millisecond feedback is achieved.
Drawings
FIG. 1 is a schematic diagram of an embodiment of a node parsing method based on model training in an embodiment of the present invention;
FIG. 2 is a schematic diagram of another embodiment of a node parsing method based on model training in an embodiment of the present invention;
FIG. 3 is a schematic diagram of a GBDT structure merging based on node analysis of model training in the embodiment of the present invention;
FIG. 4 is a schematic diagram of a determination process of node analysis based on model training in an embodiment of the present invention;
FIG. 5 is a schematic diagram of an embodiment of a node analysis apparatus based on model training according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of another embodiment of a node analysis apparatus based on model training according to an embodiment of the present invention;
fig. 7 is a schematic diagram of an embodiment of a node analysis device based on model training in the embodiment of the present invention.
Detailed Description
The embodiment of the invention provides a node analysis method, a node analysis device, a node analysis equipment and a storage medium based on model training.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," or "having," and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
For convenience of understanding, a specific flow of the embodiment of the present invention is described below, and referring to fig. 1, an embodiment of the node parsing method based on model training in the embodiment of the present invention includes:
101. obtaining a plurality of GBDT structures from a preset gradient boosting decision tree GBDT model, wherein each GBDT structure comprises a plurality of leaf nodes;
the server obtains a plurality of GBDT structures from a preset gradient lifting decision tree GBDT model, the GBDT model is arranged in an advertisement recommendation system with hundred million level Page View (PV), the GBDT model is a logistic regression model and is used for rapid data processing during online scoring, and a large number of GBDT structures are generated after the GBDT model is trained in the data processing process, so that the server obtains the plurality of GBDT structures from the preset GBDT model, and each GBDT structure comprises a plurality of leaf nodes.
It should be noted that there are multiple same leaf nodes in the GBDT structure, and when data processing is performed, the work efficiency is low because the same node is repeatedly determined and the leaf nodes are analyzed.
It is to be understood that the executing subject of the present invention may be a node parsing apparatus based on model training, and may also be a terminal or a server, which is not limited herein. The embodiment of the present invention is described by taking a server as an execution subject.
102. Traversing a plurality of GBDT structures, merging the GBDT structures, and generating a plurality of tree structures to be judged;
because the preset GBDT model can generate a large number of GBDT structures, the server traverses a plurality of GBDT structures, selects two GBDT structures as a group each time, and performs node combination to obtain a plurality of tree structures to be judged.
It should be noted that the two GBDT structures may be two randomly selected GBDT structures, or two GBDT structures selected in a certain order.
When a server merges nodes, it is first clear which nodes can be merged and which nodes cannot be merged, in this embodiment, the same nodes in the same layer in the GBDT structure can be merged, and different nodes in the same layer cannot be merged.
103. Acquiring a plurality of leaf node judgment conditions from a plurality of tree structures to be judged, wherein each leaf node judgment condition corresponds to one leaf node;
the server obtains a plurality of leaf node judgment conditions from a plurality of tree structures to be judged, and the leaf node judgment conditions in the embodiment are judgment conditions obtained by obtaining the characteristics of users in a preset storage medium for different users. In other embodiments, the server may also obtain the leaf node determination conditions by obtaining the user input characteristics, where each leaf node determination condition corresponds to one leaf node.
For example, there are a plurality of students in a class, and the height and the weight of each student are different, wherein the judgment conditions of the leaf nodes may be "height less than 160 cm", "height greater than or equal to 160cm less than 165 cm", "weight less than 45 kg", and "weight greater than or equal to 45 kg less than 55 kg". And then the server determines the students meeting the standard, namely the target leaf nodes according to the leaf node judgment conditions of 'height less than 160 cm', 'height more than or equal to 160cm less than 165 cm', 'weight less than 45 kg' and 'weight more than or equal to 45 kg less than 55 kg'.
104. Selecting leaf nodes to be judged from each layer of leaf nodes to obtain a plurality of leaf nodes to be judged, selecting target leaf nodes to be judged or brother leaf nodes corresponding to the target leaf nodes to be judged according to leaf node judging conditions corresponding to the leaf nodes to be judged to obtain a plurality of target leaf nodes, wherein the plurality of leaf nodes to be judged comprise first leaf nodes to be judged to Nth leaf nodes to be judged, and N is a positive integer.
And the server determines a plurality of target leaf nodes according to the plurality of leaf nodes and the plurality of tree structures to be judged. Since the tree structure to be judged is formed by combining a plurality of GBDT structures, there may be a case where nodes containing a plurality of leaf node judgment conditions in the same layer are all multiples of 2, that is, they appear in pairs. Therefore, the nodes including the judgment conditions of the plurality of leaf nodes are all the same as the father node and different from the father-different-mother brother node of the mother node. According to the method, a plurality of target leaf nodes are selected according to the characteristics of the brother nodes; the sibling nodes have the characteristics of opposite conditions, for example, if the judgment condition of the sibling node 1 is a >10, the judgment condition of the sibling node 2 is a ≦ 10. According to the characteristic that the conditions of the brother nodes are opposite, when judging whether the leaf nodes are target leaf nodes, only any one leaf node in the brother nodes which are mutually needs to be judged, and therefore the judgment times are reduced.
In the embodiment of the invention, a plurality of GBDT structures are combined, and the nodes are judged by combining the principle of opposite conditions of brother nodes, so that the node judgment times are reduced, the efficiency of analyzing the target leaf nodes is improved, and the effect of millisecond feedback is achieved.
Referring to fig. 2, another embodiment of the node parsing method based on model training according to the embodiment of the present invention includes:
201. acquiring an initial GBDT model, initializing the initial GBDT model to obtain an estimated constant value, wherein the estimated constant value is a constant value corresponding to the minimum loss function;
the server obtains an initial GBDT model for initialization, resulting in estimated constant values that minimize the loss function.
When initializing the initial GBDT model, the initial GBDT model is a tree with only one root node, the gamma value of the tree is a constant value, and after initialization, the server can obtain an estimated value for minimizing the loss function, namely the estimated constant value.
202. Calculating negative gradient values of the loss function for multiple times to obtain a plurality of target negative gradient values, and performing residual error fitting one by one according to the plurality of target negative gradient values and the estimation constant values to generate a preset gradient lifting decision tree GBDT model;
and the server calculates the negative gradient values of the loss function for multiple times to obtain a plurality of target negative gradient values, and then performs residual error fitting according to each target negative gradient value and the estimation constant value to obtain a preset GBDT model.
203. Obtaining a plurality of GBDT structures from a preset gradient boosting decision tree GBDT model, wherein each GBDT structure comprises a plurality of leaf nodes;
the server obtains a plurality of GBDT structures from a preset gradient boosting decision tree GBDT model.
It should be noted that there are multiple same leaf nodes in the GBDT structure, and when data processing is performed, the work efficiency is low because the same node is repeatedly determined and the leaf nodes are analyzed.
204. Traversing a plurality of GBDT structures, merging the GBDT structures, and generating a plurality of tree structures to be judged;
when a server merges nodes, it is first clear which nodes can be merged and which nodes cannot be merged, in this embodiment, the same nodes in the same layer in the GBDT structure can be merged, and different nodes cannot be merged.
Specifically, in one embodiment, the server traverses a plurality of GBDT structures, randomly selects two GBDT structures from the plurality of GBDT structures, and sequentially and downwardly determines whether the same layer of leaf nodes of the two randomly selected GBDT structures includes at least one group of the same nodes from a root node, if any one of the randomly selected GBDT structures includes at least one group of the same node, the same nodes are merged to complete merging of the GBDT structures once, a tree structure to be determined is generated, and merging of the GBDT structures is performed in the same manner for other GBDT structures from the plurality of GBDT structures, so that a plurality of tree structures to be determined are generated. In another embodiment, the server may also sequence the plurality of GBDT structures, and the specific method is as follows: the server reads the number of nodes corresponding to each GBDT structure, sequences the GBDT structures according to the sequence of the number of the nodes from top to bottom, selects two GBDT structures with the highest sequence to obtain two sequential GBDT structures, and sequentially judges the same node downwards from the root node according to the two sequential GBDT structures.
The specific process of merging the same nodes is as follows:
referring to FIG. 3, a plurality of first identical child nodes (nodes with b ≦ 30 and b > 30), a plurality of first sibling identical child nodes (nodes with c ≧ 1 and c < 1), a plurality of second identical child nodes (nodes with b ≦ 30 and b > 30) and a plurality of second sibling identical child nodes (nodes with d ≦ 5.5 and d <5.5) are extracted from a group of identical nodes in two randomly selected GBDT structures or two sequential GBDT structures, respectively. The plurality of first same child nodes are child nodes under a first same node (a is a node with a value larger than 10) in a group of same nodes, the plurality of first brother same child nodes are child nodes under brother same nodes (a is a node with a value less than or equal to 10) of the first same node, and the first same node is positioned in two randomly selected GBDT structures or any one GBDT structure (tree1) in two sequential GBDT structures; the plurality of second same child nodes are child nodes under a second same node (a is a node which is greater than 10) in a group of same nodes, the plurality of second same brother child nodes are child nodes under brother same nodes (a is a node which is less than or equal to 10) of the second same node, and the second same node is positioned in two randomly selected GBDT structures or the other GBDT structure (tree2) in the two sequential GBDT structures. The server integrates the first same child nodes and the first brother child nodes to obtain a target same child node set (comprising the nodes with b less than or equal to 30, the nodes with b greater than or equal to 30, the nodes with c greater than or equal to 1 and the nodes with c less than 1), and integrates the second same child nodes and the second brother same child nodes to obtain a pre-combined same child node set (comprising the nodes with b less than or equal to 30, the nodes with b greater than or equal to 30, the nodes with d greater than or equal to 5.5 and the nodes with d less than 5.5). The server judges whether the target child node set comprises the same node in the same child node set as the pre-combination, if so, deletes the corresponding child node (the node with b less than or equal to 30 and the node with b greater than or equal to 30) or the brother child node in the same child node set as the pre-combination, combines the rest second same child node under the first same node or combines the rest second brother same child node (the node with d more than or equal to 5.5 and the node with d less than 5.5) under the first brother same node, thereby completing the combination of the nodes and generating the tree structure to be judged 3. In another embodiment, if the target child node set includes a first same sibling child node that is the same as the pre-merged same child node set, the second same sibling child node is directly deleted, and then a plurality of second same child nodes are merged into the first same node, so that the merging of the nodes is completed, and the tree structure to be determined is generated.
It should be noted that, due to the characteristics of the GBDT structures, if two GBDT structures have the same node, the sibling nodes corresponding to the same node are also the same. Therefore, when merging child nodes under the same node, child nodes under siblings also need to be merged together. And if the two GBDT structures do not comprise the same node, the server continuously judges whether the next layer of nodes comprise the same node or not until the next layer of nodes of the target leaf node is judged.
For example, assuming that the tree1 and the tree2 are two randomly selected GBDT structures, it can be seen from the figure that the nodes a >10 and a ≦ 10 are siblings of each other, the node a >10 in the tree1 is the same as the node a >10 in the tree2, and due to the characteristics of the GBDT structures, the siblings of a >10 in the tree1 and the siblings of a >10 in the tree2 are also the same, that is, the node a ≦ 10 in the tree1 is the same as the node a ≦ 10 in the tree 2. The server merges a tree1 and a tree2, wherein the tree1 comprises nodes of b being less than or equal to 30 and b being greater than 30 in the tree2, the nodes of b being less than or equal to 30 and b being greater than 30 in the tree2 are deleted, the nodes of d being more than or equal to 5.5 and d being less than 5.5 are not included in the tree1, the nodes of d being more than or equal to 5.5 and d being less than 5.5 are directly merged to the nodes of a being less than or equal to 10 in the tree1, and therefore the tree structure to be judged 3 is obtained.
205. Acquiring a plurality of leaf node judgment conditions from a plurality of tree structures to be judged, wherein each leaf node judgment condition corresponds to one leaf node;
the server obtains a plurality of leaf node judgment conditions from a plurality of tree structures to be judged, and the leaf node judgment conditions in the embodiment are judgment conditions obtained by obtaining the characteristics of users in a preset storage medium for different users. In other embodiments, the server may also obtain the leaf node determination conditions by obtaining the user input characteristics, where each leaf node determination condition corresponds to one leaf node
206. Selecting leaf nodes to be judged from each layer of leaf nodes to obtain a plurality of leaf nodes to be judged, selecting target leaf nodes to be judged or brother leaf nodes corresponding to the target leaf nodes to be judged according to leaf node judging conditions corresponding to the leaf nodes to be judged to obtain a plurality of target leaf nodes, wherein the plurality of leaf nodes to be judged comprise first leaf nodes to be judged to Nth leaf nodes to be judged, and N is a positive integer.
And the server determines a plurality of target leaf nodes according to the plurality of leaf nodes and the plurality of tree structures to be judged. Since the tree structure to be judged is formed by combining a plurality of GBDT structures, there may be a case where nodes containing a plurality of leaf node judgment conditions in the same layer are all multiples of 2, that is, they appear in pairs. Therefore, the nodes including the judgment conditions of the plurality of leaf nodes are all the same as the father node and different from the father-different-mother brother node of the mother node. According to the method, a plurality of target leaf nodes are selected according to the characteristics of the brother nodes; the sibling node has the property of the opposite condition, for example, if the determination condition of sibling node 1 is a >10, then the determination condition of sibling node 2 is a 10. According to the characteristic that the conditions of the brother nodes are opposite, when judging whether the leaf nodes are target leaf nodes, only any one leaf node in the brother nodes which are mutually needs to be judged, and therefore the judgment times are reduced.
Specifically, referring to fig. 4, the server reads a plurality of determination conditions (a >10, a ≤ 10, c ≥ 1, and d <5.5) from each tree structure to be determined, the server determines layer by layer from the root node, and in the same layer of leaf nodes, the server selects any one of the nodes that are brother nodes (a >10 and a ≤ 10) of each other for determination, that is, whether the first leaf node to be determined (a ≤ 10) meets the corresponding determination condition, and if so, determines any one of the nodes that are brother nodes (c ≥ 1, c <1, d ≥ 5.5, and d <5.5), that is, whether the second leaf node to be determined meets the corresponding determination condition, and if so, determines two target leaf nodes (c ≥ 1 and d < 5.5). In other embodiments, the depth of the structure to be determined is not less than 2, and when the depth of the structure to be determined is not less than 2, the server continues to determine the next layer of the second leaf node to be determined, and after the above-mentioned multiple iterative determinations, the server determines multiple target leaf nodes. If one of the nodes (a >10 nodes) in the mutually brother nodes does not meet the corresponding judgment condition in a certain layer of nodes, iterative judgment is carried out according to the other leaf node (a is less than or equal to 10 nodes) in the mutually brother nodes, so that a plurality of target leaf nodes are determined.
For example, the recursion process of the tree1 is that a is greater than 10, c is greater than or equal to 1, AND the target leaf node with a less than or equal to 10AND c is obtained for 2 times AND meets the leaf node judgment condition; the recursion process of the tree2 is that a is more than 10, d is more than or equal to 5.5, the target leaf node satisfying the leaf node judgment condition AND a is less than or equal to 10AND d is less than 5.5 is obtained for 2 times, AND the leaf nodes satisfying the leaf node judgment condition before the combination of the obtained tree1 AND the tree2 need to be judged for 4 times. As shown in FIG. 7, the recursive process of the merged tree structure to be judged is that a is greater than 10, c is greater than or equal to 1, d is greater than or equal to 5.5, AND leaf nodes satisfying the leaf node conditions that a is less than or equal to 10AND c is greater than or equal to 1 AND a is less than or equal to 10AND d is less than 5.5 are obtained through judgment of the leaf node judgment conditions for 3 times. That is, in the GBDT structure with a depth of 2 and an iteration number of 2, the number of judgments performed according to the tree structure to be judged is reduced by 1 compared with the number of judgments performed according to the leaf node analysis of each tree.
Assuming that the setting depth of the GBDT structure is 5, the number of iterations is 100, and the leaf node of each tree is 64, that is, it is necessary to perform 32000 round-robin judgments to determine a plurality of target leaf nodes, and after combining a plurality of GBDT structures, the judgment of the leaf nodes is performed by combining the principle that the conditions of brother nodes are opposite, which can reduce at least half of round-robin judgments.
The server also uploads the target leaf node into the blockchain. The corresponding digest information is obtained based on each target leaf node, and specifically, the digest information is obtained by hashing the target leaf node, for example, using the sha256s algorithm. Uploading summary information to the blockchain can ensure the safety and the fair transparency of the user. The user equipment may download the summary information from the blockchain to verify whether the target leaf node is tampered with.
The blockchain referred to in this example is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm, and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
In the embodiment of the invention, a plurality of GBDT structures are combined, and the nodes are judged by combining the principle of opposite conditions of brother nodes, so that the node judgment times are reduced, the efficiency of analyzing the target leaf nodes is improved, and the effect of millisecond feedback is achieved.
In the above description of the node analysis method based on model training in the embodiment of the present invention, the following description of the node analysis device based on model training in the embodiment of the present invention refers to fig. 5, and an embodiment of the node analysis device based on model training in the embodiment of the present invention includes:
a structure obtaining module 501, configured to obtain multiple GBDT structures from a preset gradient boosting decision tree GBDT model, where each GBDT structure includes multiple leaf nodes;
a tree structure generating module 502, configured to traverse the multiple GBDT structures, merge the multiple GBDT structures, and generate multiple tree structures to be determined;
a judgment condition obtaining module 503, configured to obtain a plurality of leaf node judgment conditions from the plurality of tree structures to be judged, where each leaf node judgment condition corresponds to one leaf node;
the leaf node determining module 504 is configured to select a leaf node to be determined from each layer of leaf nodes to obtain a plurality of leaf nodes to be determined, and select a target leaf node to be determined or a brother leaf node corresponding to the target leaf node to be determined as a target leaf node according to a leaf node determining condition corresponding to each leaf node to be determined to obtain a plurality of target leaf nodes, where the plurality of leaf nodes to be determined include a first leaf node to be determined to an nth leaf node to be determined, and N is a positive integer.
In the embodiment of the invention, a plurality of GBDT structures are combined, and the nodes are judged by combining the principle of opposite conditions of brother nodes, so that the node judgment times are reduced, the efficiency of analyzing the target leaf nodes is improved, and the effect of millisecond feedback is achieved.
Referring to fig. 6, another embodiment of a node parsing apparatus based on model training according to an embodiment of the present invention includes:
a structure obtaining module 501, configured to obtain multiple GBDT structures from a preset gradient boosting decision tree GBDT model, where each GBDT structure includes multiple leaf nodes;
a tree structure generating module 502, configured to traverse the multiple GBDT structures, merge the multiple GBDT structures, and generate multiple tree structures to be determined;
a judgment condition obtaining module 503, configured to obtain a plurality of leaf node judgment conditions from the plurality of tree structures to be judged, where each leaf node judgment condition corresponds to one leaf node;
the leaf node determining module 504 is configured to select a leaf node to be determined from each layer of leaf nodes to obtain a plurality of leaf nodes to be determined, and select a target leaf node to be determined or a brother leaf node corresponding to the target leaf node to be determined as a target leaf node according to a leaf node determining condition corresponding to each leaf node to be determined to obtain a plurality of target leaf nodes, where the plurality of leaf nodes to be determined include a first leaf node to be determined to an nth leaf node to be determined, and N is a positive integer.
Optionally, the tree structure generating module 502 includes:
a determining unit 5021, configured to traverse the plurality of GBDT structures, and determine whether a plurality of leaf nodes in the plurality of GBDT structures include at least one group of identical nodes;
the merging unit 5022 is configured to merge multiple child nodes under at least one group of identical nodes to generate multiple tree structures to be determined if multiple leaf nodes in the multiple GBDT structures include at least one group of identical nodes.
Optionally, the judging unit 5021 may be further specifically configured to:
traversing the GBDT structures, randomly selecting two GBDT structures, and judging whether leaf nodes of the same layer of the two randomly selected GBDT structures comprise at least one group of same nodes;
if the same layer of leaf nodes of the two randomly selected GBDT structures at least comprise one group of same nodes, judging that the plurality of GBDT structures comprise at least one group of same nodes;
or,
respectively counting the number of nodes of the plurality of GBDT structures, selecting two GBDT structures with the highest number of nodes to obtain two sequential GBDT structures, and judging whether leaf nodes at the same layer of the two sequential GBDT structures comprise at least one group of same nodes;
and if the same layer of leaf nodes of the two sequential GBDT structures at least comprise one group of same nodes, judging that the plurality of GBDT structures comprise at least one group of same nodes.
Optionally, the merging unit 5022 includes:
a child node extracting subunit 50221, configured to extract a plurality of first identical child nodes and a plurality of first identical sibling child nodes from the two randomly selected GBDT structures or any one of the two sequential GBDT structures, and extract a plurality of second identical child nodes and a plurality of second identical sibling child nodes from another GBDT structure, if at least one group of identical nodes is included in the plurality of leaf nodes in the plurality of GBDT structures;
a merging subunit 50222, configured to merge the second same child node into the first same sibling child node, merge the second same sibling child node into the first same sibling child node, and generate a to-be-determined tree structure to be integrated;
the tree structure determination subunit 50223 is configured to obtain a plurality of other tree structures to be determined to be integrated according to other GBDT structures in the plurality of GBDT structures;
the integrating subunit 50224 is configured to integrate the tree structure to be determined to be integrated with the plurality of other tree structures to be determined to be integrated, so as to obtain a plurality of tree structures to be determined.
Optionally, the merging subunit 50222 may also be specifically configured to:
integrating the first same child nodes and the first same brother child nodes into a target same child node set, and integrating the second same child nodes and the second same brother child nodes into a pre-merged child node set;
determining whether the target identical child node set includes a first identical child node or a first identical sibling child node that is identical to the plurality of second identical child nodes or the plurality of second identical sibling child nodes;
if yes, deleting the corresponding second same child node or second same brother child node from the pre-merged same child node set, merging the remaining second same child nodes under the same node corresponding to the first same child node, merging the remaining second same brother child nodes under the same brother node corresponding to the first same brother child node, and generating a plurality of tree structures to be judged.
Optionally, the leaf node determining module 504 may be further specifically configured to:
reading a plurality of leaf node judgment conditions for each tree structure to be judged in the plurality of tree structures to be judged;
in any layer of nodes, randomly selecting a leaf node from leaf nodes which are brothers in a target tree structure to be judged as a first leaf node to be judged, and judging whether the first leaf node to be judged meets a corresponding leaf node judgment condition;
if the first leaf node to be judged meets the corresponding leaf node judgment condition, selecting a second leaf node to be judged in the next layer of nodes and judging whether the second leaf node to be judged meets the corresponding leaf node judgment condition;
if the second leaf node to be judged meets the corresponding leaf node judgment condition, continuing to judge the next layer of nodes until a plurality of target leaf nodes are determined in the plurality of leaf nodes to be judged;
if the first leaf node to be judged does not meet the corresponding leaf node judgment condition, determining a plurality of target leaf nodes based on the plurality of leaf node judgment conditions and the brother leaf node corresponding to the first leaf node to be judged;
further comprising uploading the plurality of target leaf nodes into a blockchain.
Optionally, the node analysis device based on model training further includes:
an estimating module 505, configured to obtain an initial GBDT model, initialize the initial GBDT model to obtain an estimated constant value, where the estimated constant value is a predicted constant value corresponding to a minimum loss function;
and a residual fitting module 506, configured to calculate negative gradient values of the loss function multiple times to obtain multiple target negative gradient values, and perform residual fitting one by one according to the multiple target negative gradient values and the estimated constant values to generate a preset gradient lifting decision tree GBDT model.
In the embodiment of the invention, a plurality of GBDT structures are combined, and the nodes are judged by combining the principle of opposite conditions of brother nodes, so that the node judgment times are reduced, the efficiency of analyzing the target leaf nodes is improved, and the effect of millisecond feedback is achieved.
Fig. 5 and fig. 6 describe the node analysis device based on model training in the embodiment of the present invention in detail from the perspective of a modular functional entity, and the node analysis device based on model training in the embodiment of the present invention is described in detail from the perspective of hardware processing.
Fig. 7 is a schematic structural diagram of a node analysis device based on model training according to an embodiment of the present invention, where the node analysis device 700 based on model training may generate relatively large differences due to different configurations or performances, and may include one or more processors (CPUs) 710 (e.g., one or more processors) and a memory 720, one or more storage media 730 (e.g., one or more mass storage devices) storing an application 733 or data 732. Memory 720 and storage medium 730 may be, among other things, transient storage or persistent storage. The program stored in the storage medium 730 may include one or more modules (not shown), each of which may include a series of instructions operating on the node resolution device 700 based on model training. Still further, processor 710 may be configured to communicate with storage medium 730 to execute a series of instruction operations in storage medium 730 on model-based trained node resolution device 700.
The model-based training node resolution device 700 may also include one or more power supplies 740, one or more wired or wireless network interfaces 750, one or more input-output interfaces 760, and/or one or more operating systems 731, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, and so forth. Those skilled in the art will appreciate that the model training based node resolution device configuration shown in fig. 7 does not constitute a limitation of the model training based node resolution device and may include more or fewer components than shown, or some components in combination, or a different arrangement of components.
The present invention also provides a computer-readable storage medium, which may be a non-volatile computer-readable storage medium, and which may also be a volatile computer-readable storage medium, having stored therein instructions, which, when run on a computer, cause the computer to perform the steps of the model-training based node parsing method.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A node analysis method based on model training is characterized by comprising the following steps:
obtaining a plurality of GBDT structures from a preset gradient boosting decision tree GBDT model, wherein each GBDT structure comprises a plurality of leaf nodes;
traversing the GBDT structures, merging the GBDT structures, and generating a plurality of tree structures to be judged;
acquiring a plurality of leaf node judgment conditions from the tree structures to be judged, wherein each leaf node judgment condition corresponds to one leaf node;
selecting leaf nodes to be judged from each layer of leaf nodes to obtain a plurality of leaf nodes to be judged, and selecting target leaf nodes to be judged or brother leaf nodes corresponding to the target leaf nodes to be judged according to leaf node judging conditions corresponding to the leaf nodes to be judged to obtain a plurality of target leaf nodes, wherein the plurality of leaf nodes to be judged comprise first leaf nodes to be judged to Nth leaf nodes to be judged, and N is a positive integer.
2. The model-training-based node parsing method of claim 1, wherein traversing the plurality of GBDT structures, merging the plurality of GBDT structures, and generating a plurality of tree structures to be determined comprises:
traversing the plurality of GBDT structures, and judging whether a plurality of leaf nodes in the plurality of GBDT structures comprise at least one group of same nodes;
and if the plurality of leaf nodes in the plurality of GBDT structures comprise at least one group of same nodes, combining a plurality of child nodes under the at least one group of same nodes to generate a plurality of tree structures to be judged.
3. The model training-based node parsing method of claim 2, wherein traversing the plurality of GBDT structures and determining whether a plurality of leaf nodes in the plurality of GBDT structures include at least one group of identical nodes comprises:
traversing the GBDT structures, randomly selecting two GBDT structures, and judging whether leaf nodes of the same layer of the two randomly selected GBDT structures comprise at least one group of same nodes;
if the same layer of leaf nodes of the two randomly selected GBDT structures at least comprise one group of same nodes, judging that the plurality of GBDT structures comprise at least one group of same nodes;
or,
respectively counting the number of nodes of the plurality of GBDT structures, selecting two GBDT structures with the highest number of nodes to obtain two sequential GBDT structures, and judging whether leaf nodes at the same layer of the two sequential GBDT structures comprise at least one group of same nodes;
and if the same layer of leaf nodes of the two sequential GBDT structures at least comprise one group of same nodes, judging that the plurality of GBDT structures comprise at least one group of same nodes.
4. The method according to claim 3, wherein if the plurality of leaf nodes in the plurality of GBDT structures include at least one group of identical nodes, merging a plurality of sub-nodes under the at least one group of identical nodes to generate a plurality of tree structures to be determined comprises:
if a plurality of leaf nodes in the plurality of GBDT structures comprise at least one group of same nodes, extracting a plurality of first same child nodes and a plurality of first same brother child nodes from the randomly selected two GBDT structures or any one GBDT structure in the two sequential GBDT structures, and extracting a plurality of second same child nodes and a plurality of second same brother child nodes from the other GBDT structure;
merging the second same child node to the first same child node, merging the second same brother node to the first same brother child node, and generating a tree structure to be judged to be integrated;
obtaining a plurality of other tree structures to be judged to be integrated according to other GBDT structures in the plurality of GBDT structures;
and integrating the tree structure to be judged to be integrated and the plurality of other tree structures to be judged to be integrated to obtain a plurality of tree structures to be judged.
5. The method of claim 4, wherein the merging the second same child node into the first same child node, merging the second same sibling node into the first same sibling child node, and generating the to-be-determined tree structure to be integrated comprises:
integrating the first same child nodes and the first same brother child nodes into a target same child node set, and integrating the second same child nodes and the second same brother child nodes into a pre-merged child node set;
determining whether the target identical child node set includes a first identical child node or a first identical sibling child node that is identical to the plurality of second identical child nodes or the plurality of second identical sibling child nodes;
if yes, deleting a corresponding second same child node or a second same brother child node from the pre-merged same child node set, merging the remaining second same child nodes under the same node corresponding to the first same child node, merging the remaining second same brother child nodes under the same brother node corresponding to the first same brother child node, and generating a plurality of tree structures to be judged.
6. The node analysis method based on model training according to claim 1, wherein the leaf nodes to be determined are selected from each layer of leaf nodes to obtain a plurality of leaf nodes to be determined, and a target leaf node to be determined or a sibling leaf node corresponding to the target leaf node to be determined is selected as a target leaf node according to a leaf node determination condition corresponding to each leaf node to be determined to obtain a plurality of target leaf nodes, the plurality of leaf nodes to be determined includes a first leaf node to be determined to a nth leaf node to be determined, where N is a positive integer, and includes:
reading a plurality of leaf node judgment conditions for each tree structure to be judged in the plurality of tree structures to be judged;
in any layer of nodes, randomly selecting a leaf node from leaf nodes which are brothers in a target tree structure to be judged as a first leaf node to be judged, and judging whether the first leaf node to be judged meets a corresponding leaf node judgment condition;
if the first leaf node to be judged meets the corresponding leaf node judgment condition, selecting a second leaf node to be judged in the next layer of nodes and judging whether the second leaf node to be judged meets the corresponding leaf node judgment condition;
if the second leaf node to be judged meets the corresponding leaf node judgment condition, continuing to judge the next layer of nodes until a plurality of target leaf nodes are determined in the plurality of leaf nodes to be judged;
if the first leaf node to be judged does not meet the corresponding leaf node judgment condition, determining a plurality of target leaf nodes based on the plurality of leaf node judgment conditions and the brother leaf node corresponding to the first leaf node to be judged;
further comprising uploading the plurality of target leaf nodes into a blockchain.
7. The model-training-based node parsing method according to any one of claims 1-6, wherein before the obtaining a plurality of GBDT structures from a pre-set gradient boosting decision tree GBDT model, each GBDT structure comprising a plurality of leaf nodes, the model-training-based node parsing method further comprises:
acquiring an initial GBDT model, initializing the initial GBDT model to obtain an estimated constant value, wherein the estimated constant value is a predicted constant value corresponding to a minimum loss function;
and calculating the negative gradient values of the loss function for multiple times to obtain a plurality of target negative gradient values, and performing residual error fitting one by one according to the target negative gradient values and the estimated constant values to generate a preset gradient lifting decision tree GBDT model.
8. A node analysis device based on model training is characterized in that the node analysis device based on model training comprises:
the structure obtaining module is used for obtaining a plurality of GBDT structures from a preset gradient lifting decision tree GBDT model, and each GBDT structure comprises a plurality of leaf nodes;
the tree structure generating module is used for traversing the GBDT structures, merging the GBDT structures and generating a plurality of tree structures to be judged;
a judgment condition obtaining module, configured to obtain a plurality of leaf node judgment conditions from the plurality of tree structures to be judged, where each leaf node judgment condition corresponds to one leaf node;
the leaf node determining module is used for selecting leaf nodes to be judged from each layer of leaf nodes to obtain a plurality of leaf nodes to be judged, selecting a target leaf node to be judged or a brother leaf node corresponding to the target leaf node to be judged as a target leaf node according to a leaf node judging condition corresponding to each leaf node to be judged to obtain a plurality of target leaf nodes, wherein the plurality of leaf nodes to be judged comprise a first leaf node to be judged to a Nth leaf node to be judged, and N is a positive integer.
9. A node analysis device based on model training, characterized in that the node analysis device based on model training comprises: a memory having instructions stored therein and at least one processor, the memory and the at least one processor interconnected by a line;
the at least one processor invokes the instructions in the memory to cause the model training based node resolution appliance to perform the model training based node resolution method of any of claims 1-7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the model training based node resolution method according to any one of claims 1 to 7.
CN202010433477.4A 2020-05-21 2020-05-21 Node analysis method, device, equipment and storage medium based on model training Active CN111738450B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010433477.4A CN111738450B (en) 2020-05-21 2020-05-21 Node analysis method, device, equipment and storage medium based on model training

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010433477.4A CN111738450B (en) 2020-05-21 2020-05-21 Node analysis method, device, equipment and storage medium based on model training

Publications (2)

Publication Number Publication Date
CN111738450A true CN111738450A (en) 2020-10-02
CN111738450B CN111738450B (en) 2024-05-28

Family

ID=72647524

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010433477.4A Active CN111738450B (en) 2020-05-21 2020-05-21 Node analysis method, device, equipment and storage medium based on model training

Country Status (1)

Country Link
CN (1) CN111738450B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130204902A1 (en) * 2012-02-03 2013-08-08 Apple Inc. Enhanced B-Trees with Record Merging
US20130232104A1 (en) * 2011-08-02 2013-09-05 Cavium, Inc. Duplication in decision trees
CN103902591A (en) * 2012-12-27 2014-07-02 中国科学院深圳先进技术研究院 Decision tree classifier establishing method and device
CN107590263A (en) * 2017-09-22 2018-01-16 辽宁工程技术大学 A kind of distributed big data sorting technique based on multi-variable decision tree-model
CN110990829A (en) * 2019-11-21 2020-04-10 支付宝(杭州)信息技术有限公司 Method, device and equipment for training GBDT model in trusted execution environment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130232104A1 (en) * 2011-08-02 2013-09-05 Cavium, Inc. Duplication in decision trees
US20130204902A1 (en) * 2012-02-03 2013-08-08 Apple Inc. Enhanced B-Trees with Record Merging
CN103902591A (en) * 2012-12-27 2014-07-02 中国科学院深圳先进技术研究院 Decision tree classifier establishing method and device
CN107590263A (en) * 2017-09-22 2018-01-16 辽宁工程技术大学 A kind of distributed big data sorting technique based on multi-variable decision tree-model
CN110990829A (en) * 2019-11-21 2020-04-10 支付宝(杭州)信息技术有限公司 Method, device and equipment for training GBDT model in trusted execution environment

Also Published As

Publication number Publication date
CN111738450B (en) 2024-05-28

Similar Documents

Publication Publication Date Title
WO2019238109A1 (en) Fault root cause analysis method and apparatus
Shonkwiler Parallel genetic algorithms.
Orr Recent advances in radial basis function networks
CN112417028B (en) Wind speed time sequence characteristic mining method and short-term wind power prediction method
CN110458187A (en) A kind of malicious code family clustering method and system
CN111260220B (en) Group control equipment identification method and device, electronic equipment and storage medium
CN110928986B (en) Legal evidence ordering and recommending method, legal evidence ordering and recommending device, legal evidence ordering and recommending equipment and storage medium
CN110069690B (en) Method, device and medium for topic web crawler
US7991617B2 (en) Optimum design management apparatus from response surface calculation and method thereof
CN111325223A (en) Deep learning model training method and device and computer readable storage medium
CN113377964B (en) Knowledge graph link prediction method, device, equipment and storage medium
CN110858805A (en) Method and device for predicting network traffic of cell
Iqbal et al. Reusing extracted knowledge in genetic programming to solve complex texture image classification problems
CN113761026A (en) Feature selection method, device, equipment and storage medium based on conditional mutual information
CN111984842B (en) Bank customer data processing method and device
CN112257332B (en) Simulation model evaluation method and device
Shaw et al. The critical‐item, upper bounds, and a branch‐and‐bound algorithm for the tree knapsack problem
CN111738450A (en) Node analysis method, device and equipment based on model training and storage medium
CN112529543A (en) Method, device and equipment for verifying mutual exclusion relationship of workflow and storage medium
CN115130043B (en) Database-based data processing method, device, equipment and storage medium
CN107944045B (en) Image search method and system based on t distribution Hash
US20130204818A1 (en) Modeling method of neuro-fuzzy system
CN113571198B (en) Conversion rate prediction method, conversion rate prediction device, conversion rate prediction equipment and storage medium
CN116366312A (en) Web attack detection method, device and storage medium
CN115688853A (en) Process mining method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant