CN110084376A - To the method and device of the automatic branch mailbox of data - Google Patents
To the method and device of the automatic branch mailbox of data Download PDFInfo
- Publication number
- CN110084376A CN110084376A CN201910362666.4A CN201910362666A CN110084376A CN 110084376 A CN110084376 A CN 110084376A CN 201910362666 A CN201910362666 A CN 201910362666A CN 110084376 A CN110084376 A CN 110084376A
- Authority
- CN
- China
- Prior art keywords
- branch mailbox
- initial vector
- condition
- objective function
- brings
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/11—Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Optimization (AREA)
- Software Systems (AREA)
- Pure & Applied Mathematics (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Algebra (AREA)
- Computing Systems (AREA)
- Operations Research (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Complex Calculations (AREA)
Abstract
The present invention relates to technical field of data processing, more particularly to the method and device of a kind of pair of automatic branch mailbox of data, this method specifically includes: obtaining the essential characteristic data and branch mailbox condition of user's input, it brings branch mailbox condition into pre-defined function and obtains objective function, initial vector is determined according to branch mailbox condition, brings initial vector into the objective function determining direction of search to essential characteristic data.And then it puts to be adjusted initial vector according to the direction of search and bring objective function on the basis of initial vector and obtains corresponding functional value, when the difference of the latter functional value and current function value is less than default convergence precision, then determine the corresponding initial vector of the latter functional value as cut-point, the last essential characteristic data progress branch mailbox inputted according to the multiple cut-points determined to user.Quick branch mailbox can be realized by this programme, so that the correlation degree between each branch mailbox is minimum, and then carry out objective scoring operation convenient for the data inputted to user.
Description
Technical field
The present invention relates to technical field of data processing, in particular to the method and dress of a kind of pair of automatic branch mailbox of data
It sets.
Background technique
With big data, the development of artificial intelligence technology and universal, more and more financial institutions are increased to engineering
The attention degree of habit, the intelligence that the management method by tradition based on manual decision is gradually changed into based on data-driven are determined
Plan.Especially in bank individual financial business, such as credit card business, consumer finance business field, because its single amount of money is small,
Demand frequency is high, timeliness requires the reasons such as height, causes to be unable to satisfy business demand using the mode of traditional artificial examination & approval.Use machine
Device learning method carries out risk management, is based particularly on the scorecard model of logistic regression, because it is easy to explain, quickly repeatedly
Generation, mature and stable feature, just gradually adopted by vast bank.During scorecard, branch mailbox is particularly important one
Link, branch mailbox can be improved model stability, improve calculated performance, but how realize automatic branch mailbox, how by branch mailbox process
Optimize a problem in always machine learning modeling.
The main method of branch mailbox includes: equal frequency divisions case, wide branch mailbox, automatic branch mailbox etc., wherein wait frequency divisions case mainly by
Data accounting carries out branch mailbox, and such as every 10% data as a case, mainly press feature maximin and divide equally progress by wide branch mailbox
Branch mailbox, such as the oldest minimum span are 50, are used as a case within every 10 years old, are divided into 5 casees, and disadvantage is to weaken feature value difference
Influence to response variable.
Now widely used in automatic branch mailbox method includes automation branch mailbox, card side branch mailbox (Chi- based on decision tree
Merge), wherein the automation branch mailbox core concept based on decision tree is based on entropy and information gain, determining keeps segmentation front and back special
It levies the maximum point of information gain and realizes automatic branch mailbox by constantly dividing to child node.Card side's branch mailbox core concept is base
Classification is gradually merged in feature chi-square value value, iteration reaches termination condition.
The above automatic branch mailbox method of two classes is too sensitive to stopping criterion for iteration, such as tree depth, minimum tankage, is easy
Overfitting problem is caused, meanwhile, the automatic branch mailbox method of two classes is limited to constraint condition tenability, and (such as certain class data is necessary for one
Case specifies chest section etc.), the branch mailbox issue requirement in practical modeling process can not be fully met.
Summary of the invention
The purpose of the present invention is to provide the method for a kind of pair of automatic branch mailbox of data, with realize fast and effeciently by data into
Row branch mailbox achievees the effect that automatic branch mailbox so that the degree of association between adjacent two casees is minimum with this.
To achieve the goals above, technical solution used in the embodiment of the present invention is as follows:
In a first aspect, the embodiment of the invention provides the methods of a kind of pair of automatic branch mailbox of data, which comprises obtain
The essential characteristic data and branch mailbox condition of user's input;It brings the branch mailbox condition into pre-defined function and obtains objective function;
Initial vector is determined according to the branch mailbox condition, brings the initial vector into the objective function, is determined to described substantially special
Levy the direction of search of data;Point is adjusted the initial vector according to described search direction on the basis of the initial vector
And it brings the objective function into and obtains corresponding functional value;When the difference of the latter functional value and current function value is less than default receive
Hold back precision, it is determined that the corresponding initial vector adjusted of the latter functional value is as cut-point;It is multiple described according to what is determined
Cut-point carries out branch mailbox to the essential characteristic data that user inputs.
Second aspect, the embodiment of the invention also provides the device of a kind of pair of automatic branch mailbox of data, described device includes: to receive
Module is sent out, for obtaining the essential characteristic data and branch mailbox condition of user's input;Processing module is used for the branch mailbox condition
It brings pre-defined function into and obtains objective function;Initializaing variable is determined according to the branch mailbox condition, brings the initializaing variable into institute
Objective function is stated, determines the direction of search to the essential characteristic data;Point is searched according to described on the basis of the initial vector
Suo Fangxiang, which is adjusted the initial vector and brings the objective function into, obtains corresponding functional value;When the latter functional value
It is less than default convergence precision with the difference of current function value, it is determined that the corresponding initial vector adjusted of the latter functional value is made
For cut-point;Branch mailbox is carried out to the essential characteristic data that user inputs according to the multiple cut-points determined.
The method and device of the automatic branch mailbox of a kind of pair of data provided in an embodiment of the present invention, this method specifically include: obtaining
The essential characteristic data and branch mailbox condition of user's input, bring branch mailbox condition into pre-defined function and obtain objective function, according to
Branch mailbox condition determines initial vector, brings initial vector into the objective function determining direction of search to essential characteristic data.In turn
Point, which is adjusted initial vector according to the direction of search and brings objective function into, on the basis of initial vector obtains corresponding function
Value, when the difference of the latter functional value and current function value is less than default convergence precision, it is determined that the latter functional value is corresponding
Initial vector is last to carry out branch mailbox to the essential characteristic data that user inputs according to the multiple cut-points determined as cut-point.
Quick branch mailbox can be realized by this programme, so that the correlation degree between each branch mailbox is minimum, and then convenient for inputting to user
Data carry out objective scoring operation.
To enable the above objects, features and advantages of the present invention to be clearer and more comprehensible, preferred embodiment is cited below particularly, and cooperate
Appended attached drawing, is described in detail below.
Detailed description of the invention
In order to illustrate the technical solution of the embodiments of the present invention more clearly, below will be to needed in the embodiment attached
Figure is briefly described, it should be understood that the following drawings illustrates only certain embodiments of the present invention, therefore is not construed as pair
The restriction of range for those of ordinary skill in the art without creative efforts, can also be according to this
A little attached drawings obtain other relevant attached drawings.
Fig. 1 shows the flow diagram of the method for the automatic branch mailbox of a kind of pair of data provided in an embodiment of the present invention.
Fig. 2 shows the functional block diagrams of the device of the automatic branch mailbox of a kind of pair of data provided in an embodiment of the present invention.
Diagram: device of the 200- to the automatic branch mailbox of data;210- transceiver module;220- processing module.
Specific embodiment
Below in conjunction with attached drawing in the embodiment of the present invention, technical solution in the embodiment of the present invention carries out clear, complete
Ground description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.Usually exist
The component of the embodiment of the present invention described and illustrated in attached drawing can be arranged and be designed with a variety of different configurations herein.Cause
This, is not intended to limit claimed invention to the detailed description of the embodiment of the present invention provided in the accompanying drawings below
Range, but it is merely representative of selected embodiment of the invention.Based on the embodiment of the present invention, those skilled in the art are not doing
Every other embodiment obtained under the premise of creative work out, shall fall within the protection scope of the present invention.
It should also be noted that similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi
It is defined in a attached drawing, does not then need that it is further defined and explained in subsequent attached drawing.Meanwhile of the invention
In description, term " first ", " second " etc. are only used for distinguishing description, are not understood to indicate or imply relative importance.
In bank individual financial business, such as credit card business, consumer finance business field, because its single amount of money is small,
Demand frequency is high, can bring biggish workload by manual examination and verification.Bank or financial institution mostly pass through scoring card mold at present
Type scores to every master data that user inputs, to be decided whether by appraisal result as user's processes financial business,
What this method can quickly improve bank individual financial business handles efficiency.And carrying out branch mailbox operation to data is in scorecard model
An important link, being equivalent to branch mailbox is that the data of user's input are divided into multiple groups, and scorecard model is again according to certain
Logic give a mark respectively to the data of each group, finally obtain appraisal result.As it can be seen that data are divided into association journey by branch mailbox
Alap group is spent, is conducive to subsequent scorecard model and scores data, and then the appraisal result finally obtained is more
It is accurate.Present solution provides the methods of a kind of pair of automatic branch mailbox of data, can be realized by this programme and be divided automatically data
Case reaches preferable branch mailbox effect so that the degree of association between adjacent two casees is minimum with this.
Fig. 1 is please referred to, is the flow diagram of the method for the automatic branch mailbox of a kind of pair of data provided in an embodiment of the present invention, it should
Method includes:
S110 obtains the essential characteristic data and branch mailbox condition of user's input.
Specifically, the essential characteristic data of user input include the essential information of user, such as age, height, weight, receipts
Enter.The branch mailbox condition includes the ratio data in branch mailbox number and every case, if branch mailbox number is 5, the ratio data in every case
It is 10%, i.e., the essential characteristic data that user inputs is divided into 5 casees, and the data for including in every case are no less than total data
10%.
S120 brings branch mailbox condition into pre-defined function and obtains objective function.
Specifically, the branch mailbox condition includes the ratio data in branch mailbox number and every case, and then will be in branch mailbox number and every case
Ratio data bring pre-defined function into and obtain objective function, the expression way of the objective function are as follows:
Wherein, whereinIt indicates to minimize correlation degree, s.t. indicates constraint condition, Ci(x)-m indicates to divide
Case number constraint condition, m indicate branch mailbox number;Ci(x)-p indicates every case minimum scale, wherein Ci(x) constraint condition of x is indicated
Function.
In order to solve above procedure, need nonlinear optimization process simplification to be quadratic programming problem, and then need to be first right
Objective function solves Lagrangian, then carries out second approximation to Lagrangian function and solve to obtain quadratic programming to ask
Topic.
First step solves the mode of Lagrangian to objective function are as follows:
L (x)=f (x)+λ G (x)+μ S (x)
Wherein, L (x) indicates that Lagrangian, G (x) are branch mailbox number constraint condition G (x)=Ci(x)-m, S (x) are
Every case ratio S (x)=Ci(x)-p, λ are Lagrange factor, and u is branch mailbox scale factor.
Second step carries out second approximation solution to Lagrangian, can find out the optimal of former nonlinear optimization
Solution, i.e. quadratic programming problem, calculation are as follows:
Wherein,Hk indicates the Hessian matrix of kth time iteration
(Hessian matrix) the i.e. second dervative of objective function, xkIndicate a certain specific value of x, d indicates the variable direction of search.
S130 determines initial vector according to branch mailbox condition, brings initial vector into objective function, determines to essential characteristic number
According to the direction of search.
Specifically, including branch mailbox number in the branch mailbox condition, the branch mailbox number inputted such as user is 5 casees, then initial vector xk
It can be defined as x1 to x4, i.e. the essential characteristic data by user's input are cut 4 times, and 5 groups of data are obtained.In turn by the determination
Initial vector brings the above-mentioned objective function for being converted into quadratic programming problem into, to determine the searcher to essential characteristic data
To.Specific method of determination are as follows:
Firstly, carrying out first derivation to quadratic programming problem obtains gradient vector.
Its calculation are as follows:
Wherein, gkCharacterize gradient vector.
Secondly, carrying out second order derivation to quadratic programming problem obtains Hessian matrix.
Since Hessian matrix calculating process is needed to original function in different xkDerivation is carried out, in order to reduce calculation amount, when right
The branch mailbox number of essential characteristic data is less than predetermined threshold (such as 100), will use the approximate optimal solution of Newton Algorithm Hessian matrix,
When the branch mailbox number to essential characteristic data is greater than predetermined threshold (such as 100), then using the approximation of BFGS algorithm solution Hessian matrix
Optimal solution.And then using the approximate optimal solution of Hessian matrix as the calculated result for carrying out second order derivation to quadratic programming problem.
Wherein, by the way of the approximate optimal solution of Newton Algorithm Hessian matrix are as follows:
By the way of the approximate optimal solution that BFGS algorithm solves Hessian matrix are as follows:
Enable yk=gk+1-gk,sk=xk+1-xk;
The Hessian matrix of iterative process can be used Bk and carry out approximate, i.e. H ≈ B:
Bk+1=Bk+△Bk
Wherein, Bk be unit matrix, i.e., diagonal line be 1 matrix, △ BkIndicate Bk differential;
Finally gradient vector and Hessian matrix are carried out according to pre-defined rule direction vector, direction vector table is calculated
Levy the direction of search to essential characteristic data.
The calculation are as follows:
Wherein, HkCharacterize Hessian matrix, gkCharacterize gradient vector, dkDirection vector is characterized, direction vector is to basic
The direction of search of characteristic.
S140, point, which is adjusted initial vector according to the direction of search and brings objective function into, on the basis of initial vector obtains
To corresponding functional value.
Specifically, user will also input iteration step length and the number of iterations, which uses αkIt indicates, settable 1
To 1000 step-lengths, defaulting step-length is 1;The number of iterations is indicated using k, may be configured as any the number of iterations greater than 1, default value
It is 10.And then point is adjusted initial vector according to the direction of search on the basis of initial vector, such as initial vector xkExtremely for x1
X4 then adds step-length to each of initial vector value in its direction of search, then by the initial vector band adjusted
Enter and obtains corresponding functional value in objective function.When the difference of the functional value being calculated functional value corresponding with initial vector accords with
Conjunction condition, or when reaching the number of iterations, operation stops.
S150, when the difference of the latter functional value and current function value is less than default convergence precision, it is determined that the latter letter
The corresponding initial vector adjusted of numerical value is as cut-point.
Specifically, bringing initial vector adjusted into objective function obtains functional value, the functional value is referred to as latter
A functional value brings initial vector into objective function and obtains functional value, which is referred to as current function value, if the latter
The difference of functional value and current function value is less than default convergence precision, shows current group, the degree of association between group is minimum, then
The corresponding initial vector adjusted of the latter functional value is as cut-point.If at this point, the latter functional value and current function value
Difference be greater than default convergence precision, then reassign initial vector, i.e., with αk+xkAs new initial vector, (i.e. by before
Initial vector plus a step-length as new initial vector), and it is true that the initial vector reassigned repeated above-mentioned algorithm
Determine the direction of search, and compare again and bring the functional value that objective function is calculated into, to redefine cut-point.
S160 carries out branch mailbox to the essential characteristic data that user inputs according to the multiple cut-points determined.
Specifically, each cut-point corresponds to the position being split to essential characteristic data, and then can be according to determining
Multiple cut-points carry out branch mailbox to the essential characteristic data that user inputs, to obtain meeting the branch mailbox number and branch mailbox of user's input
The multi-group data of ratio.The degree of association is lower between the finally obtained multi-group data, is based on the packet count convenient for scorecard model
According to scoring operation is carried out, computational accuracy is improved.
It can be seen that the method for the automatic branch mailbox of a kind of pair of data provided by the invention, user only need to input essential characteristic
Master datas and the restrictive conditions such as data, branch mailbox condition, iteration step length and the number of iterations can be calculated by the algorithm of setting
Optimal cut-point is scored convenient for following model based on the data of branch mailbox out with completing branch mailbox processing to essential characteristic data
Operation.Program beneficial effect mainly includes two aspects that
1. compensate for tradition etc. frequency, wide method do not consider influence of the variable-value to response variable, using tradition etc. frequency,
It when wide method carries out branch mailbox, ignores in characteristic interval difference, such as age and overdue relationship, span is 20 to 50 years old, is adopted
With an every 5 years old case of wide method, but often in youth, overdue rate is higher in actual conditions.
The problem of traditional automatic branch mailbox is sensitive to parameter preset 2. compensating for, and causes over-fitting, using SQP method, user
Only need to be arranged step-length and the number of iterations, optimize IV process and be automatically performed by algorithm, reduce to modeling personnel's experience according to
Rely.
It referring to figure 2., is that the functional module of device 200 of the automatic branch mailbox of a kind of pair of data provided in an embodiment of the present invention is shown
It is intended to, which includes transceiver module 210 and processing module 220.
Transceiver module 210, for obtaining the essential characteristic data and branch mailbox condition of user's input.
In embodiments of the present invention, S110 can be executed by transceiver module 210.
Processing module 220 obtains objective function for bringing branch mailbox condition into pre-defined function;It is determined according to branch mailbox condition
Initializaing variable brings initializaing variable into objective function, determines the direction of search to essential characteristic data;On the basis of initial vector
Point, which is adjusted initial vector according to the direction of search and brings objective function into, obtains corresponding functional value;When the latter functional value
It is less than default convergence precision with the difference of current function value, it is determined that the corresponding initial vector adjusted of the latter functional value is made
For cut-point;Branch mailbox is carried out to the essential characteristic data that user inputs according to the multiple cut-points determined.
In embodiments of the present invention, S120~S160 can be executed by processing module 220.
Due to having been described in the method part to the automatic branch mailbox of data, details are not described herein.
In conclusion the method and device of the automatic branch mailbox of a kind of pair of data provided in an embodiment of the present invention, this method are specific
Include: the essential characteristic data and branch mailbox condition for obtaining user's input, brings branch mailbox condition into pre-defined function and obtain target
Function determines initial vector according to branch mailbox condition, brings initial vector into the objective function determining search to essential characteristic data
Direction.And then point is adjusted initial vector according to the direction of search and brings objective function into and obtains pair on the basis of initial vector
The functional value answered, when the difference of the latter functional value and current function value is less than default convergence precision, it is determined that the latter function
Be worth corresponding initial vector as cut-point, the last essential characteristic data that user is inputted according to the multiple cut-points determined into
Row branch mailbox.Quick branch mailbox can be realized by this programme, so that the correlation degree between each branch mailbox is minimum, and then convenient for user
The data of input carry out objective scoring operation.
In several embodiments provided herein, it should be understood that disclosed device and method can also pass through
Other modes are realized.The apparatus embodiments described above are merely exemplary, for example, flow chart and block diagram in attached drawing
Show the device of multiple embodiments according to the present invention, the architectural framework in the cards of method and computer program product,
Function and operation.In this regard, each box in flowchart or block diagram can represent the one of a module, section or code
Part, a part of the module, section or code, which includes that one or more is for implementing the specified logical function, to be held
Row instruction.It should also be noted that function marked in the box can also be to be different from some implementations as replacement
The sequence marked in attached drawing occurs.For example, two continuous boxes can actually be basically executed in parallel, they are sometimes
It can execute in the opposite order, this depends on the function involved.It is also noted that every in block diagram and or flow chart
The combination of box in a box and block diagram and or flow chart can use the dedicated base for executing defined function or movement
It realizes, or can realize using a combination of dedicated hardware and computer instructions in the system of hardware.
In addition, each functional module in each embodiment of the present invention can integrate one independent portion of formation together
Point, it is also possible to modules individualism, an independent part can also be integrated to form with two or more modules.
It, can be with if the function is realized and when sold or used as an independent product in the form of software function module
It is stored in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially in other words
The part of the part that contributes to existing technology or the technical solution can be embodied in the form of software products, the meter
Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be a
People's computer, server or network equipment etc.) it performs all or part of the steps of the method described in the various embodiments of the present invention.
It should be noted that, in this document, relational terms such as first and second and the like are used merely to an entity or behaviour
Make with another entity or operate distinguish, without necessarily requiring or implying between these entities or operation there are it is any this
The actual relationship of kind or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to nonexcludability
Include so that include a series of elements process, method, article or equipment not only include those elements, but also
Including other elements that are not explicitly listed, or further include for this process, method, article or equipment intrinsic want
Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including described want
There is also other identical elements in the process, method, article or equipment of element.
The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, for the skill of this field
For art personnel, the invention may be variously modified and varied.All within the spirits and principles of the present invention, made any to repair
Change, equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.It should also be noted that similar label and letter exist
Similar terms are indicated in following attached drawing, therefore, once being defined in a certain Xiang Yi attached drawing, are then not required in subsequent attached drawing
It is further defined and explained.
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any
Those familiar with the art in the technical scope disclosed by the present invention, can easily think of the change or the replacement, and should all contain
Lid is within protection scope of the present invention.Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (10)
1. the method for a kind of pair of automatic branch mailbox of data, which is characterized in that the described method includes:
Obtain the essential characteristic data and branch mailbox condition of user's input;
It brings the branch mailbox condition into pre-defined function and obtains objective function;
Initial vector is determined according to the branch mailbox condition, is brought the initial vector into the objective function, is determined to the base
The direction of search of eigen data;
Point is adjusted the initial vector according to described search direction and brings the mesh on the basis of the initial vector
Scalar functions obtain corresponding functional value;
When the difference of the latter functional value and current function value is less than default convergence precision, it is determined that the latter functional value pair
The initial vector adjusted answered is as cut-point;
Branch mailbox is carried out to the essential characteristic data that user inputs according to the multiple cut-points determined.
2. the method as described in claim 1, which is characterized in that described to bring the branch mailbox condition into pre-defined function and obtain mesh
After scalar functions comprising steps of
Lagrangian is solved to the objective function;
Second approximation is carried out to the Lagrangian to solve to obtain quadratic programming problem.
3. method according to claim 2, which is characterized in that it is described that initializaing variable is determined according to the branch mailbox condition, by institute
Stating the step of initializaing variable brings the objective function into, determines the direction of search to the essential characteristic data includes:
Initializaing variable is determined according to the branch mailbox number for including in the branch mailbox condition, and brings the initializaing variable into the secondary rule
The problem of drawing;
First derivation is carried out to the quadratic programming problem and obtains gradient vector;
Second order derivation is carried out to the quadratic programming problem and obtains Hessian matrix;
The gradient vector and Hessian matrix are carried out according to pre-defined rule direction vector, the direction vector characterization is calculated
To the direction of search of the essential characteristic data.
4. method as claimed in claim 3, which is characterized in that described to be obtained to quadratic programming problem progress second order derivation
The step of Hessian matrix includes:
When the branch mailbox number be less than predetermined threshold, the approximate optimal solution of the Hessian matrix is solved using Newton's algorithm;
When the branch mailbox number be greater than predetermined threshold, the approximate optimal solution of the Hessian matrix is solved using BFGS algorithm.
5. the method as described in claim 1, which is characterized in that the point on the basis of the initial vector is according to described search
Direction is adjusted the initial vector and brings the step of objective function obtains corresponding functional value into
Obtain the iteration step length and the number of iterations of user's input;
The initial vector is adjusted according to the iteration step length and brings the objective function into and obtains corresponding functional value,
And after the number of iterations reaches, stop operation.
6. the device of a kind of pair of automatic branch mailbox of data, which is characterized in that described device includes:
Transceiver module, for obtaining the essential characteristic data and branch mailbox condition of user's input;
Processing module obtains objective function for bringing the branch mailbox condition into pre-defined function;It is true according to the branch mailbox condition
Determine initializaing variable, brings the initializaing variable into the objective function, determine the direction of search to the essential characteristic data;With
Point is adjusted the initial vector according to described search direction and brings the objective function on the basis of the initial vector
Obtain corresponding functional value;When the difference of the latter functional value and current function value is less than default convergence precision, it is determined that described
The corresponding initial vector adjusted of functional value described in the latter is as cut-point;According to determine multiple cut-points to
The essential characteristic data of family input carry out branch mailbox.
7. device as claimed in claim 6, which is characterized in that the processing module is also used to: being solved to the objective function
Lagrangian;
Second approximation is carried out to the Lagrangian to solve to obtain quadratic programming problem.
8. device as claimed in claim 7, which is characterized in that the processing module is specifically used for: according to the branch mailbox condition
In include branch mailbox number determine initializaing variable, and bring the initializaing variable into the quadratic programming problem;
First derivation is carried out to the quadratic programming problem and obtains gradient vector;
Second order derivation is carried out to the quadratic programming problem and obtains Hessian matrix;
The gradient vector and Hessian matrix are carried out according to pre-defined rule direction vector, the direction vector characterization is calculated
To the direction of search of the essential characteristic data.
9. device as claimed in claim 8, which is characterized in that the processing module is specifically used for: when the branch mailbox number is less than
Predetermined threshold solves the approximate optimal solution of the Hessian matrix using Newton's algorithm;
When the branch mailbox number be greater than predetermined threshold, the approximate optimal solution of the Hessian matrix is solved using BFGS algorithm.
10. device as claimed in claim 6, which is characterized in that the processing module is specifically used for: obtaining changing for user's input
It rides instead of walk long and the number of iterations;
The initial vector is adjusted according to the iteration step length and brings the objective function into and obtains corresponding functional value,
And after the number of iterations reaches, stop operation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910362666.4A CN110084376B (en) | 2019-04-30 | 2019-04-30 | Method and device for automatically separating data into boxes |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910362666.4A CN110084376B (en) | 2019-04-30 | 2019-04-30 | Method and device for automatically separating data into boxes |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110084376A true CN110084376A (en) | 2019-08-02 |
CN110084376B CN110084376B (en) | 2021-05-14 |
Family
ID=67418143
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910362666.4A Active CN110084376B (en) | 2019-04-30 | 2019-04-30 | Method and device for automatically separating data into boxes |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110084376B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110909085A (en) * | 2019-11-25 | 2020-03-24 | 深圳前海微众银行股份有限公司 | Data processing method, device, equipment and storage medium |
CN112819034A (en) * | 2021-01-12 | 2021-05-18 | 平安科技(深圳)有限公司 | Data binning threshold calculation method and device, computer equipment and storage medium |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050079508A1 (en) * | 2003-10-10 | 2005-04-14 | Judy Dering | Constraints-based analysis of gene expression data |
US20130117165A1 (en) * | 2009-12-11 | 2013-05-09 | International Business Machines Corporation | Merchandise hierarchy refinement by incorporation of product correlation |
CN104537067A (en) * | 2014-12-30 | 2015-04-22 | 广东电网有限责任公司信息中心 | Box separation method based on k-means clustering |
CN106547758A (en) * | 2015-09-17 | 2017-03-29 | 阿里巴巴集团控股有限公司 | A kind of method and apparatus of data branch mailbox |
CN107169511A (en) * | 2017-04-27 | 2017-09-15 | 华南理工大学 | Clustering ensemble method based on mixing clustering ensemble selection strategy |
CN108399255A (en) * | 2018-03-06 | 2018-08-14 | 中国银行股份有限公司 | A kind of input data processing method and device of Classification Data Mining model |
CN108984790A (en) * | 2018-07-31 | 2018-12-11 | 蜜小蜂智慧(北京)科技有限公司 | A kind of data branch mailbox method and device |
CN109063222A (en) * | 2018-11-04 | 2018-12-21 | 吉铁磊 | A kind of self-adapting data searching method based on big data |
US20190102680A1 (en) * | 2017-09-30 | 2019-04-04 | Nec Corporation | Method, device and system for estimating causality among observed variables |
CN109636591A (en) * | 2018-12-28 | 2019-04-16 | 浙江工业大学 | A kind of credit scoring card development approach based on machine learning |
-
2019
- 2019-04-30 CN CN201910362666.4A patent/CN110084376B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050079508A1 (en) * | 2003-10-10 | 2005-04-14 | Judy Dering | Constraints-based analysis of gene expression data |
US20130117165A1 (en) * | 2009-12-11 | 2013-05-09 | International Business Machines Corporation | Merchandise hierarchy refinement by incorporation of product correlation |
CN104537067A (en) * | 2014-12-30 | 2015-04-22 | 广东电网有限责任公司信息中心 | Box separation method based on k-means clustering |
CN106547758A (en) * | 2015-09-17 | 2017-03-29 | 阿里巴巴集团控股有限公司 | A kind of method and apparatus of data branch mailbox |
CN107169511A (en) * | 2017-04-27 | 2017-09-15 | 华南理工大学 | Clustering ensemble method based on mixing clustering ensemble selection strategy |
US20190102680A1 (en) * | 2017-09-30 | 2019-04-04 | Nec Corporation | Method, device and system for estimating causality among observed variables |
CN108399255A (en) * | 2018-03-06 | 2018-08-14 | 中国银行股份有限公司 | A kind of input data processing method and device of Classification Data Mining model |
CN108984790A (en) * | 2018-07-31 | 2018-12-11 | 蜜小蜂智慧(北京)科技有限公司 | A kind of data branch mailbox method and device |
CN109063222A (en) * | 2018-11-04 | 2018-12-21 | 吉铁磊 | A kind of self-adapting data searching method based on big data |
CN109636591A (en) * | 2018-12-28 | 2019-04-16 | 浙江工业大学 | A kind of credit scoring card development approach based on machine learning |
Non-Patent Citations (4)
Title |
---|
IVAN OLIVEIRA ET AL: "SAS/OR:Rigorous constrained optimized binning for credit scoring", 《DATA MINING AND PREDICTIVE MODELING》 * |
ZEQIANG ZHANG ET AL: "Improved Ant Colony Optimization for One-Dimensional Bin Packing Problem with Precedence Constraints", 《 THIRD INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION (ICNC 2007)》 * |
傅涛等: "基于分箱统计的FCM算法及其在网络入侵检测中的应用", 《计算机科学》 * |
王洁松: "基于特征匹配与分箱技术的分布式网络入侵协同检测***研究及实现", 《中国硕士学位论文全文数据库信息科技辑》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110909085A (en) * | 2019-11-25 | 2020-03-24 | 深圳前海微众银行股份有限公司 | Data processing method, device, equipment and storage medium |
CN112819034A (en) * | 2021-01-12 | 2021-05-18 | 平安科技(深圳)有限公司 | Data binning threshold calculation method and device, computer equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN110084376B (en) | 2021-05-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104798043B (en) | A kind of data processing method and computer system | |
CN109214449A (en) | A kind of electric grid investment needing forecasting method | |
CN104346698B (en) | Based on the analysis of the food and drink member big data of cloud computing and data mining and checking system | |
CN106022473A (en) | Construction method for gene regulatory network by combining particle swarm optimization (PSO) with genetic algorithm | |
CN110084376A (en) | To the method and device of the automatic branch mailbox of data | |
CN110489556A (en) | Quality evaluating method, device, server and storage medium about follow-up record | |
CN109543693A (en) | Weak labeling data noise reduction method based on regularization label propagation | |
CN110335075A (en) | Intelligent marketing system and its working method suitable for the consumer finance | |
CN115965154A (en) | Knowledge graph-based digital twin machining process scheduling method | |
CN110222129A (en) | A kind of credit appraisal algorithm based on relevant database | |
CN113656707A (en) | Financing product recommendation method, system, storage medium and equipment | |
CN107491841A (en) | Nonlinear optimization method and storage medium | |
CN107316081A (en) | A kind of uncertain data sorting technique based on extreme learning machine | |
CN111967973A (en) | Bank client data processing method and device | |
CN111984842A (en) | Bank client data processing method and device | |
CN116611911A (en) | Credit risk prediction method and device based on support vector machine | |
CN110866694A (en) | Power grid construction project financial evaluation system and method | |
CN113779933A (en) | Commodity encoding method, electronic device and computer-readable storage medium | |
Sun et al. | Asynchronous parallel surrogate optimization algorithm based on ensemble surrogating model and stochastic response surface method | |
CN112199518A (en) | Knowledge graph recommendation-driven production technology route map configuration method in production technology | |
Diao et al. | Optimization of Management Mode of Small‐and Medium‐Sized Enterprises Based on Decision Tree Model | |
Thanassoulis | SELECTING A SUITABLE SOLUTION METHOD FOR A MULTI OBJECTIVE PROGRAMMING CAPITAL BUDGETING PROBLEM. | |
CN105989434A (en) | Transaction information management method and system | |
Lin | Method of Enterprise Information Software System (EISS) Monitoring Based on Grey Analysis and Data Clustering | |
CN115713099B (en) | Model design method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |