CN110084376A - To the method and device of the automatic branch mailbox of data - Google Patents

To the method and device of the automatic branch mailbox of data Download PDF

Info

Publication number
CN110084376A
CN110084376A CN201910362666.4A CN201910362666A CN110084376A CN 110084376 A CN110084376 A CN 110084376A CN 201910362666 A CN201910362666 A CN 201910362666A CN 110084376 A CN110084376 A CN 110084376A
Authority
CN
China
Prior art keywords
branch mailbox
initial vector
condition
objective function
brings
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910362666.4A
Other languages
Chinese (zh)
Other versions
CN110084376B (en
Inventor
李骥东
何智福
蓝科
覃进学
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Sefon Software Co Ltd
Original Assignee
Chengdu Sefon Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Sefon Software Co Ltd filed Critical Chengdu Sefon Software Co Ltd
Priority to CN201910362666.4A priority Critical patent/CN110084376B/en
Publication of CN110084376A publication Critical patent/CN110084376A/en
Application granted granted Critical
Publication of CN110084376B publication Critical patent/CN110084376B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/11Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Software Systems (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Algebra (AREA)
  • Computing Systems (AREA)
  • Operations Research (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Complex Calculations (AREA)

Abstract

The present invention relates to technical field of data processing, more particularly to the method and device of a kind of pair of automatic branch mailbox of data, this method specifically includes: obtaining the essential characteristic data and branch mailbox condition of user's input, it brings branch mailbox condition into pre-defined function and obtains objective function, initial vector is determined according to branch mailbox condition, brings initial vector into the objective function determining direction of search to essential characteristic data.And then it puts to be adjusted initial vector according to the direction of search and bring objective function on the basis of initial vector and obtains corresponding functional value, when the difference of the latter functional value and current function value is less than default convergence precision, then determine the corresponding initial vector of the latter functional value as cut-point, the last essential characteristic data progress branch mailbox inputted according to the multiple cut-points determined to user.Quick branch mailbox can be realized by this programme, so that the correlation degree between each branch mailbox is minimum, and then carry out objective scoring operation convenient for the data inputted to user.

Description

To the method and device of the automatic branch mailbox of data
Technical field
The present invention relates to technical field of data processing, in particular to the method and dress of a kind of pair of automatic branch mailbox of data It sets.
Background technique
With big data, the development of artificial intelligence technology and universal, more and more financial institutions are increased to engineering The attention degree of habit, the intelligence that the management method by tradition based on manual decision is gradually changed into based on data-driven are determined Plan.Especially in bank individual financial business, such as credit card business, consumer finance business field, because its single amount of money is small, Demand frequency is high, timeliness requires the reasons such as height, causes to be unable to satisfy business demand using the mode of traditional artificial examination & approval.Use machine Device learning method carries out risk management, is based particularly on the scorecard model of logistic regression, because it is easy to explain, quickly repeatedly Generation, mature and stable feature, just gradually adopted by vast bank.During scorecard, branch mailbox is particularly important one Link, branch mailbox can be improved model stability, improve calculated performance, but how realize automatic branch mailbox, how by branch mailbox process Optimize a problem in always machine learning modeling.
The main method of branch mailbox includes: equal frequency divisions case, wide branch mailbox, automatic branch mailbox etc., wherein wait frequency divisions case mainly by Data accounting carries out branch mailbox, and such as every 10% data as a case, mainly press feature maximin and divide equally progress by wide branch mailbox Branch mailbox, such as the oldest minimum span are 50, are used as a case within every 10 years old, are divided into 5 casees, and disadvantage is to weaken feature value difference Influence to response variable.
Now widely used in automatic branch mailbox method includes automation branch mailbox, card side branch mailbox (Chi- based on decision tree Merge), wherein the automation branch mailbox core concept based on decision tree is based on entropy and information gain, determining keeps segmentation front and back special It levies the maximum point of information gain and realizes automatic branch mailbox by constantly dividing to child node.Card side's branch mailbox core concept is base Classification is gradually merged in feature chi-square value value, iteration reaches termination condition.
The above automatic branch mailbox method of two classes is too sensitive to stopping criterion for iteration, such as tree depth, minimum tankage, is easy Overfitting problem is caused, meanwhile, the automatic branch mailbox method of two classes is limited to constraint condition tenability, and (such as certain class data is necessary for one Case specifies chest section etc.), the branch mailbox issue requirement in practical modeling process can not be fully met.
Summary of the invention
The purpose of the present invention is to provide the method for a kind of pair of automatic branch mailbox of data, with realize fast and effeciently by data into Row branch mailbox achievees the effect that automatic branch mailbox so that the degree of association between adjacent two casees is minimum with this.
To achieve the goals above, technical solution used in the embodiment of the present invention is as follows:
In a first aspect, the embodiment of the invention provides the methods of a kind of pair of automatic branch mailbox of data, which comprises obtain The essential characteristic data and branch mailbox condition of user's input;It brings the branch mailbox condition into pre-defined function and obtains objective function; Initial vector is determined according to the branch mailbox condition, brings the initial vector into the objective function, is determined to described substantially special Levy the direction of search of data;Point is adjusted the initial vector according to described search direction on the basis of the initial vector And it brings the objective function into and obtains corresponding functional value;When the difference of the latter functional value and current function value is less than default receive Hold back precision, it is determined that the corresponding initial vector adjusted of the latter functional value is as cut-point;It is multiple described according to what is determined Cut-point carries out branch mailbox to the essential characteristic data that user inputs.
Second aspect, the embodiment of the invention also provides the device of a kind of pair of automatic branch mailbox of data, described device includes: to receive Module is sent out, for obtaining the essential characteristic data and branch mailbox condition of user's input;Processing module is used for the branch mailbox condition It brings pre-defined function into and obtains objective function;Initializaing variable is determined according to the branch mailbox condition, brings the initializaing variable into institute Objective function is stated, determines the direction of search to the essential characteristic data;Point is searched according to described on the basis of the initial vector Suo Fangxiang, which is adjusted the initial vector and brings the objective function into, obtains corresponding functional value;When the latter functional value It is less than default convergence precision with the difference of current function value, it is determined that the corresponding initial vector adjusted of the latter functional value is made For cut-point;Branch mailbox is carried out to the essential characteristic data that user inputs according to the multiple cut-points determined.
The method and device of the automatic branch mailbox of a kind of pair of data provided in an embodiment of the present invention, this method specifically include: obtaining The essential characteristic data and branch mailbox condition of user's input, bring branch mailbox condition into pre-defined function and obtain objective function, according to Branch mailbox condition determines initial vector, brings initial vector into the objective function determining direction of search to essential characteristic data.In turn Point, which is adjusted initial vector according to the direction of search and brings objective function into, on the basis of initial vector obtains corresponding function Value, when the difference of the latter functional value and current function value is less than default convergence precision, it is determined that the latter functional value is corresponding Initial vector is last to carry out branch mailbox to the essential characteristic data that user inputs according to the multiple cut-points determined as cut-point. Quick branch mailbox can be realized by this programme, so that the correlation degree between each branch mailbox is minimum, and then convenient for inputting to user Data carry out objective scoring operation.
To enable the above objects, features and advantages of the present invention to be clearer and more comprehensible, preferred embodiment is cited below particularly, and cooperate Appended attached drawing, is described in detail below.
Detailed description of the invention
In order to illustrate the technical solution of the embodiments of the present invention more clearly, below will be to needed in the embodiment attached Figure is briefly described, it should be understood that the following drawings illustrates only certain embodiments of the present invention, therefore is not construed as pair The restriction of range for those of ordinary skill in the art without creative efforts, can also be according to this A little attached drawings obtain other relevant attached drawings.
Fig. 1 shows the flow diagram of the method for the automatic branch mailbox of a kind of pair of data provided in an embodiment of the present invention.
Fig. 2 shows the functional block diagrams of the device of the automatic branch mailbox of a kind of pair of data provided in an embodiment of the present invention.
Diagram: device of the 200- to the automatic branch mailbox of data;210- transceiver module;220- processing module.
Specific embodiment
Below in conjunction with attached drawing in the embodiment of the present invention, technical solution in the embodiment of the present invention carries out clear, complete Ground description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.Usually exist The component of the embodiment of the present invention described and illustrated in attached drawing can be arranged and be designed with a variety of different configurations herein.Cause This, is not intended to limit claimed invention to the detailed description of the embodiment of the present invention provided in the accompanying drawings below Range, but it is merely representative of selected embodiment of the invention.Based on the embodiment of the present invention, those skilled in the art are not doing Every other embodiment obtained under the premise of creative work out, shall fall within the protection scope of the present invention.
It should also be noted that similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi It is defined in a attached drawing, does not then need that it is further defined and explained in subsequent attached drawing.Meanwhile of the invention In description, term " first ", " second " etc. are only used for distinguishing description, are not understood to indicate or imply relative importance.
In bank individual financial business, such as credit card business, consumer finance business field, because its single amount of money is small, Demand frequency is high, can bring biggish workload by manual examination and verification.Bank or financial institution mostly pass through scoring card mold at present Type scores to every master data that user inputs, to be decided whether by appraisal result as user's processes financial business, What this method can quickly improve bank individual financial business handles efficiency.And carrying out branch mailbox operation to data is in scorecard model An important link, being equivalent to branch mailbox is that the data of user's input are divided into multiple groups, and scorecard model is again according to certain Logic give a mark respectively to the data of each group, finally obtain appraisal result.As it can be seen that data are divided into association journey by branch mailbox Alap group is spent, is conducive to subsequent scorecard model and scores data, and then the appraisal result finally obtained is more It is accurate.Present solution provides the methods of a kind of pair of automatic branch mailbox of data, can be realized by this programme and be divided automatically data Case reaches preferable branch mailbox effect so that the degree of association between adjacent two casees is minimum with this.
Fig. 1 is please referred to, is the flow diagram of the method for the automatic branch mailbox of a kind of pair of data provided in an embodiment of the present invention, it should Method includes:
S110 obtains the essential characteristic data and branch mailbox condition of user's input.
Specifically, the essential characteristic data of user input include the essential information of user, such as age, height, weight, receipts Enter.The branch mailbox condition includes the ratio data in branch mailbox number and every case, if branch mailbox number is 5, the ratio data in every case It is 10%, i.e., the essential characteristic data that user inputs is divided into 5 casees, and the data for including in every case are no less than total data 10%.
S120 brings branch mailbox condition into pre-defined function and obtains objective function.
Specifically, the branch mailbox condition includes the ratio data in branch mailbox number and every case, and then will be in branch mailbox number and every case Ratio data bring pre-defined function into and obtain objective function, the expression way of the objective function are as follows:
Wherein, whereinIt indicates to minimize correlation degree, s.t. indicates constraint condition, Ci(x)-m indicates to divide Case number constraint condition, m indicate branch mailbox number;Ci(x)-p indicates every case minimum scale, wherein Ci(x) constraint condition of x is indicated Function.
In order to solve above procedure, need nonlinear optimization process simplification to be quadratic programming problem, and then need to be first right Objective function solves Lagrangian, then carries out second approximation to Lagrangian function and solve to obtain quadratic programming to ask Topic.
First step solves the mode of Lagrangian to objective function are as follows:
L (x)=f (x)+λ G (x)+μ S (x)
Wherein, L (x) indicates that Lagrangian, G (x) are branch mailbox number constraint condition G (x)=Ci(x)-m, S (x) are Every case ratio S (x)=Ci(x)-p, λ are Lagrange factor, and u is branch mailbox scale factor.
Second step carries out second approximation solution to Lagrangian, can find out the optimal of former nonlinear optimization Solution, i.e. quadratic programming problem, calculation are as follows:
Wherein,Hk indicates the Hessian matrix of kth time iteration (Hessian matrix) the i.e. second dervative of objective function, xkIndicate a certain specific value of x, d indicates the variable direction of search.
S130 determines initial vector according to branch mailbox condition, brings initial vector into objective function, determines to essential characteristic number According to the direction of search.
Specifically, including branch mailbox number in the branch mailbox condition, the branch mailbox number inputted such as user is 5 casees, then initial vector xk It can be defined as x1 to x4, i.e. the essential characteristic data by user's input are cut 4 times, and 5 groups of data are obtained.In turn by the determination Initial vector brings the above-mentioned objective function for being converted into quadratic programming problem into, to determine the searcher to essential characteristic data To.Specific method of determination are as follows:
Firstly, carrying out first derivation to quadratic programming problem obtains gradient vector.
Its calculation are as follows:
Wherein, gkCharacterize gradient vector.
Secondly, carrying out second order derivation to quadratic programming problem obtains Hessian matrix.
Since Hessian matrix calculating process is needed to original function in different xkDerivation is carried out, in order to reduce calculation amount, when right The branch mailbox number of essential characteristic data is less than predetermined threshold (such as 100), will use the approximate optimal solution of Newton Algorithm Hessian matrix, When the branch mailbox number to essential characteristic data is greater than predetermined threshold (such as 100), then using the approximation of BFGS algorithm solution Hessian matrix Optimal solution.And then using the approximate optimal solution of Hessian matrix as the calculated result for carrying out second order derivation to quadratic programming problem.
Wherein, by the way of the approximate optimal solution of Newton Algorithm Hessian matrix are as follows:
By the way of the approximate optimal solution that BFGS algorithm solves Hessian matrix are as follows:
Enable yk=gk+1-gk,sk=xk+1-xk
The Hessian matrix of iterative process can be used Bk and carry out approximate, i.e. H ≈ B:
Bk+1=Bk+△Bk
Wherein, Bk be unit matrix, i.e., diagonal line be 1 matrix, △ BkIndicate Bk differential;
Finally gradient vector and Hessian matrix are carried out according to pre-defined rule direction vector, direction vector table is calculated Levy the direction of search to essential characteristic data.
The calculation are as follows:
Wherein, HkCharacterize Hessian matrix, gkCharacterize gradient vector, dkDirection vector is characterized, direction vector is to basic The direction of search of characteristic.
S140, point, which is adjusted initial vector according to the direction of search and brings objective function into, on the basis of initial vector obtains To corresponding functional value.
Specifically, user will also input iteration step length and the number of iterations, which uses αkIt indicates, settable 1 To 1000 step-lengths, defaulting step-length is 1;The number of iterations is indicated using k, may be configured as any the number of iterations greater than 1, default value It is 10.And then point is adjusted initial vector according to the direction of search on the basis of initial vector, such as initial vector xkExtremely for x1 X4 then adds step-length to each of initial vector value in its direction of search, then by the initial vector band adjusted Enter and obtains corresponding functional value in objective function.When the difference of the functional value being calculated functional value corresponding with initial vector accords with Conjunction condition, or when reaching the number of iterations, operation stops.
S150, when the difference of the latter functional value and current function value is less than default convergence precision, it is determined that the latter letter The corresponding initial vector adjusted of numerical value is as cut-point.
Specifically, bringing initial vector adjusted into objective function obtains functional value, the functional value is referred to as latter A functional value brings initial vector into objective function and obtains functional value, which is referred to as current function value, if the latter The difference of functional value and current function value is less than default convergence precision, shows current group, the degree of association between group is minimum, then The corresponding initial vector adjusted of the latter functional value is as cut-point.If at this point, the latter functional value and current function value Difference be greater than default convergence precision, then reassign initial vector, i.e., with αk+xkAs new initial vector, (i.e. by before Initial vector plus a step-length as new initial vector), and it is true that the initial vector reassigned repeated above-mentioned algorithm Determine the direction of search, and compare again and bring the functional value that objective function is calculated into, to redefine cut-point.
S160 carries out branch mailbox to the essential characteristic data that user inputs according to the multiple cut-points determined.
Specifically, each cut-point corresponds to the position being split to essential characteristic data, and then can be according to determining Multiple cut-points carry out branch mailbox to the essential characteristic data that user inputs, to obtain meeting the branch mailbox number and branch mailbox of user's input The multi-group data of ratio.The degree of association is lower between the finally obtained multi-group data, is based on the packet count convenient for scorecard model According to scoring operation is carried out, computational accuracy is improved.
It can be seen that the method for the automatic branch mailbox of a kind of pair of data provided by the invention, user only need to input essential characteristic Master datas and the restrictive conditions such as data, branch mailbox condition, iteration step length and the number of iterations can be calculated by the algorithm of setting Optimal cut-point is scored convenient for following model based on the data of branch mailbox out with completing branch mailbox processing to essential characteristic data Operation.Program beneficial effect mainly includes two aspects that
1. compensate for tradition etc. frequency, wide method do not consider influence of the variable-value to response variable, using tradition etc. frequency, It when wide method carries out branch mailbox, ignores in characteristic interval difference, such as age and overdue relationship, span is 20 to 50 years old, is adopted With an every 5 years old case of wide method, but often in youth, overdue rate is higher in actual conditions.
The problem of traditional automatic branch mailbox is sensitive to parameter preset 2. compensating for, and causes over-fitting, using SQP method, user Only need to be arranged step-length and the number of iterations, optimize IV process and be automatically performed by algorithm, reduce to modeling personnel's experience according to Rely.
It referring to figure 2., is that the functional module of device 200 of the automatic branch mailbox of a kind of pair of data provided in an embodiment of the present invention is shown It is intended to, which includes transceiver module 210 and processing module 220.
Transceiver module 210, for obtaining the essential characteristic data and branch mailbox condition of user's input.
In embodiments of the present invention, S110 can be executed by transceiver module 210.
Processing module 220 obtains objective function for bringing branch mailbox condition into pre-defined function;It is determined according to branch mailbox condition Initializaing variable brings initializaing variable into objective function, determines the direction of search to essential characteristic data;On the basis of initial vector Point, which is adjusted initial vector according to the direction of search and brings objective function into, obtains corresponding functional value;When the latter functional value It is less than default convergence precision with the difference of current function value, it is determined that the corresponding initial vector adjusted of the latter functional value is made For cut-point;Branch mailbox is carried out to the essential characteristic data that user inputs according to the multiple cut-points determined.
In embodiments of the present invention, S120~S160 can be executed by processing module 220.
Due to having been described in the method part to the automatic branch mailbox of data, details are not described herein.
In conclusion the method and device of the automatic branch mailbox of a kind of pair of data provided in an embodiment of the present invention, this method are specific Include: the essential characteristic data and branch mailbox condition for obtaining user's input, brings branch mailbox condition into pre-defined function and obtain target Function determines initial vector according to branch mailbox condition, brings initial vector into the objective function determining search to essential characteristic data Direction.And then point is adjusted initial vector according to the direction of search and brings objective function into and obtains pair on the basis of initial vector The functional value answered, when the difference of the latter functional value and current function value is less than default convergence precision, it is determined that the latter function Be worth corresponding initial vector as cut-point, the last essential characteristic data that user is inputted according to the multiple cut-points determined into Row branch mailbox.Quick branch mailbox can be realized by this programme, so that the correlation degree between each branch mailbox is minimum, and then convenient for user The data of input carry out objective scoring operation.
In several embodiments provided herein, it should be understood that disclosed device and method can also pass through Other modes are realized.The apparatus embodiments described above are merely exemplary, for example, flow chart and block diagram in attached drawing Show the device of multiple embodiments according to the present invention, the architectural framework in the cards of method and computer program product, Function and operation.In this regard, each box in flowchart or block diagram can represent the one of a module, section or code Part, a part of the module, section or code, which includes that one or more is for implementing the specified logical function, to be held Row instruction.It should also be noted that function marked in the box can also be to be different from some implementations as replacement The sequence marked in attached drawing occurs.For example, two continuous boxes can actually be basically executed in parallel, they are sometimes It can execute in the opposite order, this depends on the function involved.It is also noted that every in block diagram and or flow chart The combination of box in a box and block diagram and or flow chart can use the dedicated base for executing defined function or movement It realizes, or can realize using a combination of dedicated hardware and computer instructions in the system of hardware.
In addition, each functional module in each embodiment of the present invention can integrate one independent portion of formation together Point, it is also possible to modules individualism, an independent part can also be integrated to form with two or more modules.
It, can be with if the function is realized and when sold or used as an independent product in the form of software function module It is stored in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially in other words The part of the part that contributes to existing technology or the technical solution can be embodied in the form of software products, the meter Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be a People's computer, server or network equipment etc.) it performs all or part of the steps of the method described in the various embodiments of the present invention. It should be noted that, in this document, relational terms such as first and second and the like are used merely to an entity or behaviour Make with another entity or operate distinguish, without necessarily requiring or implying between these entities or operation there are it is any this The actual relationship of kind or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to nonexcludability Include so that include a series of elements process, method, article or equipment not only include those elements, but also Including other elements that are not explicitly listed, or further include for this process, method, article or equipment intrinsic want Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including described want There is also other identical elements in the process, method, article or equipment of element.
The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, for the skill of this field For art personnel, the invention may be variously modified and varied.All within the spirits and principles of the present invention, made any to repair Change, equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.It should also be noted that similar label and letter exist Similar terms are indicated in following attached drawing, therefore, once being defined in a certain Xiang Yi attached drawing, are then not required in subsequent attached drawing It is further defined and explained.
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any Those familiar with the art in the technical scope disclosed by the present invention, can easily think of the change or the replacement, and should all contain Lid is within protection scope of the present invention.Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. the method for a kind of pair of automatic branch mailbox of data, which is characterized in that the described method includes:
Obtain the essential characteristic data and branch mailbox condition of user's input;
It brings the branch mailbox condition into pre-defined function and obtains objective function;
Initial vector is determined according to the branch mailbox condition, is brought the initial vector into the objective function, is determined to the base The direction of search of eigen data;
Point is adjusted the initial vector according to described search direction and brings the mesh on the basis of the initial vector Scalar functions obtain corresponding functional value;
When the difference of the latter functional value and current function value is less than default convergence precision, it is determined that the latter functional value pair The initial vector adjusted answered is as cut-point;
Branch mailbox is carried out to the essential characteristic data that user inputs according to the multiple cut-points determined.
2. the method as described in claim 1, which is characterized in that described to bring the branch mailbox condition into pre-defined function and obtain mesh After scalar functions comprising steps of
Lagrangian is solved to the objective function;
Second approximation is carried out to the Lagrangian to solve to obtain quadratic programming problem.
3. method according to claim 2, which is characterized in that it is described that initializaing variable is determined according to the branch mailbox condition, by institute Stating the step of initializaing variable brings the objective function into, determines the direction of search to the essential characteristic data includes:
Initializaing variable is determined according to the branch mailbox number for including in the branch mailbox condition, and brings the initializaing variable into the secondary rule The problem of drawing;
First derivation is carried out to the quadratic programming problem and obtains gradient vector;
Second order derivation is carried out to the quadratic programming problem and obtains Hessian matrix;
The gradient vector and Hessian matrix are carried out according to pre-defined rule direction vector, the direction vector characterization is calculated To the direction of search of the essential characteristic data.
4. method as claimed in claim 3, which is characterized in that described to be obtained to quadratic programming problem progress second order derivation The step of Hessian matrix includes:
When the branch mailbox number be less than predetermined threshold, the approximate optimal solution of the Hessian matrix is solved using Newton's algorithm;
When the branch mailbox number be greater than predetermined threshold, the approximate optimal solution of the Hessian matrix is solved using BFGS algorithm.
5. the method as described in claim 1, which is characterized in that the point on the basis of the initial vector is according to described search Direction is adjusted the initial vector and brings the step of objective function obtains corresponding functional value into
Obtain the iteration step length and the number of iterations of user's input;
The initial vector is adjusted according to the iteration step length and brings the objective function into and obtains corresponding functional value, And after the number of iterations reaches, stop operation.
6. the device of a kind of pair of automatic branch mailbox of data, which is characterized in that described device includes:
Transceiver module, for obtaining the essential characteristic data and branch mailbox condition of user's input;
Processing module obtains objective function for bringing the branch mailbox condition into pre-defined function;It is true according to the branch mailbox condition Determine initializaing variable, brings the initializaing variable into the objective function, determine the direction of search to the essential characteristic data;With Point is adjusted the initial vector according to described search direction and brings the objective function on the basis of the initial vector Obtain corresponding functional value;When the difference of the latter functional value and current function value is less than default convergence precision, it is determined that described The corresponding initial vector adjusted of functional value described in the latter is as cut-point;According to determine multiple cut-points to The essential characteristic data of family input carry out branch mailbox.
7. device as claimed in claim 6, which is characterized in that the processing module is also used to: being solved to the objective function Lagrangian;
Second approximation is carried out to the Lagrangian to solve to obtain quadratic programming problem.
8. device as claimed in claim 7, which is characterized in that the processing module is specifically used for: according to the branch mailbox condition In include branch mailbox number determine initializaing variable, and bring the initializaing variable into the quadratic programming problem;
First derivation is carried out to the quadratic programming problem and obtains gradient vector;
Second order derivation is carried out to the quadratic programming problem and obtains Hessian matrix;
The gradient vector and Hessian matrix are carried out according to pre-defined rule direction vector, the direction vector characterization is calculated To the direction of search of the essential characteristic data.
9. device as claimed in claim 8, which is characterized in that the processing module is specifically used for: when the branch mailbox number is less than Predetermined threshold solves the approximate optimal solution of the Hessian matrix using Newton's algorithm;
When the branch mailbox number be greater than predetermined threshold, the approximate optimal solution of the Hessian matrix is solved using BFGS algorithm.
10. device as claimed in claim 6, which is characterized in that the processing module is specifically used for: obtaining changing for user's input It rides instead of walk long and the number of iterations;
The initial vector is adjusted according to the iteration step length and brings the objective function into and obtains corresponding functional value, And after the number of iterations reaches, stop operation.
CN201910362666.4A 2019-04-30 2019-04-30 Method and device for automatically separating data into boxes Active CN110084376B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910362666.4A CN110084376B (en) 2019-04-30 2019-04-30 Method and device for automatically separating data into boxes

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910362666.4A CN110084376B (en) 2019-04-30 2019-04-30 Method and device for automatically separating data into boxes

Publications (2)

Publication Number Publication Date
CN110084376A true CN110084376A (en) 2019-08-02
CN110084376B CN110084376B (en) 2021-05-14

Family

ID=67418143

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910362666.4A Active CN110084376B (en) 2019-04-30 2019-04-30 Method and device for automatically separating data into boxes

Country Status (1)

Country Link
CN (1) CN110084376B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110909085A (en) * 2019-11-25 2020-03-24 深圳前海微众银行股份有限公司 Data processing method, device, equipment and storage medium
CN112819034A (en) * 2021-01-12 2021-05-18 平安科技(深圳)有限公司 Data binning threshold calculation method and device, computer equipment and storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050079508A1 (en) * 2003-10-10 2005-04-14 Judy Dering Constraints-based analysis of gene expression data
US20130117165A1 (en) * 2009-12-11 2013-05-09 International Business Machines Corporation Merchandise hierarchy refinement by incorporation of product correlation
CN104537067A (en) * 2014-12-30 2015-04-22 广东电网有限责任公司信息中心 Box separation method based on k-means clustering
CN106547758A (en) * 2015-09-17 2017-03-29 阿里巴巴集团控股有限公司 A kind of method and apparatus of data branch mailbox
CN107169511A (en) * 2017-04-27 2017-09-15 华南理工大学 Clustering ensemble method based on mixing clustering ensemble selection strategy
CN108399255A (en) * 2018-03-06 2018-08-14 中国银行股份有限公司 A kind of input data processing method and device of Classification Data Mining model
CN108984790A (en) * 2018-07-31 2018-12-11 蜜小蜂智慧(北京)科技有限公司 A kind of data branch mailbox method and device
CN109063222A (en) * 2018-11-04 2018-12-21 吉铁磊 A kind of self-adapting data searching method based on big data
US20190102680A1 (en) * 2017-09-30 2019-04-04 Nec Corporation Method, device and system for estimating causality among observed variables
CN109636591A (en) * 2018-12-28 2019-04-16 浙江工业大学 A kind of credit scoring card development approach based on machine learning

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050079508A1 (en) * 2003-10-10 2005-04-14 Judy Dering Constraints-based analysis of gene expression data
US20130117165A1 (en) * 2009-12-11 2013-05-09 International Business Machines Corporation Merchandise hierarchy refinement by incorporation of product correlation
CN104537067A (en) * 2014-12-30 2015-04-22 广东电网有限责任公司信息中心 Box separation method based on k-means clustering
CN106547758A (en) * 2015-09-17 2017-03-29 阿里巴巴集团控股有限公司 A kind of method and apparatus of data branch mailbox
CN107169511A (en) * 2017-04-27 2017-09-15 华南理工大学 Clustering ensemble method based on mixing clustering ensemble selection strategy
US20190102680A1 (en) * 2017-09-30 2019-04-04 Nec Corporation Method, device and system for estimating causality among observed variables
CN108399255A (en) * 2018-03-06 2018-08-14 中国银行股份有限公司 A kind of input data processing method and device of Classification Data Mining model
CN108984790A (en) * 2018-07-31 2018-12-11 蜜小蜂智慧(北京)科技有限公司 A kind of data branch mailbox method and device
CN109063222A (en) * 2018-11-04 2018-12-21 吉铁磊 A kind of self-adapting data searching method based on big data
CN109636591A (en) * 2018-12-28 2019-04-16 浙江工业大学 A kind of credit scoring card development approach based on machine learning

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
IVAN OLIVEIRA ET AL: "SAS/OR:Rigorous constrained optimized binning for credit scoring", 《DATA MINING AND PREDICTIVE MODELING》 *
ZEQIANG ZHANG ET AL: "Improved Ant Colony Optimization for One-Dimensional Bin Packing Problem with Precedence Constraints", 《 THIRD INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION (ICNC 2007)》 *
傅涛等: "基于分箱统计的FCM算法及其在网络入侵检测中的应用", 《计算机科学》 *
王洁松: "基于特征匹配与分箱技术的分布式网络入侵协同检测***研究及实现", 《中国硕士学位论文全文数据库信息科技辑》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110909085A (en) * 2019-11-25 2020-03-24 深圳前海微众银行股份有限公司 Data processing method, device, equipment and storage medium
CN112819034A (en) * 2021-01-12 2021-05-18 平安科技(深圳)有限公司 Data binning threshold calculation method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN110084376B (en) 2021-05-14

Similar Documents

Publication Publication Date Title
CN104798043B (en) A kind of data processing method and computer system
CN109214449A (en) A kind of electric grid investment needing forecasting method
CN104346698B (en) Based on the analysis of the food and drink member big data of cloud computing and data mining and checking system
CN106022473A (en) Construction method for gene regulatory network by combining particle swarm optimization (PSO) with genetic algorithm
CN110084376A (en) To the method and device of the automatic branch mailbox of data
CN110489556A (en) Quality evaluating method, device, server and storage medium about follow-up record
CN109543693A (en) Weak labeling data noise reduction method based on regularization label propagation
CN110335075A (en) Intelligent marketing system and its working method suitable for the consumer finance
CN115965154A (en) Knowledge graph-based digital twin machining process scheduling method
CN110222129A (en) A kind of credit appraisal algorithm based on relevant database
CN113656707A (en) Financing product recommendation method, system, storage medium and equipment
CN107491841A (en) Nonlinear optimization method and storage medium
CN107316081A (en) A kind of uncertain data sorting technique based on extreme learning machine
CN111967973A (en) Bank client data processing method and device
CN111984842A (en) Bank client data processing method and device
CN116611911A (en) Credit risk prediction method and device based on support vector machine
CN110866694A (en) Power grid construction project financial evaluation system and method
CN113779933A (en) Commodity encoding method, electronic device and computer-readable storage medium
Sun et al. Asynchronous parallel surrogate optimization algorithm based on ensemble surrogating model and stochastic response surface method
CN112199518A (en) Knowledge graph recommendation-driven production technology route map configuration method in production technology
Diao et al. Optimization of Management Mode of Small‐and Medium‐Sized Enterprises Based on Decision Tree Model
Thanassoulis SELECTING A SUITABLE SOLUTION METHOD FOR A MULTI OBJECTIVE PROGRAMMING CAPITAL BUDGETING PROBLEM.
CN105989434A (en) Transaction information management method and system
Lin Method of Enterprise Information Software System (EISS) Monitoring Based on Grey Analysis and Data Clustering
CN115713099B (en) Model design method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant