CN107679547A - A kind of data processing method for being directed to two disaggregated models, device and electronic equipment - Google Patents

A kind of data processing method for being directed to two disaggregated models, device and electronic equipment Download PDF

Info

Publication number
CN107679547A
CN107679547A CN201710733129.7A CN201710733129A CN107679547A CN 107679547 A CN107679547 A CN 107679547A CN 201710733129 A CN201710733129 A CN 201710733129A CN 107679547 A CN107679547 A CN 107679547A
Authority
CN
China
Prior art keywords
roc curve
sample
adjusted
disaggregated model
axis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710733129.7A
Other languages
Chinese (zh)
Inventor
宋博文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201710733129.7A priority Critical patent/CN107679547A/en
Publication of CN107679547A publication Critical patent/CN107679547A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

This specification embodiment discloses a kind of data processing method for being directed to two disaggregated models, device and electronic equipment.Methods described includes:By the density function of the specified index based on sample corresponding to two disaggregated models, ROC curve corresponding to two disaggregated model is adjusted, and determines A weighting UC again after the adjustment, for evaluating two disaggregated model.

Description

A kind of data processing method for being directed to two disaggregated models, device and electronic equipment
Technical field
This specification is related to computer software technical field, more particularly to a kind of data processing side for being directed to two disaggregated models Method, device and electronic equipment.
Background technology
Two disaggregated models are a kind of conventional models in Intelligent Recognition field.Receiver Operating Characteristics (Receiver Operating Characteristic, ROC) curve often as two disaggregated models evaluation criterion.
In the prior art, for two disaggregated models, ROC curve corresponding to two disaggregated model, Ran Houji can be drawn Area under the ROC curve line of the ROC curve (Area Under ROC Curve, AUC) is calculated, usually, the AUC is bigger, can be with Think that two disaggregated model is more excellent.
Based on prior art, it is desirable to be able to the scheme more accurately evaluated two disaggregated models.
The content of the invention
This specification embodiment provides a kind of data processing method for being directed to two disaggregated models, device and electronic equipment, To solve following technical problem:It is required to the scheme more accurately evaluated two disaggregated models.
In order to solve the above technical problems, what this specification embodiment was realized in:
A kind of data processing method for being directed to two disaggregated models that this specification embodiment provides, including:
Obtain two disaggregated models and include the training data of multiple samples, two disaggregated model is used to calculate the sample Corresponding score, using judge positive negative sample and by result of determination as whether for the sample perform specify event foundation;
Estimate the density function of the specified index of the sample, as ROC curve Dynamic gene, the specified index reflection The implementation effect of the specified event;
According to two disaggregated model, the training data and the ROC curve Dynamic gene, two classification is obtained Adjusted ROC curve corresponding to model;
According to the adjusted ROC curve, A weighting UC corresponding to two disaggregated model is determined.
A kind of data processing equipment for being directed to two disaggregated models that this specification embodiment provides, including:
First acquisition module, obtain two disaggregated models and include the training data of multiple samples, two disaggregated model is used In calculating score corresponding to the sample, to judge positive negative sample and by result of determination as whether referring to for sample execution Determine the foundation of event;
Estimation module, estimate the density function of the specified index of the sample, as ROC curve Dynamic gene, the finger Determine the implementation effect of specified event described in index reflection;
Second acquisition module, according to two disaggregated model, the training data and the ROC curve Dynamic gene, Obtain adjusted ROC curve corresponding to two disaggregated model;
Determining module, according to the adjusted ROC curve, determine A weighting UC corresponding to two disaggregated model.
The a kind of electronic equipment that this specification embodiment provides, including:
At least one processor;And
The memory being connected with least one processor communication;Wherein,
The memory storage has can be by the instruction of at least one computing device, and the instruction is by described at least one Individual computing device, so that at least one processor can:
Obtain two disaggregated models and include the training data of multiple samples, two disaggregated model is used to calculate the sample Corresponding score, using judge positive negative sample and by result of determination as whether for the sample perform specify event foundation;
Estimate the density function of the specified index of the sample, as ROC curve Dynamic gene, the specified index reflection The implementation effect of the specified event;
According to two disaggregated model, the training data and the ROC curve Dynamic gene, two classification is obtained Adjusted ROC curve corresponding to model;
According to the adjusted ROC curve, A weighting UC corresponding to two disaggregated model is determined.
Above-mentioned at least one technical scheme that this specification embodiment uses can reach following beneficial effect:By based on The density function of the specified index of sample corresponding to two disaggregated models, ROC curve corresponding to two disaggregated model is adjusted, And determine A weighting UC again after the adjustment, more accurately two disaggregated model can be evaluated based on A weighting UC.
Brief description of the drawings
In order to illustrate more clearly of this specification embodiment or technical scheme of the prior art, below will to embodiment or The required accompanying drawing used is briefly described in description of the prior art, it should be apparent that, drawings in the following description are only Some embodiments described in this specification, for those of ordinary skill in the art, do not paying creative labor Under the premise of, other accompanying drawings can also be obtained according to these accompanying drawings.
Fig. 1 is a kind of overall architecture schematic diagram that the scheme of this specification is related under a kind of practical application scene;
Fig. 2 is a kind of flow signal for data processing method for being directed to two disaggregated models that this specification embodiment provides Figure;
Fig. 3 is a kind of schematic flow sheet of the tuning scheme for two disaggregated model that this specification embodiment provides;
Fig. 4 a are two disaggregated models 1 and two disaggregated models 2 under a kind of practical application scene that this specification embodiment provides The ROC curve schematic diagram of corresponding standard;
Fig. 4 b are under a kind of practical application scene that this specification embodiment provides, and are adjusted corresponding to two disaggregated models 1 ROC curve schematic diagram;
Fig. 4 c are under a kind of practical application scene that this specification embodiment provides, and are adjusted corresponding to two disaggregated models 2 ROC curve schematic diagram;
Fig. 5 is a kind of data processing equipment for being directed to two disaggregated models corresponding to Fig. 2 that this specification embodiment provides Structural representation.
Embodiment
This specification embodiment provides a kind of data processing method for being directed to two disaggregated models, device and electronic equipment.
In order that those skilled in the art more fully understand the technical scheme in this specification, below in conjunction with this explanation Accompanying drawing in book embodiment, the technical scheme in this specification embodiment is clearly and completely described, it is clear that described Embodiment be only some embodiments of the present application, rather than whole embodiment.Based on this specification embodiment, this area The every other embodiment that those of ordinary skill is obtained under the premise of creative work is not made, should all belong to the application The scope of protection.
Fig. 1 is a kind of overall architecture schematic diagram that the scheme of this specification is related under a kind of practical application scene.This is whole In body framework, two parts are related generally to:Equipment, A weighting UC computing devices where two disaggregated models and training data.Weighting AUC computing devices obtain two disaggregated models and training data, and ROC curve corresponding to two disaggregated models is entered according to training data Row adjustment, and A weighting UC is calculated, using the Appreciation gist as two disaggregated model.In actual applications, two disaggregated models and Equipment and A weighting UC computing devices can also be same equipment where training data, can so reduce network data transmission amount.
Based on above overall architecture, the scheme of this specification is described in detail below.
Fig. 2 is a kind of flow signal for data processing method for being directed to two disaggregated models that this specification embodiment provides Figure.The possible executive agent of the flow includes but is not limited to can be as server or the following equipment of terminal:Personal computer, Medium-size computer, computer cluster, mobile phone, tablet personal computer, intelligent wearable device, vehicle device etc..
Flow in Fig. 2 may comprise steps of:
S202:Obtain two disaggregated models and include the training data of multiple samples, two disaggregated model is used to calculate institute Score corresponding to sample is stated, to judge positive negative sample and by result of determination as whether for the specified event of sample execution Foundation.
In this specification embodiment, usually, when sample is judged as positive sample, then can decision-making the sample is performed should The event of specifying, conversely, when sample is judged as negative sample, then can decision-making do not perform the specified event to the sample.
It should be noted that positive and negative sample results can be informative foundation, it is actual that whether sample is performed on earth This specifies event is necessary to depend on positive negative sample result of determination.Before two disaggregated models train, if for sample Originally it can be known to perform this and specify event.
S204:Estimate the density function of the specified index of the sample, as ROC curve Dynamic gene, the specified finger Mark reflects the implementation effect of the specified event.
S206:According to two disaggregated model, the training data and the ROC curve Dynamic gene, described in acquisition Adjusted ROC curve corresponding to two disaggregated models.
For the ROC curve of standard, it is obtained according to the multiple sample drawings of weight identical.And in practical application In, two disaggregated models are for (such as the specified event of subsequent affect caused by the correctness of the classification results of different samples Implementation effect etc.) may be different, then when evaluating two disaggregated model, it is also contemplated that the weight otherness of different samples, In this way, be advantageous to more accurately evaluate two disaggregated model.
Based on such thinking, in this specification embodiment, sample can be selected to be related to the parameter of the subsequent affect, As specified index, specify index to assign possible different weight to different samples according to this, realized based on weight to standard ROC curve adjustment, and two disaggregated model is evaluated according to adjusted ROC curve.
For example, can be with the density function of the specified index of sample estimates, as ROC curve Dynamic gene;Or can also This is directly specified into the specific value of index as ROC curve Dynamic gene.The advantages of former mode, is:Further examine Consider the probability of occurrence of the specific value of the specified index of different samples, it is more representative so as to advantageously allow sample.Afterwards A kind of the advantages of mode, is:Amount of calculation is smaller, and cost of implementation is smaller.In this specification embodiment, former is based primarily upon Mode illustrates.
S208:According to the adjusted ROC curve, A weighting UC corresponding to two disaggregated model is determined.
Area is referred to as AUC under the line of the ROC curve of standard, similarly, for the ease of description, in this specification embodiment In, area under the line of adjusted ROC curve is referred to as A weighting UC, usually, it is believed that A weighting UC is bigger, then it is corresponding Two disaggregated models it is more excellent.
By Fig. 2 method, by the density function of the specified index based on sample corresponding to two disaggregated models, to this two ROC curve is adjusted corresponding to disaggregated model, and determines A weighting UC again after the adjustment, can be more accurate based on A weighting UC Really two disaggregated model is evaluated, therefore, can partly or entirely solve above-mentioned technical problem.
Method based on Fig. 2, this specification embodiment additionally provide some specific embodiments of this method, and extension Scheme, it is illustrated below.
In this specification embodiment, as it was previously stated, A weighting UC can be used to evaluate two disaggregated models, and then, can be with root Parameter adjustment is carried out to two disaggregated models according to evaluation result, meets expected two more excellent disaggregated models to obtain.
Specifically, for the flow in Fig. 2, two disaggregated models can also be carried out once or multiple parameter adjustment, it is right Two disaggregated models after each parameter adjustment, adjusted according to two disaggregated model, the training data and the ROC curve The factor, obtain adjusted ROC curve corresponding to two disaggregated model, and A weighting UC corresponding to determination;According to each determination The A weighting UC, filter out and meet expected two disaggregated models.
It is for instance possible to use gridding method force search parameter space, searches out the parameter for make it that A weighting UC is maximum, its is right Two disaggregated models answered are the selection result.
According to explanation above, this specification embodiment provides a kind of flow signal of tuning scheme of two disaggregated models Figure, as shown in Figure 3.
Flow in Fig. 3 mainly includes the following steps that:
The score of each sample in training data, and the density letter of the specified index to sample are calculated using two disaggregated models Number is estimated;
According to the score and the density function, parameter adjustment is carried out for two disaggregated models, in the hope of making two classification Parameter maximum A weighting UC corresponding to model, export two disaggregated models corresponding to required parameter.
Further, illustrated for the acquisition modes of adjusted ROC curve.Usually, can be by bent in ROC In line coordinates system, moved, drawn with corresponding move mode in axis of abscissas and/or axis of ordinates direction according to sample To the ROC curve of standard, wherein, axis of abscissas represents false positive example rate (False Positive Rate, FPR), axis of ordinates table Show real example rate (True Positive Rate, TPR).For the scheme of this specification, due to add ROC curve adjustment because Son, then when drawing adjusted ROC curve, the move mode also can correspondingly be entered according to ROC curve Dynamic gene Row adjustment.
Specifically, it is described bent according to two disaggregated model, the training data and the ROC for step S206 Line Dynamic gene, adjusted ROC curve corresponding to acquisition two disaggregated model, can include:
Obtain score corresponding to each sample that the training data that two disaggregated model calculates includes;Obtain Legitimate reading corresponding to the result of determination, the legitimate reading show that the sample is really positive sample or negative sample;Root According to the score, the legitimate reading and the ROC curve Dynamic gene, obtain and adjusted corresponding to two disaggregated model ROC curve.
Further, it is described according to the score, the legitimate reading and the ROC curve Dynamic gene, obtain institute Adjusted ROC curve corresponding to two disaggregated models is stated, can specifically be included:
According to the score, each sample included to the training data is ranked up;
According to the clooating sequence, the legitimate reading and the ROC curve Dynamic gene, by being sat in ROC curve In mark system, moved from starting point corresponding to each sample to terminal, drafting obtains adjusting corresponding to two disaggregated model ROC curve;Usually, sample corresponding to the higher expression of score is more likely to be positive sample, can be according to score from high to low Order, each sample is ranked up, when mobile, moved successively against each sample of sequence;
Wherein, in the moving process, using the ROC curve Dynamic gene, to the axis of abscissas direction and/ Or the move mode in the axis of ordinates direction is adjusted.
Exemplified by adjusting the move mode in axis of ordinates direction.The adjustment carried out to the move mode is with the standard of drawing Carried out during ROC curve on the basis of corresponding move mode;
It is described to utilize the ROC curve Dynamic gene, in the axis of abscissas direction and/or the axis of ordinates direction Move mode be adjusted, can specifically include:
In the move mode and the ROC curve of drafting standard in the axis of abscissas direction in the axis of abscissas direction Move mode is consistent;Sample for being defined as positive sample in each sample according to the legitimate reading, using described ROC curve Dynamic gene, the move mode currently in the axis of ordinates direction is adjusted.
Can have to the concrete mode that the move mode in axis of ordinates direction is adjusted a variety of, mobile speed can be adjusted Degree, can also adjust displacement.
For example, described utilize the ROC curve Dynamic gene, the move mode currently in the axis of ordinates direction is entered Row adjustment, can specifically include:
It is defined as the sample of positive sample for this, the move mode currently in the axis of ordinates direction is adjusted, So as to move distance to a declared goal to the axis of ordinates direction, the distance to a declared goal is calculated using the ROC curve Dynamic gene Arrive.
Distance to a declared goal calculation formula can also have a variety of.For example the distance to a declared goal can be:For another example, Distance to a declared goal can also be:Etc.;
Wherein, p represents the specified index for being defined as the sample of positive sample, and f (p) represents the ROC corresponding to p The value of the specified index p of the sample (is substituted into f (p), the probability of value appearance can be calculated) by curve Dynamic gene, npRepresent the actual quantity of positive sample in each sample;λ1、λ2It is adjustability coefficients.
In this specification embodiment, in addition to the scheme of above-mentioned adjustment move mode, also it can be achieved adjusted Other of ROC curve draw scheme.For example ROC curve coordinate points [FPR corresponding to each sample i can be tried to achieve respectivelyi, TPRi], recycle ROC curve Dynamic gene each coordinate points are adjusted (such as adjustment TPRi), by each adjacent coordinates point The ROC curve that can be adjusted with line segment connection.
From the description above, in order to make it easy to understand, with reference to the example under a kind of practical application scene, for the above method Implementation and its effect illustrate.
The scene is that scene is refused to pay in the world, and in particular to asks money problem again.Case can be carried out using two disaggregated models Classification, the case of money can be asked again with identification.Based on thinking above, not only need to consider which kind of refuses to pay case to is carried out again Please money, while can also further consider to ask the index such as money success rate, case amount.
For example, if two disaggregated models can correctly identify the case that can be asked money again, if but these cases ask money again Success rate is very low, then can also waste the various resources asked again spent by money.
For another example two disaggregated models can correctly identify the case (it is 0.9 to ask money success rate again) of 10 1 U.S. dollars, it is not so good as Strive for correctly identifying the case of 1 100 U.S. dollar (it is 0.8 to ask money success rate again).
In the case of implementing the above method under the scene, above-mentioned specified event is to ask money (more specifically, can be again Please money), above-mentioned specified index such as can be to ask money success rate (more specifically, can ask money success rate again), above-mentioned Sample can be specifically the case, if asking money success, actual sample corresponding to expression is positive sample, otherwise, corresponding to expression Actual sample is negative sample.
It is assumed that having used 8 samples, related data is as shown in table 1 below:
Table 1
According to the data in table 1, the ROC curve of standard corresponding to two disaggregated models, two disaggregated models 1, two can be drawn The ROC curve of the corresponding standard respectively of disaggregated model 2 is identical, as shown in fig. 4 a.
The AUC of ROC curve in Fig. 4 a is equal to 0.75, it is difficult to distinguishes two disaggregated models 1 and two disaggregated models 2 to sample 1 The difference of the score calculated with sample 3.
And after using the scheme of this specification, based on asking money success rate to adjust ROC curve, and A weighting UC is obtained, led to Adjusted ROC curve is crossed, two disaggregated models 1 can be preferably distinguished and two disaggregated models 2 calculates to sample 1 and sample 3 Score difference.
Fig. 4 b show adjusted ROC curve corresponding to two disaggregated models 1, and Fig. 4 c show the correspondence of two disaggregated model 2 Adjusted ROC curve.
According to diagram, A weighting UC corresponding to two disaggregated models 1 can be calculated equal to 0.475, two disaggregated models 2 are right The A weighting UC answered is equal to 0.43125.Using A weighting UC as evaluation criteria, it is known that two disaggregated models 1 are better than two disaggregated models 2, from This is also readily appreciated that from the point of view of situation success rate, because two disaggregated models 1 tend to ask the high sample of money success rate to calculate Go out relatively high score.
Based on same thinking, this specification embodiment additionally provides corresponding device, as shown in Figure 5.
Fig. 5 is a kind of data processing equipment for being directed to two disaggregated models corresponding to Fig. 2 that this specification embodiment provides Structural representation, dashed rectangle represents optional module, and the device can be located in Fig. 2 on the executive agent of flow, including:
First acquisition module 501, obtain two disaggregated models and include the training data of multiple samples, two disaggregated model For calculating score corresponding to the sample, to judge positive negative sample and by result of determination as whether for sample execution The foundation for the event of specifying;
Estimation module 502, estimate the density function of the specified index of the sample, it is bent as Receiver Operating Characteristics ROC Line Dynamic gene, the implementation effect of specified event described in the specified index reflection;
Second acquisition module 503, according to two disaggregated model, the training data and the ROC curve adjustment because Son, obtain adjusted ROC curve corresponding to two disaggregated model;
Determining module 504, according to the adjusted ROC curve, determine that weighting ROC is bent corresponding to two disaggregated model Area AUC under line line.
Alternatively, described device also includes:
Screening module 505, two disaggregated model is carried out once or multiple parameter adjustment, for each parameter adjustment Two disaggregated models afterwards, according to two disaggregated model, the training data and the ROC curve Dynamic gene, obtain this two Adjusted ROC curve corresponding to disaggregated model, and A weighting UC corresponding to determination;
According to the A weighting UC of each determination, filter out and meet expected two disaggregated models.
Alternatively, second acquisition module 503 is according to two disaggregated model, the training data and the ROC Curve Dynamic gene, adjusted ROC curve corresponding to acquisition two disaggregated model, is specifically included:
It is each described to obtain that the training data that two disaggregated model calculates includes for second acquisition module 503 Score corresponding to sample;
Obtain legitimate reading corresponding to the result of determination, the legitimate reading show the sample be really positive sample also It is negative sample;
According to the score, the legitimate reading and the ROC curve Dynamic gene, two disaggregated model pair is obtained The adjusted ROC curve answered.
Alternatively, second acquisition module 503 is adjusted according to the score, the legitimate reading and the ROC curve Integral divisor, adjusted ROC curve corresponding to acquisition two disaggregated model, is specifically included:
Second acquisition module 503 according to the score, arrange by each sample included to the training data Sequence;
According to the clooating sequence, the legitimate reading and the ROC curve Dynamic gene, by being sat in ROC curve In mark system, moved from starting point corresponding to each sample to terminal, drafting obtains adjusting corresponding to two disaggregated model ROC curve;
Wherein, the axis of abscissas of the ROC curve coordinate system represents FPR, and axis of ordinates represents TPR, is moved through described Cheng Zhong, using the ROC curve Dynamic gene, to the movement in the axis of abscissas direction and/or the axis of ordinates direction Mode is adjusted.
Alternatively, the adjustment that second acquisition module 503 is carried out to the move mode is with the ROC curve for the standard of drawing When corresponding move mode on the basis of carry out;
Second acquisition module 503 utilizes the ROC curve Dynamic gene, in the axis of abscissas direction and/or The move mode in the axis of ordinates direction is adjusted, and is specifically included:
Second acquisition module 503 is in the move mode and the ROC curve of drafting standard in the axis of abscissas direction Move mode in the axis of abscissas direction is consistent;
Sample for being defined as positive sample in each sample according to the legitimate reading, adjusted using the ROC curve Integral divisor, the move mode currently in the axis of ordinates direction is adjusted.
Alternatively, second acquisition module 503 utilizes the ROC curve Dynamic gene, to currently in the ordinate The move mode of direction of principal axis is adjusted, and is specifically included:
Second acquisition module 503 is defined as the sample of positive sample for this, to currently in the axis of ordinates direction Move mode be adjusted, so as to move distance to a declared goal to the axis of ordinates direction, the distance to a declared goal utilizes the ROC Curve Dynamic gene is calculated.
Alternatively, the distance to a declared goal is:
Wherein, p represents the specified index for being defined as the sample of positive sample, and f (p) represents the ROC corresponding to p Curve Dynamic gene, npRepresent the actual quantity of positive sample in each sample.
Alternatively, the specified event includes asking money, and the specified index includes asking money success rate;
If asking money success, actual sample corresponding to expression is positive sample, and otherwise, sample corresponding to expression is actual for negative sample This.
Based on same thinking, this specification embodiment additionally provides corresponding a kind of electronic equipment, including:
At least one processor;And
The memory being connected with least one processor communication;Wherein,
The memory storage has can be by the instruction of at least one computing device, and the instruction is by described at least one Individual computing device, so that at least one processor can:
Obtain two disaggregated models and include the training data of multiple samples, two disaggregated model is used to calculate the sample Corresponding score, using judge positive negative sample and by result of determination as whether for the sample perform specify event foundation;
Estimate the density function of the specified index of the sample, as ROC curve Dynamic gene, the specified index reflection The implementation effect of the specified event;
According to two disaggregated model, the training data and the ROC curve Dynamic gene, two classification is obtained Adjusted ROC curve corresponding to model;
According to the adjusted ROC curve, A weighting UC corresponding to two disaggregated model is determined.
Based on same thinking, this specification embodiment additionally provides a kind of corresponding non-volatile computer storage and is situated between Matter, is stored with computer executable instructions, and the computer executable instructions are arranged to:
First acquisition module, obtain two disaggregated models and include the training data of multiple samples, two disaggregated model is used In calculating score corresponding to the sample, to judge positive negative sample and by result of determination as whether referring to for sample execution Determine the foundation of event;
Estimation module, estimate the density function of the specified index of the sample, as ROC curve Dynamic gene, the finger Determine the implementation effect of specified event described in index reflection;
Second acquisition module, according to two disaggregated model, the training data and the ROC curve Dynamic gene, Obtain adjusted ROC curve corresponding to two disaggregated model;
Determining module, according to the adjusted ROC curve, determine A weighting UC corresponding to two disaggregated model.
It is above-mentioned that this specification specific embodiment is described.Other embodiments are in the scope of the appended claims It is interior.In some cases, the action recorded in detail in the claims or step can be come according to different from the order in embodiment Perform and still can realize desired result.In addition, the process described in the accompanying drawings not necessarily require show it is specific suitable Sequence or consecutive order could realize desired result.In some embodiments, multitasking and parallel processing be also can With or be probably favourable.
Each embodiment in this specification is described by the way of progressive, identical similar portion between each embodiment Divide mutually referring to what each embodiment stressed is the difference with other embodiment.Especially for device, For electronic equipment, nonvolatile computer storage media embodiment, because it is substantially similar to embodiment of the method, so description It is fairly simple, the relevent part can refer to the partial explaination of embodiments of method.
Device that this specification embodiment provides, electronic equipment, nonvolatile computer storage media with method are corresponding , therefore, device, electronic equipment, nonvolatile computer storage media also there is the Advantageous similar with corresponding method to imitate Fruit, due to the advantageous effects of method being described in detail above, therefore, repeat no more here corresponding intrument, The advantageous effects of electronic equipment, nonvolatile computer storage media.
In the 1990s, the improvement for a technology can clearly distinguish be on hardware improvement (for example, Improvement to circuit structures such as diode, transistor, switches) or software on improvement (improvement for method flow).So And as the development of technology, the improvement of current many method flows can be considered as directly improving for hardware circuit. Designer nearly all obtains corresponding hardware circuit by the way that improved method flow is programmed into hardware circuit.Cause This, it cannot be said that the improvement of a method flow cannot be realized with hardware entities module.For example, PLD (Programmable Logic Device, PLD) (such as field programmable gate array (Field Programmable Gate Array, FPGA)) it is exactly such a integrated circuit, its logic function is determined by user to device programming.By designer Voluntarily programming comes a digital display circuit " integrated " on a piece of PLD, without asking chip maker to design and make Special IC chip.Moreover, nowadays, substitution manually makes IC chip, this programming is also used instead mostly " patrols Volume compiler (logic compiler) " software realizes that software compiler used is similar when it writes with program development, And the source code before compiling also write by handy specific programming language, this is referred to as hardware description language (Hardware Description Language, HDL), and HDL is also not only a kind of, but have many kinds, such as ABEL (Advanced Boolean Expression Language)、AHDL(Altera Hardware Description Language)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL (Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(Ruby Hardware Description Language) etc., VHDL (Very-High-Speed are most generally used at present Integrated Circuit Hardware Description Language) and Verilog.Those skilled in the art also should This understands, it is only necessary to method flow slightly programming in logic and is programmed into integrated circuit with above-mentioned several hardware description languages, Can is readily available the hardware circuit for realizing the logical method flow.
Controller can be implemented in any suitable manner, for example, controller can take such as microprocessor or processing Device and storage can by the computer of the computer readable program code (such as software or firmware) of (micro-) computing device Read medium, gate, switch, application specific integrated circuit (Application Specific Integrated Circuit, ASIC), the form of programmable logic controller (PLC) and embedded microcontroller, the example of controller include but is not limited to following microcontroller Device:ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20 and Silicone Labs C8051F320, are deposited Memory controller is also implemented as a part for the control logic of memory.It is also known in the art that except with Pure computer readable program code mode realized beyond controller, completely can be by the way that method and step is carried out into programming in logic to make Controller is obtained in the form of gate, switch, application specific integrated circuit, programmable logic controller (PLC) and embedded microcontroller etc. to come in fact Existing identical function.Therefore this controller is considered a kind of hardware component, and various for realizing to including in it The device of function can also be considered as the structure in hardware component.Or even, can be by for realizing that the device of various functions regards For that not only can be the software module of implementation method but also can be the structure in hardware component.
System, device, module or the unit that above-described embodiment illustrates, it can specifically be realized by computer chip or entity, Or realized by the product with certain function.One kind typically realizes that equipment is computer.Specifically, computer for example may be used Think personal computer, laptop computer, cell phone, camera phone, smart phone, personal digital assistant, media play It is any in device, navigation equipment, electronic mail equipment, game console, tablet PC, wearable device or these equipment The combination of equipment.
For convenience of description, it is divided into various units during description apparatus above with function to describe respectively.Certainly, this is being implemented The function of each unit can be realized in same or multiple softwares and/or hardware during specification.
It should be understood by those skilled in the art that, this specification embodiment can be provided as method, system or computer program Product.Therefore, this specification embodiment can use complete hardware embodiment, complete software embodiment or with reference to software and hardware The form of the embodiment of aspect.Moreover, this specification embodiment can be can use using computer is wherein included in one or more It is real in the computer-usable storage medium (including but is not limited to magnetic disk storage, CD-ROM, optical memory etc.) of program code The form for the computer program product applied.
This specification is with reference to the method, equipment (system) and computer program product according to this specification embodiment Flow chart and/or block diagram describe.It should be understood that can be by every in computer program instructions implementation process figure and/or block diagram One flow and/or the flow in square frame and flow chart and/or block diagram and/or the combination of square frame.These computers can be provided Processor of the programmed instruction to all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing devices To produce a machine so that produce use by the instruction of computer or the computing device of other programmable data processing devices In the dress for realizing the function of being specified in one flow of flow chart or multiple flows and/or one square frame of block diagram or multiple square frames Put.
These computer program instructions, which may be alternatively stored in, can guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works so that the instruction being stored in the computer-readable memory, which produces, to be included referring to Make the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one square frame of block diagram or The function of being specified in multiple square frames.
These computer program instructions can be also loaded into computer or other programmable data processing devices so that counted Series of operation steps is performed on calculation machine or other programmable devices to produce computer implemented processing, so as in computer or The instruction performed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one The step of function of being specified in individual square frame or multiple square frames.
In a typical configuration, computing device includes one or more processors (CPU), input/output interface, net Network interface and internal memory.
Internal memory may include computer-readable medium in volatile memory, random access memory (RAM) and/or The forms such as Nonvolatile memory, such as read-only storage (ROM) or flash memory (flash RAM).Internal memory is computer-readable medium Example.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology come realize information store.Information can be computer-readable instruction, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), moved State random access memory (DRAM), other kinds of random access memory (RAM), read-only storage (ROM), electric erasable Programmable read only memory (EEPROM), fast flash memory bank or other memory techniques, read-only optical disc read-only storage (CD-ROM), Digital versatile disc (DVD) or other optical storages, magnetic cassette tape, the storage of tape magnetic rigid disk or other magnetic storage apparatus Or any other non-transmission medium, the information that can be accessed by a computing device available for storage.Define, calculate according to herein Machine computer-readable recording medium does not include temporary computer readable media (transitory media), such as data-signal and carrier wave of modulation.
It should also be noted that, term " comprising ", "comprising" or its any other variant are intended to nonexcludability Comprising so that process, method, commodity or equipment including a series of elements not only include those key elements, but also wrapping Include the other element being not expressly set out, or also include for this process, method, commodity or equipment intrinsic want Element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that wanted including described Other identical element also be present in the process of element, method, commodity or equipment.
This specification can be described in the general context of computer executable instructions, such as journey Sequence module.Usually, program module include performing particular task or realize the routine of particular abstract data type, program, object, Component, data structure etc..This specification can also be put into practice in a distributed computing environment, in these DCEs In, by performing task by communication network and connected remote processing devices.In a distributed computing environment, program module It can be located in the local and remote computer-readable storage medium including storage device.
Each embodiment in this specification is described by the way of progressive, identical similar portion between each embodiment Divide mutually referring to what each embodiment stressed is the difference with other embodiment.It is real especially for system For applying example, because it is substantially similar to embodiment of the method, so description is fairly simple, related part is referring to embodiment of the method Part explanation.
This specification embodiment is the foregoing is only, is not limited to the application.For those skilled in the art For, the application can have various modifications and variations.All any modifications made within spirit herein and principle, it is equal Replace, improve etc., it should be included within the scope of claims hereof.

Claims (17)

1. a kind of data processing method for being directed to two disaggregated models, including:
Obtain two disaggregated models and include the training data of multiple samples, two disaggregated model is corresponding for calculating the sample Score, using judge positive negative sample and by result of determination as whether for the sample perform specify event foundation;
The density function of the specified index of the sample is estimated, as Receiver Operating Characteristics' ROC curve Dynamic gene, the finger Determine the implementation effect of specified event described in index reflection;
According to two disaggregated model, the training data and the ROC curve Dynamic gene, two disaggregated model is obtained Corresponding adjusted ROC curve;
According to the adjusted ROC curve, area AUC under weighting ROC curve line corresponding to two disaggregated model is determined.
2. the method as described in claim 1, methods described also includes:
Two disaggregated model is carried out once or multiple parameter adjustment, for two disaggregated models after each parameter adjustment, According to two disaggregated model, the training data and the ROC curve Dynamic gene, obtain and adjusted corresponding to two disaggregated model The ROC curve haveing suffered, and A weighting UC corresponding to determination;
According to the A weighting UC of each determination, filter out and meet expected two disaggregated models.
3. the method as described in claim 1, described bent according to two disaggregated model, the training data and the ROC Line Dynamic gene, adjusted ROC curve corresponding to acquisition two disaggregated model, is specifically included:
Obtain score corresponding to each sample that the training data that two disaggregated model calculates includes;
Legitimate reading corresponding to the result of determination is obtained, the legitimate reading shows that the sample is really that positive sample is still born Sample;
According to the score, the legitimate reading and the ROC curve Dynamic gene, obtain corresponding to two disaggregated model Adjusted ROC curve.
4. method as claimed in claim 3, described to be adjusted according to the score, the legitimate reading and the ROC curve The factor, adjusted ROC curve corresponding to acquisition two disaggregated model, is specifically included:
According to the score, each sample included to the training data is ranked up;
According to the clooating sequence, the legitimate reading and the ROC curve Dynamic gene, by ROC curve coordinate system In, moved from starting point corresponding to each sample to terminal, drafting obtains adjusted ROC corresponding to two disaggregated model Curve;
Wherein, the axis of abscissas of the ROC curve coordinate system represents false positive example rate FPR, and axis of ordinates represents real example rate TPR, In the moving process, using the ROC curve Dynamic gene, in the axis of abscissas direction and/or the ordinate The move mode of direction of principal axis is adjusted.
It is corresponding when the adjustment that is carried out to the move mode is with the ROC curve of the standard of drawing 5. method as claimed in claim 4 Move mode on the basis of carry out;
It is described to utilize the ROC curve Dynamic gene, to the shifting in the axis of abscissas direction and/or the axis of ordinates direction Flowing mode is adjusted, and is specifically included:
In the move mode and the ROC curve of drafting standard in the axis of abscissas direction in the movement in the axis of abscissas direction Mode is consistent;
Sample for being defined as positive sample in each sample according to the legitimate reading, using the ROC curve adjustment because Son, the move mode currently in the axis of ordinates direction is adjusted.
6. method as claimed in claim 5, described to utilize the ROC curve Dynamic gene, to currently in the axis of ordinates The move mode in direction is adjusted, and is specifically included:
It is defined as the sample of positive sample for this, the move mode currently in the axis of ordinates direction is adjusted, so that Distance to a declared goal is moved to the axis of ordinates direction, the distance to a declared goal is calculated using the ROC curve Dynamic gene.
7. method as claimed in claim 6, the distance to a declared goal are:
Wherein, p represents the specified index for being defined as the sample of positive sample, and f (p) represents the ROC curve corresponding to p Dynamic gene, npRepresent the actual quantity of positive sample in each sample.
8. the method as described in any one of claim 1~7, the specified event includes asking money, and the specified index includes please Money success rate;
If asking money success, actual sample corresponding to expression is positive sample, and otherwise, actual sample corresponding to expression is negative sample.
9. a kind of data processing equipment for being directed to two disaggregated models, including:
First acquisition module, obtain two disaggregated models and include the training data of multiple samples, two disaggregated model is based on Score corresponding to the sample is calculated, to judge positive negative sample and by result of determination as whether for the specified thing of sample execution The foundation of part;
Estimation module, estimate the density function of the specified index of the sample, as Receiver Operating Characteristics' ROC curve adjustment because Son, the implementation effect of specified event described in the specified index reflection;
Second acquisition module, according to two disaggregated model, the training data and the ROC curve Dynamic gene, obtain Adjusted ROC curve corresponding to two disaggregated model;
Determining module, according to the adjusted ROC curve, determine under weighting ROC curve line corresponding to two disaggregated model Area AUC.
10. device as claimed in claim 9, described device also include:
Screening module, two disaggregated model is carried out once or multiple parameter adjustment, for two after each parameter adjustment Disaggregated model, according to two disaggregated model, the training data and the ROC curve Dynamic gene, obtain the two classification mould Adjusted ROC curve corresponding to type, and A weighting UC corresponding to determination;
According to the A weighting UC of each determination, filter out and meet expected two disaggregated models.
11. device as claimed in claim 9, second acquisition module is according to two disaggregated model, the training data And the ROC curve Dynamic gene, adjusted ROC curve corresponding to acquisition two disaggregated model, specifically include:
Second acquisition module obtains each sample pair that the training data that two disaggregated model calculates includes The score answered;
Legitimate reading corresponding to the result of determination is obtained, the legitimate reading shows that the sample is really that positive sample is still born Sample;
According to the score, the legitimate reading and the ROC curve Dynamic gene, obtain corresponding to two disaggregated model Adjusted ROC curve.
12. device as claimed in claim 11, second acquisition module is according to the score, the legitimate reading and institute ROC curve Dynamic gene is stated, adjusted ROC curve corresponding to acquisition two disaggregated model, is specifically included:
According to the score, each sample included to the training data is ranked up second acquisition module;
According to the clooating sequence, the legitimate reading and the ROC curve Dynamic gene, by ROC curve coordinate system In, moved from starting point corresponding to each sample to terminal, drafting obtains adjusted ROC corresponding to two disaggregated model Curve;
Wherein, the axis of abscissas of the ROC curve coordinate system represents false positive example rate FPR, and axis of ordinates represents real example rate TPR, In the moving process, using the ROC curve Dynamic gene, in the axis of abscissas direction and/or the ordinate The move mode of direction of principal axis is adjusted.
13. device as claimed in claim 12, the adjustment that second acquisition module is carried out to the move mode is to draw Carried out during the ROC curve of standard on the basis of corresponding move mode;
Second acquisition module utilizes the ROC curve Dynamic gene, in the axis of abscissas direction and/or the vertical seat The move mode in parameter direction is adjusted, and is specifically included:
Second acquisition module is in the move mode and the ROC curve of drafting standard in the axis of abscissas direction in the horizontal stroke The move mode of change in coordinate axis direction is consistent;
Sample for being defined as positive sample in each sample according to the legitimate reading, using the ROC curve adjustment because Son, the move mode currently in the axis of ordinates direction is adjusted.
14. device as claimed in claim 13, second acquisition module utilizes the ROC curve Dynamic gene, to current Move mode in the axis of ordinates direction is adjusted, and is specifically included:
Second acquisition module is defined as the sample of positive sample for this, to currently in the mobile side in the axis of ordinates direction Formula is adjusted, so as to move distance to a declared goal to the axis of ordinates direction, the distance to a declared goal is adjusted using the ROC curve The factor is calculated.
15. device as claimed in claim 14, the distance to a declared goal are:
Wherein, p represents the specified index for being defined as the sample of positive sample, and f (p) represents the ROC curve corresponding to p Dynamic gene, npRepresent the actual quantity of positive sample in each sample.
16. the device as described in any one of claim 9~15, the specified event includes asking money, and the specified index includes Please money success rate;
If asking money success, actual sample corresponding to expression is positive sample, and otherwise, actual sample corresponding to expression is negative sample.
17. a kind of electronic equipment, including:
At least one processor;And
The memory being connected with least one processor communication;Wherein,
The memory storage has can be by the instruction of at least one computing device, and the instruction is by least one place Manage device to perform, so that at least one processor can:
Obtain two disaggregated models and include the training data of multiple samples, two disaggregated model is corresponding for calculating the sample Score, using judge positive negative sample and by result of determination as whether for the sample perform specify event foundation;
The density function of the specified index of the sample is estimated, as Receiver Operating Characteristics' ROC curve Dynamic gene, the finger Determine the implementation effect of specified event described in index reflection;
According to two disaggregated model, the training data and the ROC curve Dynamic gene, two disaggregated model is obtained Corresponding adjusted ROC curve;
According to the adjusted ROC curve, area AUC under weighting ROC curve line corresponding to two disaggregated model is determined.
CN201710733129.7A 2017-08-24 2017-08-24 A kind of data processing method for being directed to two disaggregated models, device and electronic equipment Pending CN107679547A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710733129.7A CN107679547A (en) 2017-08-24 2017-08-24 A kind of data processing method for being directed to two disaggregated models, device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710733129.7A CN107679547A (en) 2017-08-24 2017-08-24 A kind of data processing method for being directed to two disaggregated models, device and electronic equipment

Publications (1)

Publication Number Publication Date
CN107679547A true CN107679547A (en) 2018-02-09

Family

ID=61134244

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710733129.7A Pending CN107679547A (en) 2017-08-24 2017-08-24 A kind of data processing method for being directed to two disaggregated models, device and electronic equipment

Country Status (1)

Country Link
CN (1) CN107679547A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109447125A (en) * 2018-09-28 2019-03-08 北京达佳互联信息技术有限公司 Processing method, device, electronic equipment and the storage medium of disaggregated model
CN109598304A (en) * 2018-12-04 2019-04-09 北京字节跳动网络技术有限公司 Disaggregated model calibration method, device, equipment and readable medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109447125A (en) * 2018-09-28 2019-03-08 北京达佳互联信息技术有限公司 Processing method, device, electronic equipment and the storage medium of disaggregated model
CN109598304A (en) * 2018-12-04 2019-04-09 北京字节跳动网络技术有限公司 Disaggregated model calibration method, device, equipment and readable medium
CN109598304B (en) * 2018-12-04 2019-11-08 北京字节跳动网络技术有限公司 Disaggregated model calibration method, device, equipment and readable medium

Similar Documents

Publication Publication Date Title
CN108345580A (en) A kind of term vector processing method and processing device
CN108734460A (en) A kind of means of payment recommends method, apparatus and equipment
CN109034183A (en) A kind of object detection method, device and equipment
CN109086961A (en) A kind of Information Risk monitoring method and device
CN107679700A (en) Business flow processing method, apparatus and server
TW201833851A (en) Risk control event automatic processing method and apparatus
CN111881973A (en) Sample selection method and device, storage medium and electronic equipment
CN108665277A (en) A kind of information processing method and device
CN107516105A (en) Image processing method and device
CN108665158A (en) A kind of method, apparatus and equipment of trained air control model
CN109615171A (en) Characteristic threshold value determines that method and device, problem objects determine method and device
CN110502614A (en) Text hold-up interception method, device, system and equipment
CN109948680A (en) The classification method and system of medical record data
CN108921190A (en) A kind of image classification method, device and electronic equipment
CN109684477A (en) A kind of patent text feature extracting method and system
CN107679547A (en) A kind of data processing method for being directed to two disaggregated models, device and electronic equipment
CN108960561A (en) A kind of air control model treatment method, device and equipment based on unbalanced data
CN110516915A (en) Service node training, appraisal procedure, device and electronic equipment
CN111126358A (en) Face detection method, face detection device, storage medium and equipment
CN107423269A (en) Term vector processing method and processing device
CN109359727A (en) Structure determination methodology, device, equipment and the readable medium of neural network
CN107038127A (en) Application system and its buffer control method and device
CN108681490A (en) For the vector processing method, device and equipment of RPC information
CN109919357A (en) A kind of data determination method, device, equipment and medium
CN110059569A (en) Biopsy method and device, model evaluation method and apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20201019

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20201019

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Applicant before: Alibaba Group Holding Ltd.

TA01 Transfer of patent application right
RJ01 Rejection of invention patent application after publication

Application publication date: 20180209

RJ01 Rejection of invention patent application after publication