CN109615020A - Characteristic analysis method, device, equipment and medium based on machine learning model - Google Patents
Characteristic analysis method, device, equipment and medium based on machine learning model Download PDFInfo
- Publication number
- CN109615020A CN109615020A CN201811588694.XA CN201811588694A CN109615020A CN 109615020 A CN109615020 A CN 109615020A CN 201811588694 A CN201811588694 A CN 201811588694A CN 109615020 A CN109615020 A CN 109615020A
- Authority
- CN
- China
- Prior art keywords
- sample
- training
- model
- characteristic analysis
- machine learning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of characteristic analysis method based on machine learning model, device, equipment and media, this method comprises: determining the second training sample set based on the target sample got and the first training sample set;The target sample has the default sample class determined by disaggregated model, and the disaggregated model is obtained by first training sample set training;According to default training rules and second training sample set, training obtains Characteristic Analysis Model;Forecast sample is input to the disaggregated model, obtains the sample class of the forecast sample;When detecting that the sample class of the forecast sample is identical as the default sample class, the forecast sample is input to the Characteristic Analysis Model, obtains the signature analysis result of the forecast sample.When the present invention is realized based on machine learning model progress business classification, not changing disaggregated model algorithm can be realized the signature analysis of single sample.
Description
Technical field
The present invention relates to machine learning techniques field more particularly to a kind of signature analysis sides based on machine learning model
Method, device, equipment and medium.
Background technique
When carrying out prediction classification to business sample in conjunction with machine learning model, each business sample has multiple features, and
Each feature is different the percentage contribution of the classification results of business sample, and the feature importance of sample is characterized in the sample quilt
When disaggregated model is determined as a certain classification, multiple features of the sample are to the current significance level for determining result.
Current machine learning model, such as decision tree, algorithm is relatively easy, although can know from classification results individually
The feature importance of sample, but its classifying quality is bad, therefore is rarely employed;And the better machine learning model of classifying quality,
Point that can only be exported according to model such as SVM (Support Vector Machine, support vector machines), neural network etc., user
Class result knows what classification single business sample belongs to, but can not know that model is mainly sentenced according to which feature of the sample
The fixed sample is current class, i.e., user can not know the feature importance of single sample under the judgement result, unless calculating
Method fully opens the related source code of modification, but this needs very deep algorithm knowledge.
Summary of the invention
The main purpose of the present invention is to provide a kind of characteristic analysis methods based on machine learning model, device, equipment
And medium, it is intended under the premise of not changing disaggregated model algorithm, realize the signature analysis of single sample, user is made not only may be used
To know the classification results of business sample, the feature importance of business sample under the classification results can also be known, to assist
User preferably carries out business judgement according to the feature importance of sample.
To achieve the above object, the present invention provides a kind of characteristic analysis method based on machine learning model, described to be based on
The characteristic analysis method of machine learning model the following steps are included:
Based on the target sample got and the first training sample set, the second training sample set is determined;The target sample
With the default sample class determined by disaggregated model, the disaggregated model is obtained by first training sample set training;
According to default training rules and second training sample set, training obtains Characteristic Analysis Model;
Forecast sample is input to the disaggregated model, obtains the sample class of the forecast sample;
When detecting that the sample class of the forecast sample is identical as the default sample class, by the forecast sample
It is input to the Characteristic Analysis Model, obtains the signature analysis result of the forecast sample.
Optionally, second training sample set includes multiple second training samples, described based on the target sample got
This and the first training sample set, the step of determining the second training sample set include:
Obtain target sample, the first training sample set and multiple initial training samples;
After the standard deviation for multiplying first training sample set to the initial training sample, it is added with the target sample,
The result that will add up is as the second training sample;
Based on obtained multiple second training samples, the second training sample set is determined.
Optionally, the basis presets training rules and second training sample set, and training obtains signature analysis mould
The step of type includes:
Calculate the Euclidean distance between the target sample and second training sample;
The second training sample is calculated according to default calculation formula and the corresponding Euclidean distance of second training sample
This enters to join coefficient;
Obtain the predicted value that the disaggregated model is directed to second training sample;
Using the multiple second training sample, multiple second training samples it is corresponding enter join coefficient and predicted value as
Enter ginseng and carry out ridge regression model training, obtained training result is as Characteristic Analysis Model.
Optionally, described when detecting that the sample class of the forecast sample is identical as the default sample class, it will
Before the step of forecast sample is input to the Characteristic Analysis Model, obtains the signature analysis result of the forecast sample also
Include:
Based on second training sample set, Accuracy Verification is carried out to the Characteristic Analysis Model;
Judge whether the Characteristic Analysis Model passes through Accuracy Verification, if passing through, enter step: is described when detecting
When the sample class of forecast sample is identical as the default sample class, the forecast sample is input to the signature analysis mould
Type obtains the signature analysis result of the forecast sample.
Optionally, described to be based on second training sample set, Accuracy Verification is carried out to the Characteristic Analysis Model
Step includes:
Several aspect of model for meeting preset condition are obtained from the Characteristic Analysis Model;
Multiple first predicted values are obtained according to several described aspect of model training Characteristic Analysis Model;
Multiple second training samples for including by second training sample set input the disaggregated model respectively, obtain more
A second predicted value;
According to the multiple first predicted value and the multiple second predicted value, it is accurate to carry out to the Characteristic Analysis Model
Property verifying.
In addition, the present invention also provides a kind of feature analyzing apparatus based on machine learning model, it is described to be based on machine learning
The feature analyzing apparatus of model includes:
Extraction module, for determining the second training sample set based on the target sample got and the first training sample set;
The target sample has the default sample class determined by disaggregated model, and the disaggregated model is by first training sample
Training is got;
Training module, for according to training rules and second training sample set is preset, training to obtain signature analysis
Model;
Determination module obtains the sample class of the forecast sample for forecast sample to be input to the disaggregated model;
Analysis module, for when detecting that the sample class of the forecast sample is identical as the default sample class,
The forecast sample is input to the Characteristic Analysis Model, obtains the signature analysis result of the forecast sample.
Optionally, second training sample set includes multiple second training samples, and the extraction module includes:
First acquisition unit, for obtaining target sample, the first training sample set and multiple initial training samples;
Processing unit, after the standard deviation for multiplying first training sample set to the initial training sample with the mesh
This addition of standard specimen, the result that will add up is as the second training sample;
Determination unit, for determining the second training sample set based on obtained multiple second training samples.
Optionally, the training module includes:
First computing unit, for calculating the Euclidean distance between the target sample and second training sample;
Second computing unit, based on according to default calculation formula and the corresponding Euclidean distance of second training sample
Second training sample is calculated to enter to join coefficient;
Second acquisition unit, the predicted value for being directed to second training sample for obtaining the disaggregated model;
Training unit, for by the multiple second training sample, multiple second training samples it is corresponding enter ginseng be
Several and predicted value carries out ridge regression model training as ginseng is entered, and obtained training result is as Characteristic Analysis Model.
Optionally, described device further include:
Authentication module carries out Accuracy Verification to the Characteristic Analysis Model for being based on second training sample set;
Judgment module for judging whether the Characteristic Analysis Model passes through Accuracy Verification, and works as and judges the spy
Analysis model is levied by after Accuracy Verification, sending judging result " passing through " to the analysis module;
The analysis module is also used to after receiving the judging result that the judgment module is sent and being " passing through ", works as inspection
Measure the sample class of the forecast sample it is identical as the default sample class when, the forecast sample is input to the spy
Analysis model is levied, the signature analysis result of the forecast sample is obtained.
Optionally, the authentication module includes:
Extraction unit, for obtaining several aspect of model for meeting preset condition from the Characteristic Analysis Model;
Third computing unit, for obtaining multiple the according to several aspect of model training Characteristic Analysis Model
One predicted value;
4th computing unit, multiple second training samples for including by second training sample set input institute respectively
Disaggregated model is stated, multiple second predicted values are obtained;
Authentication unit, for dividing the feature according to the multiple first predicted value and the multiple second predicted value
It analyses model and carries out Accuracy Verification.
In addition, the present invention also provides a kind of signature analysis equipment based on machine learning model, the equipment includes: storage
The feature based on machine learning model point that device, processor and being stored in can be run on the memory and on the processor
Program is analysed, realizes when the signature analysis program based on machine learning model is executed by the processor and is based on as described above
The step of characteristic analysis method of machine learning model.
In addition, being applied to computer the present invention also provides a kind of medium, being stored on the medium based on machine learning mould
The signature analysis program of type is realized as described above when the signature analysis program based on machine learning model is executed by processor
The characteristic analysis method based on machine learning model the step of.
The present invention is based on the target sample got and the first training sample sets, determine the second training sample set;The mesh
Standard specimen sheet has the default sample class determined by disaggregated model, and the disaggregated model is by first training sample set training
It obtains;According to default training rules and second training sample set, training obtains Characteristic Analysis Model;Forecast sample is defeated
Enter the sample class that the forecast sample is obtained to the disaggregated model;When detect the sample class of the forecast sample with
When the default sample class is identical, the forecast sample is input to the Characteristic Analysis Model, obtains the forecast sample
Signature analysis result;It is carried out as a result, according to the second training sample set determined based on target sample and the first training sample set
The training of Characteristic Analysis Model, it is not necessary to modify points that the important feature of single forecast sample can be realized in the source code of disaggregated model
Analysis, when solving in the prior art using the disaggregated model of good classification effect but algorithm complexity progress sample classification, user can only
Know what classification single business sample belongs to according to the classification results that model exports, but can not know that model is mainly that basis should
Which feature of sample determines the sample for current class, i.e., can not know that the feature of single sample under the judgement result is important
The problem of property, the present invention is improved in the case where avoiding the intrusion to disaggregated model algorithm it is not necessary to modify disaggregated model algorithm
The reference value of category of model result preferably can carry out business judgement and development according to classification results with auxiliary activities.
Detailed description of the invention
Fig. 1 is the structural schematic diagram for the hardware running environment that the embodiment of the present invention is related to;
Fig. 2 is that the present invention is based on the flow diagrams of the characteristic analysis method first embodiment of machine learning model;
Fig. 3 is that the present invention is based on the flow diagrams of the characteristic analysis method second embodiment of machine learning model;
Fig. 4 is that the present invention is based on the flow diagrams of the characteristic analysis method 3rd embodiment of machine learning model;
Fig. 5 is the refinement step schematic diagram of step S310 in Fig. 4.
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Specific embodiment
It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.
As shown in Figure 1, Fig. 1 is the structural schematic diagram for the hardware running environment that the embodiment of the present invention is related to.
It should be noted that Fig. 1 can be the structural schematic diagram of the hardware running environment of sample characteristics analytical equipment.This hair
Bright embodiment sample characteristics analytical equipment can be PC, the terminal devices such as portable computer.
As shown in Figure 1, the sample characteristics analytical equipment may include: processor 1001, such as CPU, network interface 1004,
User interface 1003, memory 1005, communication bus 1002.Wherein, communication bus 1002 is for realizing between these components
Connection communication.User interface 1003 may include display screen (Display), input unit such as keyboard (Keyboard), optional
User interface 1003 can also include standard wireline interface and wireless interface.Network interface 1004 optionally may include standard
Wireline interface, wireless interface (such as WI-FI interface).Memory 1005 can be high speed RAM memory, be also possible to stable
Memory (non-volatile memory), such as magnetic disk storage.Memory 1005 optionally can also be independently of aforementioned
The storage device of processor 1001.
It will be understood by those skilled in the art that the not structure paired samples of sample characteristics analytical equipment structure shown in Fig. 1
The restriction of signature analysis equipment may include perhaps combining certain components or different than illustrating more or fewer components
Component layout.
As shown in Figure 1, as may include that operating system, network are logical in a kind of memory 1005 of computer storage medium
Believe module, Subscriber Interface Module SIM and the signature analysis program based on machine learning model.Wherein, operating system is management and control
The program of sample preparation eigen analytical equipment hardware and software resource, support signature analysis program based on machine learning model and
The operation of other softwares or program.
In sample characteristics analytical equipment shown in Fig. 1, user interface 1003 is mainly used for carrying out data with each terminal
Communication;Network interface 1004 is mainly used for connecting background server, carries out data communication with background server;And processor 1001
It can be used for calling the signature analysis program based on machine learning model stored in memory 1005, and execute following operation:
Based on the target sample got and the first training sample set, the second training sample set is determined;The target sample
With the default sample class determined by disaggregated model, the disaggregated model is obtained by first training sample set training;
According to default training rules and second training sample set, training obtains Characteristic Analysis Model;
Forecast sample is input to the disaggregated model, obtains the sample class of the forecast sample;
When detecting that the sample class of the forecast sample is identical as the default sample class, by the forecast sample
It is input to the Characteristic Analysis Model, obtains the signature analysis result of the forecast sample.
Further, processor 1001 can be also used for calling stored in memory 1005 based on machine learning model
Signature analysis program, and execute following steps:
Obtain target sample, the first training sample set and multiple initial training samples;
It is added after multiplying the standard deviation of first training sample set to the initial training sample with the target sample, it will
The result of addition is as the second training sample;
Based on obtained multiple second training samples, the second training sample set is determined.
Further, processor 1001 can be also used for calling stored in memory 1005 based on machine learning model
Signature analysis program, and execute following steps:
Calculate the Euclidean distance between the target sample and second training sample;
The second training sample is calculated according to default calculation formula and the corresponding Euclidean distance of second training sample
This enters to join coefficient;
Obtain the predicted value that the disaggregated model is directed to second training sample;
Using the multiple second training sample, multiple second training samples it is corresponding enter join coefficient and predicted value as
Enter ginseng and carry out ridge regression model training, obtained training result is as Characteristic Analysis Model.
Further, processor 1001 can be also used for calling stored in memory 1005 based on machine learning model
Signature analysis program, and execute following steps:
Based on second training sample set, Accuracy Verification is carried out to the Characteristic Analysis Model;
Judge whether the Characteristic Analysis Model passes through Accuracy Verification, if passing through, enter step: is described when detecting
When the sample class of forecast sample is identical as the default sample class, the forecast sample is input to the signature analysis mould
Type obtains the signature analysis result of the forecast sample.
Further, processor 1001 can be also used for calling stored in memory 1005 based on machine learning model
Signature analysis program, and execute following steps:
Several aspect of model for meeting preset condition are obtained from the Characteristic Analysis Model;
Multiple first predicted values are obtained according to several described aspect of model training Characteristic Analysis Model;
Multiple second training samples for including by second training sample set input the disaggregated model respectively, obtain more
A second predicted value;
According to the multiple first predicted value and the multiple second predicted value, it is accurate to carry out to the Characteristic Analysis Model
Property verifying.
Based on above-mentioned structure, each embodiment of the characteristic analysis method based on machine learning model is proposed.
It is that the present invention is based on the signals of the process of the characteristic analysis method first embodiment of machine learning model referring to Fig. 2, Fig. 2
Figure.
The embodiment of the invention provides the embodiments of the characteristic analysis method based on machine learning model, need to illustrate
It is, it, in some cases, can be to be different from sequence execution institute herein although logical order is shown in flow charts
The step of showing or describing.
The embodiment of the present invention is applied to signature analysis equipment based on the characteristic analysis method of machine learning model, and the present invention is real
Applying signature analysis equipment can be PC, and the terminal devices such as portable computer are not particularly limited herein.
Characteristic analysis method of the present embodiment based on machine learning model include:
Step S100 determines the second training sample set based on the target sample got and the first training sample set;Its
In, the target sample has the default sample class determined by disaggregated model, and the disaggregated model is trained by described first
Sample set training obtains.
With big data and the fast development of machine learning, answering for business classification and prediction is carried out using machine learning model
Also increasingly wider with range, when carrying out prediction classification to business sample in conjunction with machine learning model, each business sample has more
A feature, and each feature is different the percentage contribution of the classification results of business sample, the feature importance characterization of sample
When: the sample is classified model and is determined as a certain classification, multiple features of the sample to the current significance level for determining result,
Determine that result is more important, the feature of sample is more important.
Currently, common disaggregated model has decision tree, logistic regression, SVM, random forest, neural network etc., wherein very
More complicated machine learning classification models are all black boxs, i.e. input multiple characteristic values for being sample, export the classification knot for sample
Fruit, it is unknowable as why from multiple characteristic values can deriving classification results;It, can be using complexity in the application
Sorting algorithm, and in the case where not changing the source code of disaggregated model, realize the interpretation analysis of single sample,
In, what interpretation is described as can derive classification results and which feature to classification results from so multiple characteristic values
Influence it is maximum.
The interpretation of machine learning algorithm is divided to two classes: the relatively simple algorithm such as decision tree, logistic regression at present, can
To there is the feature importance of single sample directly to use;And such as SVM, neural network etc. relative complex algorithm, single sample
Feature importance be unable to get, financial system is when using machine learning model, if it is the relatively simple calculation such as decision tree
Method, although there is the feature importance of single sample, algorithm classification effect is bad, therefore is rarely employed;It is returned if it is logic
Return scheduling algorithm, there is the feature importance of model, that is, input multiple and different samples, model can provide this multiple sample respectively
For the probability of a certain classification, but the output without being directed to single sample feature importance;If it is algorithm complexity, classification effect
When the better model of fruit, algorithm is fully opened the related source code of modification by the explanatory needs of single sample, and this needs is very deep
Algorithm knowledge.
Financial system values the interpretation of model when using machine learning very much, for example, in anti-money laundering field,
All there are strict requirements to go to illustrate why client has money laundering suspicion from levels such as client, account, transaction for the supervision of all money laundering cases
It doubts, and machine learning model, such as neural network model can only determine whether client has in anti-money laundering field practical application
Money laundering suspicion, can not but illustrate why client has money laundering suspicion, and such judgement result is better to business personnel
Analysis and judge client whether genuine money laundering can't play the role of it is too big.
In the present embodiment, target sample and the first training sample are first obtained, wherein target sample is containing special characteristic
The sample of (if be applied to anti money washing and identify that special characteristic is preferably money laundering class another characteristic).Based on the target sample got
This and the first training sample set, determine the second training sample set, the target sample have determine by disaggregated model it is default
Sample class, the disaggregated model are obtained by first training sample set training;Specifically, the disaggregated model includes but not
It is limited to decision tree, logistic regression, SVM, neural network etc., target sample is confirmed as the classification after disaggregated model is classified
The corresponding a certain sample class of model;For example, target sample includes the characteristic information of a certain client, such as behavioural characteristic, building
Disaggregated model be for client whether there is money laundering suspicion to classify to do, the target sample is after disaggregated model is classified, quilt
It is determined as with money laundering suspicion, it is to be understood that the disaggregated model corresponding to building is that whether have money laundering suspicion for client
It doubts to do and classify, the first training sample concentrates the characteristic information including multiple client's samples, has to have in these client's samples and wash
The bad sample of money suspicion also has the good sample without money laundering suspicion;The target sample and the first training sample set are with client
For dimension, the feature of multiple description customer actions can have, for example be transferred to the amount of money on the day of client, produce the amount of money, transaction generation
In such feature such as the number of high-risk areas.
In the present embodiment, it is equivalent to client's sample (the i.e. above-mentioned target for choosing that several include client characteristics information
Sample), then these client's sample random distributions process client's sample of this several random distribution, it is several to change this
The spatial distribution of a client characteristics sample finally obtains the client's sample for being distributed in around target sample and having money laundering suspicion
Feature set is as the second training sample set, by having money laundering suspicion to multiple client's sample analyses with money laundering suspicion
Client feature importance.
Step S200, according to default training rules and second training sample set, training obtains Characteristic Analysis Model.
The multiple client characteristics for including are concentrated to carry out importance point the second training sample being distributed in around target sample
Analysis specifically chooses regression algorithm model, the data got will be concentrated to substitute into regression algorithm model from the second training sample
It calculates, to obtain the linear convergent rate about multiple client characteristics as a result, carrying out according to the coefficient of feature each in result expression
Feature importance ranking obtains each important feature in this feature analysis model to get the visitor arrived under same sample class
The importance ranking of the multiple features in family, further, the regression algorithm model can be ridge regression model, be also possible to return
Model.
Forecast sample is input to the disaggregated model, obtains the sample class of the forecast sample by step S300.
Forecast sample is input to the disaggregated model based on the building of the first training sample set to classify to forecast sample, is obtained pre-
The classification results (i.e. sample class) of test sample sheet.
Step S400, when detecting that the sample class of the forecast sample is identical as the default sample class, by institute
It states forecast sample and is input to the Characteristic Analysis Model, obtain the signature analysis result of the forecast sample.
Judge whether the classification results of forecast sample are consistent with the classification of target sample, is washed as whether forecast sample also has
Money suspicion, if forecast sample by disaggregated model is judged to that current predictive sample is input to the needle of building with money laundering suspicion
To the Characteristic Analysis Model with money laundering suspicion user, the feature importance of the forecast sample is obtained.
Further, when the disaggregated model determines the target sample and the forecast sample all has money laundering suspicion
When, the predicted value for the target sample and the predicted value for the forecast sample of the disaggregated model output may phases
Together, it is also possible to it is different.In the present embodiment, when two predicted value differences, the forecast sample is input to the feature
Analysis model, it is as an implementation, optional first to judge described two before obtaining the signature analysis result of the forecast sample
Whether the difference between a predicted value is less than preset threshold, if being less than, the target sample and forecast sample of description selection all have
Under the premise of having money laundering suspicion, the differences between samples of the two within a preset range, then by the forecast sample are input to the feature
Analysis model obtains the signature analysis of the forecast sample as a result, thus, it is possible to lift pins are to the signature analysis of forecast sample
Accuracy;While business personnel knows that the business sample has money laundering suspicion according to machine learning model as a result, also it would know that
The importance of each feature in the business sample, business personnel can be in conjunction with the importance of each feature of single sample, to the industry
The business whether genuine money laundering of sample is judged, mitigates business personnel's workload simultaneously, the present invention is applied to anti-money laundering field can also
To meet regulatory requirements.
The present invention is based on the target sample got and the first training sample sets, determine the second training sample set;The mesh
Standard specimen sheet has the default sample class determined by disaggregated model, and the disaggregated model is by first training sample set training
It obtains;According to default training rules and second training sample set, training obtains Characteristic Analysis Model;Forecast sample is defeated
Enter the sample class that the forecast sample is obtained to the disaggregated model;When detect the sample class of the forecast sample with
When the default sample class is identical, the forecast sample is input to the Characteristic Analysis Model, obtains the forecast sample
Signature analysis result;It is carried out as a result, according to the second training sample set determined based on target sample and the first training sample set
The training of Characteristic Analysis Model, it is not necessary to modify points that the important feature of single forecast sample can be realized in the source code of disaggregated model
Analysis solves in the prior art, and when carrying out sample classification using the disaggregated model of good classification effect but algorithm complexity, user can only
Know what classification single business sample belongs to according to the classification results that model exports, but can not know that model is mainly that basis should
Which feature of sample determines the sample for current class, i.e., can not know that the feature of single sample under the judgement result is important
The problem of property, the present invention is improved in the case where avoiding the intrusion to disaggregated model algorithm it is not necessary to modify disaggregated model algorithm
The reference value of category of model result, auxiliary activities preferably carry out business judgement and development according to classification results.
Further, propose that the present invention is based on the characteristic analysis method second embodiments of machine learning model.
It is that the present invention is based on the signals of the process of the characteristic analysis method second embodiment of machine learning model referring to Fig. 3, Fig. 3
Figure, based on the above-mentioned characteristic analysis method first embodiment based on machine learning model, in the present embodiment, step S100 is based on
The target sample and the first training sample set got, the step of determining the second training sample set include:
Step S101 obtains target sample, the first training sample set and multiple initial training samples;Wherein, the target
Sample has the default sample class determined by disaggregated model, and the disaggregated model is by first training sample set trained
It arrives.
In the present embodiment, use it is random generate and the mean value after standardization is 0, standard deviation for 1 it is multiple
Different initial training samples.
Step S102, after the standard deviation that first training sample set is multiplied to the initial training sample, with the target
Sample is added, and the result that will add up is as the second training sample;
Multiply the standard deviation of first training sample set to each initial training sample respectively, then again by multiplied result with
The target sample is added, and the result that will add up is as the second training sample.It should be understood that initial training sample and first are instructed
Practice the multiplication of the standard deviation of sample set and then be added with target sample, realizes and produced around target sample
Second training sample set.By this mode of operation, the second training sample set for realizing generation is more in line with target sample
Actual conditions improve the accuracy of subsequent characteristics analysis.
Step S103 determines the second training sample set based on obtained multiple second training samples.
The second multiple and different training samples is generated around target sample, thereby determines that the second training sample set.
As an implementation, for target sample using client as dimension, sample includes multiple features to describe customer action,
For example be transferred to the amount of money on the day of client, produce the amount of money, transaction generation in number of high-risk areas etc., target sample passes through disaggregated model
It is to be distributed in that the client is judged as after classification with multiple second training samples that after money laundering suspicion, the second training sample is concentrated
Around target sample and client's sample of money laundering suspicion is all had, includes the client characteristics such as the behavioural characteristic of corresponding client letter
Breath.
Further, step S200, according to default training rules and second training sample set, training obtains feature
The step of analysis model includes:
Step S201 calculates the Euclidean distance between the target sample and second training sample;
Each of the second training sample concentration the second training sample and mesh are calculated separately according to the calculation formula of Euclidean distance
Euclidean distance between standard specimen sheet, and between i-th of second training samples and target sample that the second training sample is concentrated
Euclidean distance is expressed as Di。
Step S202 calculates described the according to default calculation formula and the corresponding Euclidean distance of second training sample
Two training samples enter to join coefficient;
In the present embodiment, institute is calculated according to default calculation formula and the corresponding Euclidean distance of second training sample
It states the second training sample to enter to join coefficient, be equivalent to corresponding according to the standard deviation and second training sample preset in calculation formula
Euclidean distance, calculate the second training sample enters to join coefficient, i.e., based on default calculation formula:It is calculated i-th
A second training sample enters to join coefficient (i.e. weight) Wi, wherein DiBetween i-th of second training samples and target sample
Euclidean distance, σ are the standard deviation of first training sample set, and e is irrational number, and numerical value is approximately equal to 2.718, it is possible to understand that
It is that i is the positive integer greater than 1, the value range of i concentrates the number for the second training sample for including in 1 to the second training sample
Between.
Step S203 obtains the predicted value that the disaggregated model is directed to second training sample;
Each of the second training sample is concentrated by the disaggregated model obtained according to first training sample set training
Second training sample is predicted, the predicted value about each second training sample is obtained, wherein disaggregated model includes but unlimited
In: decision tree, logistic regression, SVM, neural network etc..
Step S204, by the multiple second training sample, multiple second training samples it is corresponding enter join coefficient and
Predicted value carries out ridge regression model training as ginseng is entered, and obtained training result is as Characteristic Analysis Model.
Using multiple second training samples, multiple second training samples are corresponding enters to join coefficient and predicted value as variable
Substitute into calculation formulaCarry out ridge regression model training, wherein n is the second training sample
The number for the second training sample that concentration includes, the value range of i is 1 between n, and α is random number, according to numerical value according to reality
Situation setting, βiI-th of second training samples being calculated for above-mentioned steps enter to join coefficient Wi, y is described point got
Class model is directed to the predicted value of second training sample, xiFor i-th of second training samples.
It includes multiple multiple features with money laundering suspicion client that second training sample, which is concentrated, this multiple feature is denoted as spy
Sign 1, feature 2, feature 3 ..., feature 10, feature 11 etc., the output result obtained after ridge regression model calculating
For coef1* feature 1+coef2* feature 2+coef3* feature 3+...+coef10* feature 10+...+b, wherein coef value is every
The coefficient of a feature, b are deviation value, according to coef value to feature 1, feature 2, feature 3 ..., feature 10, feature 11 it is equal into
Row sequence, obtains the importance ranking of the important feature of model, i.e. feature 1, feature 2, feature 3 ..., feature 10, feature
The different importance of multiple features such as 11.
Forecast sample is input to the disaggregated model, obtains the sample class of the forecast sample, it is described when detecting
When the sample class of forecast sample is identical as the default sample class, that is, when all having money laundering suspicion, by the forecast sample
It is input to the Characteristic Analysis Model, obtains the signature analysis of the forecast sample as a result, obtaining multiple spies in forecast sample
The importance of sign.
It is understood that when being applied to anti-money laundering field, the feature of the training sample of disaggregated model and forecast sample is all
It is to be extracted according to rule, as to reach wholesale standard, transaction spot more in area of being involved in drug traffic, client occupation for the client trading amount of money
For the features such as unemployed, method may be used also while providing single forecast sample to business has money laundering suspicion through the embodiment of the present invention
The feature importance for providing single sample, the present invention is based on the characteristic analysis methods of machine learning model, are applicable in a variety of different
Prediction model, business personnel is according to the prediction result of prediction model and the feature importance of single sample, whether to judge client
It is genuine suspicious, mitigate business personnel's workload, increases the reliability of business judgement;Financial system is done using machine learning model
When business is classified, the interpretation of model is valued very much, such as in anti-money laundering field, all money laundering case supervision there are stringent rule
It is fixed, it to go to illustrate why client has money laundering suspicion from levels such as client, account, transaction, may be implemented by the method for the invention
The feature importance analysis of single sample, meets regulatory requirements.
Further, propose that the present invention is based on the characteristic analysis method 3rd embodiments of machine learning model.
It is that the present invention is based on the signals of the process of the characteristic analysis method 3rd embodiment of machine learning model referring to Fig. 4, Fig. 4
Figure, based on the above-mentioned characteristic analysis method second embodiment based on machine learning model, in the present embodiment, step S400 works as inspection
Measure the sample class of the forecast sample it is identical as the default sample class when, the forecast sample is input to the spy
Before the step of levying analysis model, obtaining the signature analysis result of the forecast sample further include:
Step S310 is based on second training sample set, carries out Accuracy Verification to the Characteristic Analysis Model;
Step S320, judges whether the Characteristic Analysis Model passes through Accuracy Verification, if passing through, enters step
S400: when detecting that the sample class of the forecast sample is identical as the default sample class, the forecast sample is defeated
Enter to obtain the signature analysis result of the forecast sample to the Characteristic Analysis Model.
It as an implementation, is the refinement step schematic diagram of step S310 in the present embodiment referring to Fig. 5, Fig. 5, specifically
Ground, step S310, be based on second training sample set, to the Characteristic Analysis Model carry out Accuracy Verification may include as
Lower refinement step:
Step S311 obtains several aspect of model for meeting preset condition from the Characteristic Analysis Model;
Step S312 obtains multiple first predictions according to several described aspect of model training Characteristic Analysis Model
Value;
Step S313, multiple second training samples for including by second training sample set input the classification mould respectively
Type obtains multiple second predicted values;
Step S314, according to the multiple first predicted value and the multiple second predicted value, to the signature analysis mould
Type carries out Accuracy Verification.
After the feature for obtaining Characteristic Analysis Model according to the second training sample set, extracted by the priority of feature importance excellent
Several high features of first grade obtain multiple first predicted value ypred according to this several feature training characteristics analysis model;
Multiple second training samples for including by the second training sample set input the classification mould based on the training of the first training sample set respectively
In type, multiple second predicted value ytrues corresponding with multiple second training samples are obtained, it is multiple second pre- that this is calculated
The average value ytrue.mean () of measured value, according to ypred, multiple ytrue and ytrue.mean () digital simulation value R:R=
1-u/v, wherein
K is that the second training sample set includes
The number of second training sample then judges that the Characteristic Analysis Model is tested by accuracy when R value is higher than the threshold value of setting
Card, it is to be understood that threshold value can by user's sets itself, the result accuracy of the more high then Characteristic Analysis Model of threshold value more
It is high;After Characteristic Analysis Model passes through Accuracy Verification, sample characteristics can be carried out to the forecast sample with default sample class
Analysis, obtains the signature analysis of the forecast sample as a result, thus, it is possible to improving the accuracy of single sample signature analysis.?
To after important feature, the analysis of user's dubiety and judgement are carried out so that user is based on important feature, to mitigate manually in backwash
The customer analysis workload in money field, and improve precision of analysis.
In addition, the embodiment of the present invention also proposes a kind of feature analyzing apparatus based on machine learning model, it is described to be based on machine
The feature analyzing apparatus of device learning model includes:
Extraction module, for determining the second training sample set based on the target sample got and the first training sample set;
The target sample has the default sample class determined by disaggregated model, and the disaggregated model is by first training sample
Training is got;
Training module, for according to training rules and second training sample set is preset, training to obtain signature analysis
Model;
Determination module obtains the sample class of the forecast sample for forecast sample to be input to the disaggregated model;
Analysis module, for when detecting that the sample class of the forecast sample is identical as the default sample class,
The forecast sample is input to the Characteristic Analysis Model, obtains the signature analysis result of the forecast sample.
Preferably, second training sample set includes multiple second training samples, and the extraction module includes:
First acquisition unit, for obtaining target sample, the first training sample set and multiple initial training samples;
Processing unit, after the standard deviation for multiplying first training sample set to the initial training sample with the mesh
This addition of standard specimen, the result that will add up is as the second training sample;
Determination unit, for determining the second training sample set based on obtained multiple second training samples.
Preferably, the training module includes:
First computing unit, for calculating the Euclidean distance between the target sample and second training sample;
Second computing unit, based on according to default calculation formula and the corresponding Euclidean distance of second training sample
Second training sample is calculated to enter to join coefficient;
Second acquisition unit, the predicted value for being directed to second training sample for obtaining the disaggregated model;
Training unit, for by the multiple second training sample, multiple second training samples it is corresponding enter ginseng be
Several and predicted value carries out ridge regression model training as ginseng is entered, and obtained training result is as Characteristic Analysis Model.
Preferably, described device further include:
Authentication module carries out Accuracy Verification to the Characteristic Analysis Model for being based on second training sample set;
Judgment module for judging whether the Characteristic Analysis Model passes through Accuracy Verification, and works as and judges the spy
Analysis model is levied by after Accuracy Verification, sending judging result " passing through " to the analysis module;
The analysis module is also used to after receiving the judging result that the judgment module is sent and being " passing through ", works as inspection
Measure the sample class of the forecast sample it is identical as the default sample class when, the forecast sample is input to the spy
Analysis model is levied, the signature analysis result of the forecast sample is obtained.
Preferably, the authentication module includes:
Extraction unit, for obtaining several aspect of model for meeting preset condition from the Characteristic Analysis Model;
Third computing unit, for obtaining multiple the according to several aspect of model training Characteristic Analysis Model
One predicted value;
4th computing unit, multiple second training samples for including by second training sample set input institute respectively
Disaggregated model is stated, multiple second predicted values are obtained;
Authentication unit, for dividing the feature according to the multiple first predicted value and the multiple second predicted value
It analyses model and carries out Accuracy Verification.
Institute as above is realized when the feature analyzing apparatus modules operation based on machine learning model that the present embodiment proposes
The step of characteristic analysis method based on machine learning model stated, details are not described herein.
In addition, the embodiment of the present invention also proposes a kind of medium, it is applied to computer, i.e., the described medium is computer-readable deposits
Storage media, be stored with the signature analysis program based on machine learning model on the medium, described based on machine learning model
The step of characteristic analysis method based on machine learning model as described above is realized when signature analysis program is executed by processor.
Wherein, the signature analysis program based on machine learning model run on the processor, which is performed, to be realized
Method can refer to the present invention is based on each embodiment of the characteristic analysis method of machine learning model, details are not described herein again.
It should be noted that, in this document, the terms "include", "comprise" or its any other variant are intended to non-row
His property includes, so that the process, method, article or the device that include a series of elements not only include those elements, and
And further include other elements that are not explicitly listed, or further include for this process, method, article or device institute it is intrinsic
Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including being somebody's turn to do
There is also other identical elements in the process, method of element, article or device.
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side
Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases
The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art
The part contributed out can be embodied in the form of software products, which is stored in a storage medium
In (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that a terminal device (can be mobile phone, computer, clothes
Business device, air conditioner or the network equipment etc.) execute method described in each embodiment of the present invention.
The above is only a preferred embodiment of the present invention, is not intended to limit the scope of the invention, all to utilize this hair
Equivalent structure or equivalent flow shift made by bright specification and accompanying drawing content is applied directly or indirectly in other relevant skills
Art field, is included within the scope of the present invention.
Claims (12)
1. a kind of characteristic analysis method based on machine learning model, which is characterized in that the spy based on machine learning model
Levy analysis method the following steps are included:
Based on the target sample got and the first training sample set, the second training sample set is determined;The target sample has
By the default sample class that disaggregated model determines, the disaggregated model is obtained by first training sample set training;
According to default training rules and second training sample set, training obtains Characteristic Analysis Model;
Forecast sample is input to the disaggregated model, obtains the sample class of the forecast sample;
When detecting that the sample class of the forecast sample is identical as the default sample class, the forecast sample is inputted
To the Characteristic Analysis Model, the signature analysis result of the forecast sample is obtained.
2. as described in claim 1 based on the characteristic analysis method of machine learning model, which is characterized in that second training
Sample set includes multiple second training samples, described based on the target sample got and the first training sample set, determines second
The step of training sample set includes:
Obtain target sample, the first training sample set and multiple initial training samples;
It after the standard deviation for multiplying first training sample set to the initial training sample, is added with the target sample, by phase
The result added is as the second training sample;
Based on obtained multiple second training samples, the second training sample set is determined.
3. as claimed in claim 2 based on the characteristic analysis method of machine learning model, which is characterized in that the basis is default
Training rules and second training sample set, training the step of obtaining Characteristic Analysis Model include:
Calculate the Euclidean distance between the target sample and second training sample;
Second training sample is calculated according to default calculation formula and the corresponding Euclidean distance of second training sample
Enter to join coefficient;
Obtain the predicted value that the disaggregated model is directed to second training sample;
Enter to join coefficient and predicted value as entering ginseng using the multiple second training sample, multiple second training samples are corresponding
Ridge regression model training is carried out, obtained training result is as Characteristic Analysis Model.
4. the characteristic analysis method as claimed in any one of claims 1-3 based on machine learning model, which is characterized in that institute
It states when detecting that the sample class of the forecast sample is identical as the default sample class, the forecast sample is input to
The Characteristic Analysis Model, before the step of obtaining the signature analysis result of the forecast sample further include:
Based on second training sample set, Accuracy Verification is carried out to the Characteristic Analysis Model;
Judge whether the Characteristic Analysis Model passes through Accuracy Verification, if passing through, enter step: when detecting the prediction
When the sample class of sample is identical as the default sample class, the forecast sample is input to the Characteristic Analysis Model,
Obtain the signature analysis result of the forecast sample.
5. as claimed in claim 4 based on the characteristic analysis method of machine learning model, which is characterized in that described based on described
Second training sample set, to the Characteristic Analysis Model carry out Accuracy Verification the step of include:
Several aspect of model for meeting preset condition are obtained from the Characteristic Analysis Model;
Multiple first predicted values are obtained according to several described aspect of model training Characteristic Analysis Model;
Multiple second training samples for including by second training sample set input the disaggregated model respectively, obtain multiple
Two predicted values;
According to the multiple first predicted value and the multiple second predicted value, accuracy is carried out to the Characteristic Analysis Model and is tested
Card.
6. a kind of feature analyzing apparatus based on machine learning model, which is characterized in that the spy based on machine learning model
Levying analytical equipment includes:
Extraction module, for determining the second training sample set based on the target sample got and the first training sample set;It is described
Target sample has the default sample class determined by disaggregated model, and the disaggregated model is assembled for training by first training sample
It gets;
Training module, for according to training rules and second training sample set is preset, training to obtain Characteristic Analysis Model;
Determination module obtains the sample class of the forecast sample for forecast sample to be input to the disaggregated model;
Analysis module, for when detecting that the sample class of the forecast sample is identical as the default sample class, by institute
It states forecast sample and is input to the Characteristic Analysis Model, obtain the signature analysis result of the forecast sample.
7. as claimed in claim 6 based on the feature analyzing apparatus of machine learning model, which is characterized in that second training
Sample set includes multiple second training samples, and the extraction module includes:
First acquisition unit, for obtaining target sample, the first training sample set and multiple initial training samples;
Processing unit, after the standard deviation for multiplying first training sample set to the initial training sample with the target sample
This addition, the result that will add up is as the second training sample;
Determination unit, for determining the second training sample set based on obtained multiple second training samples.
8. as claimed in claim 7 based on the feature analyzing apparatus of machine learning model, which is characterized in that the training module
Include:
First computing unit, for calculating the Euclidean distance between the target sample and second training sample;
Second computing unit, for calculating institute according to default calculation formula and the corresponding Euclidean distance of second training sample
The second training sample is stated to enter to join coefficient;
Second acquisition unit, the predicted value for being directed to second training sample for obtaining the disaggregated model;
Training unit, for by the multiple second training sample, multiple second training samples it is corresponding enter join coefficient and
Predicted value carries out ridge regression model training as ginseng is entered, and obtained training result is as Characteristic Analysis Model.
9. the feature analyzing apparatus based on machine learning model as described in any one of claim 6-8, which is characterized in that institute
State device further include:
Authentication module carries out Accuracy Verification to the Characteristic Analysis Model for being based on second training sample set;
Judgment module for judging whether the Characteristic Analysis Model passes through Accuracy Verification, and works as and judges the feature point
Model is analysed by after Accuracy Verification, sending judging result " passing through " to the analysis module;
The analysis module is also used to after receiving the judging result that the judgment module is sent and being " passing through ", when detecting
When the sample class of the forecast sample is identical as the default sample class, the forecast sample is input to the feature point
Model is analysed, the signature analysis result of the forecast sample is obtained.
10. as claimed in claim 9 based on the feature analyzing apparatus of machine learning model, which is characterized in that the verifying mould
Block includes:
Extraction unit, for obtaining several aspect of model for meeting preset condition from the Characteristic Analysis Model;
Third computing unit, it is pre- for obtaining multiple first according to several aspect of model training Characteristic Analysis Model
Measured value;
4th computing unit, multiple second training samples for including by second training sample set input described point respectively
Class model obtains multiple second predicted values;
Authentication unit is used for according to the multiple first predicted value and the multiple second predicted value, to the signature analysis mould
Type carries out Accuracy Verification.
11. a kind of signature analysis equipment based on machine learning model, which is characterized in that the equipment includes: memory, processing
Device and the signature analysis program based on machine learning model that is stored on the memory and can run on the processor,
It is realized when the signature analysis program based on machine learning model is executed by the processor as any in claim 1 to 5
The step of characteristic analysis method based on machine learning model described in item.
12. a kind of medium, which is characterized in that be applied to computer, be stored with the spy based on machine learning model on the medium
Sign analysis program, realizes such as claim 1 to 5 when the signature analysis program based on machine learning model is executed by processor
Any one of described in the characteristic analysis method based on machine learning model the step of.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811588694.XA CN109615020A (en) | 2018-12-25 | 2018-12-25 | Characteristic analysis method, device, equipment and medium based on machine learning model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811588694.XA CN109615020A (en) | 2018-12-25 | 2018-12-25 | Characteristic analysis method, device, equipment and medium based on machine learning model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109615020A true CN109615020A (en) | 2019-04-12 |
Family
ID=66012010
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811588694.XA Pending CN109615020A (en) | 2018-12-25 | 2018-12-25 | Characteristic analysis method, device, equipment and medium based on machine learning model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109615020A (en) |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110162995A (en) * | 2019-04-22 | 2019-08-23 | 阿里巴巴集团控股有限公司 | Assess the method and device thereof of contribution data degree |
CN110161322A (en) * | 2019-06-10 | 2019-08-23 | 深圳市安特保电子商务集团有限公司 | A kind of detection device and processing method |
CN110264274A (en) * | 2019-06-21 | 2019-09-20 | 深圳前海微众银行股份有限公司 | Objective group's division methods, model generating method, device, equipment and storage medium |
CN110363302A (en) * | 2019-06-13 | 2019-10-22 | 阿里巴巴集团控股有限公司 | Training method, prediction technique and the device of disaggregated model |
CN110363534A (en) * | 2019-06-28 | 2019-10-22 | 阿里巴巴集团控股有限公司 | The method and device traded extremely for identification |
CN111126622A (en) * | 2019-12-19 | 2020-05-08 | ***股份有限公司 | Data anomaly detection method and device |
CN111126448A (en) * | 2019-11-29 | 2020-05-08 | 无线生活(北京)信息技术有限公司 | Method and device for intelligently identifying fraud users |
CN111539315A (en) * | 2020-04-21 | 2020-08-14 | 招商局金融科技有限公司 | Model training method and device based on black box model, electronic equipment and medium |
CN111738522A (en) * | 2020-06-28 | 2020-10-02 | 电子科技大学中山学院 | Photovoltaic power generation power prediction method, storage medium and terminal equipment |
CN111797995A (en) * | 2020-06-29 | 2020-10-20 | 第四范式(北京)技术有限公司 | Method and device for generating interpretation report of model prediction sample |
CN112149833A (en) * | 2019-06-28 | 2020-12-29 | 北京百度网讯科技有限公司 | Prediction method, device, equipment and storage medium based on machine learning |
WO2021012220A1 (en) * | 2019-07-24 | 2021-01-28 | 东莞理工学院 | Evasion attack method and device for integrated tree classifier |
CN112417852A (en) * | 2020-12-07 | 2021-02-26 | 中山大学 | Method and device for judging importance of code segment |
CN112711643A (en) * | 2019-10-25 | 2021-04-27 | 北京达佳互联信息技术有限公司 | Training sample set obtaining method and device, electronic equipment and storage medium |
CN112749235A (en) * | 2019-10-31 | 2021-05-04 | 北京金山云网络技术有限公司 | Method and device for analyzing classification result and electronic equipment |
CN113408558A (en) * | 2020-03-17 | 2021-09-17 | 百度在线网络技术(北京)有限公司 | Method, apparatus, device and medium for model verification |
CN113466474A (en) * | 2020-03-31 | 2021-10-01 | 深圳市帝迈生物技术有限公司 | Sample analyzer and sample analysis method and system |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU2013100982A4 (en) * | 2013-07-19 | 2013-08-15 | Huaiyin Institute Of Technology, China | Feature Selection Method in a Learning Machine |
CN107832830A (en) * | 2017-11-17 | 2018-03-23 | 湖北工业大学 | Intruding detection system feature selection approach based on modified grey wolf optimized algorithm |
CN108629687A (en) * | 2018-02-13 | 2018-10-09 | 阿里巴巴集团控股有限公司 | A kind of anti money washing method, apparatus and equipment |
CN108665293A (en) * | 2017-03-29 | 2018-10-16 | 华为技术有限公司 | Feature importance acquisition methods and device |
-
2018
- 2018-12-25 CN CN201811588694.XA patent/CN109615020A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU2013100982A4 (en) * | 2013-07-19 | 2013-08-15 | Huaiyin Institute Of Technology, China | Feature Selection Method in a Learning Machine |
CN108665293A (en) * | 2017-03-29 | 2018-10-16 | 华为技术有限公司 | Feature importance acquisition methods and device |
CN107832830A (en) * | 2017-11-17 | 2018-03-23 | 湖北工业大学 | Intruding detection system feature selection approach based on modified grey wolf optimized algorithm |
CN108629687A (en) * | 2018-02-13 | 2018-10-09 | 阿里巴巴集团控股有限公司 | A kind of anti money washing method, apparatus and equipment |
Non-Patent Citations (1)
Title |
---|
董亚楠: "一种基于用户行为特征选择的点击欺诈检测方法", 《计算机科学》 * |
Cited By (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110162995A (en) * | 2019-04-22 | 2019-08-23 | 阿里巴巴集团控股有限公司 | Assess the method and device thereof of contribution data degree |
CN110162995B (en) * | 2019-04-22 | 2023-01-10 | 创新先进技术有限公司 | Method and device for evaluating data contribution degree |
CN110161322A (en) * | 2019-06-10 | 2019-08-23 | 深圳市安特保电子商务集团有限公司 | A kind of detection device and processing method |
CN110363302A (en) * | 2019-06-13 | 2019-10-22 | 阿里巴巴集团控股有限公司 | Training method, prediction technique and the device of disaggregated model |
CN110363302B (en) * | 2019-06-13 | 2023-09-12 | 创新先进技术有限公司 | Classification model training method, prediction method and device |
CN110264274B (en) * | 2019-06-21 | 2023-12-29 | 深圳前海微众银行股份有限公司 | Guest group dividing method, model generating method, device, equipment and storage medium |
CN110264274A (en) * | 2019-06-21 | 2019-09-20 | 深圳前海微众银行股份有限公司 | Objective group's division methods, model generating method, device, equipment and storage medium |
CN110363534A (en) * | 2019-06-28 | 2019-10-22 | 阿里巴巴集团控股有限公司 | The method and device traded extremely for identification |
CN112149833B (en) * | 2019-06-28 | 2023-12-12 | 北京百度网讯科技有限公司 | Prediction method, device, equipment and storage medium based on machine learning |
CN112149833A (en) * | 2019-06-28 | 2020-12-29 | 北京百度网讯科技有限公司 | Prediction method, device, equipment and storage medium based on machine learning |
WO2021012220A1 (en) * | 2019-07-24 | 2021-01-28 | 东莞理工学院 | Evasion attack method and device for integrated tree classifier |
CN112711643B (en) * | 2019-10-25 | 2023-10-10 | 北京达佳互联信息技术有限公司 | Training sample set acquisition method and device, electronic equipment and storage medium |
CN112711643A (en) * | 2019-10-25 | 2021-04-27 | 北京达佳互联信息技术有限公司 | Training sample set obtaining method and device, electronic equipment and storage medium |
CN112749235A (en) * | 2019-10-31 | 2021-05-04 | 北京金山云网络技术有限公司 | Method and device for analyzing classification result and electronic equipment |
CN111126448A (en) * | 2019-11-29 | 2020-05-08 | 无线生活(北京)信息技术有限公司 | Method and device for intelligently identifying fraud users |
CN111126622B (en) * | 2019-12-19 | 2023-11-03 | ***股份有限公司 | Data anomaly detection method and device |
CN111126622A (en) * | 2019-12-19 | 2020-05-08 | ***股份有限公司 | Data anomaly detection method and device |
CN113408558A (en) * | 2020-03-17 | 2021-09-17 | 百度在线网络技术(北京)有限公司 | Method, apparatus, device and medium for model verification |
CN113408558B (en) * | 2020-03-17 | 2024-03-08 | 百度在线网络技术(北京)有限公司 | Method, apparatus, device and medium for model verification |
CN113466474A (en) * | 2020-03-31 | 2021-10-01 | 深圳市帝迈生物技术有限公司 | Sample analyzer and sample analysis method and system |
CN111539315B (en) * | 2020-04-21 | 2023-04-18 | 招商局金融科技有限公司 | Model training method and device based on black box model, electronic equipment and medium |
CN111539315A (en) * | 2020-04-21 | 2020-08-14 | 招商局金融科技有限公司 | Model training method and device based on black box model, electronic equipment and medium |
CN111738522A (en) * | 2020-06-28 | 2020-10-02 | 电子科技大学中山学院 | Photovoltaic power generation power prediction method, storage medium and terminal equipment |
CN111797995A (en) * | 2020-06-29 | 2020-10-20 | 第四范式(北京)技术有限公司 | Method and device for generating interpretation report of model prediction sample |
CN111797995B (en) * | 2020-06-29 | 2024-01-26 | 第四范式(北京)技术有限公司 | Method and device for generating interpretation report of model prediction sample |
CN112417852B (en) * | 2020-12-07 | 2022-01-25 | 中山大学 | Method and device for judging importance of code segment |
CN112417852A (en) * | 2020-12-07 | 2021-02-26 | 中山大学 | Method and device for judging importance of code segment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109615020A (en) | Characteristic analysis method, device, equipment and medium based on machine learning model | |
US11250368B1 (en) | Business prediction method and apparatus | |
CN111078880B (en) | Sub-application risk identification method and device | |
CN110288459A (en) | Loan prediction technique, device, equipment and storage medium | |
US20220036178A1 (en) | Dynamic gradient aggregation for training neural networks | |
CN111629010B (en) | Malicious user identification method and device | |
CN110069545B (en) | Behavior data evaluation method and device | |
CN107633030A (en) | Credit estimation method and device based on data model | |
CN110930038A (en) | Loan demand identification method, loan demand identification device, loan demand identification terminal and loan demand identification storage medium | |
CN110264274A (en) | Objective group's division methods, model generating method, device, equipment and storage medium | |
CN109214178A (en) | APP application malicious act detection method and device | |
CN108596765A (en) | A kind of Electronic Finance resource recommendation method and device | |
CN110019996A (en) | A kind of family relationship recognition methods and system | |
CN106778851A (en) | Social networks forecasting system and its method based on Mobile Phone Forensics data | |
CN109960753A (en) | Detection method, device, storage medium and the server of equipment for surfing the net user | |
CN110348471A (en) | Exception object recognition methods, device, medium and electronic equipment | |
CN112950218A (en) | Business risk assessment method and device, computer equipment and storage medium | |
CN108629355A (en) | Method and apparatus for generating workload information | |
CN114139931A (en) | Enterprise data evaluation method and device, computer equipment and storage medium | |
CN113850669A (en) | User grouping method and device, computer equipment and computer readable storage medium | |
CN113570260A (en) | Task allocation method, computer-readable storage medium and electronic device | |
CN117235633A (en) | Mechanism classification method, mechanism classification device, computer equipment and storage medium | |
CN112085517A (en) | Coupon issuing method and device, electronic equipment and readable storage medium | |
CN110134862A (en) | Product information methods of exhibiting, device, computer equipment and storage medium | |
CN107645412B (en) | Web service combination multi-target verification method in open environment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190412 |