CN109146667B

CN109146667B - Method for constructing external interface comprehensive application model based on quantitative statistics

Info

Publication number: CN109146667B
Application number: CN201810950828.1A
Authority: CN
Inventors: 朱虹; 葛晓艳; 陈卓尔
Original assignee: Zhongan Online P&c Insurance Co ltd
Current assignee: Zhongan Online P&c Insurance Co ltd
Priority date: 2018-08-20
Filing date: 2018-08-20
Publication date: 2021-06-08
Anticipated expiration: 2038-08-20
Also published as: CN109146667A

Abstract

The invention discloses a method for constructing an external interface comprehensive application model based on quantitative statistics, and belongs to the field of data mining. The method comprises the following steps: interface data index quantification, namely uniformly standardizing results of different data sources, abstracting the results into rules, quantifying the rules into 0/1 indexes, and establishing a sample set; extracting a plurality of samples from the sample set, analyzing the samples, and obtaining 0/1 labels corresponding to each sample; and (4) creating an analysis model, and screening the final online rule. According to the method, the results of different data sources are unified and standardized and then are judged and screened, so that the optimal calling of different data interfaces and different types of data provided by the data interfaces is realized, and the established model can predict the default risk of the user.

Description

Method for constructing external interface comprehensive application model based on quantitative statistics

Technical Field

The invention relates to the field of data mining, in particular to a method for constructing an external interface comprehensive application model based on quantitative statistics.

Background

In recent years, along with the development of internet technology, the internet financial industry has grown rapidly, and provides convenient and efficient popular financial services for various social levels with financial service demands, especially for small and micro enterprises, farmers and low-income groups in cities and towns. Compared with the traditional financial industry, the credit service of the internet finance has the characteristics of low threshold, fast loan putting, wide coverage and the like, and the fund supply and demand parties all perform information discrimination, matching, pricing and transaction through the network platform, so that the internet finance credit service faces higher credit risk and puts higher requirements on the credit risk control capability of the internet finance platform.

On one hand, although an established earlier internet financial platform accumulates certain customer credit data in a system of the platform and can judge the repayment capacity and the repayment willingness of a user according to the historical credit condition of the customer, the hot start mode is relatively limited, is only suitable for old users of the platform and cannot judge new users.

On the other hand, although the internet financial institution can inquire the user's message reporting at the central bank after the user authorizes the internet financial institution by relying on the accumulated credit data of the internet financial institution, the message reporting at the central bank is more authoritative, but the coverage is limited, and the user who does not apply for the bank credit card cannot be covered.

Depending on the development of technologies such as big data and cloud computing, many third-party data companies appear in the market, and under the condition that the demand of credit wind control cannot be met by hot start and compulsory bank credit investigation, the internet financial platform can select to access a third-party data interface and obtain credit related data from a public way to solve the problem of 'cold start' of credit wind control. However, data suppliers on the market are more and more, data types are more and more abundant, data quality is good and irregular, and the problem that how to perform screening, distinguishing and combining various types of data and apply the data to a credit wind control system of the credit wind control system becomes a common problem in the industry when the cost is certain and how to achieve certain accuracy and coverage of the credit wind control system is solved.

Disclosure of Invention

In order to solve the problems in the prior art, the embodiment of the invention provides a method for constructing an external interface comprehensive application model based on quantitative statistics, which realizes the unified calling of different data interfaces and different types of data provided by the data interfaces by uniformly standardizing the results of different data sources, so as to overcome the problems that in the prior art, due to more and more data suppliers in the market, more and more data types and good and uneven data quality, the screening, the discrimination, the combination of various types of data and the application to the credit wind control system of the user are unknown, and the credit wind control reaches certain accuracy and coverage under the condition of certain cost.

The embodiment of the invention provides the following specific technical scheme:

a method for constructing an external interface comprehensive application model based on quantitative statistics comprises the following steps:

s1: interface data index quantification, namely uniformly standardizing results of different data sources, abstracting the results into rules, quantifying the rules into 0/1 indexes, and establishing a sample set;

s2: extracting a plurality of samples from the sample set, analyzing the samples, and obtaining 0/1 labels corresponding to each sample;

s3: establishing an analysis model, and screening a final online rule;

wherein the samples in the sample set comprise a plurality of types of products, each product comprises a plurality of scenes, and a set of product types and a set of scenes are established in the sample set.

Further, the step S1 specifically includes:

s1.1: sorting and summarizing all data suppliers and all interfaces provided by all suppliers, and establishing a supplier set and an interface set;

s1.2: sorting and summarizing data input and data output formats and contents required by all interfaces, and at least dividing an interface set into a label class, a numerical value class and/or an 0/1 class;

s1.3: abstracting each interface result into a rule and quantizing the rule into 0/1 indexes;

s1.4: establishing the rules into a rule set;

s1.5: a classification label is established for each rule.

Further, the abstracting the interface result into a rule and quantizing the interface result into 0/1 indexes at least includes:

abstracting each label into a rule for interface results of the interface set of the label class, wherein the miss/hit respectively corresponds to the result 0/1; and/or the presence of a gas in the gas,

for the interface result of the interface set of the numerical class, finding the specific meaning of the score corresponding to the interface document, dividing the score into different score intervals, abstracting into rules, quantizing the result into 1 if the interface return result is in the score interval, otherwise, 0; and/or the presence of a gas in the gas,

for the interface result of the interface set of 0/1 types, if the interface return result itself is 0 or 1, only the corresponding rule needs to be listed, and the hit is 1, otherwise, it is 0.

Further, the step S2 specifically includes:

s2.1: sampling a plurality of samples from the sample set by adopting non-return sampling, wherein the quality ratio of the samples is 1: 1;

s2.2: determining the sampling proportion of each scene, and sampling according to different products and different scenes according to the proportion;

s2.3: inquiring each interface corresponding to the sample, and obtaining a return result of the sample;

s2.4: and resolving the returned result of the samples into rules according to the step S1, and obtaining 0/1 labels of each sample corresponding to each rule.

Further, the step S2 further includes:

and establishing a sample complementing sequence table, complementing a scene after complementing the sequence table if the samples of a certain scene are insufficient, and so on, and complementing the samples of the first scene in the list when the samples of the last scene in the complementing sequence table are insufficient.

Further, the creating an analysis model includes at least one of creating a rule matrix or creating a logistic regression analysis model.

Further, the creating a rule matrix and the screening of the final online rule specifically include:

s3.1a: calculating quantization indexes of the rules, wherein the quantization indexes at least comprise: the adjusted accuracy and coverage rate;

s3.2a: selecting a reference index from the quantization indexes to construct a rule matrix;

s3.3a: and calculating a rule matrix index, selecting an online threshold according to the rule matrix index, wherein a rule in the range of the online threshold is a rule capable of being online.

Further, the constructing the rule matrix specifically includes:

selecting a reference index from the quantitative indexes, wherein the reference index comprises the adjusted accuracy and coverage rate;

grading the rule according to the reference index, wherein the grading threshold is an adjustable parameter;

and putting the rules into the corresponding positions of the matrix to obtain the regular groups.

Further, the creating a logistic regression analysis model and the screening of the final online rule specifically include:

s3.1b: determining model variables for the logistic regression model:

s3.2b: training a model, randomly dividing the derived variables into a training set and a verification set according to the proportion of 7:3, and fitting a logistic regression model by using the training set, wherein the fitting is normalized by adopting L1;

s3.3b: and verifying the model, namely verifying the logistic regression model by using the verification set, and adjusting the parameters of the logistic regression model according to the output result.

Further, the step s3.1b specifically includes:

s3.1.1: taking the rule set as a basic variable set of a logistic regression model;

s3.1.2: classifying the basic variable set into four types of labels of multi-head loan, malicious overdue, risk record and high-risk behavior;

s3.1.3: and combining rules with accuracy levels of A and B in the rule matrix according to the classification labels to generate derivative variables, wherein if any rule in the derivative variable pool is hit, the value of the derivative variable is 1, and otherwise, the value of the derivative variable is 0.

Further, the method further comprises:

s4: and screening the interfaces according to the interface cost and the gains of the interfaces.

Further, the step S4 specifically includes:

and sorting the interfaces on line according to the sequence of the query cost from high to low to generate an interface cost sequence list, sequentially calculating the gain of each interface according to the sequence of the interface cost sequence list, and deleting the interfaces with negative or no gain.

Compared with the prior art, the technical scheme provided by the embodiment of the invention has the following beneficial effects:

1. the embodiment of the invention provides a method for constructing an external interface comprehensive application model based on quantitative statistics, which can realize optimal calling of different data interfaces and different types of data provided by the data interfaces by uniformly standardizing results of different data sources and then judging and screening;

2. the embodiment of the invention provides a method for constructing an external interface comprehensive application model based on quantitative statistics, which can improve the accuracy and the coverage of credit wind control by constructing a rule matrix and a logistic regression analysis model;

3. the embodiment of the invention provides a method for constructing an external interface comprehensive application model based on quantitative statistics, and the cost of credit wind control can be controlled by screening interfaces according to the interface cost and the gains of each interface.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a flow chart of one embodiment of a method for constructing an external interface comprehensive application model based on quantitative statistics according to the present invention;

FIG. 2 is a flowchart illustrating the implementation procedure of step S1 in the method for constructing the external interface comprehensive application model based on quantitative statistics according to an embodiment of the present invention;

FIG. 3 is a flowchart illustrating the implementation procedure of step S2 in the method for constructing the external interface comprehensive application model based on quantitative statistics according to an embodiment of the present invention;

fig. 4 is a flowchart of creating a rule matrix and screening a final online rule in the method for constructing an external interface comprehensive application model based on quantitative statistics according to the embodiment of the present invention;

fig. 5 is a flowchart of creating a logistic regression analysis model and screening a final online rule in the method for constructing an external interface comprehensive application model based on quantitative statistics according to the embodiment of the present invention;

FIG. 6 is a flowchart of another embodiment of a method for constructing an external interface integrated application model based on quantitative statistics according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

According to the method for constructing the external interface comprehensive application model based on the quantitative statistics, the constructed model can call different data interfaces and different types of data provided by the data interfaces, is used for comprehensively predicting the default risk and the credit default probability of a user, and can reduce the cost of data query.

As shown in fig. 1, fig. 1 is a flowchart of a method for constructing an external interface comprehensive application model based on quantitative statistics, which is provided by an embodiment of the present invention, and the method includes the following steps:

s1: interface data index quantification, unification and standardization of results of different data sources, abstraction into rules and quantification into 0/1 indexes, and establishment of a sample set.

Specifically, there are many data suppliers on the market, and the types of data are rich and diverse, so that the results of different data sources provided by different data suppliers are unified and standardized, abstracted into rules and quantized into 0/1 indexes, and a sample set is established, thereby facilitating the subsequent call of data interfaces provided by each data supplier. Wherein, the sample needs to contain three elements of the user (namely identity card, telephone, name), product type and good and bad label.

It should be noted that, samples in the sample set include multiple types of products, each Product includes multiple scenes, and the method for constructing the external interface comprehensive application model based on the quantitative statistics further includes establishing a Product type set Product _0 ═ pro1, pro2,. } and a scene set pro1 ═ item1, item 2.

S2: and extracting a plurality of samples from the sample set, analyzing the samples, and obtaining 0/1 labels corresponding to each sample.

Specifically, a user can set a sampling proportion by himself, then a plurality of samples are extracted from the sample set according to the set proportion, then the extracted samples are analyzed, 0/1 labels of each sample corresponding to each rule are obtained, and the data are recorded as data input.

S3: and (4) creating an analysis model, and screening the final online rule.

Specifically, the user may construct a rule matrix or create a logistic regression analysis model for screening the final online rules. And the default risk of the user can be comprehensively predicted through the online rule. The benchmark index for constructing the rule matrix can be obtained by calculation. The user may also determine a set of variables for training the logistic regression analysis model, train the logistic regression analysis model based on the set of variables, and then predict the user's risk of default based on the data input and the logistic regression analysis model.

Referring to fig. 2, fig. 2 is a flowchart of an implementation process of step S1 in the method for constructing the external interface comprehensive application model based on the quantitative statistics, where the process may include the following steps:

s1.1: and sorting and summarizing all data suppliers and all interfaces provided by all suppliers, and establishing a supplier set and an interface set.

Specifically, all tradeable data suppliers and all available interfaces provided by each Supplier on the current market are collated and summarized, and a Supplier set supply _0 ═ { S1, S2, S3, …, Sn } and an interface set Port _0 ═ { P1, P2, P3, …, Pn } are respectively established.

S1.2: and sorting and summarizing data input and data output formats and contents required by all interfaces, and dividing the interface set into at least label class, numerical value class and/or 0/1 and the like.

S1.3: the interface results of each interface set are abstracted into rules and quantized into 0/1 metrics.

Specifically, the process at least comprises the following steps:

for the interface results of the interface set of the tag class, each tag is abstracted to a rule, and the miss/hit corresponds to the result 0/1, respectively. If a certain label is "court lost messenger", it is abstracted as a rule "whether it is a court lost messenger", if so, the result is quantized to 1, and if not, the result is quantized to 0.

And for the interface result of the interface set of the numerical class, finding the specific meaning of the score corresponding to the interface document, dividing the score into different score intervals, abstracting into rules, quantizing the result into 1 if the interface returned result is in the score interval, and otherwise, quantizing the result into 0. If the returned result of a certain interface is 30 points, the score of 30 to 50 points is abstracted into a rule that the time is more than 90 days and the money amount is in a range of 0-1w, if the returned result of the interface is in an interval (30, 50), the result is quantized to be 1, otherwise, the result is quantized to be 0.

For the interface result of the interface set of 0/1 types, if the interface return result itself is 0 or 1, only the corresponding rule needs to be listed, and the hit is 1, otherwise, it is 0. If "hit on XX blacklist", then hit is 1, otherwise it is 0.

S1.4: and establishing the rule into a rule set.

Specifically, based on the above result, a Rule set Rule _0 ═ { R1, R2, R3,.. Rn }, is established.

S1.5: a classification label is established for each rule.

Specifically, the following table is an example of a rule classification, referring to the following table:

wherein, the supplier of the rule R1 is S1, the specific meaning of R1 is "applicant is highly active human", and the corresponding labels of the rule R1 are: personal, multi-headed loans and active.

Referring to fig. 3, fig. 3 is a flowchart of an implementation process of step S2 in the method for constructing the external interface comprehensive application model based on the quantitative statistics, where the process may include the following steps:

s2.1: and (3) sampling a plurality of samples from the sample set by adopting non-return sampling, wherein the quality ratio of the samples is 1: 1.

s2.2: and determining the sampling proportion of each scene, and sampling according to different products and different scenes according to the proportion.

Specifically, in order to ensure the diversity of the sample, the sample is divided into different products and different scenes according to the proportion, and the products in the sample comprise: the system comprises stages, cash credits and virtual cards, wherein the stages comprise the following scenes: car insurance, rent room, medical treatment and voyage, the scene that the cash loan includes has: the virtual card comprises the following scenes: physical and virtual. The user can determine the sampling proportion of different products and different scenes according to actual requirements, the finally determined sampling proportion needs to ensure that the quality proportion of the sample is 1:1, and the following table is specifically referred to:

s2.3: and inquiring each interface corresponding to the sample, and obtaining a return result of the sample.

Specifically, each interface corresponding to the extracted sample is queried, and a return result of the sample is obtained from each interface.

Specifically, if the returned result of the sample is label type, the rule corresponding to the label is obtained, and the miss/hit respectively corresponds to the result 0/1; if the returned result of the sample is numerical, acquiring a corresponding rule, if the returned result of the interface is in a score interval corresponding to the rule, the result is 1, otherwise, the result is 0; if the returned result of the sample is 0/1 type, only the corresponding rule needs to be listed, the hit is 1, otherwise it is 0. And (3) marking the finally obtained 0/1 label of each sample corresponding to each rule as data input, wherein the data input further comprises the actual good-and-bad label of each sample and the good-and-bad result predicted by each rule.

In practical applications, there may be a situation where a certain scene sample is insufficient, and it is necessary to determine that the sample complements the sequence table, for example, [ pro _ iteml, pro1_ itme2, pro2_ item 1. ].

Referring to fig. 4, fig. 4 is a flowchart of creating a rule matrix and screening a final online rule in a method for constructing an external interface comprehensive application model based on quantitative statistics according to an embodiment of the present invention, where the process may include the following steps:

s3.1a: calculating quantization indexes of the rules, wherein the quantization indexes at least comprise: adjusted accuracy and coverage.

Specifically, assuming that Q samples are extracted in total, calculating the quantization index of each rule at least includes:

interface return rate: return _ rate is the interface return amount/submit query amount;

interface matching rate: outputting manual rejection or hit black name list quantity/interface return quantity;

rate of interface lookup: an interface return rate (i.e., interface matching rate);

number of hits on the rule: rule _ bad _ amount is the number of people who hit the rule;

number of hits (predicted bad and true bad): the number of people who hit (by rule) and (actually overdue or cheat);

bad number of misses (good and true prediction): the number of people who (not hit by a rule) and (actually overdue or fraudulent);

number of hits (bad and true predictions): the number of people who (hit by a rule) and (actually not overdue or fraudulent);

good number of misses (predicted good and true good): the number of people who (did not hit by a rule) and (actually not overdue or fraudulent);

the accuracy is as follows: access _ rate is 1.0 round _ bad _ actual _ bad _ average/round _ bad _ average (i.e. 1.0 number of missing bad persons/number of hit rules);

the adjusted accuracy rate: the adjusted _ accuracy _ rate is 1.0 accuracy _ rate/(a-b accuracy _ rate), wherein a ═ b, parameters a and b in the formula can be adjusted according to actual conditions, and the larger the value of a-b is, the more remarkable the adjustment effect on the accuracy rate is.

Specifically, it should be noted here that, in order to prevent sampling bias, we sample according to 1: the ratio of 1 is used for respectively extracting good samples and bad samples, the sampling method can lead to high accuracy, and the ratio of the bad samples is far smaller than that of the good samples in the actual situation, so that the accuracy is adjusted to be close to the real level. In the implementation of the present invention, a in the formula is set to 9, and b is set to 8, i.e., adjusted _ access _ rate is 1.0 × access _ rate/(9-8 × access _ rate).

Coverage rate: coverage _ rate ═ 1.0 round _ bad _ actual _ bad _ account/(0.5Q)

Disturbance factor: a return _ rate ═ 1.0 round _ bad _ account/O

F value: f _ score 2.0 additional _ access _ rate _ call _ rate/(adjusted _ access _ rate + call _ rate)

Specifically, in a general case, the adjusted accuracy and the adjusted coverage rate are used to measure the effect of the model, but sometimes the two measurement methods may produce a contradiction, for example, the adjusted accuracy is very high but the coverage rate is low, or the adjusted accuracy rate is high but the adjusted accuracy rate is general, at this time, the F value may be used to measure the effect, where the F value is a weighted harmonic mean value of the adjusted accuracy rate and the adjusted disturbance rate, and is used to measure the effect by combining the two indexes (when the adjusted accuracy rate and the adjusted disturbance rate produce a contradiction, the F value may be used to measure the combined effect). It should be noted here that the accuracy, the adjusted accuracy, the disturbance rate, and the F value mentioned in the embodiment of the present invention may all be used as indexes for measuring the model effect, and the embodiment of the present invention selects the adjusted accuracy and the coverage rate as the measurement indexes of the rule matrix model, and may also be adjusted to other indexes in the actual application process.

S3.2a: and selecting a reference index from the quantization indexes to construct a rule matrix.

Specifically, the accuracy and the coverage rate after adjustment are selected as the reference indexes of the rule matrix in the embodiment of the invention. And classifying the rules according to the reference indexes, classifying the rules into four levels of ABCD according to the adjusted accuracy rate, and classifying the rules into four levels of 1234 according to the coverage rate, wherein the classification threshold is an adjustable parameter, and the rules can be adjusted according to actual conditions in practical application. Putting the rules into the corresponding positions of the matrix according to grades to obtain regular groups, wherein the data with magnitude are divided into 16 groups (A1 to D4) in total, and a group without magnitude X0 (referring to the rule corresponding to the interface return rate of 0) in total, and 17 groups in total, for example, the following table is an example for determining the grading of the rules, wherein the accuracy grading threshold is 15%, 30%, 60%, and the coverage grading threshold is 1%, 3%, 6%:

s3.3a: and calculating a rule matrix index, selecting an online threshold according to the rule matrix index, wherein a rule in the range of the online threshold is a rule capable of being online. For example, if the rule with the accuracy of the a-level and the B-level is selected to be on line, the on-line threshold is the accuracy of 30%, that is, when the accuracy is greater than or equal to 30%, the rule is divided into the a-level or the B-level, and the rule can be on line this time.

Specifically, the rule matrix index to be calculated includes:

the cell accuracy rate is the number of people hit by all rules in the cell and are actually overdue or fraudulent/the number of people hit by all rules in the cell;

cell coverage is the number of people hit by all rules in a cell and actually overdue or cheat/the total number of people who actually overdue or cheat;

the row cumulative accuracy is the number of people hit by all rules in the current and left cells and are actually overdue or fraudulent/the number of people hit by all rules in the cell;

row cumulative coverage is the number of people hit by all rules in the current and left cells and actually overdue or cheat/the total number of people who actually overdue or cheat;

the row cumulative accuracy is the number of people hit by all rules in the current and upper cells and are actually overdue or fraudulent/the number of people hit by all rules in the cell;

row cumulative coverage-the number of people hit by all rules in the current and top cells and actually overdue or cheat/the total number of people actually overdue or cheat;

the row and column cumulative accuracy is the number of people hit by all rules in the current and upper left cells and actually overdue or cheat/the number of people hit by all rules in the cell;

line cumulative coverage-the number of people hit by all rules in the current and top left cells and actually overdue or cheat/the total number of people actually overdue or cheat.

In practical application, the method further comprises the steps of periodically updating the rule model and determining the online rule, and the method is specifically shown in the following table:

the results of this stage	Early stage results	Final decision
			Threading	Threading	Threading
Threading	Off-line	Calculating the two-stage comprehensive result, and finally taking the comprehensive result as the standard
			Off-line	Threading	Calculating the two-stage comprehensive result, and finally taking the comprehensive result as the standard
Off-line	Off-line	Off-line

Referring to fig. 5, fig. 5 is a flowchart of creating a logistic regression analysis model and screening a final online rule (a rule corresponding to a variable with a significant final parameter is an online rule) in the method for constructing the external interface comprehensive application model based on the quantitative statistics, where the process may include the following steps:

s3.1b: model variables for the logistic regression model are determined.

Specifically, the process of determining the model variables of the logistic regression model includes the following steps:

s3.1.1: taking the Rule set Rule _0 ═ { R1, R2, R3.., Rn } as a logistic regression model basis variable set;

S3.2b: training a model, randomly dividing the derived variables into a training set and a verification set according to the proportion of 7:3, and fitting a logistic regression model by using the training set, wherein the fitting is normalized by adopting L1, so that overfitting caused by more variables can be prevented;

Specifically, the calculation results are grouped according to the probability range, the grouped target number, the grouped non-target number, the accumulated total number, the accumulated non-target number, the grouped target proportion, the accumulated grouped non-target proportion, the ks value, the random level and the lifting degree are calculated, the parameter of the logistic regression model is adjusted according to the calculation results, and the optimal model parameter is determined.

It should be noted here that the output result of the Logistic regression model is the probability that each sample belongs to good and bad classes, grouping is performed according to the quantile of the probability, and the number of the grouped target person refers to the number of bad samples falling in the grouping interval; the number of grouped people refers to the number of samples falling in the grouped interval; the number of grouped non-target people refers to the number of good samples falling in the grouped interval; the accumulated target number of people is less than or equal to the sum of the number of bad samples of the corresponding probability values of the grouping interval; the total accumulated number of people is less than or equal to the sum of total samples of the probability values corresponding to the grouping interval; the accumulated grouped non-target people number refers to the sum of good sample numbers which are less than or equal to the probability value corresponding to the grouped interval; each fraction value is the number of the index divided by the total number of samples; the ks value refers to the absolute value of the difference between the accumulated grouping target ratio and the accumulated grouping non-target ratio and is used for measuring the distinguishing capability of the model on the good and bad samples; the boost refers to the degree that the target grouping ratio is multiplied by the random level and is used for measuring the degree that the prediction capability of the model is better than that of random selection.

And adjusting parameters of the logistic regression model according to the calculation result to obtain different models and different model evaluation indexes, and selecting the parameters which enable the model evaluation indexes to be optimal as the optimal model parameters.

Referring to fig. 6, fig. 6 is a flowchart illustrating another embodiment of a method for constructing an external interface comprehensive application model based on quantitative statistics, where the method includes, in addition to the above steps S1 to S3:

Specifically, the interfaces on line are sorted according to the sequence from high to low of the query cost, an interface cost sequence list Port _ rank is generated [ Pr1, Pr2, Pr 3.., and Prn ], the gain of each interface is sequentially calculated according to the sequence of the interface cost sequence list, namely, each time the gain of the comprehensive hit number, the accuracy, the adjusted accuracy and the coverage rate is added, the interface with negative or no gain is deleted, and the final on-line interface and the corresponding rule thereof are obtained. The specific steps of calculating the gain of each interface are as follows: the first step is as follows: according to the interface cost sequence list Port _ rank [ [ Pr1, Pr2, Pr 3. ] and Prn ], starting from Pr1 and Pr2 with the lowest cost, indexes such as the comprehensive hit number, the hit failure number, the accuracy rate and the like of Pr1 and Pr2 serial are calculated, if the comprehensive index added with Pr2 is improved (has gain) compared with the independent index of the Pr1 interface, Pr2 is reserved, otherwise, the comprehensive index is not reserved; the second step is that: if Pr2 is reserved in the first step, then calculating the comprehensive indexes of Pr1, Pr2 and Pr3, if the gain is available, reserving Pr3, otherwise, not reserving; the third step: by analogy, an online interface can be finally obtained.

1. the embodiment of the invention provides a method for constructing an external interface comprehensive application model based on quantitative statistics, which can realize the unified calling of different data interfaces and different types of data provided by the data interfaces by uniformly standardizing the results of different data sources;

3. the embodiment of the invention provides a method for constructing an external interface comprehensive application model based on quantitative statistics, which can screen interfaces according to interface cost and gains of the interfaces and can inquire data cost.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A method for constructing an external interface comprehensive application model based on quantitative statistics is characterized by comprising the following steps:

s3: establishing an analysis model, and screening a final online rule;

wherein, the samples in the sample set comprise a plurality of types of products, each product comprises a plurality of scenes, and a product type set and a scene set are established in the sample set;

creating an analytical model includes at least one of creating a rule matrix or creating a logistic regression analysis model;

the creating a rule matrix and the screening of the final online rule specifically comprise:

2. The method for constructing an external interface comprehensive application model based on quantitative statistics as claimed in claim 1, wherein said step S1 specifically comprises:

s1.4: establishing the rules into a rule set;

s1.5: a classification label is established for each rule.

3. The method for constructing an external interface comprehensive application model based on quantitative statistics as claimed in claim 2, wherein the abstracting each interface result into a rule and quantizing each interface result into 0/1 indexes at least comprises:

4. The method for constructing an external interface comprehensive application model based on quantitative statistics as claimed in any one of claims 1 to 3, wherein the step S2 specifically comprises:

5. The method for constructing an external interface comprehensive application model based on quantitative statistics as claimed in claim 4, wherein said step S2 further comprises:

6. The method for constructing an external interface comprehensive application model based on quantitative statistics as claimed in claim 1, wherein said constructing a rule matrix specifically comprises:

7. The method for constructing an external interface comprehensive application model based on quantitative statistics as claimed in claim 1, wherein the creating a logistic regression analysis model and the screening of the final online rules specifically comprise:

s3.1b: determining model variables for a logistic regression analysis model, the model variables comprising derivative variables:

s3.2b: a training model, randomly dividing the derived variables into a training set and a verification set according to the proportion of 7:3, and fitting the logistic regression model by using the training set, wherein the fitting is normalized by adopting L1;

8. The method for constructing an external interface comprehensive application model based on quantitative statistics as claimed in claim 7, wherein the step S3.1b specifically comprises:

9. The method for constructing the external interface comprehensive application model based on the quantitative statistics as claimed in any one of claims 1 to 3, wherein the method further comprises:

10. The method for constructing an external interface comprehensive application model based on quantitative statistics as claimed in claim 9, wherein said step S4 specifically includes: