CN109272340B - Parameter threshold determining method and device and computer storage medium - Google Patents

Parameter threshold determining method and device and computer storage medium Download PDF

Info

Publication number
CN109272340B
CN109272340B CN201810797054.3A CN201810797054A CN109272340B CN 109272340 B CN109272340 B CN 109272340B CN 201810797054 A CN201810797054 A CN 201810797054A CN 109272340 B CN109272340 B CN 109272340B
Authority
CN
China
Prior art keywords
parameter
parameter threshold
sample
candidate
threshold value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810797054.3A
Other languages
Chinese (zh)
Other versions
CN109272340A (en
Inventor
金晓辉
阮晓雯
徐亮
肖京
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201810797054.3A priority Critical patent/CN109272340B/en
Priority to PCT/CN2018/111123 priority patent/WO2020015216A1/en
Publication of CN109272340A publication Critical patent/CN109272340A/en
Application granted granted Critical
Publication of CN109272340B publication Critical patent/CN109272340B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Finance (AREA)
  • General Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Theoretical Computer Science (AREA)
  • Strategic Management (AREA)
  • Data Mining & Analysis (AREA)
  • Development Economics (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • General Business, Economics & Management (AREA)
  • Pure & Applied Mathematics (AREA)
  • Marketing (AREA)
  • Mathematical Physics (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Mathematical Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Algebra (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Technology Law (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a method and a device for determining a parameter threshold value and a computer storage medium, relates to the technical field of information, and mainly aims to optimize the parameter threshold value given according to business experience based on the actual condition of data distribution, improve the accuracy of the parameter threshold value of grade evaluation and further improve the accuracy of the data grade evaluation. The method comprises the following steps: determining parameter candidate threshold values of all parameters according to data distribution of sample data on all parameters; adjusting the parameter threshold value given according to the business experience by using a preset adjusting hypothesis and the candidate parameter threshold value; and determining the parameter threshold value of the grade according to the adjusted parameter threshold value. The invention is suitable for determining the parameter threshold.

Description

Parameter threshold determination method and device and computer storage medium
Technical Field
The present invention relates to the field of information technologies, and in particular, to a method and an apparatus for determining a parameter threshold, and a computer storage medium.
Background
In some scenarios, it is often necessary to build a rating model in order to rate the data. In general, a rules engine is built prior to building the rating model. The rule engine will use a series of parameter specification conditions in the rating model and then the rating model will rate each step to be performed according to the parameter specification conditions until a rating conclusion is reached. Usually the parameter reduction condition is constructed from parameters and parameter thresholds. For example, for the money consumption scenario, when the risk level of money consumption is assessed, the parameters related to money consumption may be a: an amount of consumption; parameter b: when the consumption stroke number and the risk level of money consumption are high, the corresponding parameter specification condition can be a>x 1 andb>y 1 (ii) a When the consumption level of the amount is low, the corresponding parameter stipulation condition a<x 2 andb<y 2 (ii) a When the risk level of the money consumption is middle, the corresponding parameter specification conditions can be as follows: a is>x 1 andy 1 >b>y 2 Or x 1 >a>x 2 andb>y 1 Or x 1 >aandy 1 >b>y 2 . Wherein x is 1 、x 2 A parameter threshold value being a parameter a; y is 1 、y 2 A parameter threshold value for parameter b.
Currently, the parameter threshold value of the rating is usually determined according to the business experience of the business party or the expert, that is, the parameter threshold value of the business party or the expert given according to the business experience is directly determined as the parameter threshold value of the rating. However, in the case that the business experience of the business party or the expert is insufficient or the new scenario is aimed at, the ranking according to the parameter threshold may not conform to the actual situation of the data, resulting in a less accurate ranking of the parameter threshold and a less accurate ranking of the data.
Disclosure of Invention
The invention provides a method, a device and a computer storage medium for determining a parameter threshold, which mainly can realize the optimization of the parameter threshold given according to business experience based on the actual condition of data distribution and improve the accuracy of the parameter threshold of grade evaluation, thereby improving the accuracy of data grade evaluation.
According to a first aspect of the present invention, there is provided a parameter threshold determination method, comprising:
determining candidate parameter thresholds of all parameters according to data distribution of sample data on all parameters;
adjusting the parameter threshold value given according to the business experience by using a preset adjusting hypothesis and the candidate parameter threshold value;
and determining the parameter threshold value of the grade according to the adjusted parameter threshold value.
According to a second aspect of the present invention, there is provided a parameter threshold determining apparatus comprising:
the determining unit is used for determining candidate parameter threshold values of all parameters according to data distribution of sample data on all parameters;
the tuning unit is used for tuning a parameter threshold value given according to business experience by using a preset tuning hypothesis and the candidate parameter threshold value;
and the determining unit is used for determining the parameter threshold value of the grade according to the adjusted parameter threshold value.
According to a third aspect of the present invention, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of:
determining candidate parameter threshold values of all parameters according to data distribution of sample data on all parameters;
adjusting the parameter threshold value given according to the business experience by using a preset adjusting hypothesis and the candidate parameter threshold value;
and determining the parameter threshold value of the grade according to the adjusted parameter threshold value.
According to a fourth aspect of the present invention, there is provided a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the program:
determining candidate parameter threshold values of all parameters according to data distribution of sample data on all parameters;
adjusting the parameter threshold value given according to the business experience by using a preset adjusting hypothesis and the candidate parameter threshold value;
and determining the parameter threshold value of the grade according to the adjusted parameter threshold value.
Compared with the parameter threshold value of grade assessment determined according to the business experience of a business party or an expert at present, the parameter threshold value determining method, the parameter threshold value determining device and the computer storage medium can determine the candidate parameter threshold value of each parameter according to the data distribution of sample data on each parameter; the parameter threshold given according to the business experience can be tuned and optimized by utilizing the preset tuning hypothesis and the candidate parameter threshold, and the parameter threshold for grade evaluation is determined according to the tuned and optimized parameter threshold, so that the parameter threshold given according to the business experience can be tuned and optimized based on the actual condition of data distribution, the accuracy of the parameter threshold for grade evaluation is improved, and the accuracy of data grade evaluation can be improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
fig. 1 shows a flowchart of a parameter threshold determination method provided in an embodiment of the present invention;
FIG. 2 is a flow chart of another method for determining a parameter threshold according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a parameter threshold determining apparatus according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of another parameter threshold value determining apparatus provided in the embodiment of the present invention;
fig. 5 shows a physical structure diagram of a computer device according to an embodiment of the present invention.
Detailed Description
The invention will be described in detail hereinafter with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
As background art, currently, the parameter threshold value of the rating is usually determined according to the business experience of the business party or expert, that is, the parameter threshold value of the business party or expert given according to the business experience is directly determined as the parameter threshold value of the rating. However, in the case that the business experience of the business party or the expert is insufficient or the new scenario is aimed at, the ranking according to the parameter threshold may not conform to the actual situation of the data, resulting in a less accurate ranking of the parameter threshold and a less accurate ranking of the data.
In order to solve the above problem, an embodiment of the present invention provides a method for determining a parameter threshold, as shown in fig. 1, where the method includes:
101. and determining candidate parameter threshold values of all parameters according to the data distribution of the sample data on all parameters.
For example, the sample data is money consumption data in a situation of fraudulent use of a bank card, and the parameters corresponding to the money consumption data may include: parameter a: an amount of consumption; parameter b: the consumption stroke number, the amount consumption data is distributed in a certain way on two parameters of the consumption amount and the consumption stroke number, for example, the amount consumption data is 100, the amount consumption data with the consumption stroke number of more than 10 can have 10, and the amount consumption data with the consumption stroke number of less than 10 can have 90; there may be 20 pieces of money consumption data with a consumption amount of more than 5 ten thousand yuan, 80 pieces of money consumption data with a consumption amount of less than 5 ten thousand, and so on. For another example, the sample data is human body attribute data, and the parameters corresponding to the human body attribute data may include: parameter a: body weight; parameter b: height, the body attribute data may be distributed in a certain way on two parameters of weight and height, for example, there may be 100 pieces of body attribute data, 80 pieces of body attribute data with weight of more than 100 jin, 20 pieces of body attribute data with weight of less than 100 jin, 70 pieces of body attribute data with height of more than 1.6m, 20 pieces of body attribute data with height of less than 1.6m, and the like.
In the embodiment of the present invention, a statistical quantity line graph of each parameter may be drawn according to the data distribution on each parameter, and then each inflection point in the statistical quantity line graph is searched as a candidate parameter threshold. The candidate parameter threshold determined for each parameter may be a set, e.g., the candidate parameter threshold determined for parameter a may include
Figure GDA0004094978430000041
A candidate parameter threshold determined for parameter b may comprise>
Figure GDA0004094978430000042
102. And adjusting the parameter threshold value given according to the business experience by using a preset adjusting hypothesis and the candidate parameter threshold value.
The preset tuning hypothesis may be set according to a specific application scenario or an actual requirement, and the embodiment of the present invention is not limited herein. Specifically, the tuning hypothesis may be set according to the number of the grades and the number of the parameters of the data. For example, the preset tuning hypothesis may include:
assume that 1: if the grade to be evaluated has p grades, N p :N p-1 ≈N p-1 :N p-2 ≈...≈N 2 :N 1 Said N is p Indicating the sample data size contained in the p-th level;
assume 2: if the number of the parameters is h, the adjusted and optimized parameter threshold values of all levels of each parameter accord with
Figure GDA0004094978430000043
The R is hp The percentile of the parameter threshold value of the pth grade representing the pth parameter;
assume that 3: and the sample ratio corresponding to the adjusted parameter threshold combination is approximately equal to the sample ratio before adjustment.
103. And determining the parameter threshold value of the grade according to the adjusted parameter threshold value.
For example, in a situation of fraudulent use of a bank card, if the consumption amount threshold corresponding to the adjusted high risk level is 48525 yuan and the number of consumption strokes corresponding to the adjusted high risk level is 10, the consumption amount threshold is 48525 yuan and the number of consumption strokes 10 are determined as parameter thresholds for evaluating the high risk level, that is, when the consumption amount is greater than 48525 yuan and the number of consumption strokes is greater than 10, the consumption level is determined as the high risk level. Compared with the consumption amount threshold value corresponding to the high risk level given according to the business experience at present, the consumption amount threshold value and the consumption pen number in the embodiment of the invention are 5 ten thousand yuan, and the corresponding consumption pen number is 10 pens, so that the actual condition that the money consumption data is distributed on the consumption amount and the consumption pen number is better met.
Compared with the parameter threshold value of grade assessment determined according to the service experience of a service party or an expert at present, the parameter threshold value determination method provided by the embodiment of the invention can determine the candidate parameter threshold value of each parameter according to the data distribution of sample data on each parameter; the parameter threshold given according to the business experience can be tuned and optimized by utilizing the preset tuning hypothesis and the candidate parameter threshold, and the parameter threshold for grade evaluation is determined according to the tuned and optimized parameter threshold, so that the parameter threshold given according to the business experience can be tuned and optimized based on the actual condition of data distribution, the accuracy of the parameter threshold for grade evaluation is improved, and the accuracy of data grade evaluation can be improved.
Further, in order to better describe the process of determining the parameter threshold, as a refinement and an extension of the foregoing embodiment, an embodiment of the present invention provides another method for determining the parameter threshold, as shown in fig. 2, where the method includes:
201. and calculating the statistic of the sample data on each parameter according to the data distribution of the sample data on each parameter.
Wherein the statistics may be one or more of quantiles, intra-group dispersion, or inter-group distance. For example, if the statistic is quantile, the quantile between 0 and 100 of each parameter can be directly calculated; if the statistic is intra-group dispersion or inter-group distance, dividing the sample data into two groups of data according to a certain step length threshold, and then calculating the intra-group dispersion of the two groups or calculating the inter-group distance of the two groups according to an intra-group dispersion calculation formula; and finally, calculating the intra-group dispersion or the inter-group distance of each movement according to a certain step moving threshold to obtain the intra-group dispersion or the inter-group distance.
202. And determining a statistic line graph corresponding to each parameter according to the statistic.
For the embodiment of the present invention, the step 202 may specifically be: and drawing a statistic line graph corresponding to each parameter according to the statistics, wherein the abscissa of the statistic line graph can be a specific parameter value, and the ordinate can be statistic. For example, if the statistics line graph is a quantile line graph, the quantile line graph can be drawn according to the 0-100 quantiles calculated in step 201, wherein the abscissa of the quantile line graph is the specific value of the parameter a, and the ordinate is the quantile of the parameter a. If the statistical quantity line graph is an intra-group dispersion line graph, the intra-group dispersion line graph can be drawn according to the intra-group dispersion line graph calculated in step 201; if the statistics line graph is an inter-group distance line graph, the inter-group distance line graph may be drawn according to the inter-group distance line graph calculated in step 201.
203. And determining a candidate parameter threshold of each parameter according to the threshold corresponding to each inflection point in the statistic line graph.
For the embodiment of the present invention, the inflection point in the statistical line graph may be a point where the difference between the front slope and the rear slope is large, and a point where the front slope is positive and the rear slope is negative in the statistical line graph may be determined as the required inflection point. Specifically, when the statistical quantity line graph is a quantile line graph, the inflection point can be found by using the following formula:
Figure GDA0004094978430000061
in this formula x i Abscissa value, y, which can represent a quantile line graph i The ordinate values of the quantile line chart may be represented. The candidate parameter threshold of the parameter a determined by said step 203 may comprise
Figure GDA0004094978430000062
Parameter b the candidate parameter threshold may comprise {/or {/H } { (R) }>
Figure GDA0004094978430000063
204. And adjusting the parameter threshold value given according to business experience by using a preset adjusting hypothesis and the candidate parameter threshold value.
Wherein the preset tuning hypothesis comprises:
assume that 1: if the grade to be evaluated has p grades, N p :N p-1 ≈N p-1 :N p-2 ≈...≈N 2 :N 1 Said N is p Indicating the sample data size contained in the p-th level;
assume 2: if the number of the parameters is h, the adjusted and optimized parameter threshold values of all levels of each parameter accord with
Figure GDA0004094978430000071
Said R is hp A percentile representing a parameter threshold value of the pth grade of the h parameter;
assume that 3: and the sample ratio corresponding to the adjusted parameter threshold combination is approximately equal to the sample ratio before adjustment.
For example, in the case of a stolen bank card, the risk level of the risk level assessment is 3, which can be a high risk level, a medium risk level and a low risk level, and the parameters of the sample data include 2 parameters and the number of consumption strokes; the tuning assumptions set may be:
assume that 1: sample data size for low risk level: sample data size for medium risk class ≈ sample data size for medium risk class: sample data size for high risk level; or low risk class of samples: sample proportion of medium risk class ≈ sample proportion of medium risk class: sample fraction of high risk class;
assume 2: the percentile where the consumption amount threshold value is located is approximately equal to the percentile where the consumption pen number threshold value is located;
assume that 3: and the sample ratio corresponding to the combination of the adjusted consumption amount threshold and the adjusted consumption pen number threshold is approximately equal to the sample ratio before adjustment.
For the embodiment of the present invention, the step 204 may specifically include:
1. and calculating the ratio of the sample data size of the parameter threshold combination corresponding to the given parameter threshold to the total sample data size as the sample ratio before the adjustment.
It should be noted that the sample ratio H is expressed as follows:
Figure GDA0004094978430000072
for example, the sample data is 100 pieces of money consumption data, given the consumption money threshold value x of the high risk level 1 5 ten thousand yuan, the consumption pen number y 1 The consumption amount threshold is 5 ten thousand yuan, the consumption data with the consumption stroke number of 10 strokes is 20, and the sample proportion before the adjustment is as follows: 20/100=0.2.
2. And selecting a candidate parameter threshold value combination with the sample ratio approximately equal to the sample ratio before the adjustment and the percentile corresponding to the candidate parameter threshold value approximately equal to the sample ratio before the adjustment as a first group of parameter threshold values after the adjustment according to the combination threshold value division table corresponding to the hypothesis 3, the hypothesis 2 and the candidate parameter threshold value.
For the embodiment of the present invention, the step 2 specifically includes: determining a combined threshold dividing table corresponding to the candidate parameter threshold, wherein the combined threshold dividing table stores a corresponding relation between the candidate parameter threshold and the percentile where the candidate parameter threshold is located, a sample proportion corresponding to the candidate parameter threshold combination of each parameter, and a mapping relation between the corresponding relation and the sample proportion; searching a sample ratio approximately equal to the sample ratio before tuning and a first corresponding relation corresponding to the searched sample ratios from the combined threshold division table according to the hypothesis 3; and searching candidate parameter thresholds corresponding to the parameters with the percentile approximately equal to the percentile from the first corresponding relation according to the hypothesis 2, and determining the optimized first group of parameter thresholds according to the searched candidate parameter thresholds.
In addition, the step of determining the combination threshold partition table corresponding to the candidate parameter threshold may specifically include: determining a candidate parameter threshold combination of each parameter according to the candidate parameter threshold; calculating the ratio of the sample data volume corresponding to the candidate parameter threshold combination of each parameter to the total sample data volume as the sample proportion corresponding to the candidate parameter threshold combination of each parameter; establishing a corresponding relation between the candidate parameter threshold value and the percentile where the candidate parameter threshold value is located, and a mapping relation between the corresponding relation and the sample proportion; and constructing a combined threshold dividing table corresponding to the candidate parameter threshold according to the corresponding relation and the mapping relation.
For example, the candidate parameter threshold for parameter a includes
Figure GDA0004094978430000081
Parameter b the candidate parameter threshold comprises>
Figure GDA0004094978430000082
An n × m candidate parameter threshold combination matrix can be combined, as shown below:
Figure GDA0004094978430000083
for each element (candidate parameter threshold combination) in the combination matrix, a corresponding sample proportion may be calculated, representing the ratio of the sample data amount to the total sample data amount for which both parameters are greater than the corresponding candidate parameter threshold:
Figure GDA0004094978430000084
the correspondence between the candidate parameter threshold of the parameter a and the percentile where the candidate parameter threshold is located may be:
Figure GDA0004094978430000091
the correspondence between the candidate parameter threshold of the parameter b and the percentile where the candidate parameter threshold is located may be: />
Figure GDA0004094978430000092
The mapping relationship between the correspondence relationship and the sample fraction may be: h 11 Respectively and->
Figure GDA0004094978430000093
And &>
Figure GDA0004094978430000094
Mapping; …; h is respectively associated with->
Figure GDA0004094978430000095
And &>
Figure GDA0004094978430000096
…H nm Are respectively connected with
Figure GDA0004094978430000097
And &>
Figure GDA0004094978430000098
Mapping; the determined combination threshold partition table may be as shown in table 1:
TABLE 1
Figure GDA0004094978430000099
Thus, with respect to step 2, a sample ratio H approximately equal to the sample ratio H before tuning can be found from Table 1 according to hypothesis 3, e.g., H is equal to the sample ratio H approximately equal to the sample ratio H before tuning i-1j 、H ij 、H ij-1 According to H i-1j The corresponding correspondence can be found:
Figure GDA00040949784300000910
according to H ij The corresponding correspondence can be found: />
Figure GDA00040949784300000911
According to H ij-1 The corresponding correspondence can be found: />
Figure GDA00040949784300000912
From the above correspondence r can be found according to the hypothesis 2 i And r j Is relatively close, then->
Figure GDA00040949784300000913
I.e. candidate parameter threshold of parameter a to be searched,/>
Figure GDA00040949784300000914
Namely a candidate parameter threshold value of the parameter b to be searched; finally->
Figure GDA00040949784300000915
I.e. the first set of candidate thresholds after tuning, i.e. the threshold (x) for a given parameter 1 ,y 1 ) One of the results of the tuning was performed. In a bank embezzlement application scenario, a call is placed>
Figure GDA00040949784300000916
The consumption amount threshold and the consumption stroke number threshold corresponding to the medium risk level can be set. />
3. And determining other groups of adjusted parameter thresholds according to the adjusted first group of parameter thresholds, the combined threshold partition table and the hypothesis 1.
For the embodiment of the present invention, the step 3 specifically includes: calculating sample ratios corresponding to other groups of parameter thresholds according to the sample ratio corresponding to the adjusted first group of parameter thresholds and the hypothesis 1; and searching a second corresponding relation corresponding to the sample proportion corresponding to the other group of parameter thresholds from the combined threshold division table, and determining the other group of parameter thresholds according to the second corresponding relation.
For example, the first set of candidate thresholds may correspond to a sample fraction of P i-1 (P herein) i-1 With step 2H above ij Identical in meaning, and P is used therefor in order to avoid confusion i-1 The sample proportion H corresponding to the first group of candidate threshold values after adjustment is shown ij ) In the embodiment of the present invention, according to assumption 1, the tuning inequality may be set, and the sample ratio of the level corresponding to the lower set of parameter thresholds is assumed to be P i The tuning inequality can satisfy:
Figure GDA0004094978430000101
can calculate according to the inequalityGo out of P i According to P i Returning to the combined threshold partition table again to find the corresponding sample proportion; according to the found sample ratio, the candidate parameter threshold values of the two parameters can be selected; according to the selected candidate parameter threshold, the lower group parameter threshold and other group parameter thresholds can be determined, and in a bank embezzlement application scene, the lower group parameter threshold and other group parameter thresholds can be determined according to the corresponding risk grade
Figure GDA0004094978430000102
Determining the spending amount threshold and the spending number threshold corresponding to the high risk level, such as ^ greater or greater>
Figure GDA0004094978430000103
It should be noted that, the embodiment of the present invention is only illustrated by using 2 parameters, and if there are multiple parameters, the above steps are repeatedly iterated until each group of thresholds is obtained, which is not described herein again. In addition, in parameter threshold tuning calculation in various practical application scenarios, the sample data size, the percentile and the sample ratio are not usually integers but may be numerical values with multi-bit decimal numbers, and the sample data size, the percentile of the parameter threshold and the sample ratio before and after tuning at different levels are difficult to be equal to each other.
205. And determining the parameter threshold value of the grade according to the adjusted parameter threshold value.
Compared with the parameter threshold value of which the grade is determined according to the business experience of a business party or an expert at present, the other parameter threshold value determination method provided by the embodiment of the invention can determine the candidate parameter threshold value of each parameter according to the data distribution of the sample data on each parameter; the parameter threshold given according to the business experience can be tuned and optimized by utilizing the preset tuning hypothesis and the candidate parameter threshold, and the parameter threshold for grade evaluation is determined according to the tuned and optimized parameter threshold, so that the parameter threshold given according to the business experience can be tuned and optimized based on the actual condition of data distribution, the accuracy of the parameter threshold for grade evaluation is improved, and the accuracy of data grade evaluation can be improved.
Further, as a specific implementation of fig. 1, an embodiment of the present invention provides a parameter threshold determining apparatus, as shown in fig. 3, where the apparatus includes: a determination unit 31 and a tuning unit 32.
The determining unit 31 may be configured to determine the candidate parameter threshold of each parameter according to data distribution of sample data on each parameter, where the determining unit 31 is a main functional module in the present apparatus that determines the candidate parameter threshold of each parameter according to data distribution of sample data on each parameter.
The tuning unit 32 may be configured to tune a parameter threshold given according to business experience by using a preset tuning hypothesis and the candidate parameter threshold. The tuning unit 32 is a main functional module, which is also a core module, for tuning the parameter threshold given according to the business experience by using the preset tuning hypothesis and the candidate parameter threshold.
The determining unit 31 may be configured to determine a ranked parameter threshold according to the tuned parameter threshold. The determination unit 31 is also the main functional module of the present device for determining the ranked parameter threshold value according to the tuned parameter threshold value.
For the embodiment of the present invention, the determining unit 31 may be specifically configured to calculate statistics of sample data on each parameter according to data distribution of the sample data on each parameter; and determining a statistic line graph corresponding to each parameter according to the statistic, and determining a candidate parameter threshold of each parameter according to a threshold corresponding to each inflection point in the statistic line graph. The statistics may be one or more of quantiles, intra-group dispersion, or inter-group distance.
For the embodiment of the present invention, the preset tuning hypothesis includes:
assume that 1: if the grade to be evaluated has p grades, then N p :N p-1 ≈N p-1 :N p-2 ≈...≈N 2 :N 1 Said N is p Indicating the sample data volume contained in the p level;
assume 2: if the number of the parameters is h, the adjusted and optimized parameter threshold values of all levels of each parameter accord with
Figure GDA0004094978430000111
The R is hp The percentile of the parameter threshold value of the pth grade representing the pth parameter;
assume that 3: and the sample ratio corresponding to the adjusted parameter threshold combination is approximately equal to the sample ratio before adjustment.
For the embodiment of the present invention, the tuning unit 32 may include: a calculation module 321, a selection module 322 and a determination module 323, as shown in fig. 4.
The calculating module 321 may be configured to calculate a ratio of the sample data size of the parameter threshold combination corresponding to the given parameter threshold to the total sample data size, as a sample proportion before tuning.
The selecting module 322 may be configured to select, according to the combination threshold partition table corresponding to the hypothesis 3, the hypothesis 2, and the candidate parameter threshold, a candidate parameter threshold combination in which the sample ratio is approximately equal to the sample ratio before tuning and the percentile corresponding to the candidate parameter threshold is approximately equal to the sample ratio before tuning, as the first set of parameter thresholds after tuning.
The determining module 323 may be configured to determine other adjusted sets of parameter thresholds according to the adjusted first set of parameter thresholds, the combined threshold partition table, and the hypothesis 1.
In a specific application scenario, the selecting module 322 may include: a determination sub-module 3221 and a lookup sub-module 3222.
The determining sub-module 3221 may be configured to determine a combined threshold partition table corresponding to the candidate parameter threshold, where the combined threshold partition table stores a correspondence between the candidate parameter threshold and a percentile where the candidate parameter threshold is located, a sample proportion corresponding to a candidate parameter threshold combination of each parameter, and a mapping relationship between the correspondence and the sample proportion.
The searching sub-module 3222 may be configured to search, according to the hypothesis 3, a sample ratio approximately equal to the sample ratio before tuning and a first corresponding relationship corresponding to the searched sample ratios from the combination threshold dividing table.
The determining sub-module 3221 may further be configured to search, according to the hypothesis 2, candidate parameter thresholds corresponding to the parameters with percentiles approximately equal to each other from the first corresponding relationship, and determine the first set of optimized parameter thresholds according to the searched candidate parameter thresholds.
For the embodiment of the present invention, the determining unit 31 may be specifically configured to calculate sample ratios corresponding to other sets of parameter thresholds according to the sample ratio corresponding to the tuned first set of parameter thresholds and the assumption 1; and searching a second corresponding relation corresponding to the sample proportion corresponding to the other groups of parameter thresholds from the combined threshold division table, and determining the other groups of parameter thresholds according to the second corresponding relation.
In addition, in order to determine the combination threshold partition table corresponding to the candidate parameter threshold, the determining sub-module 3221 may be specifically configured to determine a candidate parameter threshold combination of each parameter according to the candidate parameter threshold; calculating the ratio of the sample data size corresponding to the candidate parameter threshold combination of each parameter to the total sample data size as the sample proportion corresponding to the candidate parameter threshold combination of each parameter; establishing a corresponding relation between the candidate parameter threshold value and the percentile where the candidate parameter threshold value is located, and a mapping relation between the corresponding relation and the sample proportion; and constructing a combined threshold dividing table corresponding to the candidate parameter threshold according to the corresponding relation and the mapping relation.
It should be noted that other corresponding descriptions of the functional modules related to the parameter threshold determining apparatus provided in the embodiment of the present invention may refer to the corresponding description of the method shown in fig. 1, and are not described herein again.
Based on the method shown in fig. 1, correspondingly, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the following steps: determining candidate parameter thresholds of all parameters according to data distribution of sample data on all parameters; adjusting the parameter threshold value given according to the business experience by using a preset adjusting hypothesis and the candidate parameter threshold value; and determining the parameter threshold value of the grade according to the adjusted parameter threshold value.
Based on the above embodiments of the method shown in fig. 1 and the parameter threshold determination apparatus shown in fig. 3, an embodiment of the present invention further provides an entity structure diagram of a computer device, as shown in fig. 5, the apparatus includes: a processor 41, a memory 42, and a computer program stored on the memory 42 and executable on the processor, wherein the memory 42 and the processor 41 are both arranged on a bus 43 such that when the processor 41 executes the program, the following steps are performed: determining candidate parameter thresholds of all parameters according to data distribution of sample data on all parameters; adjusting the parameter threshold value given according to the business experience by using a preset adjusting hypothesis and the candidate parameter threshold value; and determining the parameter threshold value of the grade according to the adjusted parameter threshold value.
According to the technical scheme, the candidate parameter threshold of each parameter can be determined according to the data distribution of the sample data on each parameter; the parameter threshold given according to the business experience can be tuned by utilizing the preset tuning hypothesis and the candidate parameter threshold, and the parameter threshold for grade evaluation is determined according to the tuned parameter threshold, so that the parameter threshold given according to the business experience can be tuned based on the actual condition of data distribution, the accuracy of the parameter threshold for grade evaluation is improved, and the accuracy of the data grade evaluation can be improved.
It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (8)

1. A method for determining a parameter threshold, comprising:
determining candidate parameter threshold values of all parameters according to data distribution of sample data on all parameters;
and utilizing a preset tuning hypothesis and the candidate parameter threshold value to tune a parameter threshold value given according to business experience, wherein the preset tuning hypothesis comprises:
assume that 1: if the grade to be evaluated has p grades, N p :N p-1 ≈N p-1 :N p-2 ≈...≈N 2 :N 1 Said N is p Indicating the sample data size contained in the p-th level;
assume 2: if the number of the parameters is h, the adjusted and optimized parameter threshold values of all levels of each parameter accord with
Figure FDA0004094978420000011
The R is hp A percentile representing a parameter threshold value of the pth grade of the h parameter;
assume that 3: the sample proportion corresponding to the adjusted parameter threshold value combination is approximately equal to the sample proportion before adjustment;
wherein, the tuning the parameter threshold given according to the business experience by using the preset tuning hypothesis and the candidate parameter threshold comprises:
calculating the ratio of the sample data size of the parameter threshold combination corresponding to the given parameter threshold to the total sample data size as the sample proportion before tuning;
selecting a candidate parameter threshold combination with the sample ratio approximately equal to the sample ratio before the tuning and the percentile corresponding to the candidate parameter threshold approximately equal to the sample ratio before the tuning as a first group of parameter thresholds after the tuning according to the combination threshold division table corresponding to the hypothesis 3, the hypothesis 2 and the candidate parameter threshold;
determining other groups of parameter thresholds after tuning according to the first group of parameter thresholds after tuning, the combined threshold partition table and the hypothesis 1;
and determining the parameter threshold value of the grade according to the adjusted parameter threshold value.
2. The method according to claim 1, wherein the candidate parameter threshold value of each parameter is determined according to data distribution of sample data on each parameter;
calculating statistics of the sample data on each parameter according to data distribution of the sample data on each parameter;
determining a statistic line graph corresponding to each parameter according to the statistic;
and determining a candidate parameter threshold of each parameter according to the threshold corresponding to each inflection point in the statistic line graph.
3. The method according to claim 1, wherein said dividing the table according to the combination threshold values corresponding to hypothesis 3, hypothesis 2, and the candidate parameter threshold values, selecting a combination of the sample fraction and the candidate parameter threshold value having a sample fraction approximately equal to the sample fraction before tuning and a percentile corresponding to the candidate parameter threshold value approximately equal to the percentile as the first set of parameter threshold values after tuning, comprises:
determining a combined threshold dividing table corresponding to the candidate parameter threshold, wherein the combined threshold dividing table stores a corresponding relation between the candidate parameter threshold and the percentile where the candidate parameter threshold is located, a sample proportion corresponding to the candidate parameter threshold combination of each parameter, and a mapping relation between the corresponding relation and the sample proportion;
searching a sample ratio approximately equal to the sample ratio before tuning and a first corresponding relation corresponding to the searched plurality of sample ratios from the combined threshold value division table according to the hypothesis 3;
and searching candidate parameter thresholds corresponding to the parameters with the percentile approximately equal to the percentile from the first corresponding relation according to the hypothesis 2, and determining the optimized first group of parameter thresholds according to the searched candidate parameter thresholds.
4. The method of claim 3, wherein determining tuned other sets of parameter thresholds according to the tuned first set of parameter thresholds, the combined threshold partition table, and the hypothesis 1 comprises:
calculating sample ratios corresponding to other groups of parameter thresholds according to the sample ratios corresponding to the adjusted first group of parameter thresholds and the hypothesis 1;
and searching a second corresponding relation corresponding to the sample proportion corresponding to the other group of parameter thresholds from the combined threshold division table, and determining the other group of parameter thresholds according to the second corresponding relation.
5. The method of claim 3, wherein determining the combined threshold partition table corresponding to the candidate parameter threshold comprises:
determining a candidate parameter threshold combination of each parameter according to the candidate parameter threshold;
calculating the ratio of the sample data size corresponding to the candidate parameter threshold combination of each parameter to the total sample data size as the sample proportion corresponding to the candidate parameter threshold combination of each parameter;
establishing a corresponding relation between the candidate parameter threshold value and the percentile where the candidate parameter threshold value is located, and a mapping relation between the corresponding relation and the sample proportion;
and constructing a combined threshold dividing table corresponding to the candidate parameter threshold according to the corresponding relation and the mapping relation.
6. An apparatus for parameter threshold determination, comprising:
the determining unit is used for determining candidate parameter threshold values of all parameters according to data distribution of sample data on all parameters;
the tuning unit is used for tuning a parameter threshold value given according to business experience by using a preset tuning hypothesis and the candidate parameter threshold value, wherein the preset tuning hypothesis comprises:
assume that 1: if the grade to be evaluated has p grades, N p :N p-1 ≈N p-1 :N p-2 ≈...≈N 2 :N 1 Said N is p Indicating the sample data volume contained in the p level;
assume 2: if the number of the parameters is h, the adjusted and optimized parameter threshold values of all levels of each parameter accord with
Figure FDA0004094978420000031
The R is hp The percentile of the parameter threshold value of the pth grade representing the pth parameter;
assume 3: the sample proportion corresponding to the adjusted parameter threshold value combination is approximately equal to the sample proportion before adjustment;
the tuning of the parameter threshold value given according to the business experience by using the preset tuning hypothesis and the candidate parameter threshold value includes:
calculating the ratio of the sample data size of the parameter threshold combination corresponding to the given parameter threshold to the total sample data size as the sample proportion before tuning;
selecting a candidate parameter threshold combination with the sample ratio approximately equal to the sample ratio before the tuning and the percentile corresponding to the candidate parameter threshold approximately equal to the sample ratio before the tuning as a first group of parameter thresholds after the tuning according to the combination threshold division table corresponding to the hypothesis 3, the hypothesis 2 and the candidate parameter threshold;
determining other groups of adjusted and optimized parameter thresholds according to the adjusted and optimized first group of parameter thresholds, the combined threshold partition table and the hypothesis 1;
and the determining unit is used for determining the parameter threshold value of the grade according to the adjusted parameter threshold value.
7. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 5.
8. A computer arrangement comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the computer program realizes the steps of the method of any of claims 1 to 5 when executed by the processor.
CN201810797054.3A 2018-07-19 2018-07-19 Parameter threshold determining method and device and computer storage medium Active CN109272340B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201810797054.3A CN109272340B (en) 2018-07-19 2018-07-19 Parameter threshold determining method and device and computer storage medium
PCT/CN2018/111123 WO2020015216A1 (en) 2018-07-19 2018-10-21 Method and device for determining parameter threshold, and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810797054.3A CN109272340B (en) 2018-07-19 2018-07-19 Parameter threshold determining method and device and computer storage medium

Publications (2)

Publication Number Publication Date
CN109272340A CN109272340A (en) 2019-01-25
CN109272340B true CN109272340B (en) 2023-04-18

Family

ID=65152893

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810797054.3A Active CN109272340B (en) 2018-07-19 2018-07-19 Parameter threshold determining method and device and computer storage medium

Country Status (2)

Country Link
CN (1) CN109272340B (en)
WO (1) WO2020015216A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111178595B (en) * 2019-12-11 2023-03-24 深圳平安医疗健康科技服务有限公司 Project control parameter generation method and device, computer equipment and storage medium
CN112258093B (en) * 2020-11-25 2024-06-21 京东城市(北京)数字科技有限公司 Data processing method and device for risk level, storage medium and electronic equipment
CN114565231B (en) * 2022-02-07 2024-07-12 三一汽车制造有限公司 Method, device, equipment, storage medium and working machine for determining working party quantity

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107316204A (en) * 2017-05-27 2017-11-03 银联智惠信息服务(上海)有限公司 Recognize humanized method, device, computer-readable medium and the system of holding
CN107944156A (en) * 2017-11-29 2018-04-20 中国海洋大学 The choosing method of wave height threshold value

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106096226B (en) * 2016-05-27 2018-12-11 腾讯科技(深圳)有限公司 A kind of data assessment method, apparatus and server
JP6703264B2 (en) * 2016-06-22 2020-06-03 富士通株式会社 Machine learning management program, machine learning management method, and machine learning management device
CN107316205A (en) * 2017-05-27 2017-11-03 银联智惠信息服务(上海)有限公司 Recognize humanized method, device, computer-readable medium and the system of holding

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107316204A (en) * 2017-05-27 2017-11-03 银联智惠信息服务(上海)有限公司 Recognize humanized method, device, computer-readable medium and the system of holding
CN107944156A (en) * 2017-11-29 2018-04-20 中国海洋大学 The choosing method of wave height threshold value

Also Published As

Publication number Publication date
WO2020015216A1 (en) 2020-01-23
CN109272340A (en) 2019-01-25

Similar Documents

Publication Publication Date Title
CN109272340B (en) Parameter threshold determining method and device and computer storage medium
CN110457486A (en) The people entities alignment schemes and device of knowledge based map
CN109117742B (en) Gesture detection model processing method, device, equipment and storage medium
CN106919957B (en) Method and device for processing data
CN112819157B (en) Neural network training method and device, intelligent driving control method and device
CN106325756B (en) Data storage method, data calculation method and equipment
CN107633257B (en) Data quality evaluation method and device, computer readable storage medium and terminal
CN114091128B (en) Method and device for determining layout scheme and electronic equipment
CN112365007B (en) Model parameter determining method, device, equipment and storage medium
CN111242319A (en) Model prediction result interpretation method and device
CN105426425A (en) Big data marketing method based on mobile signaling
CN109754135B (en) Credit behavior data processing method, apparatus, storage medium and computer device
CN114037514A (en) Method, device, equipment and storage medium for detecting fraud risk of user group
CN109801073A (en) Risk subscribers recognition methods, device, computer equipment and storage medium
CN117407921A (en) Differential privacy histogram release method and system based on must-connect and don-connect constraints
CN113326255A (en) Method and device for screening effective test data, terminal equipment and storage medium
CN111160491A (en) Pooling method and pooling model in convolutional neural network
CN110363359A (en) A kind of occupation prediction technique and system
CN113905066B (en) Networking method of Internet of things, networking device of Internet of things and electronic equipment
CN115225543A (en) Flow prediction method and device, electronic equipment and storage medium
CN113177613A (en) System resource data distribution method and device
CN110362831B (en) Target user identification method, device, electronic equipment and storage medium
CN110134575B (en) Method and device for calculating service capacity of server cluster
CN112559737A (en) Node classification method and system of knowledge graph
CN106330745B (en) Traffic policy selection method and traffic policy selection device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant