CN108346098B - Method and device for mining wind control rule - Google Patents

Method and device for mining wind control rule Download PDF

Info

Publication number
CN108346098B
CN108346098B CN201810053792.7A CN201810053792A CN108346098B CN 108346098 B CN108346098 B CN 108346098B CN 201810053792 A CN201810053792 A CN 201810053792A CN 108346098 B CN108346098 B CN 108346098B
Authority
CN
China
Prior art keywords
type
feature
sample
learning
variables
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810053792.7A
Other languages
Chinese (zh)
Other versions
CN108346098A (en
Inventor
孙清清
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Advanced New Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Advanced New Technologies Co Ltd filed Critical Advanced New Technologies Co Ltd
Priority to CN201810053792.7A priority Critical patent/CN108346098B/en
Publication of CN108346098A publication Critical patent/CN108346098A/en
Application granted granted Critical
Publication of CN108346098B publication Critical patent/CN108346098B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Business, Economics & Management (AREA)
  • Technology Law (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Evolutionary Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Physiology (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a method and a device for mining wind control rules, which can determine characteristic values of learning samples corresponding to characteristic types according to the preset characteristic types, and the characteristic values serve as variables of the characteristic types, then, the characteristic types and the variables of the characteristic types are screened through a genetic algorithm, the designated characteristic types and the designated variables of the designated characteristic types are determined, and finally, a first-order rule learning algorithm is adopted according to the learning samples, the designated characteristic types and the designated variables of the designated characteristic types to generate the wind control rules.

Description

Method and device for mining wind control rule
Technical Field
The application relates to the technical field of information, in particular to a method and a device for mining a wind control rule.
Background
Money laundering, a act of legalizing the illegal gain, mainly refers to masking and hiding the illegal gain and the income generated by the illegal gain by various means through financial institutions, so that the illegal gain is legalized formally, and belongs to illegal criminal activities.
At present, money laundering activities are usually implemented by financial institutions, so that the financial institutions, as a first line of anti-money laundering, usually recognize received transaction requests through configured recognition rules to refuse to execute transactions with suspicion of money laundering, prevent money laundering behaviors, or, when the transactions with suspicion of money laundering are determined, deposit data of the transactions for subsequent investigation.
However, due to existing identification rules, it is usually set manually based on empirically or historically determined data of money laundering transactions. The manually set recognition rules are often inaccurate, so that the money laundering is inefficient and the recognition accuracy is low based on the existing recognition rules.
It can be seen that there is a need for a method of mining wind-controlled rules on demand to mine wind-controlled rules for identifying money laundering activities.
Disclosure of Invention
The embodiment of the specification provides a method and a device for mining a wind control rule, which are used for solving the problems that the existing identification rule set manually is not accurate usually, so that the efficiency is low when money is washed back based on the existing identification rule, and the identification accuracy is low.
The embodiment of the specification adopts the following technical scheme:
a method of wind-controlled rule mining, comprising:
determining a characteristic value of each learning sample corresponding to each preset characteristic type as a variable of the characteristic type;
selecting at least part of feature types from the feature types through a genetic algorithm to be used as specified feature types, and selecting at least part of variables from variables of the feature types aiming at the feature types to be used as specified variables of the feature types;
and generating a wind control rule by adopting a first-order rule learning algorithm according to each learning sample, each selected specified characteristic type and each specified variable of the specified characteristic type.
A device for wind-controlled regular excavation, comprising:
the determining module is used for determining the characteristic value of each learning sample corresponding to each preset characteristic type as a variable of the characteristic type;
the selection module selects at least part of feature types from the feature types through a genetic algorithm to serve as designated feature types, and selects at least part of variables from the variables of the feature types to serve as designated variables of the feature types aiming at the feature types;
and the generation module is used for generating the wind control rule by adopting a first-order rule learning algorithm according to each learning sample, each selected specified characteristic type and each specified variable of the specified characteristic type.
A server, wherein the server comprises: one or more processors and memory, the memory storing a program and configured to perform, by the one or more processors:
determining a characteristic value of each learning sample corresponding to each preset characteristic type as a variable of the characteristic type;
selecting at least part of feature types from the feature types through a genetic algorithm to be used as specified feature types, and selecting at least part of variables from variables of the feature types aiming at the feature types to be used as specified variables of the feature types;
and generating a wind control rule by adopting a first-order rule learning algorithm according to each learning sample, each selected specified characteristic type and each specified variable of the specified characteristic type.
The embodiment of the specification adopts at least one technical scheme which can achieve the following beneficial effects:
with the method and the device provided by the present specification, first, for each preset feature type, a variable of the feature type can be determined according to each learning sample. And then, screening each characteristic type and the variable of each characteristic type through a genetic algorithm, and determining the specified characteristic type and each specified variable of the specified characteristic type. And finally, generating a wind control rule by adopting a first-order rule learning algorithm according to each learning sample, each specified characteristic type and each specified variable of the specified characteristic type. Because each specified feature type and each corresponding specified variable which form the wind rule are determined by screening optimization according to the feature selection algorithm, the recognition effect of the wind control rule generated based on each specified feature type is better. The shortcoming of manual rule setting in the prior art can be avoided, and the efficiency and the recognition accuracy rate of money backwashing according to the wind control rules are improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a process for mining a wind-controlled rule provided by an embodiment of the present disclosure;
FIG. 2 is a flowchart of an optimization process of a genetic algorithm provided by an embodiment of the present disclosure;
FIGS. 3a to 3d are schematic diagrams of crossover operations provided herein;
FIG. 4 is a schematic diagram of generating a wind-controlled rule through a first-order rule learning algorithm according to an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of a wind-controlled regular digging device provided in an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of a server according to an embodiment of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the present disclosure more apparent, the technical solutions of the present disclosure will be clearly and completely described below with reference to the specific embodiments of the present disclosure and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person skilled in the art without making any inventive step based on the embodiments in the description belong to the protection scope of the present application.
The technical solutions provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings.
Fig. 1 is a process of mining a wind-controlled rule provided in an embodiment of the specification, which may specifically include the following steps:
s100: and determining the characteristic value of each learning sample corresponding to each preset characteristic type as a variable of the characteristic type.
In one or more embodiments of the present disclosure, the mining of the wind control rules may be used to mine rules for risk control of various types of risks, such as rules for fraud transaction prevention and rules for loan transaction control, and for convenience of description, the following description will take the example that the mining process of the wind control rules is used to generate risk rules for identifying money laundering transactions. The process of wind-controlled mining of rules may then be performed by financial institutions, regulatory agencies, law enforcement agencies, etc., agencies or departments participating in anti-money laundering. The wind-control rule mining process is performed, for example, by a server of a bank, a terminal of an economic crime investigation institution, or the like.
For convenience of description, the following description will be given by taking a process in which a server of a bank performs the mining of the wind control rule as an example.
In addition, since money laundering is mainly achieved through a transaction (e.g., a transaction) performed at a financial machine, in the present specification, the wind-controlled rule generated through the wind-controlled rule mining process may be a wind-controlled rule that identifies money laundering transactions according to business data, which may be data required when a server performs a received service request.
Thus, to make the generated wind control rules more accurate, the server may first determine each learning sample from the historical data. And then, aiming at each preset characteristic type, determining a characteristic value of each learning sample corresponding to the characteristic type as a variable of the characteristic type so as to be convenient for the execution of the subsequent steps. Specifically, the server may use the service data of a plurality of transactions performed historically as the learning samples respectively. Namely, the business data of each transaction is taken as an independent learning sample. The business data of the transaction may include: personal information (e.g., name, age, sex, address, contact address, etc.) of both parties to the transaction, transaction amount, Internet Protocol (IP) addresses of both parties when the transaction is performed, country to which the IP address belongs, administrative division to which the IP address belongs, etc.
Further, the financial institution may perform anti-money laundering review (e.g., identifying whether it belongs to money laundering or not by preset wind-control rules) for each transaction request, and add the review result to the transaction data, usually for the purpose of anti-money laundering. For the transaction identified as money laundering, reporting to the international money laundering organization is also required, so that the information related to money laundering staff in the business data of the transaction is added to the sanctioning list. Therefore, when the server determines the learning sample according to the historical data, the server may further take the business data of which the examination result is the money laundering transaction as a positive sample and the business data of which the examination result is not the money laundering transaction as a negative sample according to the examination result of the money laundering transaction in the business data of each transaction, as shown in table 1.
Table 1 is a schematic diagram of each learning sample determined by the server provided in the implementation of the present specification:
sample identification Service data Examination of the results
001 The initiator: a user a; the receiving side: a user f; IP country: US; … … P
002 The initiator: a user b; the receiving side: a user e; IP country: RU; … … P
003 The initiator: a user c; the receiving side: a user h; IP country: UA; … … N
004 The initiator: a user a; the receiving side: a user i; IP country: CN; … … P
005 The initiator: a user d; the receiving side: a user j; IP country: UK; … … N
006 The initiator: a user e; the receiving side: a user k; IP country: DE; … … N
TABLE 1
As can be seen from table 1, each learning sample corresponds to business data of one transaction, and can be divided into positive sample and negative sample according to the examination result. If the identification is P, the transaction is a money laundering transaction, and if the identification is N, the transaction is not a money laundering transaction. That is, what is labeled as P is a positive example sample, and what is labeled as N is a negative example sample. For convenience of description, the above-mentioned N and P representation methods will be continuously adopted in the following description.
Then, since each service data in the positive example sample includes a feature for identifying money laundering transactions, the server may determine, from each learning sample, a feature value of each learning sample corresponding to the feature type as a variable of the feature type for each preset feature type, so as to further determine, through subsequent steps, each specified feature type and each specified variable of the specified feature type for forming the wind control rule.
In this specification, since each transaction corresponds to a plurality of kinds of business data, and different kinds of business data have different effects on identifying money laundering transactions, each kind of business data can be used as a feature type, and possible feature values of each kind of business data can be used as variables of the feature type, so as to determine specified feature types (for example, a feature type having a greater effect on identifying money laundering transactions and a variable having a greater effect on identifying money laundering transactions) for generating the pneumatic control rules through subsequent steps.
For example, assuming that money laundering transactions are typically transfer transactions initiated from a user in a certain part of the united states to a user in a certain part of the united kingdom, then the country to which the IP address in the transaction data belongs, which is in the united states or the united kingdom, is of great use in identifying whether the transaction is a money laundering transaction. Alternatively, assuming that money laundering transactions are typically initiated at the bank's local time at 58 pm, then the transaction time in the transaction data, and the transaction time at 58 pm, is more useful for identifying whether the transaction is a money laundering transaction.
Then, the server may determine, for each preset feature type, a feature value of each learning sample corresponding to the feature type as a variable of the feature type, thereby determining each variable corresponding to each feature type.
Specifically, since the business data of the transaction service is the private data of each financial institution, generally, only the business data of the transaction service executed by itself exists in the history data of each financial institution, and the business data of other financial institutions cannot be acquired. Therefore, the learning samples are determined only through the historical data of the learning samples, so that the richness degree of the learning samples and the coverage degree of money laundering transactions are insufficient, and the accuracy of subsequently generated wind control rules is reduced.
For example, assume that a money laundering transaction is conducted by bank a through a device of the same IP address, while a normal transaction is performed by bank b. Then in determining the learning sample, bank a may determine that the transaction is a positive sample, the characteristics of the IP address also belong to the characteristics of the positive sample, and for bank b, since the IP address was not recorded to initiate a money laundering transaction, the learning sample may be recorded as a negative sample.
Therefore, in this specification, when determining each preset feature type, the server may determine, according to a pre-configured money laundering sanction list, a learning sample (specifically, a positive example sample) in addition to each type of service data as each feature type, and also determine, according to a matching result between information in the money laundering sanction list and each service data, each feature type, and enrich each preset feature type.
Specifically, the money laundering sanctioning list may be information published by the international anti-money laundering organization that has been determined to be a money laundering transaction, and may include business data of each transaction that has been determined to be a money laundering transaction (e.g., personal information, IP addresses, transaction times, etc. of both parties involved in the money laundering transaction). The server can determine the proper samples according to the business data of the transactions contained in the money laundering sanction list. On the other hand, whether each service data is matched with the data of the money laundering sanctioning list or not can be used as each characteristic type, and each matching result can be used as a variable.
Specifically, the server may match, for each learning sample, the corresponding service data with information in the money laundering sanctioning list, and use the matching result of each service data as a variable of the feature type.
For example, for each learning sample, the name of the initiator in the learning sample is matched with each name in the money laundering sanctioned list, the IP address of the initiator in the learning sample is matched with each IP address in the money laundering sanctioned list, the birthday date of the initiator in the learning sample is matched with each birthday date in the money laundering sanctioned list, and the like. And variables of the feature types of the learning sample according to the processing result, such as 0 represents mismatch and 1 represents match.
S102: selecting at least part of feature types from the feature types as specified feature types through a genetic algorithm, and selecting at least part of variables from variables of the feature types as specified variables of the feature types for the feature types.
In this specification, as in step S100, the different feature types have different effects on determining whether the transaction is a money laundering transaction, so that in order to improve the accuracy of the subsequent generation of the wind control rule and avoid the influence of the unstable feature types on the generation of the wind control rule, the server may further determine, through a genetic algorithm, each of the designated feature types and each of the variables of each of the designated feature types that are used to constitute the wind control rule. The unstable feature types are feature types with similar probability of appearing in the positive example sample and the negative example sample. For example, if a certain feature type is a feature possessed by most positive examples and most negative examples at the same time, the feature has a small effect on determining whether the transaction is a money laundering transaction, and therefore, the feature can be filtered out through genetic algorithm optimization. Similarly, for each variable of a feature type, the optimization filtering can be performed by a genetic algorithm.
Specifically, in the present specification, each learning sample may be regarded as a population, each learning sample may be regarded as an individual, each feature type in each learning sample may be regarded as each gene included in the individual, and a variable of a feature type may be regarded as a variable of a gene. The server can determine each feature type and the assigned variable of each feature type through the optimization process of the genetic algorithm as shown in fig. 2, and comprises the following steps:
s1020: and carrying out feature coding on each learning sample.
Firstly, for convenience, the optimization server can perform feature coding on the feature type of each learning sample aiming at each learning sample, so as to determine the specified feature type for forming the wind control rule according to a feature selection algorithm.
In particular, the server may target feature type FiThe feature type FiThe variables of (2) are divided into three domains: operator field OiValue range ViAnd scope Ei. Wherein, FiDenotes the ith feature type, OiAn operator representing the ith syndrome type, the operator may include: "in", "═ and" not in "mean respectively inclusive, equal and exclusive, ViCharacteristic value representing the ith type of feature, EiIndicating whether the ith special type exists in the learning sample, e.g., 0 indicates absence and 1 indicates presence.
Assume that, taking learning sample 001 in table 1 as an example, after feature coding, a feature structure shown in table 2 is obtained:
Figure BDA0001553127110000081
TABLE 2
Wherein, corresponding to the result of encoding some feature types in the learning sample 001, table 2 shows F1F 44 feature codes, and the content is the variable of the feature code corresponding to each feature type. For each learning sample, a series of feature encodings for the learning sample can be determined. Thus, after feature encoding each feature type, a population as shown in table 3 can be obtained.
Study sample 1 F1 F2 …… Fm P
Study sample 2 F1 F2 …… Fm P
Study sample 3 F1 F2 …… Fm N
…… …… …… …… …… ……
Learning sample X F1 F2 …… Fm P
TABLE 3
Where a total of X learning samples are seen, one learning sample per behavior in table 3, i.e., an individual in the genetic algorithm, and P and N are the results of the examination of the learning samples as previously described.
S1021: and carrying out competitive selection on each learning sample, and taking the selected learning sample as an individual in the population.
The server can perform competitive selection on the population and keep the individuals with higher fitness in the population. Specifically, the server may use each learning sample as a sample to be selected (i.e., an individual), calculate the fitness of the sample to be selected, and then screen out a preset number of samples to be selected as execution objects of subsequent steps according to the calculated fitness of each sample to be selected. For example, a preset number of samples to be selected are selected according to the calculated fitness of each sample to be selected from a high order to a low order, or a sample to be selected with a fitness higher than a fitness threshold is selected according to a preset fitness threshold, and so on. Of course, how to select the sample to be selected according to the fitness may be specifically set according to needs, which is not limited in this specification.
In addition, in the present specification, the fitness of each sample to be selected may be determined according to the sum of the fitness of the variables of each feature type in the sample to be selected. And the fitness of the variable of each characteristic type can be calculated according to a fitness formula. Specifically, the fitness formula may be: fitnessji=Niplog2(Nip/(Nip+10Nin) Wherein, fitnessiDenotes the fitness of the ith feature type, NipThe number of times that the variable representing the ith feature type appears in each positive case sample, NinThe number of times the variable representing the ith feature type appears in the negative example sample. The server may then, according to the formula:
Figure BDA0001553127110000091
determining the fitness of each sample to be selected, wherein the fitness is fixedjDenotes the fitness, of the jth sample to be selectedjiAnd representing the fitness of the variable of the ith characteristic type in the jth sample to be selected.
S1022: and adjusting the population to determine at least one new population.
Then, the service can perform at least one of the operations of copying, crossing and mutation on the population screened in the previous step to obtain a new population. In the present specification, the symbol GkRepresents a low K generation population, then G0Indicates the population that has undergone fitness screening for the first time. For convenience of understanding, in the process of optimization through a genetic algorithm, the present specification substitutes an individual for the sample to be selected in step S1021, substitutes a gene for a feature type, and describes a set of the selected samples to be selected as a population, where a variable of the feature type is a variable of the gene.
Specifically, the following description is made for each operation:
the copying operation is that the server carries out random copying on the individuals in the population to obtain a new population G'0Wherein the replication probability of random replication can be set according to the requirement, such as 50%. That is, there is a 50% probability of each individual copying into the new population G'0In (1).
Cross-operation as server slave population G0The individuals in the group are paired pairwise, and whether each pair of individuals executes the cross operation is determined according to the cross probability. Specific crossover operations may include: switching and merging. That is, two of each pair of individuals performing the crossover operation are determinedThe allogenic genes (i.e., the allogenic trait types) of an individual, and the exchange or merging of the variables of the allogenic genes to obtain a new individual. And the new individuals obtained after the cross operation are taken as a new population G "0Of (a). Wherein the probability of the interleaving operation can also be set as required, e.g. 90%. That is, the servers operate interleaved with each pair of individuals with a 90% probability.
It should be noted that in the present specification, the types of variables may include: a first type and a second type. Wherein the first type may be a binary type, i.e. a variable with only two values. The second type may be: one of enumerated or combined types. Specifically, the binary variable may be a matching result, the enumerated variable may be one of a plurality of contents (e.g., a resident country of the initiator, a country to which the IP address belongs, etc.), and the combined variable may be one or more of a plurality of contents. Additionally, variable types may also include: the discrete variable may be discrete data (e.g., date of birth, age, amount, etc.). Of course, the server may also divide the features into more categories as needed, and this description does not limit this.
For cross-operations, the server may swap variables for the same genes whose variables in each pair of individuals are binary-type variables, or merge the variable rows for the same genes whose variables in each pair of individuals are enumeration-type variables and combination-type variables. Wherein, which homologous genes are specifically selected for exchange or combination can be set according to requirements without limitation.
For example, assume for population G shown in FIG. 3a0Two individuals 002 and 008, the server selects these two individuals to do crossover operations for the same gene exchange to determine the new individual. The server may then determine that the variable in the individual is a gene that is homologous to the binary variable, such as the signature type F shown in FIG. 3a3And feature type F5Is a binary variable, and the value ranges V corresponding to the individuals 002 and 008 are respectively3And V5Interchanging to determine new two individuals as a new population G "0As shown in fig. 3 b.Of course, the value ranges of different feature types of the binary variable included in each individual can be interchanged, as shown in fig. 3 c. Of course, when value ranges of different feature types are exchanged, because the representation forms of variables of different feature types are different (for example, the variables are the variables representing countries such as "US", "RU", "DE", and the variables representing time such as "10: 10 AM", "9: 15 PM", and the representation forms of the variables are different), in order to prevent the exchange between the variables representing different forms from causing gene malformation of newly generated individuals (for example, the country to which the IP belongs is 10:10AM, and the transaction initiation time is US), the server can also determine the exchange between the variables representing the same form according to the representation forms of the variables of each feature type, and how to set the specification is not limited in detail.
Alternatively, if the server combines the same genes to determine a new individual, the genes whose variables are combined variables in two individuals of individuals 002 and 008 can be targeted, as shown in feature type F in FIG. 3a2And feature type F4The individual 002 and 008 are respectively corresponded to the value range V2Merging, and respectively corresponding threshold values V of the individuals 002 and 0085Are combined to determine new individuals as a new population G "0As shown in fig. 3 d.
Variant operation as server slave population G0According to the mutation probability, selecting the individuals to be mutated from each individual, and adjusting the scope of at least one gene (namely, characteristic type) in the selected individuals, namely, whether the gene exists in the individual, so as to obtain a new individual as a new population G "0Of (a). The selection probability of the mutation operation can be set according to needs, and the specification is not limited, for example, the selection probability can be set to 10%. Wherein, for each gene, when the server adjusts the variable of the gene from not existing in the individual to existing in the individual, one variable can be selected as the variable of the gene of the individual after adjustment according to the frequency of occurrence of each variable in the gene. For example, the variable with the highest frequency of occurrence is selected as the variable of the gene.
S1023: and determining a next generation population by competitive selection according to the population and a new population generated by the population.
Thereafter, the server may retrieve the population G 'according to'0、G”0、G”0And a population G0Taking the sample in at least one population as a sample to be selected, and carrying out competitive selection again to obtain a next generation population G1. The process of performing the competitive selection may be the step S1021, which is not described in detail herein.
S1024: and judging whether the iteration times of the population determined in the step S1023 reach a preset value, if so, executing a step S1025, and if not, executing a step S1022.
S1025: and outputting the determined population as a result.
Finally, the server may repeat the above process through a genetic algorithm until the number of iterations reaches a preset value, where the preset value may be set as needed, and this specification does not limit this. For example, if the preset value is 5000, G is determined5000Then the population is output as a result. In this case, the genes included in each individual in the population are genes that have been retained by competitive selection, and the types of variables in each retained gene are reduced by the above-described duplication, crossover, and mutation operations. Each feature type included in the population is a designated feature type used for generating the wind control rule, and each variable of each designated feature type is a designated variable.
In addition, in the present specification, the server may select at least one operation from the replication, crossover, and mutation operations to execute during each iteration of the population.
S104: and generating a wind control rule by adopting a first-order rule learning algorithm according to each learning sample, each selected specified characteristic type and each specified variable of the specified characteristic type.
In the embodiment of the description, after determining each specified feature type through a feature learning algorithm such as a genetic algorithm, the server may generate the wind control rule through a first-order rule learning algorithm according to the labels of the positive example sample and the negative example sample in each learning sample.
Specifically, the server may first re-determine each positive example and each negative example according to each specified feature type, that is, re-determine the feature type of each learning sample according to the business data and the money laundering sanctioning list corresponding to each learning sample.
The server may then generate a plurality of wind-controlled rules using a First Order rule learning algorithm (Foil).
When the server generates the wind control rule through the Foil algorithm, the training set can be determined according to the positive sample and the negative sample. And the following steps are performed, as shown in fig. 4:
s1040: initializing a wind control rule set, taking each positive sample as a positive sample set, and taking each negative sample as a negative sample set;
s1041: judging whether the sample set of the positive case is empty, if so, executing a step S1048, otherwise, executing a step S1042;
s1042: according to the Foil algorithm, determining a specified variable from the specified variables, wherein the specified variable is used as a specified variable contained in the newly-built wind control rule;
s1043: judging whether the matching degree of the newly built wind control rule and the negative sample set is lower than a preset threshold, if so, executing a step S1045, and if not, executing a step S1044;
s1044: according to a Foil algorithm, determining a specified variable from the specified variables, and adding the specified variable to the newly-built wind control rule;
s1045: updating the negative sample set according to the newly-built wind control rule after the specified variable is added, and repeatedly executing the step S1043;
s1046: adding the newly-built wind control rule into a wind control rule set;
s1047: deleting the positive examples samples matched with the wind control rules in the wind control rule set in the positive example sample set according to the wind control rule set, and repeatedly executing S1041;
s1048: and taking the wind control rules in the wind control rule set as a plurality of generated wind control rules until the positive sample set is judged to be empty.
In step S1043, the server may use a formula:
Figure BDA0001553127110000141
and determining the gain value of each designated variable, and taking the designated variable with the highest gain value as the designated variable added with the air intake control rule. Wherein, PnewThe number of the regular example samples matched with the newly-built wind control rule after the specified variable is added into the newly-built wind control rule is shown, NnewThe quantity P of the negative example samples matched with the newly-built wind control rule after the specified variable is added into the newly-built wind control rule is shownoldRepresenting the number of the positive example samples matched with the newly-built wind control rule when the specified variable is not added into the newly-built wind control rule, NoldAnd the number of negative example samples matched with the newly-built wind control rule when the specified variable is not added into the newly-built wind control rule is represented.
It should be noted that, in this specification, a specific variable is added to a wind control rule, and specifically, a specific feature type corresponding to the specific variable may be added to the wind control rule. For example, specifying a feature type as the country to which the IP belongs, the specified variables may include: "US", "CN", "UK", and the like, when the server selects the specified variable "US" and adds the specified variable to the wind control rule, it may be specifically that the country to which the IP belongs is US added to the wind control rule. Therefore, the wind control rules generated through the specification can be regarded as wind control rules consisting of different conditions, and when business data are matched with the conditions of any one wind control rule, the gel coat corresponding to the business data can be determined to be money laundering transaction.
For example, assuming that some wind-controlled rule is generated such that the IP from which the transaction is initiated belongs to the US and the initiator name is user a, when user a initiates a transaction from the US, it may be determined that the transaction is a money laundering transaction.
In addition, in this specification, the matching degree of the new wind control rule and the negative example sample set, the number of positive example samples matching with the new wind control rule, and the ratio of the number of negative example samples matching with the new wind control rule are determined. For example, if a new wind control rule matches 1000 positive example samples and 1 negative example sample, the matching degree of the new wind control rule and the negative example sample set is 0.1%.
Of course, in this specification, the preset threshold may be set as needed, or when the server determines whether the matching degree between the new wind control rule and the negative example sample set is higher than the preset threshold, the server may also determine whether the new wind control rule is not matched with any negative example sample set. This is not limited in this specification.
Based on the wind control rule mining process shown in fig. 1, because each specified feature type and each corresponding specified variable of the specified feature types forming the wind rule are determined by screening and optimizing according to the feature selection algorithm, the recognition effect of the wind control rule generated based on each specified feature type is better. The shortcoming of manual rule setting in the prior art can be avoided, and the efficiency and the recognition accuracy rate of money backwashing according to the wind control rules are improved.
In addition, in this specification, after the server generates a plurality of wind control rules, the server may further prune the generated wind control rules to further improve the identification accuracy of the wind control rules.
Specifically, the server may re-determine the detection samples different from the learning samples, and each detection sample may be determined according to the historical data. And then, identifying each detection sample according to the generated plurality of wind control rules, and determining an identification result. And then, determining the accuracy of the identification results of the generated wind control rules according to the examination results of the existing detection samples.
Specifically, first, the accuracy of each wind control rule may be determined separately for the recognition result of each wind control rule. Then, for each wind control rule, the server may judge, for a specified variable of each specified feature type included in the wind control rule, whether the accuracy of the recognition result of the wind control rule after the specified variable is deleted is higher than the accuracy of the recognition result of the wind control rule before the specified variable is deleted, if so, delete the specified variable of the specified feature type, and otherwise, delete none. And finally, after the server finishes pruning all the wind control rules, recalculating the accuracy of the recognition result of each pruned wind control rule, and selecting a specified number of wind control rules as the wind control rules for recognizing money laundering transactions according to the accuracy of the pruned wind control rules.
Further, the above embodiment only takes the generated wind control rule as an example for identifying money laundering transactions, and similarly, the wind control rule mining method provided in this specification may also determine the wind control rule for other traffic types, as described in step S100, which is not limited in this specification.
It should be noted that all execution subjects of the steps of the method provided in the embodiments of the present specification may be the same apparatus, or the method may also be executed by different apparatuses. For example, the execution subject of steps S100 and S102 may be device 1, and the execution subject of step S102 may be device 2; alternatively, the execution subject of step S100 may be device 1, and the execution subjects of step S102 and step S104 may be device 2; and so on. The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
Based on the method for mining the wind control rule shown in fig. 1, an embodiment of the present specification further provides a device for mining the wind control rule, as shown in fig. 5.
Fig. 5 is a schematic structural diagram of a wind-controlled regular excavation device provided in an embodiment of the present disclosure, where the device includes:
the determining module 200 is configured to determine, for each preset feature type, a feature value of each learning sample corresponding to the feature type as a variable of the feature type;
a selection module 202, which selects at least part of feature types from the feature types as designated feature types through a genetic algorithm, and selects at least part of variables from the variables of the feature types as designated variables of the feature types for each feature type;
and the generation module 204 is used for generating the wind control rule by adopting a first-order rule learning algorithm according to each learning sample, each selected specified characteristic type and each specified variable of the specified characteristic type.
The selection module 202 selects learning samples from the learning samples to copy according to the copy probability of the genetic algorithm to obtain copy samples, and selects a specific feature type and a specific variable according to the variable of each learning sample corresponding to each feature type and the variable of each copy sample corresponding to each feature type.
The selection module 202 determines a plurality of pairs of learning samples from each learning sample according to the cross probability of the genetic algorithm, exchanges the variables of the learning samples of the same feature type for each pair of learning samples, and/or combines the variables of the learning samples of the same feature type to obtain cross samples, and selects the designated feature type and the designated variables according to the variables of the learning samples corresponding to the feature types and the variables of the cross samples corresponding to the feature types.
The types of variables include: the selection module 202 exchanges the variables of the same feature type with the first type in the pair of learning samples, and combines the variables of the same feature type with the second type in the pair of learning samples.
The selection module 202 selects learning samples for mutation from the learning samples according to the mutation probability of the genetic algorithm, adjusts, for each learning sample selected for mutation, whether a variable of at least one feature type in the learning sample exists in the learning sample, obtains each mutation sample, and selects a specified feature type and a specified variable according to the variable of each learning sample corresponding to each feature type and the variable of each mutation sample corresponding to each feature type.
The selecting module 202, for each feature type, selects a variable according to the occurrence frequency of the variables in the feature type as the variable of the feature type of the adjusted learning sample when the variable of the feature type is adjusted to be present in the learning sample from being absent in the learning sample.
The selection module 202, according to the fitness formula of the genetic algorithm
Figure BDA0001553127110000171
Determining the fitness of each sample to be selected, selecting a specified number of samples to be selected according to the sequence of the fitness of each sample to be selected from high to low, taking each characteristic type corresponding to the selected sample to be selected as a specified characteristic type, and taking the variable of each characteristic type corresponding to the selected sample to be selected as a specified variable;
wherein the sample to be selected comprises: at least one type of samples, fitness, among the learning samples, the replication samples, the crossover samples, and the variation samplesjDenotes the fitness, of the jth sample to be selectedjiThe fitness N of the variable representing the ith characteristic type in the jth sample to be selectedipThe number of occurrences of the variable representing the ith feature type in each positive case sample, NinThe times of the variable representing the ith characteristic type appearing in each negative sample are determined, the positive sample is a sample to be selected with risk in the control result, and the negative sample is a sample to be selected with no risk in the control result according to the fitness formula of the genetic algorithm
Figure BDA0001553127110000172
Determining the fitness of each sample to be selected, selecting a specified number of samples to be selected according to the sequence of the fitness of each sample to be selected from high to low, taking each characteristic type corresponding to the selected samples to be selected as a specified characteristic type, and enabling the selected samples to be selected to correspond to the variable of each characteristic typeAs a specified variable;
wherein the sample to be selected comprises: at least one type of samples, fitness, among the learning samples, the replication samples, the crossover samples, and the variation samplesjDenotes the fitness, of the jth sample to be selectedjiThe fitness N of the variable representing the ith characteristic type in the jth sample to be selectedipThe number of occurrences of the variable representing the ith feature type in each positive case sample, NinAnd the times of the variable representing the ith characteristic type appearing in each negative sample are respectively, the positive sample is the sample to be selected with the risk control result, and the negative sample is the sample to be selected with the risk control result.
The device further comprises:
the rule pruning module 206 determines the recognition accuracy of each wind control rule, adjusts the feature types included in each wind control rule according to the recognition accuracy of each wind control rule, re-determines the recognition accuracy of the adjusted wind control rule, and selects at least one wind control rule according to the recognition accuracy of each wind control rule after adjustment and the recognition accuracy of each wind control rule before adjustment.
The rule pruning module 206 determines each detection sample different from each learning sample according to the historical data, and determines the identification accuracy of each wind control rule according to the identification result of each wind control rule on each detection sample.
The rule pruning module 206 determines, for each wind control rule, a feature type that improves the recognition accuracy of the wind control rule after being deleted, from among the feature types that constitute the wind control rule, according to the recognition accuracy of the wind control rule, and deletes the determined feature type from among the feature types that constitute the wind control rule.
Based on the method for mining the wind control rule illustrated in fig. 1, the present specification correspondingly provides a server, as illustrated in fig. 6, where the server includes: one or more processors and memory, the memory storing a program and configured to perform, by the one or more processors:
determining a characteristic value of each learning sample corresponding to each preset characteristic type as a variable of the characteristic type;
selecting at least part of feature types from the feature types through a genetic algorithm to be used as specified feature types, and selecting at least part of variables from variables of the feature types aiming at the feature types to be used as specified variables of the feature types;
and generating a wind control rule by adopting a first-order rule learning algorithm according to each learning sample, each selected specified characteristic type and each specified variable of the specified characteristic type.
In the 90 s of the 20 th century, improvements in a technology could clearly distinguish between improvements in hardware (e.g., improvements in circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements in process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain the corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical modules. For example, a Programmable Logic Device (PLD), such as a Field Programmable Gate Array (FPGA), is an integrated circuit whose Logic functions are determined by programming the Device by a user. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Furthermore, nowadays, instead of manually making an Integrated Circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development and writing, but the original code before compiling is also written by a specific Programming Language, which is called Hardware Description Language (HDL), and HDL is not only one but many, such as abel (advanced Boolean Expression Language), ahdl (alternate Hardware Description Language), traffic, pl (core universal Programming Language), HDCal (jhdware Description Language), lang, Lola, HDL, laspam, hardward Description Language (vhr Description Language), vhal (Hardware Description Language), and vhigh-Language, which are currently used in most common. It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.
The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, and an embedded microcontroller, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic for the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may thus be considered a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the various elements may be implemented in the same one or more pieces of software and/or hardware in the practice of the present application.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (17)

1. A method for mining wind control rules is applied to the field of risk control of financial transactions and comprises the following steps:
aiming at each feature type of preset financial transaction, determining a feature value of each learning sample corresponding to the feature type as a variable of the feature type, wherein the server respectively uses business data of a plurality of transactions which are executed historically as the learning sample, the business data comprises personal information of both transaction parties, transaction amount, network interconnection protocol addresses of both transaction parties during business execution, countries to which IP addresses belong, and administrative to which the IP addresses belong, and the personal information comprises personal names, personal ages, personal sexes, personal addresses and personal contact ways of both transaction parties; each service data is used as a feature type, and possible feature values of each service data are used as variables of the feature type;
when each preset characteristic type is determined, each type of service data is determined as each characteristic type; or, determining a learning sample according to a pre-configured list, and determining each characteristic type according to the matching result of the information in the list and each service data;
selecting at least part of feature types from the feature types through a genetic algorithm to be used as specified feature types, and selecting at least part of variables from variables of the feature types aiming at the feature types to be used as specified variables of the feature types;
generating a wind control rule of the financial transaction by adopting a first-order rule learning algorithm according to each learning sample, each selected designated feature type and each designated variable of the designated feature type;
selecting at least part of feature types from the feature types through a genetic algorithm to be used as specified feature types, and selecting at least part of variables from variables of the feature types for the feature types to be used as specified variables of the feature types, wherein the method specifically comprises the following steps:
selecting learning samples from the learning samples to copy according to the copy probability of the genetic algorithm to obtain copy samples;
selecting a specified feature type and a specified variable according to the variable of each learning sample corresponding to each feature type and the variable of each copying sample corresponding to each feature type;
selecting at least part of feature types from the feature types through a genetic algorithm to be used as specified feature types, and selecting at least part of variables from variables of the feature types for the feature types to be used as specified variables of the feature types, wherein the method specifically comprises the following steps:
determining a plurality of pairs of learning samples from each learning sample according to the cross probability of the genetic algorithm;
for each pair of learning samples, exchanging the variables of the learning samples of the same characteristic type, and/or combining the variables of the learning samples of the same characteristic type to obtain cross samples;
selecting a specified feature type and a specified variable according to the variables of the learning samples corresponding to the feature types and the variables of the cross samples corresponding to the feature types;
the financial transaction comprises money laundering transaction, when the financial transaction is money laundering transaction, the server determines each preset characteristic type, determines a learning sample according to a pre-configured money laundering sanction list besides each type of business data as each characteristic type, and also determines each characteristic type according to a matching result of information in the money laundering sanction list and each business data, and enriches each preset characteristic type; the money laundering sanctioning list is information which is published by the international anti-money laundering organization and is determined to be money laundering transaction, and business data of each transaction which is determined to be money laundering transaction is included.
2. The method of claim 1, the type of variable comprising: a first type and a second type;
exchanging the variables of the learning sample pair with the same feature type specifically comprises:
exchanging the variables of the same characteristic type with the first type in the pair of learning samples;
merging the variables of the learning sample pair with the same characteristic types, specifically comprising:
and merging the variables of the same characteristic type with the variables of the second type in the pair of learning samples.
3. The method according to claim 1, wherein at least part of the feature types are selected from the feature types as the designated feature types through a genetic algorithm, and for each feature type, at least part of the variables are selected from the variables of the feature type as the designated variables of the feature type, and the method specifically comprises the following steps:
selecting a learning sample for mutation from the learning samples according to the mutation probability of the genetic algorithm;
aiming at each learning sample selected for mutation, adjusting whether at least one variable of the characteristic type in the learning sample exists in the learning sample or not to obtain each mutation sample;
and selecting the specified feature type and the specified variable according to the variable of each learning sample corresponding to each feature type and the variable of each variation sample corresponding to each feature type.
4. The method according to claim 3, wherein adjusting whether at least one feature type variable exists in the learning sample comprises:
and for each feature type, when the variable of the feature type is adjusted to be present in the learning sample from being absent in the learning sample, selecting one variable according to the appearance frequency of the variables in the feature type as the variable of the feature type of the adjusted learning sample.
5. The method according to any one of claims 1 to 4, wherein selecting the specified feature type and the specified variable specifically comprises:
a fitness formula according to the genetic algorithm
Figure FDA0003552273550000031
Determining the fitness of each sample to be selected;
selecting a specified number of samples to be selected according to the sequence of the fitness of the samples to be selected from high to low;
using each feature type corresponding to the selected sample to be selected as a designated feature type, and using the variable of each feature type corresponding to the selected sample to be selected as a designated variable;
wherein the sample to be selected comprises: at least one type of samples, fitness, among the learning samples, the replication samples, the crossover samples, and the variation samplesjRepresents the fitness, of the jth sample to be selectedjiThe fitness N of the variable representing the ith characteristic type in the jth sample to be selectedipThe number of occurrences of the variable representing the ith feature type in each positive case sample, NinAnd the times of the variable representing the ith characteristic type appearing in each negative sample are respectively, the positive sample is the sample to be selected with the risk control result, and the negative sample is the sample to be selected with the risk control result.
6. The method of claim 1, after generating the wind control rules, the method further comprising:
determining the identification accuracy of each wind control rule;
adjusting the feature types contained in the wind control rules according to the identification accuracy of the wind control rules;
re-determining the recognition accuracy of the adjusted wind control rule;
and selecting at least one wind control rule according to the recognition accuracy of each adjusted wind control rule and the recognition accuracy of each wind control rule before adjustment.
7. The method of claim 6, wherein determining the recognition accuracy of each generated rule specifically comprises:
determining each detection sample different from each learning sample according to the historical data;
and determining the identification accuracy of each wind control rule according to the identification result of each wind control rule on each detection sample.
8. The method according to claim 7, wherein the adjusting of the feature types included in each of the wind control rules according to the recognition accuracy of each of the wind control rules specifically comprises:
for each wind control rule, determining a feature type which improves the identification accuracy of the wind control rule after being deleted from all feature types forming the wind control rule according to the identification accuracy of the wind control rule;
and deleting the determined feature types from the feature types forming the wind control rule.
9. A device for mining wind control rules is applied to the field of risk control of financial transactions and comprises:
the system comprises a determining module, a learning module and a learning module, wherein the determining module is used for determining a characteristic value of each learning sample corresponding to each characteristic type of a preset financial transaction as a variable of the characteristic type, the server respectively uses business data of a plurality of transactions which are executed historically as the learning sample, the business data comprises personal information of both transaction parties, a transaction amount, network interconnection protocol addresses of both transaction parties during business execution, countries to which IP addresses belong, and administrative divisions to which the IP addresses belong, and the personal information comprises personal names, personal ages, personal sexes, personal addresses and personal contact modes of both transaction parties; each service data is used as a feature type, and possible feature values of each service data are used as variables of the feature type; when each preset characteristic type is determined, each type of service data is determined as each characteristic type; or, determining a learning sample according to a pre-configured list, and determining each characteristic type according to the matching result of the information in the list and each service data;
the selection module selects at least part of feature types from the feature types through a genetic algorithm to serve as designated feature types, and selects at least part of variables from the variables of the feature types to serve as designated variables of the feature types aiming at the feature types;
the generating module is used for generating a wind control rule of the financial transaction by adopting a first-order rule learning algorithm according to each learning sample, each selected designated feature type and each designated variable of the designated feature type;
the selection module selects learning samples from the learning samples to copy according to the copying probability of the genetic algorithm to obtain copying samples, and selects specified feature types and specified variables according to the variables of the learning samples corresponding to the feature types and the variables of the copying samples corresponding to the feature types;
the selection module determines a plurality of pairs of learning samples from each learning sample according to the cross probability of the genetic algorithm, exchanges the variables of the learning samples with the same characteristic type for each pair of learning samples, and/or combines the variables of the learning samples with the same characteristic type to obtain cross samples, and selects the specified characteristic type and the specified variable according to the variable of each learning sample corresponding to each characteristic type and the variable of each cross sample corresponding to each characteristic type;
the financial transaction comprises money laundering transaction, when the financial transaction is money laundering transaction, the server determines each preset characteristic type, determines a learning sample according to a pre-configured money laundering sanction list besides each type of business data as each characteristic type, and also determines each characteristic type according to a matching result of information in the money laundering sanction list and each business data, and enriches each preset characteristic type; the money laundering sanctioning list is information which is published by the international anti-money laundering organization and is determined to be money laundering transaction, and business data of each transaction which is determined to be money laundering transaction is included.
10. The apparatus of claim 9, the type of variable comprising: the selection module exchanges the variables of the same characteristic type with the first type in the pair of learning samples and combines the variables of the same characteristic type with the second type in the pair of learning samples.
11. The apparatus according to claim 9, wherein the selection module selects a learning sample to be mutated from the learning samples according to a mutation probability of the genetic algorithm, adjusts whether or not a variable of at least one feature type in the learning sample exists in the learning sample for each learning sample selected to be mutated, obtains each mutation sample, and selects the designated feature type and the designated variable according to a variable of each learning sample corresponding to each feature type and a variable of each mutation sample corresponding to each feature type.
12. The apparatus according to claim 11, wherein the selecting module selects, for each feature type, one variable according to the frequency of occurrence of the variables in the feature type as the variable of the feature type of the adjusted learning sample when the variable of the feature type is adjusted from being absent from the learning sample to being present in the learning sample.
13. The apparatus of any of claims 9 to 12, the selection module to select the fitness formula based on the genetic algorithm
Figure FDA0003552273550000061
Determining the fitness of each sample to be selected, selecting a specified number of samples to be selected according to the sequence of the fitness of each sample to be selected from high to low, taking each characteristic type corresponding to the selected sample to be selected as a specified characteristic type, and taking the variable of the selected sample to be selected corresponding to each characteristic type as a specified variable;
wherein the sample to be selected comprises: at least one type of samples, fitness, among the learning samples, the replication samples, the crossover samples, and the variation samplesjRepresents the fitness of the jth sample to be selected,fitnessjithe fitness N of the variable representing the ith characteristic type in the jth sample to be selectedipThe number of occurrences of the variable representing the ith feature type in each positive case sample, NinAnd the times of the variable representing the ith characteristic type appearing in each negative sample are respectively, the positive sample is the sample to be selected with the risk control result, and the negative sample is the sample to be selected with the risk control result.
14. The apparatus of claim 9, the apparatus further comprising:
the rule pruning module is used for determining the identification accuracy of each wind control rule, adjusting the feature types contained in each wind control rule according to the identification accuracy of each wind control rule, re-determining the identification accuracy of the adjusted wind control rule, and selecting at least one wind control rule according to the identification accuracy of each wind control rule after adjustment and the identification accuracy of each wind control rule before adjustment.
15. The apparatus of claim 14, wherein the rule pruning module determines each test sample different from each learning sample according to the historical data, and determines the identification accuracy of each wind control rule according to the identification result of each wind control rule on each test sample.
16. The apparatus of claim 15, wherein the rule pruning module determines, for each of the wind control rules, a feature type that improves the recognition accuracy of the wind control rule after being deleted from the feature types constituting the wind control rule according to the recognition accuracy of the wind control rule, and deletes the determined feature type from the feature types constituting the wind control rule.
17. A server applied to the field of risk control of financial transactions, wherein the server comprises: one or more processors and memory, the memory storing a program and configured to perform, by the one or more processors:
aiming at each feature type of preset financial transaction, determining a feature value of each learning sample corresponding to the feature type as a variable of the feature type, wherein the server respectively uses business data of a plurality of transactions which are executed historically as the learning sample, the business data comprises personal information of both transaction parties, transaction amount, network interconnection protocol addresses of both transaction parties during business execution, countries to which IP addresses belong, and administrative to which the IP addresses belong, and the personal information comprises personal names, personal ages, personal sexes, personal addresses and personal contact ways of both transaction parties; each service data is used as a feature type, and possible feature values of each service data are used as variables of the feature type;
when each preset characteristic type is determined, each type of service data is determined as each characteristic type; or, determining a learning sample according to a pre-configured list, and determining each characteristic type according to the matching result of the information in the list and each service data;
selecting at least part of feature types from the feature types through a genetic algorithm to be used as specified feature types, and selecting at least part of variables from variables of the feature types aiming at the feature types to be used as specified variables of the feature types;
generating a wind control rule of the financial transaction by adopting a first-order rule learning algorithm according to each learning sample, each selected designated feature type and each designated variable of the designated feature type;
selecting at least part of feature types from the feature types through a genetic algorithm to be used as specified feature types, and selecting at least part of variables from variables of the feature types for the feature types to be used as specified variables of the feature types, wherein the method specifically comprises the following steps:
selecting learning samples from the learning samples to copy according to the copy probability of the genetic algorithm to obtain copy samples;
selecting a specified feature type and a specified variable according to the variable of each learning sample corresponding to each feature type and the variable of each copying sample corresponding to each feature type;
selecting at least part of feature types from the feature types through a genetic algorithm to be used as specified feature types, and selecting at least part of variables from variables of the feature types for the feature types to be used as specified variables of the feature types, wherein the method specifically comprises the following steps:
determining a plurality of pairs of learning samples from each learning sample according to the cross probability of the genetic algorithm;
for each pair of learning samples, exchanging the variables of the learning samples of the same characteristic type, and/or combining the variables of the learning samples of the same characteristic type to obtain cross samples;
selecting a specified feature type and a specified variable according to the variables of the learning samples corresponding to the feature types and the variables of the cross samples corresponding to the feature types;
the financial transaction comprises money laundering transaction, when the financial transaction is money laundering transaction, the server determines each preset characteristic type, determines a learning sample according to a pre-configured money laundering sanction list besides each type of business data as each characteristic type, and also determines each characteristic type according to a matching result of information in the money laundering sanction list and each business data, and enriches each preset characteristic type; the money laundering sanctioning list is information which is published by the international anti-money laundering organization and is determined to be money laundering transaction, wherein the information comprises business data of each transaction which is determined to be money laundering transaction.
CN201810053792.7A 2018-01-19 2018-01-19 Method and device for mining wind control rule Active CN108346098B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810053792.7A CN108346098B (en) 2018-01-19 2018-01-19 Method and device for mining wind control rule

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810053792.7A CN108346098B (en) 2018-01-19 2018-01-19 Method and device for mining wind control rule

Publications (2)

Publication Number Publication Date
CN108346098A CN108346098A (en) 2018-07-31
CN108346098B true CN108346098B (en) 2022-05-31

Family

ID=62960526

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810053792.7A Active CN108346098B (en) 2018-01-19 2018-01-19 Method and device for mining wind control rule

Country Status (1)

Country Link
CN (1) CN108346098B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109670835A (en) * 2018-09-25 2019-04-23 深圳壹账通智能科技有限公司 Air control method, apparatus, equipment and readable storage medium storing program for executing based on service node
CN109840838B (en) * 2018-12-26 2021-08-31 天翼数智科技(北京)有限公司 Wind control rule model dual-engine system, control method and server
CN110135701A (en) * 2019-04-23 2019-08-16 北京淇瑀信息科技有限公司 Control automatic generation method, device, electronic equipment and the readable medium of rule
CN111461892B (en) * 2020-03-31 2021-07-06 支付宝(杭州)信息技术有限公司 Method and device for selecting derived variables of risk identification model
CN111967600B (en) * 2020-08-18 2021-09-14 北京睿知图远科技有限公司 Feature derivation method based on genetic algorithm in wind control scene

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102054039A (en) * 2010-12-30 2011-05-11 长安大学 Fitness scaling method for improving overall search capability of genetic algorithm
CN106445821A (en) * 2016-09-23 2017-02-22 郑州云海信息技术有限公司 Method for automatically generating test case based on genetic algorithm
CN106952162A (en) * 2016-01-07 2017-07-14 平安科技(深圳)有限公司 Money laundering risks rating calculation method and system
CN107391569A (en) * 2017-06-16 2017-11-24 阿里巴巴集团控股有限公司 Identification, model training, Risk Identification Method, device and the equipment of data type
CN107424069A (en) * 2017-08-17 2017-12-01 阿里巴巴集团控股有限公司 A kind of generation method of air control feature, risk monitoring and control method and apparatus
CN107491988A (en) * 2017-08-09 2017-12-19 浙江工商大学 A kind of wisdom retail data method for digging based on genetic algorithm and improvement interest-degree

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102054039A (en) * 2010-12-30 2011-05-11 长安大学 Fitness scaling method for improving overall search capability of genetic algorithm
CN106952162A (en) * 2016-01-07 2017-07-14 平安科技(深圳)有限公司 Money laundering risks rating calculation method and system
CN106445821A (en) * 2016-09-23 2017-02-22 郑州云海信息技术有限公司 Method for automatically generating test case based on genetic algorithm
CN107391569A (en) * 2017-06-16 2017-11-24 阿里巴巴集团控股有限公司 Identification, model training, Risk Identification Method, device and the equipment of data type
CN107491988A (en) * 2017-08-09 2017-12-19 浙江工商大学 A kind of wisdom retail data method for digging based on genetic algorithm and improvement interest-degree
CN107424069A (en) * 2017-08-17 2017-12-01 阿里巴巴集团控股有限公司 A kind of generation method of air control feature, risk monitoring and control method and apparatus

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"周志华机器学习读后总结第14、15、16章";漠北墨杯;《CSDN》;20171023;第1-7页 *

Also Published As

Publication number Publication date
CN108346098A (en) 2018-07-31

Similar Documents

Publication Publication Date Title
CN108346098B (en) Method and device for mining wind control rule
US20200211106A1 (en) Method, apparatus, and device for training risk management models
JP6975101B2 (en) Methods, devices and non-temporary computer-readable storage media for transaction execution and validation in the blockchain (transaction execution and validation in the blockchain)
CN110457912B (en) Data processing method and device and electronic equipment
US11036767B2 (en) System and method for providing database abstraction and data linkage
AU2018206822A1 (en) Simplified tax interview
US10817495B2 (en) Systems and methods for performing data processing operations using variable level parallelism
CN109146638B (en) Method and device for identifying abnormal financial transaction group
US20210182859A1 (en) System And Method For Modifying An Existing Anti-Money Laundering Rule By Reducing False Alerts
CN107291716B (en) Link data checking method and device
CN111428217B (en) Fraudulent party identification method, apparatus, electronic device and computer readable storage medium
CN107784063B (en) Algorithm generation method and terminal equipment
CN110414567A (en) Data processing method, device and electronic equipment
CN110020866B (en) Training method and device for recognition model and electronic equipment
CN108876102B (en) Risk transaction mining method, device and equipment
CN111090780A (en) Method and device for determining suspicious transaction information, storage medium and electronic equipment
US11847390B2 (en) Generation of synthetic data using agent-based simulations
CN111639690A (en) Fraud analysis method, system, medium, and apparatus based on relational graph learning
CN111758098A (en) Named entity identification and extraction using genetic programming
CN110069545A (en) A kind of behavioral data appraisal procedure and device
Mohammed et al. Feature reduction based on hybrid efficient weighted gene genetic algorithms with artificial neural network for machine learning problems in the big data
CN114581233A (en) Money laundering account determining method, apparatus, device, storage medium and program product
CN112989763B (en) Data acquisition method, device, computer equipment and storage medium
JP2024009924A (en) Profit and loss calculation relating to encrypted currency transaction
CN113641708B (en) Rule engine optimization method, data matching method and device, storage medium and terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20200925

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20200925

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Applicant before: Alibaba Group Holding Ltd.

GR01 Patent grant
GR01 Patent grant