WO2021196843A1 - 用于风险识别模型的衍生变量选择方法和装置 - Google Patents

用于风险识别模型的衍生变量选择方法和装置 Download PDF

Info

Publication number
WO2021196843A1
WO2021196843A1 PCT/CN2021/073963 CN2021073963W WO2021196843A1 WO 2021196843 A1 WO2021196843 A1 WO 2021196843A1 CN 2021073963 W CN2021073963 W CN 2021073963W WO 2021196843 A1 WO2021196843 A1 WO 2021196843A1
Authority
WO
WIPO (PCT)
Prior art keywords
paternal
variable
variable set
derived
cumulative
Prior art date
Application number
PCT/CN2021/073963
Other languages
English (en)
French (fr)
Inventor
付大鹏
赵闻飙
Original Assignee
支付宝(杭州)信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 支付宝(杭州)信息技术有限公司 filed Critical 支付宝(杭州)信息技术有限公司
Publication of WO2021196843A1 publication Critical patent/WO2021196843A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/06Asset management; Financial planning or analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/40Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
    • G06Q20/401Transaction verification
    • G06Q20/4016Transaction verification involving fraud or risk level assessment in transaction processing

Definitions

  • the embodiments of this specification relate to the technical field of risk identification, and in particular, to a method and device for selecting derivative variables for a risk identification model.
  • Identifying risk characteristics is a necessary function for protecting the interests of users in many current wealth management applications, electronic payment applications, and other scenarios that are highly sensitive to risks.
  • user transactions and account risk control are highly antagonistic, corresponding to various types of risks such as embezzlement, fraud, cash out, cheating, money laundering, etc., groups and individuals such as black industry gangs and "wool parties" .
  • risk control system bypassing various risk identifications in order to embezzle money or illegal transactions. The reason is that the number and diversity of the risk characteristics of the training sample in the sample database for training the risk identification model are insufficient.
  • the way to increase the number and diversity of risk features is to use exhaustive methods to violently derive risk features, and then perform feature screening based on preset screening conditions (feature importance is greater than a preset threshold), which requires a lot of calculations Resource and time cost, and the quality of the obtained risk feature set is low.
  • the purpose of the embodiments of this specification is to provide a method and device for selecting a derivative variable for a risk identification model, so as to improve the selection efficiency and quality of the risk feature set.
  • the embodiments of this specification provide a method for selecting derived variables for a risk identification model, including: determining the updated seed of the target genetic algorithm model according to the quality of the target genetic algorithm model and the derived variables generated by its seed pool Pool, where the quality of the derivative variable is used to evaluate the contribution of the derivative variable as the sample feature of the risk identification model of the target business.
  • the updated seed pool includes the set of parents of N best-quality derivative variables generated by the seed pool before the update ;
  • the target derivative variable set is determined in the direction of variation with the best quality of the derivative variable, where the first paternal cumulative variable set and the second paternal cumulative variable set It is the derived variable parent selected based on the seed pool after the update of the target genetic algorithm model; if the target derived variable set meets the quality convergence condition of the derived variable, the derived variable in the target derived variable set is output as the sample feature of the risk identification model;
  • the target derivative variable set is determined in the direction of variation with the best quality of the derivative variable, including: according to the first paternal cumulative variable set, through the derivative variable paternal
  • the matching model selects M second paternal parents matched by the first paternal parent in the second paternal cumulative variable set in the second paternal cumulative variable set to
  • the embodiment of this specification also provides a derivative variable selection device for a risk identification model, including: a seed pool determining module, which determines the target genetic algorithm model and the quality of the derivative variables generated by the seed pool according to the target genetic algorithm model.
  • the updated seed pool of the algorithm model where the quality of the derived variables is used to evaluate the contribution of the derived variables as the sample characteristics of the risk identification model of the target business.
  • the updated seed pool includes the N best quality generated by the seed pool before the update
  • the parent set of derived variables the derived variable determination module determines the target derived variable set according to the first paternal cumulative variable set and the second paternal cumulative variable set to determine the target derived variable set with the best quality of the derived variable.
  • the first paternal cumulative variable set The cumulative variable set and the second paternal cumulative variable set are derived variables parent selected based on the updated seed pool of the target genetic algorithm model; the information output module, if the target derived variable set meets the quality convergence condition of the derived variable, output the target derived variable
  • the derivative variables in the set are used as the sample characteristics of the risk identification model; among them, the derivative variable determination module, specifically based on the first paternal cumulative variable set, selects the second paternal cumulative variable set from the second paternal cumulative variable set through the derivative variable paternal matching model
  • the M second paternal parents matched by the first paternal parent in a paternal cumulative variable set are used to generate a candidate derived variable set; N derived variables with the best quality are selected from the candidate derived variable set as the target derived variable set.
  • the embodiments of the present specification also provide an electronic device, including: a memory, on which a computer program is stored; a processor, used to execute the computer program in the memory to achieve: according to the target genetic algorithm model and its seed pool
  • the quality of the generated derivative variables determines the updated seed pool of the target genetic algorithm model.
  • the quality of the derivative variables is used to evaluate the contribution of the derivative variables as the sample characteristics of the target business risk identification model.
  • the updated seed pool includes the pre-updated seed pool.
  • the first paternal cumulative variable set and the second paternal cumulative variable set are derived variables parent selected based on the updated seed pool of the target genetic algorithm model; if the target derived variable set meets the quality convergence condition of the derived variable, output the target derived variable Derivative variables in the variable set are used as the sample characteristics of the risk identification model; among them, according to the first paternal cumulative variable set and the second paternal cumulative variable set, the target derivative variable set is determined in the direction of variation with the best quality of the derivative variable, Including: According to the first paternal cumulative variable set, the second paternal cumulative variable set is selected from the second paternal cumulative variable set through the derived variable paternal matching model, and M second paternals matched by the first paternal parent in the first paternal cumulative variable
  • the embodiment of this specification also provides a storage medium on which a computer program is stored.
  • the program is executed when the processor is executed: according to the target genetic algorithm model and the quality of the derived variables generated by its seed pool, the target is determined The updated seed pool of the genetic algorithm model, where the quality of the derived variables is used to evaluate the contribution of the derived variables as the sample characteristics of the risk identification model of the target business.
  • the updated seed pool includes the N best quality seed pools generated before the update
  • the target derivative variable set is determined in the direction of variation with the best quality of the derivative variable, where the first paternal cumulative variable set and The second paternal cumulative variable set is the derived variable parent selected based on the seed pool after the update of the target genetic algorithm model; if the target derived variable set meets the quality convergence condition of the derived variable, the derived variable in the target derived variable set is output as The sample characteristics of the risk identification model; among them, according to the first paternal cumulative variable set and the second paternal cumulative variable set, the target derivative variable set is determined in the direction of variation of the best quality of the derivative variable, including: according to the first paternal cumulative variable Collection, select M second parents matched by the first parent in the first parent cumulative variable set in the second parent cumulative variable set through the derived variable parent matching model to generate the candidate derived variable set; in Select N derived variables with the best quality from the candidate derived variable set as
  • the above-mentioned at least one technical solution adopted in the embodiments of this specification can achieve the following beneficial effects: by determining the updated seed pool of the target genetic algorithm model according to the quality of the target genetic algorithm model and its seed pool derived variables, and the updated seed
  • the pool includes the paternal set of N best-quality derived variables generated by the seed pool before the update; then, according to the first paternal cumulative variable set and the second paternal cumulative variable set, the target is determined in the direction of variation with the best quality of the derived variables Derivative variable set, where the first paternal cumulative variable set and the second paternal cumulative variable set are derived variables parents selected based on the updated seed pool of the target genetic algorithm model; if the target derived variable set meets the quality convergence of the derived variables Condition, output the derivative variable in the target derivative variable set as the sample feature of the risk identification model.
  • the sample characteristics of the risk identification model can be generated directly through the model, saving a lot of computing resources and time costs. Furthermore, through continuous optimization of the seed pool and continuous optimization of the mutation direction to determine the best quality of the derived variables, The final risk feature set is of high quality.
  • Figure 1 is a flowchart of a method for selecting derivative variables for a risk identification model provided by an embodiment of this specification
  • FIG. 2 is a schematic diagram of interaction between a service terminal and an electronic device according to an embodiment of this specification
  • Figure 3 is a flowchart of a method for selecting derivative variables for a risk identification model provided by an embodiment of this specification
  • Figure 4 is a flowchart of a method for selecting derivative variables for a risk identification model provided by an embodiment of the specification
  • FIG. 5 is a block diagram of functional modules of a device for selecting a derivative variable for a risk identification model provided by an embodiment of this specification;
  • FIG. 6 is a block diagram of functional modules of a device for selecting a derivative variable for a risk identification model provided by an embodiment of this specification;
  • FIG. 7 is a circuit connection block diagram of an electronic device provided by an embodiment of this specification.
  • an embodiment of this specification provides a method for selecting a derivative variable for a risk identification model, which is applied to an electronic device 100.
  • the electronic device 100 can be, but is not limited to, a server.
  • the electronic device 100 is in communication connection with the service terminal 200 for data interaction.
  • the business terminal 200 is installed with risk-sensitive application programs related to financial management, electronic payment, and the like.
  • the specific operation content of the generated transaction can be sent to the electronic device 100 and added to the seed pool.
  • the method includes S11 to S17.
  • S11 Determine the updated seed pool of the target genetic algorithm model according to the quality of the derived variables generated by the target genetic algorithm model and its seed pool.
  • the quality of the derivative variable is used to evaluate the contribution of the derivative variable as the sample feature of the risk identification model of the target business.
  • the updated seed pool includes the set of parents of N best-quality derivative variables generated by the seed pool before the update.
  • N is a positive integer.
  • the target business can be a business that is highly sensitive to risks, such as payment business and money transfer business.
  • the number of cumulative variables in the seed pool after the target genetic algorithm model is updated is equal to the number of cumulative variables in the seed pool before the target genetic algorithm model is updated.
  • the seed pool before the update has 1000 cumulative variables
  • the seed pool after the update still has 1000 cumulative variables.
  • the number of cumulative variables in the seed pool after the target genetic algorithm model is updated is smaller than the number of cumulative variables in the seed pool before the target genetic algorithm model is updated.
  • the seed pool before the update has 1000 cumulative variables
  • the seed pool after the update can be 500 cumulative variables, or 200 cumulative variables, or other integer values less than 1000.
  • the process of generating derivation variables according to the target genetic algorithm model and its seed pool may include: taking the cumulative variable set of the target business as the initial seed pool of the target genetic algorithm model, and taking the preset derivation strategy as the crossover operation of the target genetic algorithm model , Take the derived variable as the target genetic algorithm model's child, and take the quality of the derived variable as the target fitness of the child in the genetic algorithm model to select the parent of the derived variable whose quality is greater than the preset threshold from the generated set of derived variables Cumulative variables construct an updated seed pool as a mutation operation; generate a new derived variable set based on the updated seed pool as the iterative operation of the target genetic algorithm model, and the quality difference of the derived variable set obtained by two adjacent iterative operations is less than
  • the preset threshold is used as the convergence condition of the target genetic algorithm model.
  • the overall quality of the seed pool can be continuously improved. For example, the number of seeds with a quality score greater than a preset threshold in the initial seed pool accounts for 20%, and the next time the quality score is greater than the preset threshold The number of seeds accounts for 40%. Next time, the number of seeds with a quality score greater than the preset threshold accounts for 55%. In this way, the quality of the seed pool is gradually improved.
  • the structure of the cumulative variable can be, but is not limited to, five dimensions including: subject + object + function + time window + condition.
  • the cumulative variable the number of times the user performs X operations in T days
  • the subject is the user ID
  • the object is the operation event ID
  • the function is count
  • the time window is T days
  • the cumulative variable of the target service may be the number of operations the user performs the target service within a set time, for example, the number of times the user performs the transfer service within 3 days, and the number of times the user performs the transfer service within 1 month. Understandably, cumulative variables have good identification effects and business explanatory properties for risk identification.
  • Derivative variables are derived based on at least two cumulative variables. For example, two cumulative variables whose content differs by one dimension (such as time dimension) are subjected to algorithmic operations (for example, the number of times a user performs a transfer business within a month, divided by the user’s The number of transfers performed within 3 days), a derivative variable is generated. Understandably, derivative variables also have good identification effects and business explanatory properties for risk identification.
  • the above algorithm can be not only division, but also multiplication, addition, subtraction, etc., depending on actual needs.
  • the target derived variable set is determined in the variation direction of the best quality of the derived variable.
  • the first paternal cumulative variable set and the second paternal cumulative variable set are derived variable paternals selected based on the updated seed pool of the target genetic algorithm model (that is, the updated seed pool is divided into the first paternal cumulative variable set And the second paternal cumulative variable set, assuming that the derivation strategy is division, each cumulative variable in the first paternal cumulative variable set is taken as the denominator, and each cumulative variable in the second paternal cumulative variable set is taken as the numerator ).
  • S13 includes:
  • M is a positive integer.
  • the first paternal cumulative variable set includes 10 cumulative variables from A1 to A10
  • the second paternal cumulative variable set includes 10 cumulative variables from B1-B10
  • A1 to A10 are traversed, focusing on the first traversed variable.
  • For the father match one of the second fathers in B1-B10 through the derived variable father matching model, until all the first fathers and the second fathers are matched.
  • S15 Determine whether the target derivative variable set satisfies the quality convergence condition of the derivative variable, if so, execute S17, and optionally, if not, return to execute S11.
  • S17 Output the derivative variables in the target derivative variable set as the sample characteristics of the risk identification model.
  • the target derived variable set meets the quality convergence condition of the derived variable, it means that the derived variable paternal matching model has stable output, so iterative training is no longer required.
  • the method for selecting derived variables for risk identification models determines the updated seed pool of the target genetic algorithm model based on the quality of the derived variables generated by the target genetic algorithm model and its seed pool, and the updated seed pool includes the pre-updated seeds
  • the paternal set of N best-quality derived variables generated by the pool then according to the first paternal cumulative variable set and the second paternal cumulative variable set, the target derivative variable set is determined in the direction of variation with the best quality of the derived variable, where ,
  • the first paternal cumulative variable set and the second paternal cumulative variable set are derived variables parent selected based on the updated seed pool of the target genetic algorithm model; if the target derived variable set meets the quality convergence condition of the derived variable, output the target derived variable Derived variables in the variable set are used as the sample characteristics of the risk identification model.
  • the sample characteristics of the risk identification model can be generated directly through the model, saving a lot of computing resources and time costs. Furthermore, through continuous optimization of the seed pool and continuous optimization of the mutation direction to determine the best quality of the derived variables, The final risk feature set is of high quality.
  • the derived variable paternal matching model is a reinforcement learning model, as shown in Figure 4, S12 specifically includes:
  • S41 Use the first parent in the first parent cumulative variable set as the state of the reinforcement learning model, and use the probability distribution of the selection of the second parent matched by the first parent as the optimal strategy of the reinforcement learning model.
  • the selection of the second paternal parent is used as the action of the reinforcement learning model, and the quality of the derivative variables determined by the first paternal parent and the second paternal parent is used as the feedback income of the reinforcement learning model, and the reinforcement learning model is trained to obtain the second cumulative variable The second parent corresponding to each first parent in the set.
  • S43 Determine a candidate derivative variable set based on each first parent and the corresponding second parent in the first cumulative variable set.
  • S13 may specifically determine whether the target derived variable set obtained based on the updated seed pool meets the quality convergence condition of the derived variable relative to the target derived variable set obtained based on the seed pool before the update, then output the target derived variable set Derived variables in as the sample characteristics of the risk identification model.
  • the quality of the target derived variable set obtained by the updated seed pool is within a preset threshold range relative to the quality of the target derived variable set obtained based on the seed pool before the update.
  • the target genetic algorithm model uses a cumulative variable set randomly selected from the cumulative variable set of the target business as the initial seed pool.
  • the updated seed pool does not include cumulative variables other than the set of paternal parents in the seed pool before the update.
  • the updated seed pool may include the generated parent set of the N best-quality derivative variables and the cumulative variable randomly selected from the cumulative variable set of the target business.
  • the proportion of the parent set of the generated N best-quality derivative variables is greater than or equal to the cumulative variable randomly selected from the cumulative variable set of the target business.
  • an embodiment of this specification also provides a derivative variable selection device 500 for a risk identification model, which is applied to an electronic device 100.
  • the electronic device 100 may be, but is not limited to, a server.
  • the electronic device 100 is in communication connection with the service terminal 200 for data interaction.
  • the business terminal 200 is installed with risk-sensitive application programs related to financial management, electronic payment, and the like.
  • the specific operation content of the transaction can be sent to the electronic device 100 and added to the seed pool.
  • the device 500 includes a seed pool determination module 501, a derivative variable determination module 502, and an information output module 503. Among them,
  • the seed pool determining module 501 determines the updated seed pool of the target genetic algorithm model based on the target genetic algorithm model and the quality of the derived variables generated by its seed pool, where the quality of the derived variables is used to evaluate the derived variables as risk identification of the target business
  • the contribution of the sample characteristics of the model, the updated seed pool includes the set of parents of N best-quality derived variables generated by the seed pool before the update.
  • the target genetic algorithm model uses a cumulative variable set randomly selected from the cumulative variable set of the target business as the initial seed pool.
  • the updated seed pool does not include cumulative variables other than the set of paternal parents in the seed pool before the update.
  • the derivative variable determining module 502 determines the target derivative variable set according to the first paternal cumulative variable set and the second paternal cumulative variable set in the direction of variation of the best derivative variable quality, wherein the first paternal cumulative variable set and the second paternal cumulative variable set
  • the paternal cumulative variable set is the derived variable paternal selected based on the updated seed pool of the target genetic algorithm model.
  • both the first parent and the second parent include multiple dimensions, and the dimension value of one dimension is different between the first parent and the second parent.
  • the information output module 503 if the target derivative variable set satisfies the quality convergence condition of the derivative variable, output the derivative variable in the target derivative variable set as a sample feature of the risk identification model. in,
  • the derivative variable determination module 502 specifically based on the first paternal cumulative variable set, selects the M matched by the first paternal parent in the first paternal cumulative variable set in the second paternal cumulative variable set through the derived variable paternal matching model A second parent to generate a set of candidate derived variables; N derived variables with the best quality are selected from the set of candidate derived variables as the target derived variable set.
  • the device 500 for selecting derived variables for risk identification models can realize the following functions when executed: by determining the updated seed pool of the target genetic algorithm model according to the quality of the derived variables generated by the target genetic algorithm model and its seed pool, and updating The latter seed pool includes the paternal set of N best-quality derived variables generated by the seed pool before the update; then the cumulative variable set of the first paternal parent and the cumulative variable set of the second paternal parent are used to derive the variation with the best quality of the variable
  • the direction determines the target derived variable set, where the first paternal cumulative variable set and the second paternal cumulative variable set are the derived variable parents selected based on the updated seed pool of the target genetic algorithm model; if the target derived variable set satisfies the derived variable
  • the quality convergence condition of the output target derivative variable set is used as the sample feature of the risk identification model.
  • the sample characteristics of the risk identification model can be generated directly through the model, saving a lot of computing resources and time costs. Furthermore, through continuous optimization of the seed pool and continuous optimization of the mutation direction to determine the best quality of the derived variables, The final risk feature set is of high quality.
  • the derived variable parent matching model is a reinforcement learning model
  • the derived variable determining module uses the first parent in the first parent cumulative variable set as the state of the reinforcement learning model, and the second parent matching the first parent
  • the probability distribution of the choice of the father is the optimal strategy of the reinforcement learning model
  • the choice of the second father is the action of the reinforcement learning model
  • the quality of the derivative variables determined by the first father and the second father is used as the reinforcement learning
  • the reinforcement learning model is trained to obtain the second parent corresponding to each first parent in the third cumulative variable set; based on each first parent in the first cumulative variable set and the corresponding first parent Two paternal parents, determine the set of candidate derived variables.
  • the information output module 503 if the target derived variable set obtained based on the updated seed pool satisfies the quality convergence condition of the derived variable with respect to the target derived variable set obtained based on the seed pool before the update, then output the target derived variable set Derived variables of, as the sample characteristics of the risk identification model.
  • the device 500 further includes: a process returning module 504, if the target derivative variable set does not meet the convergence condition, return the quality of the derivative variable generated according to the target genetic algorithm model and its seed pool, Steps to determine the updated seed pool of the target genetic algorithm model.
  • the execution subject of each step of the method provided in Embodiment 1 may be the same device, or the method may also be executed by different devices.
  • the execution subject of step 21 and step 22 can be device 1, and the execution subject of step 23 can be device 2.
  • the execution subject of step 21 can be device 1, and the execution subject of step 22 and step 23 can be device 2. ;and many more.
  • Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present specification. Please refer to FIG. 7.
  • the electronic device includes a processor, and optionally an internal bus, a network interface, and a memory.
  • the memory may include memory, such as high-speed random access memory (Random-Access Memory, RAM), or may also include non-volatile memory (non-volatile memory), such as at least one disk storage.
  • RAM random access memory
  • non-volatile memory such as at least one disk storage.
  • the electronic device may also include hardware required by other services.
  • the processor, network interface, and memory can be connected to each other through an internal bus.
  • the internal bus can be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect, peripheral component interconnection standard) bus, or an EISA (Extended) bus. Industry Standard Architecture, extended industry standard structure) bus, etc.
  • the bus can be divided into an address bus, a data bus, a control bus, and so on. For ease of presentation, only one bidirectional arrow is used in FIG. 7, but it does not mean that there is only one bus or one type of bus.
  • the program may include program code, and the program code includes computer operation instructions.
  • the memory may include memory and non-volatile memory, and provide instructions and data to the processor.
  • the processor reads the corresponding computer program from the non-volatile memory to the memory and then runs it to form a derivative variable selection device for the risk identification model on a logical level.
  • the processor executes the program stored in the memory, and is specifically configured to perform the following operations: according to the quality of the target genetic algorithm model and the derived variables generated by its seed pool, determine the updated seed pool of the target genetic algorithm model, wherein the derivative
  • the quality of the variable is used to evaluate the contribution of the derivative variable as the sample feature of the risk identification model of the target business.
  • the updated seed pool includes the parents of the N best-quality derivative variables generated by the seed pool before the update.
  • the target derivative variable set is determined in the direction of variation with the best quality of the derivative variable, wherein the first paternal cumulative variable set and the second paternal cumulative variable set
  • the cumulative variable set is based on the derived variable parent selected by the seed pool after the target genetic algorithm model is updated; if the target derived variable set meets the quality convergence condition of the derived variable, the derived variable in the target derived variable set is output as risk identification
  • the sample characteristics of the model among them, according to the first paternal cumulative variable set and the second paternal cumulative variable set, the target derivative variable set is determined in the direction of variation of the best derivative variable quality, including: according to the first paternal cumulative variable set, M second parents matched by the first parent in the first parent cumulative variable set are selected from the second parent cumulative variable set through the derived variable paternal matching model to generate the candidate derived variable set; Select N derived variables with the best quality from the candidate derived variable set as the target derived variable set.
  • the method performed by the device for selecting a derivative variable of a risk identification model disclosed in the embodiment shown in FIG. 1 of the embodiment of the present specification described above may be applied to a processor or implemented by a processor.
  • the processor may be an integrated circuit chip with signal processing capabilities.
  • each step of the above method can be completed by an integrated logic circuit of hardware in the processor or instructions in the form of software.
  • the above-mentioned processor may be a general-purpose processor, including a central processing unit (CPU), a network processor (Network Processor, NP), etc.; it may also be a digital signal processor (DSP), a dedicated integrated Circuit (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components.
  • DSP digital signal processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of this specification can be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
  • the storage medium is located in the memory, and the processor reads the information in the memory and completes the steps of the above method in combination with its hardware.
  • the electronic device can also execute the method in FIG. 1 and realize the functions of the embodiment shown in FIG. 1 of the derivative variable selection device for the risk identification model, and the details of the embodiment in this specification will not be repeated here.
  • the electronic equipment in the embodiments of this specification does not exclude other implementations, such as logic devices or a combination of software and hardware, etc. That is to say, the execution body of the following processing flow is not limited to each logic Units can also be hardware or logic devices.
  • the embodiment of the present specification also proposes a computer-readable storage medium that stores one or more programs, the one or more programs include instructions, and the instructions are used in a portable electronic device that includes multiple application programs. When executed, the portable electronic device can be used to execute the method of the embodiment shown in FIG.
  • the target genetic algorithm model and the quality of the derived variables generated by its seed pool determine the target genetic algorithm model after the update
  • the seed pool of the seed pool wherein the quality of the derivative variable is used to evaluate the contribution of the derivative variable as the sample feature of the risk identification model of the target business, and the updated seed pool includes N generated by the seed pool before the update
  • the paternal set of derivative variables with the best quality according to the first paternal cumulative variable set and the second paternal cumulative variable set, the target derivative variable set is determined in the direction of variation of the best derivative variable quality, wherein the first parent
  • the current cumulative variable set and the second paternal cumulative variable set are derived variable parents selected based on the seed pool after the target genetic algorithm model is updated; if the target derived variable set meets the quality convergence condition of the derived variable, the target derived variable set is output
  • the derivative variables in are used as the sample characteristics of the risk identification model; among them, according to the first paternal cumulative variable set and the second paternal cumulative variable set,
  • a typical implementation device is a computer.
  • the computer can be, for example, a personal computer, a laptop computer, a cell phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or Any combination of these devices.
  • Computer-readable media includes permanent and non-permanent, removable and non-removable media, and information storage can be realized by any method or technology.
  • the information can be computer readable instructions, data structures, program modules, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disc (DVD) or other optical storage, Magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission media can be used to store information that can be accessed by computing devices. According to the definition in this article, computer-readable media does not include transitory media, such as modulated data signals and carrier waves.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Finance (AREA)
  • Biophysics (AREA)
  • General Business, Economics & Management (AREA)
  • Development Economics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Strategic Management (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Operations Research (AREA)
  • Biomedical Technology (AREA)
  • Computer Security & Cryptography (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Physiology (AREA)
  • Genetics & Genomics (AREA)
  • Artificial Intelligence (AREA)
  • Technology Law (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

一种用于风险识别模型的衍生变量选择方法、装置及电子设备、存储介质,涉及风险识别领域。通过根据目标遗传算法模型及其种子池生成的衍生变量的质量,确定目标遗传算法模型更新后的种子池(S11),且更新后的种子池包括更新前种子池生成的N个质量最优的衍生变量的父本集合;然后根据第一父本累积变量集合和第二父本累积变量集合,以衍生变量质量最优的变异方向确定目标衍生变量集合(S13),输出目标衍生变量集合中的衍生变量,以作为风险识别模型的样本特征(S17)。

Description

用于风险识别模型的衍生变量选择方法和装置 技术领域
本说明书实施例涉及风险识别技术领域,尤其涉及一种用于风险识别模型的衍生变量选择方法和装置。
背景技术
对风险特征进行识别,是当前很多理财应用程序、电子支付应用程序等对风险敏感度比较高的场景必备的对用户的利益进行保护的功能。基于上述的场景,用户的交易和账户风控有着极强的对抗性,对应着盗用、欺诈、套现、作弊、洗钱等多种多样的风险类型,黑产团伙、“羊毛党”等群体和个人,会有针对现有风控体系,绕过各种风险识别以盗用钱财或违规交易。究其原因,对风险识别模型进行训练的样本数据库的作为训练样本的风险特征的数量及多样性存在不足。
增加风险特征的数量及多样性的方式为:利用穷举方法进行对风险特征进行暴力衍生,然后基于预设的筛选条件(特征重要度大于预设的阈值)进行特征筛选,需要消耗大量的计算资源及时间成本,并且得到的风险特征集合的质量偏低。
发明内容
本说明书实施例的目的是提供一种用于风险识别模型的衍生变量选择方法和装置,以提高风险特征集合的选择效率和质量。
第一方面,本说明书实施例提供了一种用于风险识别模型的衍生变量选择方法,包括:根据目标遗传算法模型及其种子池生成的衍生变量的质量,确定目标遗传算法模型更新后的种子池,其中,衍生变量的质量用于评估衍生变量作为目标业务的风险识别模型的样本特征的贡献,更新后的种子池包括更新前种子池生成的N个质量最优的衍生变量的父本集合;根据第一父本累积变量集合和第二父本累积变量集合,以衍生变量质量最优的变异方向确定目标衍生变量集合,其中,第一父本累积变量集合和第二父本累积变量集合是基于目标遗传算法模型更新后的种子池选择的衍生变量父本;如果目标衍生变量集合满足衍生变量的质量收敛条件,输出目标衍生变量集合中的衍生变量,以作为风险识别模型的样本特征;其中,根据第一父本累积变量集合和第二父本累积变量集合,以衍生变量质量最优的变异方向确定目标衍生变量集合,包括:根据第一父本累积变量 集合,通过衍生变量父本匹配模型在第二父本累积变量集合中选择第一父本累积变量集合中的第一父本所匹配的M个第二父本,以生成候选衍生变量集合;在候选衍生变量集合中选择N个质量最优的衍生变量作为目标衍生变量集合。
第二方面,本说明书实施例还提供了一种用于风险识别模型的衍生变量选择装置,包括:种子池确定模块,根据目标遗传算法模型及其种子池生成的衍生变量的质量,确定目标遗传算法模型更新后的种子池,其中,衍生变量的质量用于评估衍生变量作为目标业务的风险识别模型的样本特征的贡献,更新后的种子池包括更新前种子池生成的N个质量最优的衍生变量的父本集合;衍生变量确定模块,根据第一父本累积变量集合和第二父本累积变量集合,以衍生变量质量最优的变异方向确定目标衍生变量集合,其中,第一父本累积变量集合和第二父本累积变量集合是基于目标遗传算法模型更新后的种子池选择的衍生变量父本;信息输出模块,如果目标衍生变量集合满足衍生变量的质量收敛条件,输出目标衍生变量集合中的衍生变量,以作为风险识别模型的样本特征;其中,衍生变量确定模块,具体根据第一父本累积变量集合,通过衍生变量父本匹配模型在第二父本累积变量集合中选择第一父本累积变量集合中的第一父本所匹配的M个第二父本,以生成候选衍生变量集合;在候选衍生变量集合中选择N个质量最优的衍生变量作为目标衍生变量集合。
第三方面,本说明书实施例还提供一种电子设备,包括:存储器,其上存储有计算机程序;处理器,用于执行存储器中的计算机程序,以实现:根据目标遗传算法模型及其种子池生成的衍生变量的质量,确定目标遗传算法模型更新后的种子池,其中,衍生变量的质量用于评估衍生变量作为目标业务的风险识别模型的样本特征的贡献,更新后的种子池包括更新前种子池生成的N个质量最优的衍生变量的父本集合;根据第一父本累积变量集合和第二父本累积变量集合,以衍生变量质量最优的变异方向确定目标衍生变量集合,其中,第一父本累积变量集合和第二父本累积变量集合是基于目标遗传算法模型更新后的种子池选择的衍生变量父本;如果目标衍生变量集合满足衍生变量的质量收敛条件,输出目标衍生变量集合中的衍生变量,以作为风险识别模型的样本特征;其中,根据第一父本累积变量集合和第二父本累积变量集合,以衍生变量质量最优的变异方向确定目标衍生变量集合,包括:根据第一父本累积变量集合,通过衍生变量父本匹配模型在第二父本累积变量集合中选择第一父本累积变量集合中的第一父本所匹配的M个第二父本,以生成候选衍生变量集合;在候选衍生变量集合中选择N个质量最优的衍生变量作为目标衍生变量集合。
第四方面,本说明书实施例还提供了一种存储介质,其上存储有计算机程序,该程序被处理器执行时实现:根据目标遗传算法模型及其种子池生成的衍生变量的质量,确定目标遗传算法模型更新后的种子池,其中,衍生变量的质量用于评估衍生变量作为目标业务的风险识别模型的样本特征的贡献,更新后的种子池包括更新前种子池生成的N个质量最优的衍生变量的父本集合;根据第一父本累积变量集合和第二父本累积变量集合,以衍生变量质量最优的变异方向确定目标衍生变量集合,其中,第一父本累积变量集合和第二父本累积变量集合是基于目标遗传算法模型更新后的种子池选择的衍生变量父本;如果目标衍生变量集合满足衍生变量的质量收敛条件,输出目标衍生变量集合中的衍生变量,以作为风险识别模型的样本特征;其中,根据第一父本累积变量集合和第二父本累积变量集合,以衍生变量质量最优的变异方向确定目标衍生变量集合,包括:根据第一父本累积变量集合,通过衍生变量父本匹配模型在第二父本累积变量集合中选择第一父本累积变量集合中的第一父本所匹配的M个第二父本,以生成候选衍生变量集合;在候选衍生变量集合中选择N个质量最优的衍生变量作为目标衍生变量集合。
本说明书实施例采用的上述至少一个技术方案能够达到以下有益效果:通过根据目标遗传算法模型及其种子池生成的衍生变量的质量,确定目标遗传算法模型更新后的种子池,且更新后的种子池包括更新前种子池生成的N个质量最优的衍生变量的父本集合;然后根据第一父本累积变量集合和第二父本累积变量集合,以衍生变量质量最优的变异方向确定目标衍生变量集合,其中,第一父本累积变量集合和第二父本累积变量集合是基于目标遗传算法模型更新后的种子池选择的衍生变量父本;如果目标衍生变量集合满足衍生变量的质量收敛条件,输出目标衍生变量集合中的衍生变量,以作为风险识别模型的样本特征。最终实现直接通过模型即可生成风险识别模型的样本特征,节省了大量的计算资源及时间成本,再者,通过对种子池的不断优化及对确定衍生变量质量最优的变异方向地不断优化,最终得到的风险特征集合的质量高。
附图说明
此处所说明的附图用来提供对本说明书实施例的进一步理解,构成本说明书实施例的一部分,本说明书实施例的示意性实施例及其说明用于解释本说明书实施例,并不构成对本说明书实施例的不当限定。在附图中:
图1为本说明书的一种实施例提供的用于风险识别模型的衍生变量选择方法的流程图;
图2为本说明书的一种实施例提供的业务终端与电子设备的交互示意图;
图3为本说明书的一种实施例提供的用于风险识别模型的衍生变量选择方法的流程图
图4为本说明书的一种实施例提供的用于风险识别模型的衍生变量选择方法的流程图;
图5为本说明书的一种实施例提供的用于风险识别模型的衍生变量选择装置的功能模块框图;
图6为本说明书的一种实施例提供的用于风险识别模型的衍生变量选择装置的功能模块框图;
图7为本说明书的一种实施例提供的电子设备的电路连接框图。
具体实施方式
为使本说明书实施例的目的、技术方案和优点更加清楚,下面将结合本说明书实施例具体实施例及相应的附图对本说明书实施例技术方案进行清楚、完整地描述。显然,所描述的实施例仅是本说明书实施例一部分实施例,而不是全部的实施例。基于本说明书实施例中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本说明书实施例保护的范围。
以下结合附图,详细说明本说明书实施例各实施例提供的技术方案。
请参阅图1,本说明书实施例提供了一种用于风险识别模型的衍生变量选择方法,应用于电子设备100,电子设备100可以为但不限于是服务器。如图2所示,电子设备100与业务终端200通信连接,以便进行数据交互。其中,业务终端200安装有与理财、电子支付等相关的对风险敏感度的应用程序。当用户在业务终端200进行交易时,可以将产生交易的具体操作内容发送至电子设备100,并加入种子池。所述方法包括S11~S17。
S11:根据目标遗传算法模型及其种子池生成的衍生变量的质量,确定目标遗传算法模型更新后的种子池。
其中,衍生变量的质量用于评估衍生变量作为目标业务的风险识别模型的样本特征的贡献,更新后的种子池包括更新前种子池生成的N个质量最优的衍生变量的父本集合。应理解,N为正整数。例如,总共生成了10000个衍生变量,选择N=前1000的衍生变 量的父本集合构建更新后的种子池;再例如,总共生成了10000个衍生变量,质量的满分为100,选择质量分大于70分的N个衍生变量的父本集合构建更新后的种子池。另外,目标业务可以为支付业务、转账业务等对风险敏感度高的业务。
可选地,目标遗传算法模型更新后的种子池的累积变量的数量等于目标遗传算法模型更新前的种子池的累积变量的数量。例如,更新前的种子池有1000个累积变量,更新后的种子池还是有1000个累积变量。
或者,可选地,目标遗传算法模型更新后的种子池的累积变量的数量小于目标遗传算法模型更新前的种子池的累积变量的数量。例如,更新前的种子池有1000个累积变量,更新后的种子池可以是500个累积变量,或者是200个累积变量,或者是其它小于1000的整数值。
具体地,根据目标遗传算法模型及其种子池生成的衍生变量过程可以包括:以目标业务的累积变量集合作为目标遗传算法模型的初始种子池,以预设衍生策略为目标遗传算法模型的交叉操作,以衍生变量为目标遗传算法模型的子本,以衍生变量的质量为目标遗传算法模型中子本的适应度,以从生成的衍生变量集合中选择质量大于预设阈值的衍生变量的父本累积变量构建更新的种子池作为变异操作;以根据更新的种子池生成新的衍生变量集合作为目标遗传算法模型的迭代操作,以相邻的两次迭代操作得到的衍生变量集合的质量差值小于预设的阈值作为目标遗传算法模型的收敛条件。
通过不断迭代更新种子池,可以不断提高种子池的整体质量,例如,初始的种子池中质量分大于预设的阈值的种子的数量占比为20%,下一次质量分大于预设的阈值的种子的数量占比为40%,再下次,质量分大于预设的阈值的种子的数量占比为55%,如此,逐步提高种子池的质量。
其中,累积变量的构成方式可以为但不限于包括:主体+客体+函数+时间窗+条件五个维度。比如,累积变量:用户T天内做X操作的次数,主体是用户ID,客体是操作事件ID,函数是count,时间窗是T天,条件是操作类型=X。具体地,目标业务的累积变量可以为用户在设定时间内执行目标业务的操作次数,例如,用户在3天内执行转账业务的次数,用户在1个月内执行转账业务的次数。可以理解地,累积变量对风险识别具有良好识别效果和业务解释性。
衍生变量基于至少两个累积变量衍生生成,例如,将内容相差一个维度(如时间维度)的两个累积变量进行算法操作(如,用户在1个月内执行转账业务的次数,除以用 户在3天内执行转账业务的次数),生成一个衍生变量。可以理解地,衍生变量也对风险识别具有良好识别效果和业务解释性。当然地,上述的算法不仅仅可以为相除、也可以为相乘、相加、相减等操作,具体根据实际的需求而定。
S13:根据第一父本累积变量集合和第二父本累积变量集合,以衍生变量质量最优的变异方向确定目标衍生变量集合。
其中,第一父本累积变量集合和第二父本累积变量集合是基于目标遗传算法模型更新后的种子池选择的衍生变量父本(即将更新后的种子池划分为第一父本累积变量集合和第二父本累积变量集合,假设衍生策略为相除,则将第一父本累积变量集合中的每个累积变量当做分母,将第二父本累积变量集合中的每个累积变量当做分子)。具体地,如图3所示,在S13包括:
S31:根据第一父本累积变量集合,通过衍生变量父本匹配模型在第二父本累积变量集合中选择第一父本累积变量集合中的第一父本所匹配的M个第二父本,以生成候选衍生变量集合。
其中,M为正整数。
S33:在候选衍生变量集合中选择N个质量最优的衍生变量作为目标衍生变量集合。
例如,第一父本累积变量集合包括A1到A10的10个累积变量,第二父本累积变量集合中包括B1-B10的10个累积变量,则遍历A1到A10,针对当前遍历到的第一父本,通过衍生变量父本匹配模型匹配B1-B10中的其中一个第二父本,直到所有的第一父本与第二父本匹配完毕。
S15:判断目标衍生变量集合是否满足衍生变量的质量收敛条件,如果是,则执行S17,可选地,如果否,则返回执行S11。
S17:输出目标衍生变量集合中的衍生变量,以作为风险识别模型的样本特征。
当目标衍生变量集合满足衍生变量的质量收敛条件时,说明衍生变量父本匹配模型已经具有稳定的输出,因此,不再迭代训练。
该用于风险识别模型的衍生变量选择方法,通过根据目标遗传算法模型及其种子池生成的衍生变量的质量,确定目标遗传算法模型更新后的种子池,且更新后的种子池包括更新前种子池生成的N个质量最优的衍生变量的父本集合;然后根据第一父本累积变量集合和第二父本累积变量集合,以衍生变量质量最优的变异方向确定目标衍生变量集 合,其中,第一父本累积变量集合和第二父本累积变量集合是基于目标遗传算法模型更新后的种子池选择的衍生变量父本;如果目标衍生变量集合满足衍生变量的质量收敛条件,输出目标衍生变量集合中的衍生变量,以作为风险识别模型的样本特征。最终实现直接通过模型即可生成风险识别模型的样本特征,节省了大量的计算资源及时间成本,再者,通过对种子池的不断优化及对确定衍生变量质量最优的变异方向地不断优化,最终得到的风险特征集合的质量高。
可选地,衍生变量父本匹配模型为强化学习模型,如图4所示,S12具体包括:
S41:以第一父本累积变量集合中的第一父本作为强化学习模型的状态,以第一父本匹配的第二父本的选择的概率分布作为强化学习模型的最优策略,以第二父本的选择作为强化学习模型的动作,以由第一父本和第二父本确定的衍生变量的质量作为强化学习模型的反馈收益,对强化学习模型进行训练,以得到第二累积变量集合中的各第一父本对应的第二父本。
S43:基于第一累积变量集合中的各第一父本及对应的第二父本,确定候选衍生变量集合。
可选地,S13具体可以为判断基于更新后的种子池得到的目标衍生变量集合相对于基于更新前种子池得到的目标衍生变量集合,是否满足衍生变量的质量收敛条件,则输出目标衍生变量集合中的衍生变量,以作为风险识别模型的样本特征。
例如,判断更新后的种子池得到的目标衍生变量集合的质量相对于基于更新前种子池得到的目标衍生变量集合的质量,是否在预设的阈值范围内。
可选地,目标遗传算法模型以从目标业务的累积变量集合中随机选择的累积变量集合为初始种子池。
可选地,更新后的种子池不包括更新前种子池中父本集合以外的累积变量。
具体地,更新后的种子池可以包括生成的N个质量最优的衍生变量的父本集合以及从目标业务的累积变量集合中随机选择的累积变量。其中,生成的N个质量最优的衍生变量的父本集合的占比大于等于从目标业务的累积变量集合中随机选择的累积变量。
请参阅图5,本说明书实施例还提供了一种用于风险识别模型的衍生变量选择装置500,应用于电子设备100,电子设备100可以为但不限于是服务器。如图2所示,电子设备100与业务终端200通信连接,以便进行数据交互。其中,业务终端200安装有与理财、电子支付等相关的对风险敏感度的应用程序。当用户在业务终端200进行交易时, 可以将产生交易的具体操作内容发送至电子设备100,并加入种子池。需要说明的是,本说明书实施例所提供的用于风险识别模型的衍生变量选择装置500,其基本原理及产生的技术效果和上述实施例相同,为简要描述,本说明书实施例部分未提及之处,可参考上述的实施例中相应内容。所述装置500包括种子池确定模块501、衍生变量确定模块502、信息输出模块503,其中,
种子池确定模块501,根据目标遗传算法模型及其种子池生成的衍生变量的质量,确定目标遗传算法模型更新后的种子池,其中,衍生变量的质量用于评估衍生变量作为目标业务的风险识别模型的样本特征的贡献,更新后的种子池包括更新前种子池生成的N个质量最优的衍生变量的父本集合。
可选地,目标遗传算法模型以从目标业务的累积变量集合中随机选择的累积变量集合为初始种子池。另外,更新后的种子池不包括更新前种子池中父本集合以外的累积变量。
衍生变量确定模块502,根据第一父本累积变量集合和第二父本累积变量集合,以衍生变量质量最优的变异方向确定目标衍生变量集合,其中,第一父本累积变量集合和第二父本累积变量集合是基于目标遗传算法模型更新后的种子池选择的衍生变量父本。
可选地,第一父本、第二父本均包括多个维度,第一父本与第二父本之间有一个维度的维度值不同。
信息输出模块503,如果目标衍生变量集合满足衍生变量的质量收敛条件,输出目标衍生变量集合中的衍生变量,以作为风险识别模型的样本特征。其中,
衍生变量确定模块502,具体根据第一父本累积变量集合,通过衍生变量父本匹配模型在第二父本累积变量集合中选择第一父本累积变量集合中的第一父本所匹配的M个第二父本,以生成候选衍生变量集合;在候选衍生变量集合中选择N个质量最优的衍生变量作为目标衍生变量集合。
该用于风险识别模型的衍生变量选择装置500在执行时可以实现如下功能:通过根据目标遗传算法模型及其种子池生成的衍生变量的质量,确定目标遗传算法模型更新后的种子池,且更新后的种子池包括更新前种子池生成的N个质量最优的衍生变量的父本集合;然后根据第一父本累积变量集合和第二父本累积变量集合,以衍生变量质量最优的变异方向确定目标衍生变量集合,其中,第一父本累积变量集合和第二父本累积变量集合是基于目标遗传算法模型更新后的种子池选择的衍生变量父本;如果目标衍生变量 集合满足衍生变量的质量收敛条件,输出目标衍生变量集合中的衍生变量,以作为风险识别模型的样本特征。最终实现直接通过模型即可生成风险识别模型的样本特征,节省了大量的计算资源及时间成本,再者,通过对种子池的不断优化及对确定衍生变量质量最优的变异方向地不断优化,最终得到的风险特征集合的质量高。
可选地,衍生变量父本匹配模型为强化学习模型,衍生变量确定模块,以第一父本累积变量集合中的第一父本作为强化学习模型的状态,以第一父本匹配的第二父本的选择的概率分布作为强化学习模型的最优策略,以第二父本的选择作为强化学习模型的动作,以由第一父本和第二父本确定的衍生变量的质量作为强化学习模型的反馈收益,对强化学习模型进行训练,以得到第三累积变量集合中的各第一父本对应的第二父本;基于第一累积变量集合中的各第一父本及对应的第二父本,确定候选衍生变量集合。
可选地,信息输出模块503,如果基于更新后的种子池得到的目标衍生变量集合相对于基于更新前种子池得到的目标衍生变量集合满足衍生变量的质量收敛条件,则输出目标衍生变量集合中的衍生变量,以作为风险识别模型的样本特征。
可选地,如图6所示,所述装置500还包括:进程返回模块504,如果目标衍生变量集合不满足收敛条件,则返回根据目标遗传算法模型及其种子池生成的衍生变量的质量,确定目标遗传算法模型更新后的种子池的步骤。
需要说明的是,实施例1所提供方法的各步骤的执行主体均可以是同一设备,或者,该方法也由不同设备作为执行主体。比如,步骤21和步骤22的执行主体可以为设备1,步骤23的执行主体可以为设备2;又比如,步骤21的执行主体可以为设备1,步骤22和步骤23的执行主体可以为设备2;等等。
上述对本说明书特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。
图7是本说明书实施例的一个实施例电子设备的结构示意图。请参考图7,在硬件层面,该电子设备包括处理器,可选地还包括内部总线、网络接口、存储器。其中,存储器可能包含内存,例如高速随机存取存储器(Random-Access Memory,RAM),也可能还包括非易失性存储器(non-volatile memory),例如至少1个磁盘存储器等。当然, 该电子设备还可能包括其他业务所需要的硬件。
处理器、网络接口和存储器可以通过内部总线相互连接,该内部总线可以是ISA(Industry Standard Architecture,工业标准体系结构)总线、PCI(Peripheral Component Interconnect,外设部件互连标准)总线或EISA(Extended Industry Standard Architecture,扩展工业标准结构)总线等。所述总线可以分为地址总线、数据总线、控制总线等。为便于表示,图7中仅用一个双向箭头表示,但并不表示仅有一根总线或一种类型的总线。
存储器,用于存放程序。具体地,程序可以包括程序代码,所述程序代码包括计算机操作指令。存储器可以包括内存和非易失性存储器,并向处理器提供指令和数据。
处理器从非易失性存储器中读取对应的计算机程序到内存中然后运行,在逻辑层面上形成用于风险识别模型的衍生变量选择装置。处理器,执行存储器所存放的程序,并具体用于执行以下操作:根据目标遗传算法模型及其种子池生成的衍生变量的质量,确定目标遗传算法模型更新后的种子池,其中,所述衍生变量的质量用于评估所述衍生变量作为所述目标业务的风险识别模型的样本特征的贡献,所述更新后的种子池包括更新前种子池生成的N个质量最优的衍生变量的父本集合;根据第一父本累积变量集合和第二父本累积变量集合,以衍生变量质量最优的变异方向确定目标衍生变量集合,其中,所述第一父本累积变量集合和第二父本累积变量集合是基于所述目标遗传算法模型更新后的种子池选择的衍生变量父本;如果目标衍生变量集合满足衍生变量的质量收敛条件,输出目标衍生变量集合中的衍生变量,以作为风险识别模型的样本特征;其中,根据第一父本累积变量集合和第二父本累积变量集合,以衍生变量质量最优的变异方向确定目标衍生变量集合,包括:根据第一父本累积变量集合,通过衍生变量父本匹配模型在第二父本累积变量集合中选择第一父本累积变量集合中的第一父本所匹配的M个第二父本,以生成候选衍生变量集合;在所述候选衍生变量集合中选择N个质量最优的衍生变量作为目标衍生变量集合。
上述如本说明书实施例图1所示实施例揭示的用于风险识别模型的衍生变量选择装置执行的方法可以应用于处理器中,或者由处理器实现。处理器可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器可以是通用处理器,包括中央处理器(Central Processing Unit,CPU)、网络处理器(Network Processor,NP)等;还可以是数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array, FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本说明书实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本说明书实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法的步骤。
该电子设备还可执行图1的方法,并实现用于风险识别模型的衍生变量选择装置在图1所示实施例的功能,本说明书实施例在此不再赘述。
当然,除了软件实现方式之外,本说明书实施例的电子设备并不排除其他实现方式,比如逻辑器件抑或软硬件结合的方式等等,也就是说以下处理流程的执行主体并不限定于各个逻辑单元,也可以是硬件或逻辑器件。
本说明书实施例还提出了一种计算机可读存储介质,该计算机可读存储介质存储一个或多个程序,该一个或多个程序包括指令,该指令当被包括多个应用程序的便携式电子设备执行时,能够使该便携式电子设备执行图1所示实施例的方法,并具体用于执行以下操作:根据目标遗传算法模型及其种子池生成的衍生变量的质量,确定目标遗传算法模型更新后的种子池,其中,所述衍生变量的质量用于评估所述衍生变量作为所述目标业务的风险识别模型的样本特征的贡献,所述更新后的种子池包括更新前种子池生成的N个质量最优的衍生变量的父本集合;根据第一父本累积变量集合和第二父本累积变量集合,以衍生变量质量最优的变异方向确定目标衍生变量集合,其中,所述第一父本累积变量集合和第二父本累积变量集合是基于所述目标遗传算法模型更新后的种子池选择的衍生变量父本;如果目标衍生变量集合满足衍生变量的质量收敛条件,输出目标衍生变量集合中的衍生变量,以作为风险识别模型的样本特征;其中,根据第一父本累积变量集合和第二父本累积变量集合,以衍生变量质量最优的变异方向确定目标衍生变量集合,包括:根据第一父本累积变量集合,通过衍生变量父本匹配模型在第二父本累积变量集合中选择第一父本累积变量集合中的第一父本所匹配的M个第二父本,以生成候选衍生变量集合;在所述候选衍生变量集合中选择N个质量最优的衍生变量作为目标衍生变量集合。
总之,以上所述仅为本说明书实施例的较佳实施例而已,并非用于限定本说明书实施例的保护范围。凡在本说明书实施例的精神和原则之内,所作的任何修改、等同替换、 改进等,均应包含在本说明书实施例的保护范围之内。
上述实施例阐明的***、装置、模块或单元,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机。具体的,计算机例如可以为个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任何设备的组合。
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于***实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。

Claims (11)

  1. 一种用于风险识别模型的衍生变量选择方法,包括:
    根据目标遗传算法模型及其种子池生成的衍生变量的质量,确定目标遗传算法模型更新后的种子池,其中,所述衍生变量的质量用于评估所述衍生变量作为所述目标业务的风险识别模型的样本特征的贡献,所述更新后的种子池包括更新前种子池生成的N个质量最优的衍生变量的父本集合;
    根据第一父本累积变量集合和第二父本累积变量集合,以衍生变量质量最优的变异方向确定目标衍生变量集合,其中,所述第一父本累积变量集合和第二父本累积变量集合是基于所述目标遗传算法模型更新后的种子池选择的衍生变量父本;
    如果目标衍生变量集合满足衍生变量的质量收敛条件,输出目标衍生变量集合中的衍生变量,以作为风险识别模型的样本特征;其中,
    根据第一父本累积变量集合和第二父本累积变量集合,以衍生变量质量最优的变异方向确定目标衍生变量集合,包括:
    根据第一父本累积变量集合,通过衍生变量父本匹配模型在第二父本累积变量集合中选择第一父本累积变量集合中的第一父本所匹配的M个第二父本,以生成候选衍生变量集合;
    在所述候选衍生变量集合中选择N个质量最优的衍生变量作为目标衍生变量集合。
  2. 根据权利要求1所述的方法,所述衍生变量父本匹配模型为强化学习模型,所述根据第一父本累积变量集合,通过衍生变量父本匹配模型在第二父本累积变量集合中选择第一父本累积变量集合中的第一父本所匹配的M个第二父本,以生成候选衍生变量集合,包括:
    以第一父本累积变量集合中的第一父本作为所述强化学习模型的状态,以第一父本匹配的第二父本的选择的概率分布作为所述强化学习模型的最优策略,以第二父本的选择作为所述强化学习模型的动作,以由第一父本和第二父本确定的衍生变量的质量作为所述强化学习模型的反馈收益,对所述强化学习模型进行训练,以得到第三累积变量集合中的各第一父本对应的第二父本;
    基于第一累积变量集合中的各第一父本及对应的第二父本,确定所述候选衍生变量集合。
  3. 根据权利要求1或2所述的方法,所述如果目标衍生变量集合满足衍生变量的质量收敛条件,输出目标衍生变量集合中的衍生变量,以作为风险识别模型的样本特征,包括:
    如果基于更新后的种子池得到的目标衍生变量集合相对于基于更新前种子池得到的目标衍生变量集合满足衍生变量的质量收敛条件,则输出目标衍生变量集合中的衍生变量,以作为风险识别模型的样本特征。
  4. 根据权利要求1或2所述的方法,所述目标遗传算法模型以从目标业务的累积变量集合中随机选择的累积变量集合为初始种子池。
  5. 根据权利要求1或2所述的方法,所述更新后的种子池不包括更新前种子池中所述父本集合以外的累积变量。
  6. 根据权利要求1或2所述的方法,所述第一父本、所述第二父本均包括多个维度,所述第一父本与所述第二父本之间有一个维度的维度值不同。
  7. 根据权利要求1或2所述的方法,如果目标衍生变量集合不满足收敛条件,则返回根据目标遗传算法模型及其种子池生成的衍生变量的质量,确定目标遗传算法模型更新后的种子池的步骤。
  8. 根据权利要求1或2所述的方法,
    目标遗传算法模型更新后的种子池的累积变量的数量等于目标遗传算法模型更新前的种子池的累积变量的数量;或者
    目标遗传算法模型更新后的种子池的累积变量的数量小于目标遗传算法模型更新前的种子池的累积变量的数量。
  9. 一种用于风险识别模型的衍生变量选择装置,包括:
    种子池确定模块,根据目标遗传算法模型及其种子池生成的衍生变量的质量,确定目标遗传算法模型更新后的种子池,其中,所述衍生变量的质量用于评估所述衍生变量作为所述目标业务的风险识别模型的样本特征的贡献,所述更新后的种子池包括更新前种子池生成的N个质量最优的衍生变量的父本集合;
    衍生变量确定模块,根据第一父本累积变量集合和第二父本累积变量集合,以衍生变量质量最优的变异方向确定目标衍生变量集合,其中,所述第一父本累积变量集合和第二父本累积变量集合是基于所述目标遗传算法模型更新后的种子池选择的衍生变量父本;
    信息输出模块,如果目标衍生变量集合满足衍生变量的质量收敛条件,输出目标衍生变量集合中的衍生变量,以作为风险识别模型的样本特征;其中,
    所述衍生变量确定模块,具体根据第一父本累积变量集合,通过衍生变量父本匹配模型在第二父本累积变量集合中选择第一父本累积变量集合中的第一父本所匹配的M个第二父本,以生成候选衍生变量集合;在所述候选衍生变量集合中选择N个质量最优的衍生变量作为目标衍生变量集合。
  10. 一种电子设备,包括:
    存储器,其上存储有计算机程序;
    处理器,用于执行所述存储器中的所述计算机程序,以实现:
    根据目标遗传算法模型及其种子池生成的衍生变量的质量,确定目标遗传算法模型更新后的种子池,其中,所述衍生变量的质量用于评估所述衍生变量作为所述目标业务的风险识别模型的样本特征的贡献,所述更新后的种子池包括更新前种子池生成的N个质量最优的衍生变量的父本集合;
    根据第一父本累积变量集合和第二父本累积变量集合,以衍生变量质量最优的变异方向确定目标衍生变量集合,其中,所述第一父本累积变量集合和第二父本累积变量集合是基于所述目标遗传算法模型更新后的种子池选择的衍生变量父本;
    如果目标衍生变量集合满足衍生变量的质量收敛条件,输出目标衍生变量集合中的衍生变量,以作为风险识别模型的样本特征;其中,
    根据第一父本累积变量集合和第二父本累积变量集合,以衍生变量质量最优的变异方向确定目标衍生变量集合,包括:
    根据第一父本累积变量集合,通过衍生变量父本匹配模型在第二父本累积变量集合中选择第一父本累积变量集合中的第一父本所匹配的M个第二父本,以生成候选衍生变量集合;
    在所述候选衍生变量集合中选择N个质量最优的衍生变量作为目标衍生变量集合。
  11. 一种存储介质,其上存储有计算机程序,该程序被处理器执行时实现:
    根据目标遗传算法模型及其种子池生成的衍生变量的质量,确定目标遗传算法模型更新后的种子池,其中,所述衍生变量的质量用于评估所述衍生变量作为所述目标业务的风险识别模型的样本特征的贡献,所述更新后的种子池包括更新前种子池生成的N个质量最优的衍生变量的父本集合;
    根据第一父本累积变量集合和第二父本累积变量集合,以衍生变量质量最优的变异方向确定目标衍生变量集合,其中,所述第一父本累积变量集合和第二父本累积变量集合是基于所述目标遗传算法模型更新后的种子池选择的衍生变量父本;
    如果目标衍生变量集合满足衍生变量的质量收敛条件,输出目标衍生变量集合中的衍生变量,以作为风险识别模型的样本特征;其中,
    根据第一父本累积变量集合和第二父本累积变量集合,以衍生变量质量最优的变异方向确定目标衍生变量集合,包括:
    根据第一父本累积变量集合,通过衍生变量父本匹配模型在第二父本累积变量集合中选择第一父本累积变量集合中的第一父本所匹配的M个第二父本,以生成候选衍生变量集合;
    在所述候选衍生变量集合中选择N个质量最优的衍生变量作为目标衍生变量集合。
PCT/CN2021/073963 2020-03-31 2021-01-27 用于风险识别模型的衍生变量选择方法和装置 WO2021196843A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010244612.0 2020-03-31
CN202010244612.0A CN111461892B (zh) 2020-03-31 2020-03-31 用于风险识别模型的衍生变量选择方法和装置

Publications (1)

Publication Number Publication Date
WO2021196843A1 true WO2021196843A1 (zh) 2021-10-07

Family

ID=71683398

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/073963 WO2021196843A1 (zh) 2020-03-31 2021-01-27 用于风险识别模型的衍生变量选择方法和装置

Country Status (2)

Country Link
CN (1) CN111461892B (zh)
WO (1) WO2021196843A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111461892B (zh) * 2020-03-31 2021-07-06 支付宝(杭州)信息技术有限公司 用于风险识别模型的衍生变量选择方法和装置

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080154814A1 (en) * 2006-12-22 2008-06-26 American Express Travel Related Services Company, Inc. Automated Predictive Modeling
CN108346098A (zh) * 2018-01-19 2018-07-31 阿里巴巴集团控股有限公司 一种风控规则挖掘的方法及装置
CN108875815A (zh) * 2018-06-04 2018-11-23 深圳市研信小额贷款有限公司 特征工程变量确定方法及装置
CN110046799A (zh) * 2019-03-08 2019-07-23 阿里巴巴集团控股有限公司 决策优化方法及装置
CN110472742A (zh) * 2019-07-11 2019-11-19 阿里巴巴集团控股有限公司 一种模型变量确定方法、装置及设备
CN110503566A (zh) * 2019-07-08 2019-11-26 中国平安人寿保险股份有限公司 风控模型建立方法、装置、计算机设备及存储介质
CN111461892A (zh) * 2020-03-31 2020-07-28 支付宝(杭州)信息技术有限公司 用于风险识别模型的衍生变量选择方法和装置

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040215551A1 (en) * 2001-11-28 2004-10-28 Eder Jeff S. Value and risk management system for multi-enterprise organization
CN101782976B (zh) * 2010-01-15 2013-04-10 南京邮电大学 一种云计算环境下机器学习自动选择方法
US10043591B1 (en) * 2015-02-06 2018-08-07 Brain Trust Innovations I, Llc System, server and method for preventing suicide
CN109492844B (zh) * 2017-09-12 2022-04-15 杭州蚂蚁聚慧网络技术有限公司 业务策略的生成方法和装置
CN107679985B (zh) * 2017-09-12 2021-01-05 创新先进技术有限公司 风险特征筛选、描述报文生成方法、装置以及电子设备
CN107862468A (zh) * 2017-11-23 2018-03-30 深圳市智物联网络有限公司 设备风险识别模型建立的方法及装置
CN108460523B (zh) * 2018-02-12 2020-08-21 阿里巴巴集团控股有限公司 一种风控规则生成方法和装置
US20190325528A1 (en) * 2018-04-24 2019-10-24 Brighterion, Inc. Increasing performance in anti-money laundering transaction monitoring using artificial intelligence
CN109191283A (zh) * 2018-08-30 2019-01-11 成都数联铭品科技有限公司 风险预警方法及***
CN109523118A (zh) * 2018-10-11 2019-03-26 平安科技(深圳)有限公司 风险数据筛选方法、装置、计算机设备和存储介质
CN109711435A (zh) * 2018-12-03 2019-05-03 三峡大学 一种基于遗传算法的支持向量机在线电压稳定性监测方法
CN109816090A (zh) * 2019-02-15 2019-05-28 南京邮电大学 一种基于离散变量的改进型高光谱端元提取方法
CN110008991B (zh) * 2019-02-26 2023-05-02 创新先进技术有限公司 风险事件的识别、风险识别模型的生成方法、装置、设备及介质
CN110442712B (zh) * 2019-07-05 2023-08-22 创新先进技术有限公司 风险的确定方法、装置、服务器和文本审理***
CN110458572B (zh) * 2019-07-08 2023-11-24 创新先进技术有限公司 用户风险的确定方法和目标风险识别模型的建立方法
CN110503296B (zh) * 2019-07-08 2022-05-06 招联消费金融有限公司 测试方法、装置、计算机设备和存储介质
CN110852444A (zh) * 2019-10-11 2020-02-28 支付宝(杭州)信息技术有限公司 用于确定机器学习模型的衍生变量的方法及装置

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080154814A1 (en) * 2006-12-22 2008-06-26 American Express Travel Related Services Company, Inc. Automated Predictive Modeling
CN108346098A (zh) * 2018-01-19 2018-07-31 阿里巴巴集团控股有限公司 一种风控规则挖掘的方法及装置
CN108875815A (zh) * 2018-06-04 2018-11-23 深圳市研信小额贷款有限公司 特征工程变量确定方法及装置
CN110046799A (zh) * 2019-03-08 2019-07-23 阿里巴巴集团控股有限公司 决策优化方法及装置
CN110503566A (zh) * 2019-07-08 2019-11-26 中国平安人寿保险股份有限公司 风控模型建立方法、装置、计算机设备及存储介质
CN110472742A (zh) * 2019-07-11 2019-11-19 阿里巴巴集团控股有限公司 一种模型变量确定方法、装置及设备
CN111461892A (zh) * 2020-03-31 2020-07-28 支付宝(杭州)信息技术有限公司 用于风险识别模型的衍生变量选择方法和装置

Also Published As

Publication number Publication date
CN111461892B (zh) 2021-07-06
CN111461892A (zh) 2020-07-28

Similar Documents

Publication Publication Date Title
CN108460523B (zh) 一种风控规则生成方法和装置
US11074350B2 (en) Method and device for controlling data risk
CN109978538B (zh) 确定欺诈用户、训练模型、识别欺诈风险的方法及装置
TW201939379A (zh) 資訊轉化率的預測、資訊推薦方法和裝置
US20190066109A1 (en) Long-term short-term cascade modeling for fraud detection
WO2020063116A1 (zh) 一种风险保障产品的推送方法、装置及电子设备
US20210097541A1 (en) Knowledge neighbourhoods for evaluating business events
US11397950B2 (en) Systems and methods for authenticating an electronic transaction
CN110633989A (zh) 一种风险行为生成模型的确定方法及装置
CN111475851A (zh) 基于机器学习的隐私数据处理方法、装置及电子设备
CN111260368A (zh) 一种账户交易风险判断方法、装置及电子设备
CN114187112A (zh) 账户风险模型的训练方法和风险用户群体的确定方法
CN104574126A (zh) 一种用户特征识别方法及装置
CN111582872A (zh) 异常账号检测模型训练、异常账号检测方法、装置及设备
WO2021196843A1 (zh) 用于风险识别模型的衍生变量选择方法和装置
CN110008986B (zh) 批量风险案件识别方法、装置及电子设备
WO2019144808A1 (zh) 判定虚假资源转移及虚假交易的方法、装置及电子设备
WO2020177477A1 (zh) 一种信用服务推荐方法、装置及设备
WO2023185125A1 (zh) 产品资源的数据处理方法及装置、电子设备、存储介质
CN115564450B (zh) 一种风控方法、装置、存储介质及设备
CN111275071A (zh) 预测模型训练、预测方法、装置及电子设备
CN113159834B (zh) 一种商品信息排序方法、装置以及设备
US20190026742A1 (en) Accounting for uncertainty when calculating profit efficiency
CN110851655B (zh) 一种用于简化复杂网络的方法和***
CN112184074A (zh) 一种提示信息输出方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21781273

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21781273

Country of ref document: EP

Kind code of ref document: A1