CN116029766A

CN116029766A - User transaction decision recognition method, incentive strategy optimization method, device and equipment

Info

Publication number: CN116029766A
Application number: CN202310119126.XA
Authority: CN
Inventors: 高兰天
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2023-02-06
Filing date: 2023-02-06
Publication date: 2023-04-28

Abstract

The present disclosure provides a user transaction decision recognition method, which may be applied to the technical field of artificial intelligence. The method comprises the following steps: clustering the user transaction sample data for one time by using a time sequence clustering algorithm to obtain transaction pattern clustering data; clustering transaction mode clustering data corresponding to the ith user transaction mode for the second time to obtain a user transaction behavior template; calculating the similarity between the historical data of the user transaction behavior and each user transaction behavior template based on a dynamic time warping algorithm, and acquiring a user transaction behavior matching template and a user transaction state set; and identifying a user transaction behavior decision path based on a Markov decision process model using the set of user transaction states. The present disclosure also provides an incentive strategy optimization method, apparatus, device, medium, and program product.

Description

User transaction decision recognition method, incentive strategy optimization method, device and equipment

Technical Field

The present disclosure relates to the field of artificial intelligence technology or finance, and in particular, to a user transaction decision recognition method, incentive policy optimization method, apparatus, device, medium, and program product.

Background

At present, the marketing scheme is formulated widely, and the existing user data resources are not fully utilized to cause great waste of computing resources according to expert experience or copying inherent marketing means, so that fine marketing cannot be achieved. How to fully utilize big data resources to realize the formulation of the fine personalized marketing strategy for users has important significance for resource allocation, data utilization and promotion of marketing management efficiency. The present disclosure finds that the identification of user transaction decision paths should be beneficial to improving marketing management efficiency. However, since the user behavior mode and the current behavior state are difficult to master, the user section data are screened only by means of the fixed indexes to define the guest group list, the characteristics of customers reflected on the transaction behavior mode based on the unobservable variables such as attitude, value, life style or character are difficult to capture, the waste of user transaction data resources is caused, the data utilization rate is low, the user transaction mode and the user decision path are difficult to accurately identify, and the establishment of more efficient resource allocation and accurate marketing strategies is not facilitated.

Disclosure of Invention

In view of the foregoing, embodiments of the present disclosure provide a user transaction decision recognition method, incentive strategy optimization method, apparatus, device, medium, and program product.

According to a first aspect of the present disclosure, there is provided a user transaction decision recognition method, comprising: clustering the user transaction sample data for one time by using a time sequence clustering algorithm to obtain transaction pattern clustering data, wherein the one-time clustering comprises the step of forming a mapping relation between the user transaction sample data and a user transaction pattern, and the user transaction pattern at least comprises two types; carrying out secondary clustering on transaction mode clustering data corresponding to an ith user transaction mode to obtain user transaction behavior templates, wherein the user transaction behavior templates at least comprise two types, the user transaction behavior templates comprise user time sequence behavior sets, and the user time sequence behavior sets comprise continuous behavior features based on time sequences; calculating the similarity between the historical data of the user transaction behavior and each user transaction behavior template based on a dynamic time warping algorithm, and acquiring a user transaction behavior matching template and a user transaction state set, wherein the user transaction state set is acquired based on the user transaction behavior matching template; and identifying a user transaction behavior decision path based on a Markov decision process model by using the user transaction state set, wherein the user transaction behavior decision path is determined based on user state transition probability, and the user state transition probability is obtained through calculation of user real-time transaction behaviors and the user transaction state set.

According to an embodiment of the disclosure, the clustering the user transaction sample data once by using a time sequence clustering algorithm, and obtaining transaction pattern clustering data includes: acquiring user transaction sample data, wherein the user transaction sample data comprises transaction original data corresponding to a plurality of user transaction behavior nodes, and the user transaction state nodes have time sequence characteristics; preprocessing the user transaction original data to obtain a user continuous behavior signal, wherein the preprocessing comprises at least one of normalization processing, recoding, pre-weighting and endpoint detection; performing dimension reduction on the user continuous behavior signal based on an automatic encoder to obtain a low-dimension user characteristic signal; and carrying out time sequence clustering on the low-dimensional user characteristic signals to obtain the transaction pattern clustering data.

According to an embodiment of the disclosure, the performing secondary clustering on the transaction pattern clustering data corresponding to the ith user transaction pattern, and obtaining the user transaction behavior template includes: dividing the transaction pattern clustering data to obtain transaction state data, wherein the transaction state data is associated with a user state in a transaction pattern; clustering the transaction state data by using a time sequence clustering algorithm to obtain transaction state clustering data; and constructing a user transaction behavior template by using the transaction state clustering data, comprising: and marking service meanings of time sequence fragments in the transaction state clustering data based on user transaction behaviors to form a corresponding relation between the user transaction states and user behavior feature vectors, wherein the user transaction behavior template comprises n sections of user behavior feature vectors with time sequence features.

According to an embodiment of the present disclosure, the calculating, based on the dynamic time warping algorithm, a similarity between the user transaction behavior history data and each user transaction behavior template, and the obtaining a user transaction behavior matching template includes: acquiring user transaction behavior history data, wherein the user transaction behavior history data comprises a characteristic time sequence corresponding to single-time transaction behavior of a single user; calculating the shortest distance path between the user transaction behavior historical data and the time sequence of each user transaction behavior template by using a dynamic time warping algorithm based on Manhattan distance, and obtaining the similarity between the user transaction behavior historical data and each user transaction behavior template; and taking the user transaction behavior template with the highest similarity as a user transaction behavior matching template.

According to an embodiment of the present disclosure, obtaining the set of user transaction states includes: and acquiring the user transaction state set based on the user transaction mode corresponding to the user transaction behavior matching template.

According to an embodiment of the present disclosure, using the set of user transaction states, identifying a user transaction behavior decision path based on a markov decision process model includes: acquiring a user behavior set based on the user transaction behavior matching template, wherein the user behavior set comprises at least one behavior subset, and the behavior subset comprises a limited plurality of transaction behaviors of a user in the current user transaction behavior matching template; acquiring a user state transition probability matrix based on the user behavior set and the user transaction state set, wherein the user state transition probability matrix is used for identifying the probability of a user transitioning from an a state to a b state after executing a behavior k, the a state and the b state are elements in the user transaction state set, and the behavior k is an element in the user behavior set; acquiring a user initial state and user real-time transaction behaviors, and identifying a user transaction behavior decision path based on the user initial state, wherein the user real-time transaction behaviors and the user state transition probability matrix comprise a plurality of user behaviors with time sequence characteristics, and each user behavior is a user behavior executed when the current state is transited to the next maximum transition probability state.

A second aspect of the present disclosure provides an incentive strategy optimization method comprising: acquiring a user transaction behavior decision path of a sample user; clustering the sample users based on user transaction behavior decision paths of the sample users to obtain an excitation strategy evaluation sample set, wherein the excitation strategy evaluation sample set comprises at least one excitation strategy evaluation sample subset, and the sample users in the excitation strategy evaluation sample subset have the same class of cluster centers; evaluating an incentive strategy in an incentive strategy set based on the incentive strategy evaluation sample set, wherein the incentive strategy set comprises at least one incentive strategy, the incentive strategy comprises strategy optimization parameters, the strategy optimization parameters are used for adjusting user state transition probability, and a mapping relation exists between an incentive strategy evaluation sample subset and the incentive strategy; and adjusting policy optimization parameters of the incentive policy based on the evaluation result until an optimal incentive policy corresponding to the incentive policy evaluation sample subset is obtained, wherein the user transaction behavior decision path is obtained based on the method disclosed by the embodiment of the disclosure.

According to an embodiment of the disclosure, the policy optimization parameters include an incentive occasion and an incentive resource, and the evaluating the incentive policy set based on the incentive policy evaluation sample set includes: calculating a first success rate, which is a user rate in the sample subset corresponding to the incentive strategy evaluation to reach a successful transaction state; for the excitation strategy corresponding to the excitation strategy evaluation sample subset, taking the excitation strategy as user state transition feedback at a user transaction behavior decision path node meeting excitation time, and adjusting user state transition probability; calculating a second success ratio, wherein the second success ratio is the user ratio reaching the successful transaction state in the excitation strategy evaluation sample subset after the user state transition probability is adjusted; calculating a ratio of the second successful duty cycle to the first successful duty cycle, and obtaining a user transaction success increment of the incentive strategy corresponding to the incentive strategy evaluation sample subset.

According to an embodiment of the present disclosure, the adjusting the policy optimization parameters of the incentive policy based on the evaluation result until obtaining an optimal incentive policy corresponding to the subset of incentive policy evaluation samples includes: performing iterative optimization on the excitation strategy until the excitation strategy when the increment of successful user transaction reaches the maximum value is obtained; the optimal incentive strategy is set to a preferred incentive strategy corresponding to the highest increment of user transaction success, wherein the p-th iterative optimization comprises: adjusting excitation resources and/or excitation opportunities in the excitation strategy to obtain an excitation strategy optimized for the p-th time; and evaluating the p-th optimized incentive strategy, and calculating a user transaction success increment corresponding to the p-th optimized incentive strategy.

A third aspect of the present disclosure provides a user transaction decision-making device, comprising: the primary clustering module is configured to perform primary clustering on the user transaction sample data by using a time sequence clustering algorithm to obtain transaction pattern clustering data, wherein the primary clustering comprises the step of forming a mapping relation between the user transaction sample data and a user transaction pattern, and the user transaction pattern at least comprises two types; the secondary clustering module is configured to perform secondary clustering on transaction mode clustering data corresponding to an ith user transaction mode to obtain user transaction behavior templates, wherein the user transaction behavior templates at least comprise two types, the user transaction behavior templates comprise user time sequence behavior sets, and the user time sequence behavior sets comprise continuous behavior features based on time sequences; the matching module is configured to calculate the similarity between the historical data of the user transaction behavior and each user transaction behavior template based on a dynamic time warping algorithm, and acquire a user transaction behavior matching template and a user transaction state set, wherein the user transaction behavior matching template is the user transaction behavior template with the highest similarity with the historical data of the user transaction behavior, and the user transaction state set is acquired based on the user transaction behavior matching template; and a path identifying module configured to identify a user transaction behavior decision path based on a markov decision process model using the set of user transaction states, wherein the user transaction behavior decision path is determined based on a user state transition probability obtained by calculation of a user real-time transaction behavior and the set of user transaction states.

According to an embodiment of the present disclosure, the primary clustering module includes a sample acquisition sub-module, a preprocessing sub-module, a dimension reduction sub-module, and a pattern clustering sub-module. The sample acquisition sub-module is configured to acquire user transaction sample data including transaction raw data corresponding to a plurality of user transaction behavior nodes, the user transaction state nodes having timing characteristics. The preprocessing sub-module is configured to preprocess the user transaction original data to obtain a user continuous behavior signal, wherein the preprocessing comprises at least one of normalization processing, recoding, pre-weighting and endpoint detection. The dimension reduction submodule is configured to reduce dimension of the user continuous behavior signal based on an automatic encoder to obtain a low-dimension user characteristic signal. And the pattern clustering sub-module is configured to perform time sequence clustering on the low-dimensional user characteristic signals to acquire the transaction pattern clustering data.

According to an embodiment of the present disclosure, the secondary clustering module includes a segmentation sub-module, a state clustering sub-module, and a template construction sub-module. The segmentation sub-module is configured to segment the transaction pattern clustering data to obtain transaction state data, wherein the transaction state data is associated with a user state in a transaction pattern. The state clustering sub-module is configured to cluster the transaction state data by using a time sequence clustering algorithm to obtain transaction state clustering data. The template construction sub-module is configured to construct a user transaction behavior template using the transaction state cluster data, comprising: and marking service meanings of time sequence fragments in the transaction state clustering data based on user transaction behaviors to form a corresponding relation between the user transaction states and user behavior feature vectors, wherein the user transaction behavior template comprises n sections of user behavior feature vectors with time sequence features.

According to an embodiment of the present disclosure, the matching module includes a history data acquisition sub-module, a path computation sub-module, a template matching sub-module, and a state set acquisition sub-module. The historical data acquisition sub-module is configured to acquire user transaction behavior historical data including a characteristic time series corresponding to a single user's single transaction behavior. The path calculation sub-module is configured to calculate the shortest distance path between the user transaction behavior historical data and the time sequence of each user transaction behavior template by using a dynamic time warping algorithm based on Manhattan distance, and obtain the similarity between the user transaction behavior historical data and each user transaction behavior template. The template matching sub-module is configured to take the user transaction behavior template with the highest similarity as the user transaction behavior matching template. The state set acquisition sub-module is configured to acquire the user transaction state set based on a user transaction pattern corresponding to the user transaction behavior matching template.

According to an embodiment of the present disclosure, the path recognition module includes a behavior set acquisition sub-module, a probability matrix calculation sub-module, and a decision path recognition sub-module. The behavior set acquisition sub-module is configured to acquire a user behavior set based on the user transaction behavior matching template, the user behavior set including at least one behavior subset including a limited number of transaction behaviors of the user in the current user transaction behavior matching template. The probability matrix calculation sub-module is configured to obtain a user state transition probability matrix based on the user behavior set and the user transaction state set, wherein the user state transition probability matrix is used for identifying the probability of a user transitioning from an a state to a b state after performing a behavior k, the a state and the b state are elements in the user transaction state set, and the behavior k is an element in the user behavior set. The decision path identification sub-module is configured to acquire a user initial state and a user real-time transaction behavior, and identify a user transaction behavior decision path based on the user initial state, the user real-time transaction behavior and the user state transition probability matrix, wherein the user transaction behavior decision path comprises a plurality of user behaviors with time sequence characteristics, and each user behavior is a user behavior executed when the current state transitions to the next maximum transition probability state.

A fourth aspect of the present disclosure provides an excitation strategy optimization device, including: the system comprises an acquisition module, a user clustering module, an evaluation module and an optimization module. The acquiring module is configured to acquire a user transaction behavior decision path of the sample user, where the acquiring module may have the same function as the user transaction decision identifying device of the embodiment of the present disclosure, and is not described herein. The user clustering module is configured to cluster the sample users based on their user transaction behavior decision paths, and obtain an incentive policy evaluation sample set comprising at least one subset of incentive policy evaluation samples, the sample users in the subset of incentive policy evaluation samples having a same class of cluster centers. The evaluation module is configured to evaluate incentive policies in an incentive policy set based on the incentive policy evaluation sample set, wherein the incentive policy set comprises at least one incentive policy comprising policy optimization parameters for adjusting user state transition probabilities, the incentive policy evaluation sample subset having a mapping relationship with the incentive policies. The optimization module is configured to adjust policy optimization parameters of the incentive policy based on the evaluation result until an optimal incentive policy corresponding to the subset of incentive policy evaluation samples is obtained.

According to an embodiment of the present disclosure, the evaluation module includes a first calculation sub-module, an excitation sub-module, a second calculation sub-module, and a third calculation sub-module. The first calculation sub-module is configured to calculate a first success rate, which is a user rate in the subset of samples that reaches a transaction success state corresponding to the incentive policy evaluation. The incentive sub-module is configured to evaluate incentive strategies of the sample subset corresponding to the incentive strategies, take the incentive strategies as user state transition feedback at user transaction behavior decision path nodes meeting incentive occasions, and adjust user state transition probabilities. The second calculation submodule calculates a second success duty ratio, wherein the second success duty ratio is the user duty ratio of the excitation strategy evaluation sample subset reaching the successful transaction state after the user state transition probability is adjusted. A third calculation sub-module calculates a ratio of the second successful duty cycle to the first successful duty cycle, obtaining a user transaction success delta for the incentive strategy corresponding to the subset of incentive strategy evaluation samples.

According to an embodiment of the disclosure, the optimization module includes an iteration sub-module and a screening sub-module. The iteration sub-module is configured to perform iteration optimization on the excitation strategy until the excitation strategy when the increment of success of user transaction reaches the maximum value is obtained. Wherein the p-th iterative optimization comprises: adjusting excitation resources and/or excitation opportunities in the excitation strategy to obtain an excitation strategy optimized for the p-th time; and evaluating the p-th optimized incentive strategy, and calculating a user transaction success increment corresponding to the p-th optimized incentive strategy. The screening sub-module is configured to take the preferred incentive strategy corresponding to the highest user transaction success delta as the optimal incentive strategy.

A fifth aspect of the present disclosure provides an electronic device, comprising: one or more processors; and a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the user transaction decision recognition method or incentive policy optimization method described above.

A sixth aspect of the present disclosure also provides a computer readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform the user transaction decision recognition method or incentive policy optimization method described above.

A seventh aspect of the present disclosure also provides a computer program product comprising a computer program which, when executed by a processor, implements the above-described user transaction decision recognition method or incentive policy optimization method.

The method provided by the embodiment of the disclosure at least partially overcomes the defect that the utilization of the user transaction behavior data and the resource allocation are unreasonable in the prior art, comprehensively utilizes a time sequence clustering algorithm, a dynamic time warping algorithm and a Markov decision process technology, and combines the user transaction data to deeply explore the user transaction behavior and decision path recognition so as to realize the dynamic analysis of the user behavior. The quantitative identification of the user transaction behavior mode and the decision path is realized, the utilization rate of the user transaction data is improved, the full configuration of calculation and excitation resources is realized, and the method is favorable for making a fine personalized marketing decision.

Drawings

The foregoing and other objects, features and advantages of the disclosure will be more apparent from the following description of embodiments of the disclosure with reference to the accompanying drawings, in which:

fig. 1 schematically illustrates an application scenario diagram of a user transaction decision recognition method, an incentive policy optimization method according to an embodiment of the present disclosure.

Fig. 2 schematically illustrates a flow chart of a user transaction decision recognition method according to an embodiment of the disclosure.

Fig. 3 schematically illustrates a flowchart of a method for clustering user transaction sample data once using a time series clustering algorithm to obtain transaction pattern cluster data, according to an embodiment of the present disclosure.

Fig. 4 schematically illustrates a flowchart of a method of secondarily clustering transaction pattern cluster data corresponding to an ith user transaction pattern to obtain a user transaction behavioral template according to an embodiment of the disclosure.

Fig. 5 schematically illustrates a flowchart of a method of constructing a user transaction behavioral template using the transaction state cluster data, according to an embodiment of the disclosure.

Fig. 6 schematically illustrates a schematic diagram of a method of data preprocessing and feature extraction according to an embodiment of the present disclosure.

Fig. 7 schematically illustrates a flowchart of a method for computing similarity of user transaction behavior history data and each user transaction behavior template based on a dynamic time warping algorithm to obtain a user transaction behavior matching template according to an embodiment of the disclosure.

Fig. 8A schematically illustrates a schematic diagram of a method for user behavior feature recognition based on a dynamic time warping algorithm according to an embodiment of the disclosure.

Fig. 8B schematically illustrates a schematic diagram of a method for user behavior feature recognition based on a dynamic time warping algorithm of manhattan distance according to an embodiment of the present disclosure.

Fig. 9 schematically illustrates a flow chart of a method of obtaining the set of user transaction states, according to an embodiment of the disclosure.

FIG. 10 schematically illustrates a flow chart of a method for identifying user transaction behavior decision paths based on a Markov decision process model utilizing the set of user transaction states, according to an embodiment of the disclosure.

FIG. 11 schematically illustrates a flow chart of an incentive strategy optimization method in accordance with an embodiment of the present disclosure.

FIG. 12 schematically illustrates a flowchart of a method of evaluating a set of incentive policies based on a set of incentive policy evaluation samples, in accordance with an embodiment of the disclosure.

Fig. 13 schematically illustrates a flowchart of a method of adjusting policy optimization parameters of an incentive policy based on evaluation results until an optimal incentive policy corresponding to a subset of incentive policy evaluation samples is obtained, in accordance with an embodiment of the present disclosure.

Fig. 14 schematically illustrates a flowchart of a method of optimization of the p-th iteration according to an embodiment of the present disclosure.

Fig. 15 schematically illustrates a block diagram of a user transaction decision-making device according to an embodiment of the disclosure.

Fig. 16 schematically illustrates a block diagram of the primary clustering module according to an embodiment of the present disclosure.

Fig. 17 schematically illustrates a block diagram of a secondary clustering module according to an embodiment of the disclosure.

Fig. 18 schematically shows a block diagram of a matching module according to an embodiment of the present disclosure.

Fig. 19 schematically illustrates a block diagram of a path recognition module according to an embodiment of the present disclosure.

Fig. 20 schematically illustrates a block diagram of a motivational strategy optimization device according to an embodiment of the present disclosure.

Fig. 21 schematically illustrates a block diagram of an evaluation module according to an embodiment of the disclosure.

FIG. 22 schematically illustrates a block diagram of the optimization module according to an embodiment of the disclosure.

Fig. 23 schematically illustrates a block diagram of an electronic device adapted to implement a user transaction decision-making recognition method according to an embodiment of the disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is only exemplary and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the present disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.

Where expressions like at least one of "A, B and C, etc. are used, the expressions should generally be interpreted in accordance with the meaning as commonly understood by those skilled in the art (e.g.," a system having at least one of A, B and C "shall include, but not be limited to, a system having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).

In the process of modern business marketing, how to fully utilize the existing data resources to realize the formulation of the fine personalized marketing strategy for the user is significant for data utilization, calculation and marketing resource allocation and promotion of marketing management efficiency. The traditional marketing means based on expert experience or replication depends on subjective experience rules, so that scientific and objective establishment of a refined marketing strategy is difficult to realize, and great waste of the existing user data resources and unreasonable and insufficient configuration of service resources are caused. The present disclosure finds that the identification of a user transaction decision path is advantageous in solving the pain point problem described above. However, in the process of identifying the user transaction decision path, since the user behavior mode and the current behavior state are difficult to master, the user section data are screened by means of fixed indexes to define a customer group list, so that the characteristics of customers reflected on the transaction behavior mode based on unobservable variables such as attitude, value view, life style or character are difficult to capture, user transaction data resources cannot be fully mined, the data utilization rate is low, the flexible configuration of data and business resources is not facilitated, and the accuracy and objectivity of identifying the user transaction mode and decision path are further not facilitated.

Further, after identifying the decision paths of the user transaction, how to finely group the users by using the identified decision paths and formulate an optimized marketing incentive strategy is also a problem to be solved.

Based on the above-mentioned problems existing in the prior art, an embodiment of the present disclosure provides a user transaction decision recognition method, including clustering user transaction sample data once by using a time sequence clustering algorithm to obtain transaction pattern clustering data, where the clustering once includes forming a mapping relationship between the user transaction sample data and a user transaction pattern, and the user transaction pattern includes at least two types; carrying out secondary clustering on transaction mode clustering data corresponding to an ith user transaction mode to obtain user transaction behavior templates, wherein the user transaction behavior templates at least comprise two types, the user transaction behavior templates comprise user time sequence behavior sets, and the user time sequence behavior sets comprise continuous behavior features based on time sequences; calculating the similarity between the historical data of the user transaction behavior and each user transaction behavior template based on a dynamic time warping algorithm, and acquiring a user transaction behavior matching template and a user transaction state set, wherein the user transaction behavior matching template is the user transaction behavior template with the highest similarity with the historical data of the user transaction behavior, and the user transaction state set is acquired based on the user transaction behavior matching template; and identifying a user transaction behavior decision path based on a Markov decision process model by using the user transaction state set, wherein the user transaction behavior decision path is determined based on user state transition probability, and the user state transition probability is obtained through calculation of user real-time transaction behaviors and the user transaction state set.

Further, an embodiment of the present disclosure further provides an excitation policy optimization method, including: acquiring a user transaction behavior decision path of a sample user; clustering the sample users based on user transaction behavior decision paths of the sample users to obtain an excitation strategy evaluation sample set, wherein the excitation strategy evaluation sample set comprises at least one excitation strategy evaluation sample subset, and the sample users in the excitation strategy evaluation sample subset have the same class of cluster centers; evaluating an incentive strategy in an incentive strategy set based on the incentive strategy evaluation sample set, wherein the incentive strategy set comprises at least one incentive strategy, the incentive strategy comprises strategy optimization parameters, the strategy optimization parameters are used for adjusting user state transition probability, and a mapping relation exists between an incentive strategy evaluation sample subset and the incentive strategy; and adjusting policy optimization parameters of the incentive strategy based on the evaluation result until an optimal incentive strategy corresponding to the incentive strategy evaluation sample subset is obtained, wherein the user transaction behavior decision path is obtained based on the method disclosed by the embodiment of the invention.

The method provided by the embodiment of the disclosure at least partially overcomes the defect of utilizing the user transaction behavior data in the prior art, comprehensively utilizes a time sequence clustering algorithm, a dynamic time warping algorithm and a Markov decision process technology, and combines the user transaction data to deeply explore the user transaction behavior and decision path recognition so as to realize dynamic analysis of the user behavior. The quantitative identification of the user transaction behavior mode and the decision path is realized, and the utilization rate of the user transaction data and the effectiveness of resource allocation are improved. Further, the embodiment of the disclosure also provides the establishment of the optimized marketing incentive decision by utilizing the user transaction decision identification method, so that the scientificity, the refinement and the individuation of the marketing decision are improved.

In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing, applying and the like of the personal information of the user all conform to the regulations of related laws and regulations, necessary security measures are adopted, and the public order harmony is not violated.

In the technical scheme of the disclosure, the authorization or consent of the user is obtained before the personal information of the user is obtained or acquired.

It should be noted that, the method, the device, the equipment, the medium and the program product for identifying user transaction decisions provided by the embodiments of the present disclosure may be used in aspects related to dynamic behavior of a user and user transaction decision identification by an artificial intelligence technology, and may also be used in various fields other than the artificial intelligence technology, such as financial fields. The application fields of the user transaction decision recognition method, the incentive strategy optimization method, the device, the equipment, the medium and the program product provided by the embodiment of the disclosure are not limited.

The above-described operations for accomplishing at least one object of the present disclosure will be described below in conjunction with the accompanying drawings and their description.

As shown in fig. 1, an application scenario 100 according to this embodiment may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the

terminal devices

101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user may interact with the server 105 via the network 104 using the

terminal devices

101, 102, 103 to receive or send messages or the like. Various communication client applications, such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only) may be installed on the

terminal devices

101, 102, 103.

The

terminal devices

101, 102, 103 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.

The server 105 may be a server providing various services, such as a background management server (by way of example only) providing support for websites browsed by users using the

terminal devices

101, 102, 103. The background management server may analyze and process the received data such as the user request, and feed back the processing result (e.g., the web page, information, or data obtained or generated according to the user request) to the terminal device.

It should be noted that the user transaction decision recognition method or incentive policy optimization method provided by the embodiments of the present disclosure may be generally performed by the server 105. Accordingly, the user transaction decision-making device or incentive policy optimization device provided by embodiments of the present disclosure may be generally disposed in the server 105. The user transaction decision recognition method or incentive policy optimization method provided by the embodiments of the present disclosure may also be performed by a server or cluster of servers other than the server 105 and capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105. Accordingly, the user transaction decision-making device or incentive policy optimization device provided by the embodiments of the present disclosure may also be provided in a server or server cluster different from the server 105 and capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

The user transaction decision recognition method of the disclosed embodiment will be described in detail below with reference to fig. 2 to 14 based on the scenario described in fig. 1.

As shown in fig. 2, the user transaction decision recognition method of this embodiment includes operations S210 to S240, and the user transaction decision recognition method may be executed by a processor or may be executed by any electronic device including a processor.

In operation S210, the user transaction sample data is clustered once by using a time sequence clustering algorithm, and transaction pattern clustering data is obtained.

According to embodiments of the present disclosure, the user transaction patterns include at least two types. The primary clustering comprises the step of forming a mapping relation between the user transaction sample data and a user transaction mode. Wherein, the user transaction mode can be preset and determined based on expert experience. For example, in a network transaction, a plurality of states coexist in a user transaction behavior mode, and a typical behavior path mode comprises the following three types: attention-interest-desire-memory-action (transaction), attention-interest-search-action-sharing, resonance-confirmation-participation-sharing-diffusion. For these different transaction behaviors and modes, corresponding data needs to be collected at different stages thereof to fully explore the characteristics thereof and to formulate marketing strategies in a targeted manner. The user transaction sample data may be from a plurality of sample users, and by clustering the user transaction sample data, cluster data corresponding to the user transaction pattern may be obtained for further processing.

Taking financial product purchase as an example, since the user transaction behavior pattern tends to be fixed at a certain time, the user's traffic in time sequence can be simplified to one minimum (purchase) behavior unit, i.e., the user transaction sample data includes history data of the minimum (purchase) behavior unit corresponding to the user. A minimum (purchase) behavior unit may be divided into different transaction nodes according to the business architecture flow. From the perspective of user experience, user characteristic indexes based on a plurality of sequential transaction nodes can be built. Aiming at links such as attention, interest, resonance and the like, advertisement or propaganda entrance click data are mainly acquired in each path, and the sources of attention, interest and resonance are known. Aiming at links such as desire, memory and the like, buried point data such as collection behavior, shopping cart behavior, login checking behavior and the like are collected. Aiming at links such as searching, confirmation, participation and the like, information acquisition and interaction characteristics such as information browsing, searching behaviors, comparison bidding products, customer service communication and the like are required to be acquired. Aiming at the action (transaction) link, the attributes such as product information, value and the like are mainly collected. And in the sharing and diffusion links, data such as evaluation, sharing route, sharing conversion rate and the like are collected.

In operation S220, the transaction pattern clustering data corresponding to the ith user transaction pattern is clustered secondarily to obtain a user transaction behavior template. It should be appreciated that the user transaction behavioral templates in embodiments of the present disclosure include at least two types based on the big data of the user transaction. That is, in the same user transaction mode, the user can have at least two different transaction behaviors, and thus, fine division of the user group can be achieved. Wherein the user transaction behavioral templates comprise a set of user time-series behaviors including a time-series based continuous behavioral feature. For example, one typical user transaction behavior template may include clicking on an advertisement, browsing a product, comparing a bid, joining a shopping cart, paying for a product, sharing a product, and another typical user transaction template may be clicking on an advertisement, browsing a product, jumping out of the browsing interface, clicking on another advertisement, joining a shopping cart, paying for a product, sharing a product. The two different transaction templates include different user time series behaviors, but may correspond to attention-interest-search-action-sharing user transaction patterns.

In operation S230, the similarity between the user transaction behavior history data and each user transaction behavior template is calculated based on the dynamic time warping algorithm, and the user transaction behavior matching template and the user transaction state set are obtained. The user transaction state set is obtained based on a user transaction behavior matching template.

After the user transaction behavior template is constructed, the most similar user transaction template, namely the user transaction behavior matching template, can be obtained based on the user transaction history behavior matching. Specifically, a user transaction behavior template with the highest similarity with the user transaction behavior history data can be obtained as a user transaction behavior matching template based on similarity calculation. The transaction behavior characteristics of the user can be defined by identifying the user transaction behavior matching template so as to facilitate subsequent analysis. And because the mapping relation exists between the user transaction behavior matching template and the user transaction mode, after the user transaction behavior matching template is obtained, the user transaction state set can be further obtained corresponding to the user transaction mode to which the user transaction behavior matching template belongs.

In operation S240, a user transaction behavior decision path is identified based on a markov decision process model using the set of user transaction states. The user transaction behavior decision path is determined based on user state transition probability, and the user state transition probability is obtained through calculation of user real-time transaction behaviors and the user transaction state set. The markov decision process is a mathematical model of sequential decisions for simulating the randomness policy and return achievable by an agent in an environment where the system states have markov properties. In this process, the core is to trigger state transitions by actions to maximize feedback. In embodiments of the present disclosure, the concept of a Markov decision process may be utilized to explore a user transaction decision path. For example, in the transaction decision examples of financial products described above, the maximum feedback is to complete the transaction. The user obtains maximum feedback by performing actions to transition to the next state of the belonging user transaction mode, eventually reaching the action state and thereafter. In this process, the user state transition probability can be obtained through calculation of the user real-time transaction behavior and the user transaction state set, so that the probability of the user state transition and the probability of the final completion of the transaction at each transaction behavior node can be expected.

It should be noted that, in the embodiments of the present disclosure, the consent or authorization of the user may be obtained before the information of the user is obtained. For example, before operation S210, a request to acquire user transaction sample data may be issued to the user. In case the user agrees or authorizes that the user transaction sample data can be acquired, the operation S210 is performed.

The method provided by the embodiment of the disclosure comprehensively utilizes a time sequence clustering algorithm, a dynamic time warping algorithm and a Markov decision process technology, combines user transaction data to deeply explore user transaction behaviors, realizes user transaction decision path identification, and realizes dynamic analysis of the user behaviors. The quantitative identification of the user transaction behavior mode and the decision path is realized. The method comprises the steps of utilizing a time sequence clustering algorithm to fully mine the mapping relation between user transaction sample data and user transaction modes, and further utilizing a template constructed by a dynamic time warping algorithm to intuitively reflect the continuous behavior characteristics and the transaction states of a user. Further, the decision path of the user transaction behavior is identified by using the Markov decision process model, the characteristic of periodic continuity of the user transaction behavior is fully considered, the decision behavior in the preset time range is predicted by the change of the user state transition probability, the utilization rate of the user transaction data and the scientificity and accuracy of the decision path identification and behavior prediction are improved, and the efficient configuration of subsequent service resources, the marketing management efficiency and the user satisfaction are improved.

As shown in fig. 3, the method for obtaining transaction pattern clustering data by clustering the user transaction sample data once using the time series clustering algorithm in this embodiment includes operations S310 to S340.

In operation S310, user transaction sample data including transaction raw data corresponding to a plurality of user transaction behavior nodes having a timing characteristic is acquired. The transaction original data can be acquired by using a buried point method. For example, in the transaction example of the financial product, the buried point data required to be collected for each action link may be measured from 6 dimensions: a user ID; a time stamp; page title; browsing duration; geographic coordinates; behavior attributes, etc. According to the minimum (purchase) behavior unit, the user behavior can be divided into a plurality of transaction nodes such as first browsing, jumping-out, nth browsing access page number, searching behavior, similar recommended product browsing, customer service consultation, payment purchase and the like by taking first browsing as a starting point and final clicking purchase as an ending point. Each node needs to collect corresponding time stamp data to form a continuous data set of buried data.

And taking each behavior node data in the collected original buried point data as a sub-data set, and forming a user full life cycle data set for dynamic analysis and prediction according to the user transaction node time sequence. The collected data sets are stored in a non-structured database in the form of logs, forming an original data pool. Specifically, for a minimum (purchase) behavioral unit data set, buried point data collected during the whole process of advertisement viewing, financial product introduction searching, bid item browsing, shopping cart joining, customer service consultation, exit, searching again, bid item browsing, shopping cart joining, shopping cart browsing, product purchasing and the like can be included.

In operation S320, preprocessing is performed on the user transaction raw data to obtain a user continuous behavior signal, where the preprocessing includes at least one of normalization processing, recoding, pre-weighting, and endpoint detection. Because the collected buried data has the problems of inconsistent data format, different scales and standards and the like, the modeling analysis effect is affected, the data preprocessing methods such as feature normalization, recoding, pre-weighting, endpoint detection and the like can be performed according to different scenes and dimensions aiming at the collected original data, and the data is cleaned and integrated. The feature normalization is mainly aimed at eliminating the difference of variable influence degree caused by inconsistent feature scales, and avoiding influencing the judgment of the importance of a key link in the behavior path analysis. The classified variables are recoded, and n kinds of classified variables can be converted into n-1 or n-m-dimensional dummy variables by adopting an encoding mode such as One-Hot, embedding and the like, so that unification of signal input modes is realized. Meanwhile, in order to avoid the inundation of key information caused by dimension explosion, the key behavior information such as advertisement clicking, searching, shopping cart browsing and the like is subjected to weighted compensation in the preprocessing process, and key signals are highlighted. Finally, in order to filter a large amount of information which is not related to daily transactions of users, a threshold-based endpoint detection model is utilized to identify user behaviors of scenes related to transactions, a threshold range white list and a black list are set, and information which is not related to purchasing behaviors of the transactions of users, such as transfer, inquiry, repayment and the like, is eliminated.

In operation S330, the user continuous behavior signal is reduced in dimension based on the automatic encoder, and a low-dimensional user characteristic signal is obtained. Dynamic change of user transaction behavior is a complex and continuous process, and useful behavior signals are extracted from the dynamic change of user transaction behavior and are the basis for dynamic identification of user behavior characteristics and paths.

In operation S340, the low-dimensional user characteristic signals are clustered in time series, and the transaction pattern clustered data is obtained.

Embodiments of the present disclosure utilize a neural network automatic encoder to extract behavior feature states of a user based on a continuous behavior signal of the user with timing features. The use of an unsupervised auto encoder can reduce the input data dimension and take the reconstructed signal as output, resulting in a low-dimensional state feature code. And then, carrying out time sequence clustering on the continuous signals based on the behavior characteristic state, wherein a clustering result corresponds to the transaction mode.

As shown in fig. 4, the method for obtaining the user transaction behavioral templates according to the embodiment includes operations S410 to S430 by performing secondary clustering on transaction pattern clustering data corresponding to the ith user transaction pattern.

In operation S410, the transaction pattern cluster data is segmented to obtain transaction state data, the transaction state data being associated with a user state in a transaction pattern.

In operation S420, the transaction state data is clustered by using a time-series clustering algorithm, so as to obtain transaction state clustered data.

In operation S430, a user transaction behavioral template is constructed using the transaction state cluster data.

As shown in fig. 5, the method of constructing a user transaction behavioral template using the transaction state cluster data of this embodiment includes operation S510.

In operation S510, service meaning labeling is performed on the time sequence segments in the transaction state cluster data based on the user transaction behaviors, so as to form a corresponding relationship between the user transaction states and the user behavior feature vectors, where the user transaction behavior template includes n segments of user behavior feature vectors with time sequence features. According to the embodiment of the disclosure, the continuous signals with similar characteristics obtained after clustering can be segmented into small segments, so that the segment segmentation operation of the continuous behavior signals is realized, and the clustering of the behavior data in different states in the mode is obtained. Based on the segmented behavior segment sequences, the basic behavior segments can be classified and identified by combining a method based on a preset business rule marking through time sequence clustering. Specifically, center vectors of the clustered class clusters are used for identifying different behavioral segment classifications, the class clusters are annotated based on expert experience, and business significance is given to the clustered class clusters, so that a behavioral characteristic reference template model is constructed. The constructed model appears as n-segment representative feature vectors P.

After the raw data is data-clarified, a user continuous behavior signal for the input model may be constructed through data preprocessing, as shown in fig. 6. Further, the encoding of the low-dimensional state features can be achieved by using an automatic encoder to process the user continuous behavior signal. And then, the segmented behavior characteristics can be obtained through time sequence clustering, and after the characteristics are marked with business meanings based on expert experience, a user transaction behavior template can be constructed. The constructed user transaction behavior template can be stored as a binary sequence, so that the size of the template is reduced, the calculation efficiency is improved, and tampering is prevented.

As shown in fig. 7, the method for obtaining the matching template of the user transaction behavior according to the method for obtaining the matching template of the user transaction behavior based on the similarity between the historical data of the user transaction behavior and the templates of the user transaction behavior calculated by the dynamic time warping algorithm in this embodiment includes operations S710 to S730.

In operation S710, user transaction behavior history data including a characteristic time series corresponding to a single transaction behavior of a single user is acquired.

In operation S720, a shortest distance path between the user transaction behavior history data and the time series of each user transaction behavior template is calculated by using a dynamic time warping algorithm based on manhattan distance, so as to obtain the similarity between the user transaction behavior history data and each user transaction behavior template.

In operation S730, the user transaction behavior template with the highest similarity is used as the user transaction behavior matching template.

According to the embodiment of the disclosure, the time periods spanned by the user characteristic time sequences of the same transaction mode have non-equal length due to different personal habits of users, so that the problems of different time periods spanned by transaction behaviors and similar transaction behavior modes can be caused. To overcome this problem to achieve recognition and matching of the customer behavior pattern, embodiments of the present disclosure use a DTW (Dynamical Time Warping dynamic time warping) algorithm according to the continuous features of the segmented basic behavior segments, and obtain the best matching point between the user transaction behavior feature time sequence and each template time sequence by sequentially searching the shortest distance path in the matrix formed by the user transaction behavior feature time sequence and each template sequence, thereby measuring the similarity between the user behavior sequence and each template time sequence.

In the embodiment of the invention, a dynamic time warping algorithm based on Manhattan distance is preferably adopted to calculate the shortest distance path between the user transaction behavior historical data and the time sequence of each user transaction behavior template so as to find the optimal matching point and obtain the similarity between the user transaction behavior historical data and each user transaction behavior template.

As shown in fig. 8A, Q represents a user feature vector, whose abscissa is time and whose ordinate is behavior. And P represents a certain user transaction behavior template, the abscissa of which is time and the ordinate of which is behavior. When the user characteristic sequence is not equal to the template sequence, the time can be finely adjusted based on a dynamic time warping algorithm, and the waveform corresponding points can be searched in a one-to-many mode without one-to-one correspondence based on the time sequence. And (3) taking the overall matching cost as C (Q, pi) until all the calculation is completed, and the template with the minimum matching cost represents the current user behavior feature class.

As shown in fig. 8B, when the user behavior feature recognition is performed based on the dynamic time warping algorithm of the manhattan distance, the shortest distance path between the user transaction behavior history data and the time sequence of each user transaction behavior template is calculated by using a checkerboard matrix composed of PQ. The method has the advantages of high calculation efficiency and small calculation error.

As shown in fig. 9, the method of acquiring the user transaction state set of this embodiment includes operation S910.

In operation S910, the set of user transaction states is acquired based on a user transaction pattern corresponding to the user transaction behavior matching template.

According to the embodiment of the disclosure, after the user transaction behavior templates with the minimum cost are obtained by matching, the corresponding transaction behavior patterns can be matched for the user. And obtaining possible transaction states of the user according to the template to construct a user transaction state set so as to prepare for subsequent analysis.

As shown in fig. 10, the method for identifying a user transaction behavior decision path based on a markov decision process model using the user transaction state set of this embodiment includes operations S1010 to S1030.

In operation S1010, a set of user actions is obtained based on the user transaction action matching template, the set of user actions including at least one subset of actions including a limited number of user actions in the current user transaction action matching template. According to embodiments of the present disclosure, a user may match different transaction behavioral templates in different transactions. Subsets of user actions corresponding to different templates may be obtained through different user transaction action matching templates. It should be appreciated that a subset of user actions is a collection of user continuous action records, each of which may contain a limited number of transaction actions with timing characteristics during different transaction processes. For example, in the financial product transaction example above, the user action set may include a limited number of transaction actions of first browsing, jumping out, nth browsing access page number, search actions, browsing similar recommended products, consulting customer service, paying for purchase, etc. in the minimum (purchase) action unit.

In operation S1020, a user state transition probability matrix is obtained based on the user behavior set and the user transaction state set, where the user state transition probability matrix is used to identify a probability that a user transitions from an a state to a b state after performing a behavior k, where the a state and the b state are elements in the user transaction state set, and the behavior k is an element in the user behavior set.

In operation S1030, a user initial state and a user real-time transaction behavior are acquired, and based on the user initial state, the user real-time transaction behavior and the user state transition probability matrix identify a user transaction behavior decision path, wherein the user transaction behavior decision path includes a plurality of user behaviors with time sequence characteristics, and each user behavior is a user behavior executed when the current state transitions to a next maximum transition probability state.

Embodiments of the present disclosure utilize the Markov decision process concept to identify user transaction behavior decision paths. A typical Markov Decision Process (MDP) may be expressed in tuples < S, a, P _sa ，R，γ>. Wherein S (states) is a user state set representing possible behavior states of the user in the current transaction behavior mode; a (actions) is a user action set, for example, in the above example, a limited plurality of transaction actions such as first browsing, jumping out, nth browsing access page number, searching actions, browsing similar recommended products, consulting customer service, paying for purchase, etc. in the minimum (purchasing) action unit; p (P) _sa A state transition matrix is used for representing a probability matrix for transition to other states after an agent generates an action (a epsilon A) under the current state (S epsilon S); r (rewind) is feedback, indicating that the user is in the current state s _t In the execution of action a _t Post transition to state s _t+1 Feedback r obtained at the time _t The method comprises the steps of carrying out a first treatment on the surface of the Gamma e 0,1 is the damping coefficient (discrete factor), with smaller gamma indicating that the current return is more important and the long-term return is less important.

Setting the initial state of the user as s ₀ Action a is performed ₀ (a ₀ E.A) user state transition to s ₁ The state transition probability is Ps ₀ a ₀ The method comprises the steps of carrying out a first treatment on the surface of the In executing the action a ₁ (a ₁ E.A) post state transition to s ₂ The state transition probability is Ps ₁ a ₁ ... When performing action a _n (a _n E.A) post state transition to s _n+1 The state transition probability is Ps _n a _n . And the user selects the behavior according to feedback after each state transition, so that the overall feedback expectation of the user is maximized, and the transaction behavior decision path of the user is established through identification.

In embodiments of the present disclosure, after the user behavior set and the user transaction state set are obtained, the above data may be used to calculate a user state transition probability matrix. It will be appreciated that the user behavior set and the user state set are derived from user transaction history data, and that the user state transition probability matrix may identify user state transition probabilities based on user historical transaction performance over time. The user state transition probability is the probability that the user transits from the current state to another state after executing a certain action. After the user state transition probability matrix is obtained, the current state of the user and the real-time transaction behavior of the user can be combined, and the probability that the user transitions to another state after executing a certain behavior in the current state can be identified by the user state transition probability. It should be appreciated that the probability of transition from the current behavior to other states is different when the user performs different behaviors. For example, if the current state of the user is interest and a search or a comparison of the behavior of the bid product is performed, the probability of the state transition from interest to search should be greater than the probability of the state transition from interest to sharing. The specific probability values may be obtained based on a user state transition probability matrix. In the identification of the user transaction behavior decision path in the embodiment of the present disclosure, it may be considered that, in the case where no feedback is put, from the initial state, each action performed by the user to transition to the state with the maximum state transition probability is its desired action behavior. And connecting the expected action behaviors to obtain a user transaction decision path. At this point, the user may reach the maximum expectation in the embodiment, i.e., the transaction is successful. In embodiments of the present disclosure, as the user's transaction decision path may vary with the user's experience, attitudes, lifestyles, smaller damping coefficients may be preset to identify the importance of the current return. For some special trade products, a larger damping coefficient can be preset.

The embodiment of the disclosure also provides an excitation strategy optimization method.

As shown in fig. 11, the excitation strategy optimization method of this embodiment includes operations S1110 to S1140.

In operation S1110, a user transaction behavior decision path of a sample user is acquired.

In operation S1120, the sample users are clustered based on the user transaction behavior decision paths of the sample users, and an incentive policy evaluation sample set is obtained, wherein the incentive policy evaluation sample set comprises at least one incentive policy evaluation sample subset, and the sample users in the incentive policy evaluation sample subset have the same cluster center.

In embodiments of the present disclosure, a sample user who first employs a small sample in formulating and optimizing an incentive strategy may conduct an incentive experiment to evaluate the effectiveness of the incentive strategy. And after the evaluation is passed, the full-quantity users are put in. Wherein the sample user may be a user for conducting an excitation test. It should be understood that, in order to achieve customization of the transaction incentive policy and achieve personalized and accurate marketing of the users, the sample users may be clustered first, and the clustered sample users have the same cluster center, i.e. the user transaction behavior decision paths of the clustered sample users are relatively similar. Therefore, targeted incentive delivery can be performed for different groups of users, so that user feedback is improved.

In operation S1130, an incentive policy in an incentive policy set is evaluated based on the incentive policy evaluation sample set, wherein the incentive policy set includes at least one incentive policy including policy optimization parameters for adjusting user state transition probabilities, and the incentive policy evaluation sample subset has a mapping relationship with the incentive policy. Wherein, the incentive strategy set can contain strategies corresponding to different sample subsets, so as to realize the pertinence of incentive release. After adjusting the policy optimization parameters of the incentive policy, the probability of the user's state transition after performing a certain action may be adjusted, so that the user is more likely to realize the state transition towards the direction of completing the transaction.

In operation S1140, policy optimization parameters of the incentive policy are adjusted based on the evaluation result until an optimal incentive policy corresponding to the subset of incentive policy evaluation samples is obtained. It can be appreciated that after each adjustment of the policy optimization parameters, the utility of the current incentive policy may be evaluated, and the policy optimization parameters may be adjusted again based on the evaluation result, until the optimized incentive policy may maximally promote the possibility of the user completing the transaction.

It should be noted that the user transaction behavior decision path is obtained based on the method of the embodiment of the present disclosure.

According to the excitation strategy optimization method provided by the disclosure, the user samples are clustered secondarily by using the identified user transaction behavior decision paths, the effectiveness of the excitation strategy is evaluated through the small-batch user sample set, the excitation strategy is further adjusted based on the influence of excitation feedback on the user state transition probability in the Markov decision process, so that the user behavior is guided, the optimization problem of a random dynamic system is solved, the efficient configuration and accurate delivery of resources are realized, and the availability of data resources and business resources is improved.

As shown in fig. 12, the method of evaluating an incentive policy set based on an incentive policy evaluation sample set of the embodiment includes operations S1210 to S1240.

In operation S1210, a first success rate is calculated as a user rate that reaches a transaction success state in the subset of samples corresponding to the incentive policy evaluation. It should be appreciated that the first successful duty cycle may be the user duty cycle of the original sample user that reached the transaction success state prior to the incentive strategy adjustment.

In operation S1220, for the incentive strategy corresponding to the incentive strategy evaluation sample subset, the incentive strategy is taken as user state transition feedback at the user transaction behavior decision path node meeting the incentive opportunity, and the user state transition probability is adjusted. In an embodiment of the present disclosure, the policy optimization parameters include incentive occasions and incentive resources. It is desirable to facilitate a user to transition from a current state to a next state in accordance with a transaction pattern by giving an incentive opportunity and incentive resources until a state is reached in which the transaction is completed. Specifically, after identifying the user transaction behavior decision path, it can be determined that the user is currently in a certain state of a certain behavior mode, namely s _t The method comprises the steps of carrying out a first treatment on the surface of the The real-time transaction behavior buried point data of the user collected at the moment is action a _t-1 . The motivational resource may be used as a user to perform action a _t State s of back _t Transition to state s _t+1 Feedback of time, where s _t+1 The user is guided to make decisions which promote the completion of the transaction step by the transaction success state expected by the service personnel or the early decision-making node which is beneficial to achieving the transaction success state. It will be appreciated that the user is performing act a _t Then, tend to be from state s _t Transition to state s _t+1 To obtain feedback by adjusting the giving time of the feedback and the device The feedback resource of the body can change the probability of the user to transition to the next state. Under the expected condition, the probability of the transition of the user along the expected state transition path can be increased by adjusting the excitation resources and the excitation time, so that the success rate of the transaction is improved. Wherein the determination of the incentive occasion and incentive resources may be based on a desired user transaction flow setting, for example, the initial incentive occasion and incentive resources may be set by the user's itinerary, i.e. the product design flow chart, and gradually optimized in a subsequent adjustment process. In particular, if the similarity between the user transaction behavior decision path and the user journey is high, the incentive release can be reduced so as to save resources.

In operation S1230, a second success rate is calculated, which is the user rate at which the incentive policy evaluation sample subset reaches the successful state of the transaction after the user state transition probability adjustment.

In operation S1240, a ratio of the second successful duty cycle to the first successful duty cycle is calculated, and a user transaction success delta for the incentive strategy corresponding to the subset of incentive strategy evaluation samples is obtained.

It will be appreciated that the second successful duty cycle should be different from the first successful duty cycle after the user state transition probability is adjusted. In the desired state, the second successful duty cycle should be higher than the first successful duty cycle. Thus, the effectiveness of the incentive strategy may be evaluated by calculating the user transaction success delta.

As shown in fig. 13, the method of adjusting the policy optimization parameters of the incentive policy based on the evaluation result until the optimal incentive policy corresponding to the incentive policy evaluation sample subset is obtained includes operations S1310 to S1320.

In operation S1310, the incentive strategy is iteratively optimized until an incentive strategy is obtained when the user transaction success increment reaches a maximum value.

In operation S1320, the optimal incentive strategy is set to the preferred incentive strategy corresponding to the highest increment of user transaction success.

It will be appreciated that when the user transaction success delta reaches a maximum, it is stated that the current incentive strategy has the highest efficiency in guiding the user to make the state transition.

As shown in fig. 14, the method of the p-th iterative optimization of this embodiment includes operations S1410 to S1420.

In operation S1410, excitation resources and/or excitation opportunities in the excitation strategy are adjusted, and the p-th optimized excitation strategy is obtained.

In operation S1420, the p-th optimized incentive strategy is evaluated, and a user transaction success delta corresponding to the p-th optimized incentive strategy is calculated.

Based on the user transaction decision recognition method, the disclosure also provides a user transaction decision recognition device. The device will be described in detail below in connection with fig. 15.

As shown in fig. 15, the user transaction decision identifying apparatus 800 of this embodiment includes a primary clustering module 810, a secondary clustering module 820, a matching module 830, and a path identifying module 840.

The primary clustering module 810 is configured to perform primary clustering on user transaction sample data by using a time sequence clustering algorithm to obtain transaction pattern clustering data, wherein the primary clustering includes forming a mapping relationship between the user transaction sample data and a user transaction pattern, and the user transaction pattern includes at least two types.

The secondary clustering module 820 is configured to perform secondary clustering on transaction pattern clustering data corresponding to an ith user transaction pattern, and obtain user transaction behavior templates, where the user transaction behavior templates include at least two types, and the user transaction behavior templates include a user time sequence behavior set, and the user time sequence behavior set includes continuous behavior features based on time sequences.

The matching module 830 is configured to calculate a similarity between the user transaction behavior history data and each user transaction behavior template based on a dynamic time warping algorithm, and obtain a user transaction behavior matching template and a user transaction state set, where the user transaction state set is obtained based on the user transaction behavior matching template.

The path identification module 840 is configured to identify a user transaction behavior decision path based on a markov decision process model using the set of user transaction states, wherein the user transaction behavior decision path is determined based on a user state transition probability obtained by calculation of a user real-time transaction behavior and the set of user transaction states.

According to an embodiment of the present disclosure, the primary clustering module includes a sample acquisition sub-module, a preprocessing sub-module configured, a dimension reduction sub-module configured, and a pattern clustering sub-module.

As shown in fig. 16, the primary clustering module 810 of this embodiment includes a sample acquisition sub-module 8101, a preprocessing sub-module 8102, a dimension reduction sub-module 8103, and a pattern clustering sub-module 8104.

The sample acquisition sub-module 8101 is configured to acquire user transaction sample data including transaction raw data corresponding to a plurality of user transaction behavior nodes, the user transaction state nodes having timing characteristics.

The preprocessing sub-module is configured as 8102 and is configured to preprocess the user transaction original data to obtain a user continuous behavior signal, wherein the preprocessing comprises at least one of normalization processing, recoding, pre-weighting and endpoint detection.

The dimension reduction sub-module is configured to 8103 reduce dimensions of the user continuous behavior signal based on an automatic encoder to obtain a low-dimensional user characteristic signal.

The pattern clustering sub-module 8104 is configured to perform time-series clustering on the low-dimensional user characteristic signals, and obtain the transaction pattern clustering data.

According to an embodiment of the disclosure, the secondary clustering module may include a segmentation sub-module, a state clustering sub-module, and a template construction sub-module.

As shown in fig. 17, the secondary clustering module 820 of this embodiment includes a segmentation sub-module 8201, a state clustering sub-module 8202, and a template construction sub-module 8203.

Wherein the segmentation sub-module 8201 is configured to segment the transaction pattern cluster data to obtain transaction state data, the transaction state data being associated with a user state in a transaction pattern.

The status clustering submodule 8202 is configured to cluster the transaction status data by using a time-series clustering algorithm to obtain transaction status cluster data.

The template construction submodule 8203 is configured to construct a user transaction behavior template using the transaction state cluster data, including: and marking service meanings of time sequence fragments in the transaction state clustering data based on user transaction behaviors to form a corresponding relation between the user transaction states and user behavior feature vectors, wherein the user transaction behavior template comprises n sections of user behavior feature vectors with time sequence features.

According to an embodiment of the disclosure, the matching module may include a historical data acquisition sub-module, a path computation sub-module, a template matching sub-module, and a state set acquisition sub-module.

As shown in fig. 18, the matching module 830 of this embodiment includes a history data acquisition submodule 8301, a path calculation submodule 8302, a template matching submodule 8303, and a state set acquisition submodule 8304.

The historical data acquisition sub-module 8301 is configured to acquire user transaction behavior historical data including a characteristic time series corresponding to a single user's single transaction behavior.

The path computation submodule 8302 is configured to compute a shortest distance path between the user transaction behavior historical data and the time sequence of each user transaction behavior template by using a dynamic time warping algorithm based on Manhattan distance, and obtain the similarity between the user transaction behavior historical data and each user transaction behavior template.

The template matching sub-module 8303 is configured to take the user transaction behavior template with the highest similarity as the user transaction behavior matching template.

The state set acquisition sub-module 8304 is configured to acquire the user transaction state set based on a user transaction pattern corresponding to the user transaction behavior matching template.

According to an embodiment of the disclosure, the path recognition module includes a behavior set acquisition sub-module, a probability matrix calculation sub-module, and a decision path recognition sub-module.

As shown in fig. 19, the path recognition module 840 of this embodiment includes a behavior set acquisition sub-module 8401, a probability matrix calculation sub-module 8402, and a decision path recognition sub-module 8403.

The behavior set acquisition sub-module 8401 is configured to acquire a user behavior set based on the user transaction behavior matching template, the user behavior set including at least one subset of behaviors including a limited number of transaction behaviors of the user in the current user transaction behavior matching template.

The probability matrix computation submodule 8402 is configured to obtain a user state transition probability matrix based on the set of user actions and the set of user transaction states, the user state transition probability matrix being used to identify a probability that a user transitions from an a-state to a b-state after performing an action k, the a-state and the b-state being elements of the set of user transaction states, the action k being elements of the set of user actions.

The decision path identification sub-module 8403 is configured to obtain a user initial state and a user real-time transaction behavior, and based on the user initial state, the user real-time transaction behavior and the user state transition probability matrix identify a user transaction behavior decision path, wherein the user transaction behavior decision path comprises a plurality of user behaviors with time sequence characteristics, each user behavior being a user behavior performed when a current state transitions to a next maximum transition probability state.

According to an embodiment of the disclosure, the excitation strategy optimization device comprises an acquisition module, a user clustering module, an evaluation module and an optimization module.

As shown in fig. 20, the excitation strategy optimization device 900 of this embodiment includes an acquisition module 910, a user clustering module 920, an evaluation module 930, and an optimization module 940.

Wherein, the obtaining module 910 is configured to obtain a user transaction behavior decision path of the sample user, where the obtaining module 910 may have the same function as the user transaction decision identifying device 800 of the embodiment of the present disclosure, which is not described herein.

The user clustering module 920 is configured to cluster the sample users based on their user transaction behavior decision paths, and obtain an incentive policy evaluation sample set, which includes at least one subset of incentive policy evaluation samples, wherein the sample users in the subset of incentive policy evaluation samples have the same class of cluster centers.

The evaluation module 930 is configured to evaluate an incentive strategy of an incentive strategy set based on the incentive strategy evaluation sample set, wherein the incentive strategy set comprises at least one incentive strategy comprising strategy optimization parameters for adjusting user state transition probabilities, the incentive strategy evaluation sample subset having a mapping relation with the incentive strategy.

The optimization module 940 is configured to adjust policy optimization parameters of the incentive policy based on the evaluation result until an optimal incentive policy corresponding to the subset of incentive policy evaluation samples is obtained.

According to an embodiment of the present disclosure, the evaluation module includes a first calculation sub-module, an excitation sub-module, a second calculation sub-module, and a third calculation sub-module.

As shown in fig. 21, the evaluation module 930 of this embodiment includes a first calculation sub-module 9301, an excitation sub-module 9302, a second calculation sub-module 9303, and a third calculation sub-module 9304.

The first calculation sub-module 9301 is configured to calculate a first success rate, which is a user rate that reaches a transaction success state in the subset of samples corresponding to the incentive policy evaluation.

The incentive sub-module 9302 is configured to adjust the user state transition probability at the user transaction behavior decision path nodes satisfying the incentive occasion with incentive policies as user state transition feedback for incentive policies corresponding to the incentive policy evaluation sample subset.

The second calculation sub-module 9303 calculates a second success rate that is the user rate in the subset of user state transition probability adjusted incentive policy evaluation samples that reached the transaction success state.

A third calculation sub-module 9304 calculates a ratio of the second successful duty cycle to the first successful duty cycle, obtaining a user transaction success delta for the incentive strategy corresponding to the subset of incentive strategy evaluation samples.

According to an embodiment of the disclosure, the optimization module includes an iteration sub-module and a screening sub-module.

As shown in fig. 22, the optimization module 940 of this embodiment includes an iteration sub-module 9401 and a screening sub-module 9402.

Wherein the iteration sub-module 9401 is configured to iteratively optimize the incentive strategy until an incentive strategy is obtained when the user transaction success delta reaches a maximum. Wherein the p-th iterative optimization comprises: adjusting excitation resources and/or excitation opportunities in the excitation strategy to obtain an excitation strategy optimized for the p-th time; and evaluating the p-th optimized incentive strategy, and calculating a user transaction success increment corresponding to the p-th optimized incentive strategy.

The screening sub-module 9402 is configured to take the preferred incentive strategy corresponding to the highest user transaction success delta as the optimal incentive strategy.

According to an embodiment of the present disclosure, the primary clustering module 810, the secondary clustering module 820, the matching module 830, the path recognition module 840, the sample acquisition sub-module 8101, the preprocessing sub-module configured as 8102, the dimension reduction sub-module configured as 8103, the pattern clustering sub-module 8104, the segmentation sub-module 8201, the state clustering sub-module 8202, the template construction sub-module 8203, the acquisition sub-module 8301, the path calculation sub-module 8302, any plurality of modules of the template matching sub-module 8303, the state set acquisition sub-module 8304, the behavior set acquisition sub-module 8401, the probability matrix calculation sub-module 8402, and the decision path recognition sub-module 8403; and/or any of the acquisition module 910, the user clustering module 920, the evaluation module 930, the optimization module 940, the first calculation sub-module 9301, the excitation sub-module 9302, the second calculation sub-module 9303, the third calculation sub-module 9304, the iteration sub-module 9401, and the screening sub-module 9402 may be combined in one module to be implemented, or any of the modules may be split into a plurality of modules. Alternatively, at least some of the functionality of one or more of the modules may be combined with at least some of the functionality of other modules and implemented in one module. According to an embodiment of the present disclosure, the primary clustering module 810, the secondary clustering module 820, the matching module 830, the path recognition module 840, the sample acquisition sub-module 8101, the preprocessing sub-module configured as 8102, the dimension reduction sub-module configured as 8103, the pattern clustering sub-module 8104, the segmentation sub-module 8201, the state clustering sub-module 8202, the template construction sub-module 8203, the acquisition sub-module 8301, the path calculation sub-module 8302, any plurality of modules of the template matching sub-module 8303, the state set acquisition sub-module 8304, the behavior set acquisition sub-module 8401, the probability matrix calculation sub-module 8402, and the decision path recognition sub-module 8403; and/or at least one of the acquisition module 910, the user clustering module 920, the evaluation module 930, the optimization module 940, the first calculation sub-module 9301, the excitation sub-module 9302, the second calculation sub-module 9303, the third calculation sub-module 9304, the iteration sub-module 9401, and the screening sub-module 9402 may be implemented at least in part as hardware circuitry, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or any other reasonable manner of integrating or packaging the circuitry, or any one of or a suitable combination of three of software, hardware, and firmware. Or, the primary clustering module 810, the secondary clustering module 820, the matching module 830, the path recognition module 840, the sample acquisition sub-module 8101, the preprocessing sub-module is configured as 8102, the dimension reduction sub-module is configured as 8103, the pattern clustering sub-module 8104, the segmentation sub-module 8201, the state clustering sub-module 8202, the template construction sub-module 8203, the acquisition sub-module 8301, the path calculation sub-module 8302, the template matching sub-module 8303, the state set acquisition sub-module 8304, the behavior set acquisition sub-module 8401, the probability matrix calculation sub-module 8402, and any plurality of modules in the decision path recognition sub-module 8403; and/or at least one of the acquisition module 910, the user clustering module 920, the evaluation module 930, the optimization module 940, the first calculation sub-module 9301, the excitation sub-module 9302, the second calculation sub-module 9303, the third calculation sub-module 9304, the iteration sub-module 9401, and the screening sub-module 9402 may be at least partially implemented as a computer program module, which may perform the respective functions when being run.

As shown in fig. 23, an electronic device 9000 according to an embodiment of the present disclosure includes a processor 901, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 902 or a program loaded from a storage portion 908 into a Random Access Memory (RAM) 903. The processor 901 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or an associated chipset and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. Processor 901 may also include on-board memory for caching purposes. Processor 901 may include a single processing unit or multiple processing units for performing the different actions of the method flows according to embodiments of the present disclosure.

In the RAM903, various programs and data necessary for the operation of the electronic device 9000 are stored. The processor 901, the ROM902, and the RAM903 are connected to each other by a bus 904. The processor 901 performs various operations of the method flow according to the embodiments of the present disclosure by executing programs in the ROM902 and/or the RAM 903. Note that the program may be stored in one or more memories other than the ROM902 and the RAM 903. The processor 901 may also perform various operations of the method flow according to embodiments of the present disclosure by executing programs stored in the one or more memories.

According to an embodiment of the disclosure, the electronic device 9000 may further comprise an input/output (I/O) interface 905, the input/output (I/O) interface 905 also being connected to the bus 904. The electronic device 900 may also include one or more of the following components connected to the I/O interface 905: an input section 906 including a keyboard, a mouse, and the like; an output portion 907 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage portion 908 including a hard disk or the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to the I/O interface 905 as needed. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on the drive 910 so that a computer program read out therefrom is installed into the storage section 908 as needed.

The present disclosure also provides a computer-readable storage medium that may be embodied in the apparatus/device/system described in the above embodiments; or may exist alone without being assembled into the apparatus/device/system. The computer-readable storage medium carries one or more programs which, when executed, implement methods in accordance with embodiments of the present disclosure.

According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example, but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, the computer-readable storage medium may include ROM 902 and/or RAM 903 and/or one or more memories other than ROM 902 and RAM 903 described above.

Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the methods shown in the flowcharts. The program code, when executed in a computer system, causes the computer system to perform the methods provided by embodiments of the present disclosure.

The above-described functions defined in the system/apparatus of the embodiments of the present disclosure are performed when the computer program is executed by the processor 901. The systems, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.

In one embodiment, the computer program may be based on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted, distributed, and downloaded and installed in the form of a signal on a network medium, via communication portion 909, and/or installed from removable medium 911. The computer program may include program code that may be transmitted using any appropriate network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

In such an embodiment, the computer program may be downloaded and installed from the network via the communication portion 909 and/or installed from the removable medium 911. The above-described functions defined in the system of the embodiments of the present disclosure are performed when the computer program is executed by the processor 901. The systems, devices, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.

According to embodiments of the present disclosure, program code for performing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, such computer programs may be implemented in high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. Programming languages include, but are not limited to, such as Java, c++, python, "C" or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Those skilled in the art will appreciate that the features recited in the various embodiments of the disclosure and/or in the claims may be provided in a variety of combinations and/or combinations, even if such combinations or combinations are not explicitly recited in the disclosure. In particular, the features recited in the various embodiments of the present disclosure and/or the claims may be variously combined and/or combined without departing from the spirit and teachings of the present disclosure. All such combinations and/or combinations fall within the scope of the present disclosure.

The embodiments of the present disclosure are described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described above separately, this does not mean that the measures in the embodiments cannot be used advantageously in combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be made by those skilled in the art without departing from the scope of the disclosure, and such alternatives and modifications are intended to fall within the scope of the disclosure.

Claims

1. A method for identifying user transaction decisions, comprising:

clustering the user transaction sample data for one time by using a time sequence clustering algorithm to obtain transaction pattern clustering data, wherein the one-time clustering comprises the step of forming a mapping relation between the user transaction sample data and a user transaction pattern, and the user transaction pattern at least comprises two types;

Carrying out secondary clustering on transaction mode clustering data corresponding to an ith user transaction mode to obtain user transaction behavior templates, wherein the user transaction behavior templates at least comprise two types, the user transaction behavior templates comprise user time sequence behavior sets, and the user time sequence behavior sets comprise continuous behavior features based on time sequences;

calculating the similarity between the historical data of the user transaction behavior and each user transaction behavior template based on a dynamic time warping algorithm, and acquiring a user transaction behavior matching template and a user transaction state set, wherein the user transaction state set is acquired based on the user transaction behavior matching template; and

and identifying a user transaction behavior decision path based on a Markov decision process model by utilizing the user transaction state set, wherein the user transaction behavior decision path is determined based on user state transition probability, and the user state transition probability is obtained through calculation of user real-time transaction behaviors and the user transaction state set.

2. The method of claim 1, wherein clustering the user transaction sample data once using a time series clustering algorithm to obtain transaction pattern cluster data comprises:

Acquiring user transaction sample data, wherein the user transaction sample data comprises transaction original data corresponding to a plurality of user transaction behavior nodes, and the user transaction state nodes have time sequence characteristics;

preprocessing the user transaction original data to obtain a user continuous behavior signal, wherein the preprocessing comprises at least one of normalization processing, recoding, pre-weighting and endpoint detection;

performing dimension reduction on the user continuous behavior signal based on an automatic encoder to obtain a low-dimension user characteristic signal; and

and carrying out time sequence clustering on the low-dimensional user characteristic signals to obtain the transaction pattern clustering data.

3. The method of claim 1, wherein the secondarily clustering transaction pattern cluster data corresponding to the ith user transaction pattern, and obtaining the user transaction behavioral template comprises:

dividing the transaction pattern clustering data to obtain transaction state data, wherein the transaction state data is associated with a user state in a transaction pattern;

clustering the transaction state data by using a time sequence clustering algorithm to obtain transaction state clustering data; and

Constructing a user transaction behavior template by using the transaction state clustering data, wherein the method comprises the following steps of:

and marking service meanings of time sequence fragments in the transaction state clustering data based on user transaction behaviors to form a corresponding relation between the user transaction states and user behavior feature vectors, wherein the user transaction behavior template comprises n sections of user behavior feature vectors with time sequence features.

4. The method of claim 1, wherein the calculating the similarity between the user transaction behavior history data and each user transaction behavior template based on the dynamic time warping algorithm, and the obtaining the user transaction behavior matching template comprises:

acquiring user transaction behavior history data, wherein the user transaction behavior history data comprises a characteristic time sequence corresponding to single-time transaction behavior of a single user;

calculating the shortest distance path between the user transaction behavior historical data and the time sequence of each user transaction behavior template by using a dynamic time warping algorithm based on Manhattan distance, and obtaining the similarity between the user transaction behavior historical data and each user transaction behavior template; and

and taking the user transaction behavior template with the highest similarity as a user transaction behavior matching template.

5. The method of claim 1 or 4, wherein obtaining the set of user transaction states comprises:

and acquiring the user transaction state set based on the user transaction mode corresponding to the user transaction behavior matching template.

6. The method of claim 5, wherein utilizing the set of user transaction states to identify a user transaction behavior decision path based on a markov decision process model comprises:

acquiring a user behavior set based on the user transaction behavior matching template, wherein the user behavior set comprises at least one behavior subset, and the behavior subset comprises a limited plurality of transaction behaviors of a user in the current user transaction behavior matching template;

acquiring a user state transition probability matrix based on the user behavior set and the user transaction state set, wherein the user state transition probability matrix is used for identifying the probability of a user transitioning from an a state to a b state after executing a behavior k, the a state and the b state are elements in the user transaction state set, and the behavior k is an element in the user behavior set;

acquiring a user initial state and user real-time transaction behaviors, and identifying a user transaction behavior decision path based on the user initial state, wherein the user real-time transaction behaviors and the user state transition probability matrix comprise a plurality of user behaviors with time sequence characteristics, and each user behavior is a user behavior executed when the current state is transited to the next maximum transition probability state.

7. A method of incentive strategy optimization, comprising:

acquiring a user transaction behavior decision path of a sample user;

clustering the sample users based on user transaction behavior decision paths of the sample users to obtain an excitation strategy evaluation sample set, wherein the excitation strategy evaluation sample set comprises at least one excitation strategy evaluation sample subset, and the sample users in the excitation strategy evaluation sample subset have the same class of cluster centers;

evaluating an incentive strategy in an incentive strategy set based on the incentive strategy evaluation sample set, wherein the incentive strategy set comprises at least one incentive strategy, the incentive strategy comprises strategy optimization parameters, the strategy optimization parameters are used for adjusting user state transition probability, and a mapping relation exists between an incentive strategy evaluation sample subset and the incentive strategy;

adjusting policy optimization parameters of the incentive policy based on the evaluation result until an optimal incentive policy corresponding to the subset of incentive policy evaluation samples is obtained,

wherein the user transaction behavior decision path is obtained based on the method of any of claims 1-6.

8. The method of claim 7, wherein the policy optimization parameters include incentive occasions and incentive resources, the evaluating an incentive policy set based on the incentive policy evaluation sample set comprising:

Calculating a first success rate, which is a user rate in the sample subset corresponding to the incentive strategy evaluation to reach a successful transaction state;

for the excitation strategy corresponding to the excitation strategy evaluation sample subset, taking the excitation strategy as user state transition feedback at a user transaction behavior decision path node meeting excitation time, and adjusting user state transition probability;

calculating a second success ratio, wherein the second success ratio is the user ratio reaching the successful transaction state in the excitation strategy evaluation sample subset after the user state transition probability is adjusted;

calculating a ratio of the second successful duty cycle to the first successful duty cycle, and obtaining a user transaction success increment of the incentive strategy corresponding to the incentive strategy evaluation sample subset.

9. The method of claim 8, wherein the adjusting policy optimization parameters of the incentive policy based on the evaluation result until an optimal incentive policy corresponding to a subset of incentive policy evaluation samples is obtained comprises:

performing iterative optimization on the excitation strategy until the excitation strategy when the increment of successful user transaction reaches the maximum value is obtained;

the optimal incentive strategy is set to the preferred incentive strategy corresponding to the highest increment of user transaction success,

Wherein the p-th iterative optimization comprises:

adjusting excitation resources and/or excitation opportunities in the excitation strategy to obtain an excitation strategy optimized for the p-th time; and

evaluating the p-th optimized incentive strategy, and calculating a user transaction success increment corresponding to the p-th optimized incentive strategy.

10. A user transaction decision-making device, comprising:

the primary clustering module is configured to perform primary clustering on the user transaction sample data by using a time sequence clustering algorithm to obtain transaction pattern clustering data, wherein the primary clustering comprises the step of forming a mapping relation between the user transaction sample data and a user transaction pattern, and the user transaction pattern at least comprises two types;

the secondary clustering module is configured to perform secondary clustering on transaction mode clustering data corresponding to an ith user transaction mode to obtain user transaction behavior templates, wherein the user transaction behavior templates at least comprise two types, the user transaction behavior templates comprise user time sequence behavior sets, and the user time sequence behavior sets comprise continuous behavior features based on time sequences;

the matching module is configured to calculate the similarity between the historical data of the user transaction behavior and each user transaction behavior template based on a dynamic time warping algorithm, and acquire a user transaction behavior matching template and a user transaction state set, wherein the user transaction state set is acquired based on the user transaction behavior matching template; and

The path identifying module is configured to identify a user transaction behavior decision path based on a Markov decision process model by using the user transaction state set, wherein the user transaction behavior decision path is determined based on user state transition probability, and the user state transition probability is obtained through calculation of user real-time transaction behaviors and the user transaction state set.

11. An incentive strategy optimization device comprising:

an acquisition module configured to acquire a user transaction behavior decision path of a sample user, wherein the user transaction behavior decision path is acquired based on the user transaction decision recognition device of claim 10;

a user clustering module configured to cluster the sample users based on user transaction behavior decision paths of the sample users to obtain an excitation strategy evaluation sample set, wherein the excitation strategy evaluation sample set comprises at least one excitation strategy evaluation sample subset, and the sample users in the excitation strategy evaluation sample subset have the same class of cluster centers;

an evaluation module configured to evaluate incentive strategies in an incentive strategy set based on the incentive strategy evaluation sample set, wherein the incentive strategy set comprises at least one incentive strategy, the incentive strategy comprises strategy optimization parameters, the strategy optimization parameters are used for adjusting user state transition probability, and a mapping relation exists between an incentive strategy evaluation sample subset and the incentive strategy;

And an optimization module configured to adjust policy optimization parameters of the incentive policy based on the evaluation result until an optimal incentive policy corresponding to the subset of incentive policy evaluation samples is obtained.

12. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs,

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-9.

13. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method according to any of claims 1 to 9.

14. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 9.