CN111259922A - Order data processing method and device based on customer order-returning early warning - Google Patents

Order data processing method and device based on customer order-returning early warning Download PDF

Info

Publication number
CN111259922A
CN111259922A CN201911357189.9A CN201911357189A CN111259922A CN 111259922 A CN111259922 A CN 111259922A CN 201911357189 A CN201911357189 A CN 201911357189A CN 111259922 A CN111259922 A CN 111259922A
Authority
CN
China
Prior art keywords
order
data set
information
training data
attributes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911357189.9A
Other languages
Chinese (zh)
Inventor
陈旋
王冲
张平
李广宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Aijia Household Products Co Ltd
Original Assignee
Jiangsu Aijia Household Products Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Aijia Household Products Co Ltd filed Critical Jiangsu Aijia Household Products Co Ltd
Priority to CN201911357189.9A priority Critical patent/CN111259922A/en
Publication of CN111259922A publication Critical patent/CN111259922A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0633Lists, e.g. purchase orders, compilation or processing
    • G06Q30/0635Processing of requisition or of purchase orders
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Marketing (AREA)
  • Development Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Quality & Reliability (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Finance (AREA)
  • Evolutionary Biology (AREA)
  • Game Theory and Decision Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Operations Research (AREA)
  • General Engineering & Computer Science (AREA)
  • Tourism & Hospitality (AREA)
  • Evolutionary Computation (AREA)
  • Accounting & Taxation (AREA)
  • Educational Administration (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an order data processing method, a device, computer equipment and a storage medium based on customer order-returning early warning, which calculate the information experience entropy of a training data set D, calculate the conditional entropy of various characteristic attributes, calculate the information gain value of various characteristic attributes, determine one type of characteristic attribute of the information gain value as the root node of a decision tree to form a tree structure, record the forming sequence of the tree structure, eliminate one type of characteristic attribute of the current information gain value in the training data set D to form a new training data set D, return to the process of calculating the information experience entropy of the training data set D to sequentially obtain a plurality of tree structures until the information gain value corresponding to the obtained tree structure is smaller than a gain threshold value, combine the tree structures into the decision tree according to the forming sequence of the tree structures to obtain a prediction model, and detect the order-returning risk of order information to be detected by adopting the prediction model, the detection process of the order-returning risk can be simplified, and the detection efficiency is improved.

Description

Order data processing method and device based on customer order-returning early warning
Technical Field
The invention relates to the technical field of data structures, in particular to an order data processing method and device based on customer order-returning early warning, computer equipment and a storage medium.
Background
In recent years, data mining has attracted great attention in the information industry, mainly because of the large amount of data that is available, widespread use, and the urgent need to convert such data into useful information and knowledge. The customer receipt-withdrawal prediction early warning has very important significance for enterprise maintenance and high-value user saving, has a plurality of applications in the industries such as telecommunication, games, internet retail and the like, and the problem of customer receipt withdrawal is always the pain point of home decoration companies just after the internet home decoration field changes from the traditional industry, so that the problem of lack of effective prediction early warning and incapability of developing targeted saving measures in advance is an important reason. The invention provides a method for building a decision tree model to train and learn through automatically acquiring data, extracting characteristic attributes, analyzing through an ID3 algorithm, automatically analyzing the existing data of the existing network, and taking targeted retrieval measures in advance, thereby reducing the rate of customer orders returned, reducing the cost of companies, improving the customer satisfaction degree and simultaneously improving the company benefits.
In the process of carrying out related order management based on the traditional order data processing scheme, the customer return risk needs to be estimated, information provided by personnel providing services for the customer in each stage, such as a contact stage, an intention stage, a design stage and the like, needs to be investigated and integrated, the work is heavy, the estimation is subjective and inaccurate, and the problems of complex process and low accuracy of the traditional order data processing scheme are seen.
Disclosure of Invention
In order to solve the problems, the invention provides an order data processing method based on customer order-returning early warning, computer equipment and a storage medium.
In order to realize the aim of the invention, the invention provides an order data processing method based on the customer order-returning early warning, which comprises the following steps:
s10, calculating the information experience entropy of the training data set D; wherein the training data set D comprises a plurality of sample data; each sample data includes a plurality of characteristic attributes;
s20, identifying various feature attributes in the training data set D, calculating the conditional entropy of the various feature attributes, and calculating the information gain values of the various feature attributes according to the information experience entropy and the conditional entropy of the various feature attributes;
s30, determining a class of characteristic attributes of the information gain values as root nodes of the decision tree to form a tree structure, and recording the forming sequence of the tree structure;
s40, removing a class of characteristic attributes of the current information gain value in the training data set D to form a new training data set D, returning to the step S10 to obtain a plurality of tree structures in sequence until the information gain value corresponding to the obtained tree structure is smaller than a gain threshold value;
and S50, combining the tree structures into a decision tree according to the forming sequence of the tree structures to obtain a prediction model, and detecting the order return risk of the order information to be detected by using the prediction model when the prediction error rate of the prediction model is smaller than the error rate threshold value.
In one embodiment, the step of detecting the return risk of the order information to be detected by using the prediction model comprises the following steps:
inputting the order information to be tested into a prediction model, and obtaining the order return risk prediction result output by the prediction model; and if the bill refund risk prediction result is the bill refund, acquiring the risk characteristics causing the bill refund, and generating a client retrieval strategy according to the risk characteristics.
In one embodiment, combining the tree structures into a decision tree according to the forming sequence of the tree structures to obtain a prediction model, and when the prediction error rate of the prediction model is smaller than the error rate threshold, after detecting the order return risk of the order information to be detected by using the prediction model, the method further includes:
and recording each prediction result pushed by the early warning model periodically, detecting the times of inconsistency of each prediction result with the actual condition, and outputting error sign information of the early warning model if the times of inconsistency exceed a time threshold value in one detection period.
In one embodiment, the calculation of the empirical entropy of the information of the training data set D includes:
Figure BDA0002336250380000021
wherein H (D) represents the information experience entropy of the training data set D, | D | represents the sample capacity of the training data set D, the training data set D comprises n types of characteristic attributes, xiThe number of the ith class feature attributes in the training data set D is represented, i ═ 1,2,3, ·, n.
As one embodiment, the process of calculating the conditional entropy includes:
Figure BDA0002336250380000022
where H (D | A) represents the conditional entropy of the characteristic attribute A, Pi=P(A=xi) H (D | A ═ x) is the probability when the characteristic attribute A is the ith classi) And representing that the characteristic attribute A is the conditional entropy of the ith class.
As an embodiment, the calculation process of the information gain value includes:
g(D,A)=H(D)-H(D|A),
where g (D, a) represents an information gain value of the characteristic attribute a.
In one embodiment, before calculating the information empirical entropy of the training data set D, the method further includes:
taking order data comprising a plurality of characteristic attributes as sample data, identifying the characteristic attributes with continuous characteristics in the sample data, and obtaining specific characteristic attributes;
and discretizing the specific characteristic attributes, and constructing a training data set D according to the discretized specific characteristic attributes and other characteristic attributes except the specific characteristic attributes.
As an embodiment, discretizing the specific feature attribute includes:
and discretizing the specific characteristic attribute by means of chi-square test.
An order data processing device based on customer order-returning early warning comprises:
the calculation module is used for calculating the information experience entropy of the training data set D; wherein the training data set D comprises a plurality of sample data; each sample data includes a plurality of characteristic attributes;
the identification module is used for identifying various characteristic attributes in the training data set D, calculating the conditional entropy of the various characteristic attributes, and calculating the information gain values of the various characteristic attributes according to the information experience entropy and the conditional entropy of the various characteristic attributes;
the determining module is used for determining one type of characteristic attribute of the information gain value as a root node of the decision tree to form a tree structure and recording the forming sequence of the tree structure;
the return module is used for eliminating one class of characteristic attributes of the current information gain value in the training data set D to form a new training data set D, and returning the new training data set D to the calculation module to execute the process of calculating the information experience entropy of the training data set D so as to sequentially obtain a plurality of tree structures until the information gain value corresponding to the obtained tree structures is smaller than the gain threshold value;
and the detection module is used for combining the tree structures into a decision tree according to the formation sequence of the tree structures to obtain a prediction model, and when the prediction error rate of the prediction model is smaller than the error rate threshold value, the prediction model is adopted to detect the order return risk of the order information to be detected.
A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the order data processing method based on customer return warning according to any of the embodiments.
A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the order data processing method based on customer return warning of any one of the above embodiments.
The order data processing method, the device, the computer equipment and the storage medium based on the customer order-returning early warning are characterized in that various characteristic attributes in a training data set D are identified by calculating the information experience entropy of the training data set D, the condition entropy of the various characteristic attributes is calculated, the information gain values of the various characteristic attributes are calculated according to the information experience entropy and the condition entropy of the various characteristic attributes, one class of characteristic attributes of the information gain values are determined as root nodes of a decision tree to form a tree structure, the forming sequence of the tree structure is recorded, one class of characteristic attributes of the current information gain values in the training data set D are removed to form a new training data set D, the process of calculating the information experience entropy of the training data set D is returned to obtain a plurality of tree structures in sequence until the information gain values corresponding to the obtained tree structures are smaller than a gain threshold value, and the tree structures are combined into the decision tree according to the forming sequence of the tree structures, and obtaining a prediction model, and when the prediction error rate of the prediction model is smaller than the error rate threshold value, detecting the order return risk of the order information to be detected by adopting the prediction model, so that the detection process of the order return risk can be simplified, and the detection efficiency is improved.
Drawings
FIG. 1 is a flow diagram of a method for order data processing based on customer return warning, according to an embodiment;
FIG. 2 is a flowchart of a method for processing order data based on customer return warning according to another embodiment;
FIG. 3 is a schematic diagram of a decision tree of an embodiment;
FIG. 4 is a block diagram of an exemplary order data processing apparatus based on customer return warning;
FIG. 5 is a schematic diagram of a computer device of an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
In one embodiment, as shown in fig. 1, there is provided an order data processing method based on a customer return warning, including the following steps:
s10, calculating the information experience entropy of the training data set D; wherein the training data set D comprises a plurality of sample data; each sample data includes a plurality of characteristic attributes.
Before the above steps, a certain amount of customer order data can be acquired as sample data, a training set (training data set D) is constructed according to the sample data, and a plurality of characteristic attributes associated with each customer order (sample data) are acquired together. Specifically, the characteristic attributes can include factors influencing whether a customer returns an order, if the customer order data are home decoration order data, the characteristic attributes can be analyzed and screened according to the characteristics of the home decoration industry, for example, the home decoration order period is long, a staged collection condition exists, the internet home decoration scheme design and the like are services without compensation, the customer loan condition possibly exists when the contract amount of the home decoration order is large, and the customer problem feedback and evaluation contact points are multiple characteristics; by analytical refinement, the characteristics (characteristic attributes) that affect the invoice may include the following data: the client order equity grade, the actual payment proportion of the order, whether the order applies for decoration loan, whether the order is independently placed by the client in App, whether the order has selected a design scheme, whether the order participates in the activity of returning cash and returning income of the company, and the like; in some cases, the characteristic attributes may also include order customer issues and complaints; such as order equity grades including general equity, gold vip, platinum vip, and diamond vip. And constructing a training data set D by using the selected associated characteristic attribute data to serve as an input variable of the prediction model.
And S20, identifying various feature attributes in the training data set D, calculating the conditional entropy of the various feature attributes, and calculating the information gain values of the various feature attributes according to the information experience entropy and the conditional entropy of the various feature attributes.
In one embodiment, the calculation of the empirical entropy of the information of the training data set D includes:
Figure BDA0002336250380000051
wherein H (D) represents the information experience entropy of the training data set D, | D | represents the sample capacity or the sample number of the training data set D, the training data set D comprises n types of characteristic attributes, xiThe number of the ith class feature attributes in the training data set D is represented, i ═ 1,2,3, ·, n.
As one embodiment, the process of calculating the conditional entropy includes:
Figure BDA0002336250380000052
where H (D | A) represents the conditional entropy of the characteristic attribute A, Pi=P(A=xi) H (D | A ═ x) is the probability when the characteristic attribute A is the ith classi) And representing that the characteristic attribute A is the conditional entropy of the ith type of characteristic attribute.
As an embodiment, the calculation process of the information gain value includes:
g(D,A)=H(D)-H(D|A),
where g (D, a) represents an information gain value of the characteristic attribute a.
And S30, determining the characteristic attributes of the information gain values as root nodes of the decision tree to form a tree structure, and recording the forming sequence of the tree structure.
The decision tree is the decision tree of the ID3 algorithm, and has the characteristics of clear theory, simple method, strong learning ability and the like.
And S40, removing a class of characteristic attributes of the current information gain value in the training data set D to form a new training data set D, and returning to the step S10 to sequentially obtain a plurality of tree structures until the information gain value corresponding to the obtained tree structure is smaller than the gain threshold value.
The gain threshold value may be set to 0.005 or the like.
In one embodiment, the ID3 algorithm is used to construct a decision tree, compare the information gain values of all the feature attributes, when the information gain value g (D, A) is larger, the influence of the characteristic A on the prediction result is more obvious, the characteristic A is more suitable to be used as a root node, and then the child node recurses in the above mode, the information gain value of the characteristic of the 'interest level' is supposed to be calculated to be maximum, the characteristic of the 'interest level' is used as the root node of the decision tree to form a first tree structure, then recalculating the rest other characteristic attributes according to the steps S4-S6, taking the node with the largest information gain in the rest nodes as a root node, the decision tree is recursively constructed in this way, until the information gain of all the remaining features is less than the threshold e 0.005, the threshold value needs to be adjusted according to different actual scenes, and the goal is to ensure that the final decision tree does not generate overfitting;
and S50, combining the tree structures into a decision tree according to the forming sequence of the tree structures to obtain a prediction model, and detecting the order return risk of the order information to be detected by using the prediction model when the prediction error rate of the prediction model is smaller than the error rate threshold value.
The process of combining the tree structures into a decision tree according to the formation sequence of the tree structures comprises: the tree structure in the front of the formation sequence is on the top, the tree structure in the back of the formation sequence is on the bottom, namely from top to bottom, the tree structure at the top layer is the first tree structure, the tree structure at the second layer is the second tree structure, the tree structure at the third layer is the third tree structure, the tree structure at the fourth layer is the fourth tree structure, and so on until all the tree structures are combined into a decision tree.
The above-mentioned risk of a drop-out may include a predicted outcome of a drop-out or no drop-out.
The steps can train and land the prediction model, and when the predicted error rate reaches an acceptable range (the predicted error rate is smaller than the error rate threshold), the model is applied to the current network data, and prediction analysis is carried out on the incomplete customer orders of the current network in real time.
Specifically, the test data set and the order rejection result of the test data set may be set in the above steps, the test data set is input to the prediction model, a prediction result is obtained, and if the number of inconsistency between the prediction result and the order rejection result of the test data set is less than the set number, the prediction error rate of the prediction model may be considered to be less than the error rate threshold.
The order data processing method based on the customer order-returning early warning comprises the steps of identifying various characteristic attributes in a training data set D by calculating the information experience entropy of the training data set D, calculating the condition entropy of various characteristic attributes, calculating the information gain values of various characteristic attributes according to the information experience entropy and the condition entropy of various characteristic attributes, determining one class of characteristic attributes of the information gain values as root nodes of a decision tree to form a tree structure, recording the forming sequence of the tree structure, removing one class of characteristic attributes of the current information gain values in the training data set D to form a new training data set D, returning to the process of calculating the information experience entropy of the training data set D to sequentially obtain a plurality of tree structures until the information gain values corresponding to the obtained tree structures are smaller than a gain threshold value, and combining the tree structures into the decision tree according to the forming sequence of the tree structures, and obtaining a prediction model, and when the prediction error rate of the prediction model is smaller than the error rate threshold value, detecting the order return risk of the order information to be detected by adopting the prediction model, so that the detection process of the order return risk can be simplified, and the detection efficiency is improved.
In one embodiment, the step of detecting the return risk of the order information to be detected by using the prediction model comprises the following steps:
inputting the order information to be tested into a prediction model, and obtaining the order return risk prediction result output by the prediction model; and if the bill refund risk prediction result is the bill refund, acquiring the risk characteristics causing the bill refund, and generating a client retrieval strategy according to the risk characteristics.
According to the embodiment, the customers with the return risk and the characteristics causing the order return are found out according to the prediction model, the predicted result can be docked with the customer service system of the corresponding company, and targeted retrieval measures are taken: the retrieval measure can be manual or intelligent; for example, when the order-returning characteristic is the collection ratio, the customer can recommend sales promotion activities such as prepayment and cash return to the customer, and the collection ratio of the order is improved; for example, the reason for analyzing the order return is that no design scheme is selected, and a message notification mode such as intelligent voice or App push can be provided to remind the customer to confirm the design scheme.
In one embodiment, combining the tree structures into a decision tree according to the forming sequence of the tree structures to obtain a prediction model, and when the prediction error rate of the prediction model is smaller than the error rate threshold, after detecting the order return risk of the order information to be detected by using the prediction model, the method further includes:
and recording each prediction result pushed by the early warning model periodically, detecting the times of inconsistency of each prediction result with the actual condition, and outputting error sign information of the early warning model if the times of inconsistency exceed a time threshold value in one detection period.
The above-mentioned one detection period may be one month, and the number threshold may be set to 30 values or the like.
In the embodiment, whether the prediction result pushed by the early warning model has obvious deviation and inconsistency with the actual situation is regularly recorded, for example, when the number of returned bills is over 30 in a month and no advance prediction alarm is given, the deviated customer returned bill data needs to be analyzed, whether new factors influencing the customer returned bill appear in the returned bill data is screened, and the screened data is added into the characteristic attribute for creating the decision tree prediction model, so that a new prediction model is iteratively generated according to the new characteristic attribute, and the accuracy of the adopted prediction model is ensured.
In one embodiment, before calculating the information empirical entropy of the training data set D, the method further includes:
taking order data comprising a plurality of characteristic attributes as sample data, identifying the characteristic attributes with continuous characteristics in the sample data, and obtaining specific characteristic attributes;
and discretizing the specific characteristic attributes, and constructing a training data set D according to the discretized specific characteristic attributes and other characteristic attributes except the specific characteristic attributes.
As an embodiment, discretizing the specific feature attribute includes:
and discretizing the specific characteristic attribute by means of chi-square test.
The decision tree of the ID3 algorithm is used, the decision tree has the characteristics of clear theory, simple method, strong learning ability and the like, but the ID3 algorithm is a non-incremental algorithm and only can process discrete characteristic attributes, and when the selected attributes contain continuous characteristic values, certain analysis and modification are needed, namely, the specific characteristic attributes with continuous characteristics are subjected to discretization processing in a chi-square test mode, so that the specific characteristic attributes can be effectively processed in the decision tree.
In an example, if the characteristic attribute includes an actual collection proportion of an order, the actual collection proportion of the order is a continuous variable, discretization of the characteristic is needed, the collection proportion of the order and the order of a client in the current system and the condition of whether the order is returned or not are obtained, a discrete point is estimated through an intuitive method, for example, when the condition that the return proportion exceeds 30% is found, the number of returned orders is obviously reduced, it can be assumed that whether the collection proportion exceeds 30% can affect the return of the client, then whether an observation result and a theoretical result are consistent or not is demonstrated in a card side inspection mode, when the confidence coefficient reaches more than 99%, the assumption is established, and at the moment, the previous continuous characteristic can be replaced by the discrete characteristic value that whether the collection proportion exceeds 30% in the assumption or not; the chi-square test has the following formula:
Figure BDA0002336250380000081
wherein, chi-square value X2A deviation degree indicating whether the collection rate exceeds 30% or not, and when i indicates "collection rate<30% of the customers have returned ", fiIndicating the proportion of charge<30% of the actual value of the customer's returned, npiIndicating the proportion of charge<Expected value for 30% of customer returned; the larger the chi-square value finally calculated is, the higher the possibility that "whether the collection rate exceeds 30% is related to the customer's receipt" is established.
In an embodiment, referring to fig. 2, the order data processing method based on the customer return warning may also be implemented by the following processes:
1. firstly, acquiring a certain amount of customer order data (sample data) as a training set (training data set D), acquiring a plurality of characteristic attributes associated with each customer order, and using the characteristic data associated with the customer order as an input variable of a prediction model;
2. the characteristic attributes are factors influencing whether a customer returns an order, and special attention needs to be paid to analyzing and screening the characteristic attributes according to the characteristics of the home decoration industry, for example, the home decoration order period is long, money is collected in stages, and the proportion and the progress of the money collection become important factors; services such as Internet scheme design and the like are all front-end and free, and a user needs to pay attention to whether the user participates in a decoration scheme design link in time or not so as to further achieve signing; the payment amount of the decorated order is much higher than that of the general retail industry, and a lot of internet home decorations provide loan services, which have obvious influence on the order return; the satisfaction degree of the customer through customer service consultation problems and the complaint condition of the decoration process are important factors influencing whether the bill is returned or not; the customer participates in the sales promotion activities such as the prepayment prompt deduction and the activity cash back which are released by the company, and the sales promotion activities are recovered when the customer returns the order, which is also a key factor considered when the customer returns the order;
3. the order data characteristics described above may include the following characteristics: the method comprises the steps of customer order equity grade, the actual payment proportion of the order, whether the order applies for decoration loan or not, whether the order is made by a customer in an App (application for installation) independently or not, whether the order has a design scheme or not, whether the decoration range of the order comprises soft packaging and hard packaging or only soft packaging, whether the order participates in return income activity of a company or not, and whether the order meets customer problems and complaints; the order equity grades comprise common equity, gold vip, platinum vip and diamond vip; meanwhile, whether the customer has returned the order is obtained and used as an output variable of the prediction model;
the decision tree of the ID3 algorithm is used in the embodiment, and has the characteristics of clear theory, simple method, strong learning ability and the like, but the decision tree is a non-incremental algorithm, only discrete characteristic attributes can be processed, and certain analysis and modification are needed when the selected attributes contain continuous characteristic values; in the above-screened features, the actual collection ratio of the order is a continuous variable, and the collection ratio of the order and the customer in the current system and the condition of whether to return the order are obtained (see table 1), and it can be seen that the customer return condition is obviously reduced after the collection ratio is greater than 30% by an intuitive method, and then the customer return condition is analyzed by a chi-square test method, and the process is as follows:
TABLE 1
Figure BDA0002336250380000091
4.1 California's value is calculated first, the value in brackets is the expected value for each case, and maximum likelihood estimation is used
Figure BDA0002336250380000092
300/800 is the ratio of the raw materials>Multiplying the 30% probability by the number of returned orders to obtain the proportion of collected money>The likelihood estimates for the 30% demons are used to calculate the other three likelihood estimates 287, 22, 478, respectively. Using the chi-square fitness formula:
Figure BDA0002336250380000093
the chi-squared value was calculated to be 36.98.
And 4.2, calculating the degree of freedom, (line number-1) × (line number-1) ═ 1, (2-1) × (2-1) ═ 1.
4.3, when the degree of freedom is 1, the chi-square value and the corresponding table of the relationship between the chi-square value and the chi-square value are as follows:
TABLE 2
Figure BDA0002336250380000094
The chi-square value 36.98>10.828 can be found, so the probability of the establishment is deduced to be > 99.9%; therefore, we can convert the continuous characteristic "collection ratio" into the discrete characteristic value "whether the collection ratio is greater than 30%".
5. Assume that 16 customer orders are currently being obtained in a corporate system, as shown in table 3, 5.
TABLE 3
Figure BDA0002336250380000095
Figure BDA0002336250380000101
Defining sample data as a training data set D, and calculating the empirical entropy H (D) of information contained in all possible values of all the characteristics; firstly, according to the sample capacity of 16, the probability 4/16 of customer returning and the probability 12/16 of customer not returning, the original entropy value of the set D is calculated as follows:
Figure BDA0002336250380000102
the corresponding entropy and information gain assuming the equity level as the root node are calculated as follows:
when the rights level is common, there are 6 common rights orders, where the probability of the customer returning is 4/6, the probability of the customer not returning is 3/6,
Figure BDA0002336250380000103
when the rights level is gold, there are 2 gold rights orders, wherein the probability of the customer returning is 0, the probability of the customer not returning is 1, and H (gold) is 0
When the rights rating is platinum, there are 4 platinum rights orders, with a probability of 1/4 for the customer to return, a probability of 3/4 for the customer not to return,
Figure BDA0002336250380000104
when the interest level is equal to diamond, there are 4 diamond interest orders, wherein the probability of customer returning is 0, the probability of customer not returning is 1, and H (diamond) is equal to 0
According to sample data, the probability of taking ordinary, gold, platinum and diamond as the equity level is 6/16, 2/16, 4/16 and 4/16 respectively, so when the equity level is the root node, the information entropy is:
Figure BDA0002336250380000105
the information gain is:
g (equity grade) H (d) -H (equity grade) 0.811-0.578-0.233,
similarly, the information gains corresponding to other characteristics are respectively:
g (collection ratio > 30%) -H (d) -H (collection ratio > 30%) -0.811-0.5-0.311
g (application of fitment loan or not) ' H (D) -H (application of fitment loan or not) ' 0.811-0.688 ' 0.123
g (whether App orders themselves) H (d) -H (whether App orders themselves) 0.811-0.714 0.097
g (design option selected or not) ═ H (d) -H (design option selected or not) ═ 0.811-0.714 ═ 0.097
g (see if you participate in cashback return) H (d) -H (see if you participate in cashback return) 0.811-0.811 0.000
g (occurrence of complaint event) — H (d) -H (occurrence of complaint event) — 0.811-0.513 ═ 0.298
According to the confidence gain value of each feature, g (collection ratio > 30%) is the largest, so the root node of the decision tree takes the feature "collection ratio > 30%", a first decision tree is generated, and then the above process is used again for each leaf node to generate a final decision tree as shown in fig. 3.
The predictive model is trained and landed, and is applied to the existing net data when the predicted error rate reaches an acceptable range. In order to make the prediction model more accurate, a timing task is started, system data is automatically pulled to serve as a training set D', and the client receipt-returning prediction model is continuously trained and updated by using new training data;
and performing prediction analysis on the incomplete customer orders of the current network in real time, and predicting a customer Order1 through the decision tree, wherein the prediction analysis is characterized in that: [' equity rating: platinum ',' a cash proportion > 30%: no ',' whether a decoration loan is applied: is ',' whether App places an order autonomously: is ',' whether the design has been selected: no ',' whether to participate in cashback return: is ',' whether a complaint event occurred: NO' ]
The prediction result is as follows: returning the order;
the return risk features are: no design scheme is selected;
finding out the Order return risk of the Order1 according to the prediction model and the characteristic of causing the Order return is that no design scheme is selected, the risk early warning is recorded and put in storage, the state is to be processed, and a targeted recovery measure is taken: and pushing a message for maintaining the decoration design scheme in the App, and following by customer service personnel to remind a customer, guide the customer to select the decoration design scheme and the like.
The technical effects of the embodiment include:
1. the method does not need to participate in the prediction process manually, reduces the labor cost and has higher accuracy. And updating the prediction model at regular time and giving early warning in real time for the single return prediction.
2. The method can predict the customers with the risk of the bill return in advance, intervene the retrieval operation in advance, and reduce the loss of the company caused by the bill return.
3. Analyzing the top reason influencing the customer's order return, completing targeted improvement measures, increasing customer satisfaction, improving company public praise, and the like.
In one embodiment, referring to fig. 4, there is provided an order data processing apparatus based on a customer return warning, including:
a calculation module 10, configured to calculate an information experience entropy of the training data set D; wherein the training data set D comprises a plurality of sample data; each sample data includes a plurality of characteristic attributes;
the identification module 20 is configured to identify various feature attributes in the training data set D, calculate conditional entropies of the various feature attributes, and calculate information gain values of the various feature attributes according to the information experience entropy and the conditional entropies of the various feature attributes;
the determining module 30 is configured to determine a type of characteristic attribute of the information gain value as a root node of the decision tree, form a tree structure, and record a forming sequence of the tree structure;
the return module 40 is configured to remove a class of characteristic attributes of a current information gain value in the training data set D to form a new training data set D, and return the new training data set D to the calculation module to perform a process of calculating an information experience entropy of the training data set D, so as to sequentially obtain a plurality of tree structures until an information gain value corresponding to the obtained tree structure is smaller than a gain threshold value;
and the detection module 50 is configured to combine the tree structures into a decision tree according to the formation sequence of the tree structures to obtain a prediction model, and detect the order return risk of the order information to be detected by using the prediction model when the prediction error rate of the prediction model is smaller than the error rate threshold.
The specific definition of the order data processing device based on the customer order-returning warning may refer to the above definition of the order data processing method based on the customer order-returning warning, and is not described herein again. All or part of the modules in the order data processing device based on the customer return warning can be realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 5. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method for order data processing based on customer order return warning. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
Those skilled in the art will appreciate that the architecture shown in fig. 5 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
Based on the above examples, in one embodiment, there is further provided a computer device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor executes the program to implement any one of the above order data processing methods based on the customer return warning in the embodiments.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by a computer program, which may be stored in a non-volatile computer-readable storage medium, and executed by at least one processor of a computer system according to the embodiments of the present invention, to implement the processes of the embodiments including the order data processing method based on the customer return warning described above. The storage medium may be a magnetic disk, an optical disk, a Read-only Memory (ROM), a Random Access Memory (RAM), or the like.
Accordingly, in an embodiment, a computer storage medium is also provided, which stores a computer program thereon, wherein the program when executed by a processor implements any one of the above-mentioned order data processing methods based on customer return warning.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
It should be noted that the terms "first \ second \ third" referred to in the embodiments of the present application merely distinguish similar objects, and do not represent a specific ordering for the objects, and it should be understood that "first \ second \ third" may exchange a specific order or sequence when allowed. It should be understood that "first \ second \ third" distinct objects may be interchanged under appropriate circumstances such that the embodiments of the application described herein may be implemented in an order other than those illustrated or described herein.
The terms "comprising" and "having" and any variations thereof in the embodiments of the present application are intended to cover non-exclusive inclusions. For example, a process, method, apparatus, product, or device that comprises a list of steps or modules is not limited to the listed steps or modules but may alternatively include other steps or modules not listed or inherent to such process, method, product, or device.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. An order data processing method based on customer order-returning early warning is characterized by comprising the following steps:
s10, calculating the information experience entropy of the training data set D; wherein the training data set D comprises a plurality of sample data; each sample data includes a plurality of characteristic attributes;
s20, identifying various feature attributes in the training data set D, calculating the conditional entropy of the various feature attributes, and calculating the information gain values of the various feature attributes according to the information experience entropy and the conditional entropy of the various feature attributes;
s30, determining a class of characteristic attributes of the information gain values as root nodes of the decision tree to form a tree structure, and recording the forming sequence of the tree structure;
s40, removing a class of characteristic attributes of the current information gain value in the training data set D to form a new training data set D, returning to the step S10 to obtain a plurality of tree structures in sequence until the information gain value corresponding to the obtained tree structure is smaller than a gain threshold value;
and S50, combining the tree structures into a decision tree according to the forming sequence of the tree structures to obtain a prediction model, and detecting the order return risk of the order information to be detected by using the prediction model when the prediction error rate of the prediction model is smaller than the error rate threshold value.
2. The order data processing method based on customer order-returning warning of claim 1, wherein in one embodiment, the detecting the order-returning risk of the order information to be detected by using a prediction model comprises:
inputting the order information to be tested into a prediction model, and obtaining the order return risk prediction result output by the prediction model; and if the bill refund risk prediction result is the bill refund, acquiring the risk characteristics causing the bill refund, and generating a client retrieval strategy according to the risk characteristics.
3. The order data processing method based on customer order-returning warning of claim 1, wherein in an embodiment, the tree structures are combined into a decision tree according to a forming sequence of the tree structures to obtain a prediction model, and when a prediction error rate of the prediction model is smaller than an error rate threshold, after detecting an order-returning risk of order information to be detected by using the prediction model, the method further comprises:
and recording each prediction result pushed by the early warning model periodically, detecting the times of inconsistency of each prediction result with the actual condition, and outputting error sign information of the early warning model if the times of inconsistency exceed a time threshold value in one detection period.
4. The method of claim 1, wherein the computing of the empirical entropy of the information of the training data set D comprises:
Figure FDA0002336250370000011
wherein H (D) represents the information experience entropy of the training data set D, | D | represents the sample capacity of the training data set D, the training data set D comprises n types of characteristic attributes, xiThe number of the ith class feature attributes in the training data set D is represented, i ═ 1,2,3, ·, n.
5. The method of processing order data based on customer return alert as recited in claim 4, wherein in one embodiment, the calculation of conditional entropy comprises:
Figure FDA0002336250370000021
where H (D | A) represents the conditional entropy of the characteristic attribute A, Pi=P(A=xi) H (D | A ═ x) is the probability when the characteristic attribute A is the ith classi) And representing that the characteristic attribute A is the conditional entropy of the ith class.
6. The method of claim 5, wherein the information gain value is calculated by:
g(D,A)=H(D)-H(D|A),
where g (D, a) represents an information gain value of the characteristic attribute a.
7. The method for processing order data based on customer return warning according to any one of claims 1 to 6, wherein before calculating the empirical entropy of the information of the training data set D, the method further comprises:
taking order data comprising a plurality of characteristic attributes as sample data, identifying the characteristic attributes with continuous characteristics in the sample data, and obtaining specific characteristic attributes;
and discretizing the specific characteristic attributes, and constructing a training data set D according to the discretized specific characteristic attributes and other characteristic attributes except the specific characteristic attributes.
8. The method of claim 7, wherein discretizing the specific characteristic attribute comprises, in one embodiment:
and discretizing the specific characteristic attribute by means of chi-square test.
9. An order data processing device based on customer's warning of returning an order, characterized by, includes:
the calculation module is used for calculating the information experience entropy of the training data set D; wherein the training data set D comprises a plurality of sample data; each sample data includes a plurality of characteristic attributes;
the identification module is used for identifying various characteristic attributes in the training data set D, calculating the conditional entropy of the various characteristic attributes, and calculating the information gain values of the various characteristic attributes according to the information experience entropy and the conditional entropy of the various characteristic attributes;
the determining module is used for determining one type of characteristic attribute of the information gain value as a root node of the decision tree to form a tree structure and recording the forming sequence of the tree structure;
the return module is used for eliminating one class of characteristic attributes of the current information gain value in the training data set D to form a new training data set D, and returning the new training data set D to the calculation module to execute the process of calculating the information experience entropy of the training data set D so as to sequentially obtain a plurality of tree structures until the information gain value corresponding to the obtained tree structures is smaller than the gain threshold value;
and the detection module is used for combining the tree structures into a decision tree according to the formation sequence of the tree structures to obtain a prediction model, and when the prediction error rate of the prediction model is smaller than the error rate threshold value, the prediction model is adopted to detect the order return risk of the order information to be detected.
10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the customer order placing alert based order data processing method of claims 1 to 8 when executing the computer program.
CN201911357189.9A 2019-12-25 2019-12-25 Order data processing method and device based on customer order-returning early warning Pending CN111259922A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911357189.9A CN111259922A (en) 2019-12-25 2019-12-25 Order data processing method and device based on customer order-returning early warning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911357189.9A CN111259922A (en) 2019-12-25 2019-12-25 Order data processing method and device based on customer order-returning early warning

Publications (1)

Publication Number Publication Date
CN111259922A true CN111259922A (en) 2020-06-09

Family

ID=70943822

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911357189.9A Pending CN111259922A (en) 2019-12-25 2019-12-25 Order data processing method and device based on customer order-returning early warning

Country Status (1)

Country Link
CN (1) CN111259922A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112699934A (en) * 2020-12-28 2021-04-23 深圳前海微众银行股份有限公司 Alarm classification method and device and electronic equipment
CN112699944A (en) * 2020-12-31 2021-04-23 ***股份有限公司 Order-returning processing model training method, processing method, device, equipment and medium
CN113052689A (en) * 2021-04-30 2021-06-29 中国银行股份有限公司 Product recommendation method and device based on decision tree
CN113837865A (en) * 2021-09-29 2021-12-24 重庆富民银行股份有限公司 Method for extracting multi-dimensional risk feature strategy

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112699934A (en) * 2020-12-28 2021-04-23 深圳前海微众银行股份有限公司 Alarm classification method and device and electronic equipment
CN112699944A (en) * 2020-12-31 2021-04-23 ***股份有限公司 Order-returning processing model training method, processing method, device, equipment and medium
CN112699944B (en) * 2020-12-31 2024-04-23 ***股份有限公司 Training method, processing method, device, equipment and medium for returning list processing model
CN113052689A (en) * 2021-04-30 2021-06-29 中国银行股份有限公司 Product recommendation method and device based on decision tree
CN113052689B (en) * 2021-04-30 2024-03-26 中国银行股份有限公司 Product recommendation method and device based on decision tree
CN113837865A (en) * 2021-09-29 2021-12-24 重庆富民银行股份有限公司 Method for extracting multi-dimensional risk feature strategy

Similar Documents

Publication Publication Date Title
CN110188198B (en) Anti-fraud method and device based on knowledge graph
US11250449B1 (en) Methods for self-adaptive time series forecasting, and related systems and apparatus
CN111259922A (en) Order data processing method and device based on customer order-returning early warning
CN110417721B (en) Security risk assessment method, device, equipment and computer readable storage medium
Van den Bulte et al. Bias and systematic change in the parameter estimates of macro-level diffusion models
CN112070615B (en) Financial product recommendation method and device based on knowledge graph
Negahban Simulation-based estimation of the real demand in bike-sharing systems in the presence of censoring
US20140358838A1 (en) Detecting electricity theft via meter tampering using statistical methods
Chen et al. An empirical study of demand forecasting of non-volatile memory for smart production of semiconductor manufacturing
Chen The gamma CUSUM chart method for online customer churn prediction
US7664671B2 (en) Methods and systems for profile-based forecasting with dynamic profile selection
CN110310163A (en) A kind of accurate method, equipment and readable medium for formulating marketing strategy
Li et al. Credit scoring by incorporating dynamic networked information
CN106408325A (en) User consumption behavior prediction analysis method based on user payment information and system
WO2021072128A1 (en) Systems and methods for big data analytics
CN109583729B (en) Data processing method and device for platform online model
JP2006216019A (en) Value chain and enterprise value analysis device and method
CN107644272A (en) Student&#39;s exception learning performance Forecasting Methodology of Behavior-based control pattern
Stødle et al. Data‐driven predictive modeling in risk assessment: Challenges and directions for proper uncertainty representation
CN110544052A (en) method and device for displaying relationship network diagram
CN111625720B (en) Method, device, equipment and medium for determining execution strategy of data decision item
CN113112347A (en) Determination method of hasty collection decision, related device and computer storage medium
CN109636627B (en) Insurance product management method, device, medium and electronic equipment based on block chain
JP6357435B2 (en) SELECTION BEHAVIOR MODELING DEVICE, SELECTION BEHAVIOR PREDICTION DEVICE, METHOD, AND PROGRAM
CN115689713A (en) Abnormal risk data processing method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200609

RJ01 Rejection of invention patent application after publication