CN113570469B - Intelligent vehicle change prediction method for vehicle insurance user - Google Patents
Intelligent vehicle change prediction method for vehicle insurance user Download PDFInfo
- Publication number
- CN113570469B CN113570469B CN202110851738.9A CN202110851738A CN113570469B CN 113570469 B CN113570469 B CN 113570469B CN 202110851738 A CN202110851738 A CN 202110851738A CN 113570469 B CN113570469 B CN 113570469B
- Authority
- CN
- China
- Prior art keywords
- vehicle
- data
- insurance
- user
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 26
- 238000012549 training Methods 0.000 claims abstract description 34
- 238000012545 processing Methods 0.000 claims abstract description 29
- 238000012216 screening Methods 0.000 claims abstract description 24
- 238000010801 machine learning Methods 0.000 claims abstract description 10
- 238000002372 labelling Methods 0.000 claims abstract description 9
- 230000006870 function Effects 0.000 claims description 11
- 238000012360 testing method Methods 0.000 claims description 8
- 230000002159 abnormal effect Effects 0.000 claims description 6
- 230000004913 activation Effects 0.000 claims description 6
- 238000006073 displacement reaction Methods 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 6
- 239000000446 fuel Substances 0.000 claims description 6
- 238000013145 classification model Methods 0.000 claims description 5
- 238000002474 experimental method Methods 0.000 claims description 4
- 239000000284 extract Substances 0.000 claims description 4
- 230000005856 abnormality Effects 0.000 claims description 2
- 238000002790 cross-validation Methods 0.000 claims description 2
- 238000013480 data collection Methods 0.000 claims description 2
- 238000010606 normalization Methods 0.000 claims description 2
- 238000004148 unit process Methods 0.000 claims description 2
- 238000013136 deep learning model Methods 0.000 abstract description 2
- 230000000694 effects Effects 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 4
- 238000001514 detection method Methods 0.000 description 2
- 241001312741 Gekko swinhonis Species 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/08—Insurance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Data Mining & Analysis (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Development Economics (AREA)
- Economics (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- General Business, Economics & Management (AREA)
- Entrepreneurship & Innovation (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Databases & Information Systems (AREA)
- Marketing (AREA)
- Game Theory and Decision Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Human Resources & Organizations (AREA)
- Operations Research (AREA)
- Biophysics (AREA)
- Tourism & Hospitality (AREA)
- Technology Law (AREA)
- Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Biomedical Technology (AREA)
- Quality & Reliability (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
Abstract
An intelligent car change prediction system and method facing car insurance users, the system comprises: the system comprises a data processing module, an offline training module and an online prediction module, wherein the data processing module performs data screening and data labeling processing according to user insurance policy information and outputs whether a user changes a vehicle or not and a vehicle type changing result, the offline training module performs machine learning model training according to the user insurance policy and the labeling information and outputs a prediction model, and the online prediction module performs prediction on whether the user changes the vehicle and changes a specified vehicle type according to new user insurance policy information and the prediction model and outputs whether the user changes the vehicle and changes the specified vehicle type or not. According to whether the insurance vehicles in the year insurance policy before and after in the historical user insurance policy data are consistent, whether the user changes the vehicle and the changed vehicle type is marked, relevant feature sets of the user are screened to train a machine learning and deep learning model, and accurate prediction of whether the user changes the vehicle and whether the user changes the appointed vehicle type is completed.
Description
Technical Field
The invention relates to a technology in the field of neural network application, in particular to a vehicle change prediction method based on machine learning for a vehicle insurance user.
Background
Through the investigation of the prior art, the industry has some achievements in the field of accurate marketing at present. The user portrayal technology is the most commonly used technical means in the accurate marketing field, uses the modern computer technology to collect and analyze user information, classifies and screens user characteristics through technologies such as machine learning, deep learning and the like, establishes user portrayal, and realizes functions such as user potential value mining, user value subdivision, user management and the like. Based on the user portrait, the business objective and profit increase of the enterprise are realized through personalized marketing strategies.
Disclosure of Invention
Aiming at the defects existing in the prior art, the invention provides an intelligent car changing prediction system and method for car insurance users.
The invention is realized by the following technical scheme:
The invention relates to an intelligent vehicle change prediction method for a vehicle insurance user, which marks whether a user changes a vehicle and a changed vehicle type according to whether the insurance vehicles in a front year insurance policy and a rear year insurance policy in historical user vehicle insurance policy data are consistent, screens relevant characteristic sets of the user to train a machine learning and deep learning model, and completes accurate prediction of whether the user changes the vehicle and whether the user changes a specified vehicle type.
The invention relates to a vehicle-changing prediction system based on machine learning for a vehicle insurance user, which realizes the method, and comprises the following steps: the system comprises a data processing module, an offline training module and an online prediction module, wherein: the data processing module performs data screening and data marking processing according to the user insurance policy information and outputs the result of whether the user changes the vehicle and whether the user changes the vehicle type, the offline training module performs machine learning model training according to the user insurance policy and marking information and outputs a prediction model, and the online prediction module performs prediction on whether the user changes the vehicle and changes the specified vehicle type according to the new user insurance policy information and the prediction model and outputs whether the user changes the vehicle and changes the specified vehicle type.
The data processing module comprises: the device comprises a data screening unit and a data labeling unit, wherein: the data screening unit screens effective samples from the insurance policy data of the user, performs data cleaning according to the insurance date of the insurance policy and the certificate number field of the insurance applicant, obtains the insurance policy data of the same user in different years, and extracts relevant characteristics such as insurance user information, insurance vehicle information and insurance information in the insurance policy data; the data marking unit judges whether the vehicles applied in different years of the user are consistent or not according to whether the vin codes of the applied vehicles in the policy are consistent or not in the screened policy data of different years of the same user, so as to mark whether the user changes the vehicle and the changed vehicle type.
The offline training module comprises: the system comprises a characteristic engineering unit and a model training unit, wherein: the feature engineering unit cleans, sorts and normalizes the relevant features extracted from the data processing module, changes the policy information of a single user into a group of feature values through a character data digitizing method, and screens important features by XGBoost; the model training unit divides the data into a training data set and a test data set, trains a machine learning model by using the training set, tests the model effect by using the test set, stores the model with the optimal effect and provides the model with the optimal effect for the online prediction module to predict.
The online prediction module comprises: the device comprises a feature extraction unit and a vehicle change prediction unit, wherein: the feature extraction unit extracts and screens relevant fields and performs standardized processing according to a method used by a feature engineering unit in the offline training module according to initial feature input of online prediction by leading in policy information to be predicted by a user; the vehicle change prediction unit inputs the processed characteristics to the corresponding trained model, and the model outputs a vehicle change prediction result.
Technical effects
The invention integrally solves the problem that whether the user changes the vehicle or not is predicted by the user vehicle insurance policy; compared with the prior art, the method and the device complete the prediction of the vehicle change and the vehicle type based on the vehicle insurance policy data; the user's car change and prediction of the car type can be completed by using the user's car insurance policy data, a large amount of user personal information is not needed, and the method is more practical; the training model can predict whether all the dangerous users change the car, and meanwhile, the method can be expanded to the prediction of changing the car of various car types, and has strong flexibility.
Drawings
FIG. 1 is a block diagram of a system according to the present invention.
Detailed Description
As shown in fig. 1, an intelligent car change prediction for a car insurance user according to this embodiment includes: the system comprises a data processing module, an offline training module and an online prediction module, wherein: the data processing module performs data screening and data marking processing according to the user insurance policy information and outputs the result of whether the user changes the vehicle and whether the user changes the vehicle type, the offline training module performs machine learning model training according to the user insurance policy and marking information and outputs a prediction model, and the online prediction module performs prediction on whether the user changes the vehicle and changes the specified vehicle type according to the new user insurance policy information and the prediction model and outputs whether the user changes the vehicle and changes the specified vehicle type.
The data processing module comprises: the device comprises a data screening unit and a data labeling unit, wherein: the data screening unit is used for collecting data and screening the data, and the data labeling unit is used for searching the data of the insurance policy of the same user in the next year from the screened data according to the certificate number and the field of the insurance applicant and labeling the data.
The data collection refers to: user policy data provided by an insurance company is collected, data formats of all fields are standardized, 50 relevant fields such as user information, vehicle information and insurance information in the policy data are extracted as features, and a user policy database is established.
The data screening refers to: according to the insurance date of the insurance policy, the certificate number and the field of the insurance applicant, the data of the insurance vehicles of the same user in different years are searched in the user insurance policy database, the insurance policy data of different vin codes and the number more than 1 are deleted according to the vin codes and the field of the insurance policy, and the data of the insurance vehicles of different years, the number of which is 1, are reserved, namely the data records of the insurance vehicles of the same user in different years are screened out in the user insurance policy database.
The data label specifically comprises the following steps: when the vin code of the insurance vehicle in the current year is different from the vin code and field value of the insurance vehicle in the next year, marking the vehicle as a vehicle change, and marking the vehicle type replaced by the user by using the insurance vehicle type in the next year insurance policy data; when the vin code of the current year of the insurance application vehicle is the same as the vin code and field value of the next year of the insurance application vehicle, the vehicle is marked as not being changed.
The offline training module comprises: the system comprises a characteristic engineering unit and a model training unit, wherein: the feature engineering unit processes abnormal values, data standardization and feature screening of the data obtained by the data processing module, and the model training unit carries out modeling training of the MLP model and the GBDT model according to the screened features.
The abnormal value processing means: performing outlier processing on default values or outliers in the features obtained by the data processing module, wherein the processing features comprise: the area, the three-responsibility insurance policy, the ticket premium, the traffic violation coefficient, the expected odds, the train, the negotiated actual value, the age of the vehicle, the classification of the vehicle, the risk level of the vehicle, the type of the vehicle, the displacement, the number of times the platform returns to insurance, the platform returns to NCD coefficient, the total number of cases of the vehicle, the amount of the vehicle pay, the sex of the insured person, whether the applicant has an insurance client, whether the insurance client is an effective insurance client for life insurance, the total insurance number purchased by the applicant, the age, the vehicle type, the purchase price of new vehicle, the fuel type and the like. Because the data volume of the non-vehicle change is large, if the default value exists in the data of the non-vehicle change user, the abnormal data is directly removed, and if the abnormality exists in the data of the vehicle change user, the data is processed in the modes of mean value filling, hot card filling, manual filling and the like.
The artificial padding is suitable for the part of the missing value which can be deduced from the rest of the data, such as gender can be deduced from provincial evidence.
The hot card filling refers to: for an object that contains a null value, the hot card fill method finds an object that is most similar to it in the complete data and then fills with the value of this similar object.
The data normalization refers to: the characteristic values are standardized and then converted into standard normal distribution, such as the vehicle age, new vehicle acquisition price, actual negotiating value and the like, and are directly converted into standard normal distribution. The characteristics of the other numerical value types are standardized by an interval scaling method in a dimensionless method, the processing formula is as follows,Wherein x is the original value of the feature, min is the minimum value of all the values of the feature, max is the maximum value of all the values of the feature, and x' is the value normalized by the original value. And converting the character data into numerical values by using a onehot coding method according to the characteristic value belonging to the character string type.
The feature screening means that: and carrying out feature screening on the 50 standardized features by XGBoost, and screening features with higher importance for the classification model. XGBoost the main parameters are set as: the input data is 50 in length, the booster is tree type (gbtree), the activation function is multi: softmax, the maximum depth of the tree is 6 layers, and the gamma value is 0.1. Training runs 100 rounds. And selecting 5000 vehicle-changing data from the data set by adopting a ten-fold cross-validation method, inputting 5000 vehicle-changing data into a XGBoost model for learning, outputting a feature importance result of 50 features, and counting feature sets with front feature importance in a ten-fold experiment. And screening out the features with high importance from the 50 features according to the statistical result, wherein 28 features are included: regional, three-responsibility insurance policy, ticket policy, traffic violation coefficients, expected odds, final odds, vehicle systems, negotiated actual value, vehicle age, vehicle type classification, vehicle type risk level, vehicle type, displacement, number of platform returns to insurance, number of platform returns to NCD coefficients, total number of vehicle cases, vehicle pay amount, sex of insured, whether the insurer is a life insurance client, whether the life insurance client is a life insurance long effective policy client, the insurer purchases life insurance total policy, age, vehicle type, risk, new vehicle purchase price, fuel type, the insurer pays total policy, and the like.
The MLP model refers to: selecting a multi-layer perceptron MLP as a classification model, wherein the network structure and parameters of the MLP comprise: an input layer, three hidden layers, and an output layer. The nodes of the three hidden layers are 128, 256 and 64 respectively, the hidden layers adopt an activation function LeakyReLU, and the corresponding dropout is set to 0.2. The activation function of the output layer is Sigmod.
The data obtained by the feature engineering unit are trained one by one to obtain different models, and the model can be divided into two kinds of models of whether a user changes a vehicle model or not and whether the vehicle model changes into two kinds of models of a plurality of target vehicle types such as BMW, gekko Swinhonis, leishas, masses, mercedes-Benz and the like after changing the vehicle. For the model of whether to change the car, when the user changes the car, the label is 1, and when the user does not change the car, the label is 0. For the model of the target vehicle model, the data mark of the target vehicle model is 1, and the data marks of other vehicle models are 0. All data were read as per 4:1 split, where 75% of the data is trained and 25% of the data is used as test set.
The LeakyReLU formula isSigmod has the formula/>
The model effect of the MLP is shown in the following table.
The GBDT model refers to: the GradientBoostingClassifier model in sklearn library is selected, the size of the tree is set to be 500 in the experiment, the maximum depth of the tree is set to be 4, the learning rate is set to be 0.1, and the minimum number of samples required by splitting one internal node of the tree is set to be 100. The loss function is a logarithmic loss function L (Y, P (y|x)) = -logP (y|x).
Training the data obtained by the feature engineering unit one by one to obtain different models, wherein the trained model data are the same as the MLP model data, and dividing 4:1, wherein 75% of the data are trained and 25% of the data are used as test sets.
The GBDT model effects are shown in the following table.
Accuracy (accuracy) indicates that all samples with correct prediction result account for all sample ratios.
Precision (precision) indicates the proportion of samples that are truly valid in samples for which the prediction result is valid.
Recall (recall) that indicates the proportion of samples for which the predicted outcome is valid to all true valid samples.
And storing the trained model with the optimal effect to a local place for an online detection module.
The on-line detection module specifically comprises: the device comprises a feature extraction unit and a vehicle change prediction unit, wherein: the feature extraction unit extracts multidimensional feature information of the user required by prediction by using a method of a feature engineering unit in the offline training module, and the vehicle change prediction unit inputs the obtained multidimensional features of the user into the stored model in batches to obtain a predicted value of whether the user changes a vehicle or not and whether the user changes a target vehicle type or not.
The multi-dimensional characteristic information comprises: regional, three-responsibility insurance policy, ticket policy, traffic violation coefficients, expected odds, final odds, vehicle systems, negotiated actual value, vehicle age, vehicle type classification, vehicle type risk level, vehicle type, displacement, number of platform returns to insurance, number of platform returns to NCD coefficients, total number of vehicle cases, vehicle pay amount, sex of insured, whether the insurer is a life insurance client, whether the life insurance client is a life insurance long effective policy client, the insurer purchases life insurance total policy, age, vehicle type, risk, new vehicle purchase price, fuel type, the insurer pays total policy, and the like.
Through specific practical experiments, under a Linux operating system, a python programming language is configured, the shell command is used for starting the model, the accuracy of the model on a test set for changing vehicles is up to 70.2%, and the accuracy of the model for changing vehicle types is up to 74.8%. Experimental results show that the method has certain effect and practicability in predicting the vehicle change and the vehicle change based on policy data.
The foregoing embodiments may be partially modified in numerous ways by those skilled in the art without departing from the principles and spirit of the invention, the scope of which is defined in the claims and not by the foregoing embodiments, and all such implementations are within the scope of the invention.
Claims (1)
1. An intelligent car change prediction system for a car insurance user is characterized by comprising: the system comprises a data processing module, an offline training module and an online prediction module, wherein: the data processing module performs data screening and data marking processing according to the user insurance policy information and outputs the result of whether the user changes the vehicle and whether the user changes the vehicle type, the offline training module performs machine learning model training according to the user insurance policy and marking information and outputs a prediction model, and the online prediction module performs prediction on whether the user changes the vehicle and changes the specified vehicle type according to the new user insurance policy information and the prediction model and outputs whether the user changes the vehicle and changes the specified vehicle type;
the data processing module comprises: the device comprises a data screening unit and a data labeling unit, wherein: the data screening unit is used for collecting data and screening the data, and the data labeling unit is used for searching the data of the insurance policy of the same user in the next year from the screened data according to the certificate number and the field of the insurance applicant and labeling the data;
the data collection refers to: collecting user policy data provided by an insurance company, standardizing data formats of all fields, extracting user information, vehicle information and insurance information in the policy data as features, and establishing a user policy database;
The data screening refers to: according to the insurance date of the insurance policy, the certificate number and the field of the insurance applicant, searching data of the insurance vehicles of the same user in different years in a user insurance policy database, deleting the insurance policy data with different vin codes and the quantity larger than 1 according to the vin codes and the field of the insurance policy, and reserving the data with the quantity of the insurance vehicles of different years being 1, namely screening out the data records of the insurance vehicles of the same user in different years in the user insurance policy database;
The data label specifically comprises the following steps: when the vin code of the insurance vehicle in the current year is different from the vin code and field value of the insurance vehicle in the next year, marking the vehicle as a vehicle change, and marking the vehicle type replaced by the user by using the insurance vehicle type in the next year insurance policy data; when the vin code of the current year of the insurance application vehicle is the same as the vin code and field value of the next year of the insurance application vehicle, marking as not changing;
The offline training module comprises: the system comprises a characteristic engineering unit and a model training unit, wherein: the feature engineering unit processes abnormal values, data standardization and feature screening of the data obtained by the data processing module, and the model training unit carries out modeling training of the MLP model and the GBDT model according to the screened features;
the abnormal value processing means: performing outlier processing on default values or outliers in the features obtained by the data processing module, wherein the processing features comprise: regional, three-responsibility insurance policy, policy premium, traffic violation coefficient, expected odds, train, negotiating actual value, age, classification of model, model risk level, model type, displacement, number of platform returns to insurance, platform returns NCD coefficient, total number of vehicle cases, vehicle odds and amount, sex of insured, whether the applicant has a life insurance client, whether it is a life insurance long effective policy client, total insurance policy number purchased by applicant, age, model, new vehicle acquisition price, fuel type;
if the default value exists in the data of the non-vehicle-changing user, the abnormal data is directly removed, and if the abnormality exists in the data of the vehicle-changing user, the data is processed through mean filling, hot card filling and manual filling;
the part which is suitable for the missing value and is estimated by the rest data is filled manually;
The hot card filling refers to: for an object containing null values, the hot card fill method finds an object most similar to it in the complete data, and then fills with the value of this similar object;
The data normalization refers to: the characteristic value accords with the numerical value type of normal distribution, is converted into standard normal distribution after standardization, the characteristics of the other numerical value types are standardized by an interval scaling method in a dimensionless method, a processing formula is that, Wherein x is the original value of the feature, min is the minimum value of all values of the feature, max is the maximum value of all values of the feature, x' is the value after the original value is standardized, the feature value belongs to the character string type, and character data are converted into numerical values through a onehot coding method;
the feature screening means that: carrying out feature screening on the 50 standardized features by XGBoost, and screening features with higher importance for the classification model;
The main parameters of XGBoost are set as follows: the length of input data is 50, a boost is a tree type, an activation function is multi, the maximum depth of the tree is 6 layers, the gamma value is 0.1, training rounds are 100 rounds, a ten-fold cross validation method is adopted, 5000 vehicle-changing data are selected from a data set, 5000 vehicle-non-vehicle-changing data are input into a XGBoost model for learning, feature importance results of 50 features are output, feature sets with front feature importance in ten-fold experiments are counted, features with high importance degree are selected from the 50 features according to the counted results, 28 features are included: regional, three-responsibility insurance policy, ticket policy, traffic violation coefficient, expected odds, final odds, train, negotiated actual value, age, classification of models, risk class of models, type of vehicle, displacement of models, number of platform returns to insurance, number of platform returns to NCD coefficient, total number of vehicle cases, amount of vehicle odds, sex of insured, whether the applicant has a life insurance client, whether the customer is a life insurance long insurance effective policy client, the applicant has purchased the life insurance total policy, age, model, risk, new vehicle purchase price, fuel type, the applicant has paid total policy;
The MLP model refers to: selecting a multi-layer perceptron MLP as a classification model, wherein the network structure and parameters of the MLP comprise: the input layer, three hidden layers and the output layer, wherein the nodes of the three hidden layers are 128, 256 and 64 respectively, the hidden layers adopt an activation function LeakyReLU, the corresponding dropout is set to be 0.2, and the activation function of the output layer is Sigmod;
The method comprises the steps of training data obtained by a feature engineering unit one by one to obtain different models, classifying the models into two classification models of whether a user changes a vehicle model or not, changing the vehicle model into a target vehicle model or not after changing the vehicle, for whether the vehicle model is changed, when the user changes the vehicle, the label is 1, the label is 0, for changing the vehicle model into the target vehicle model, the data of changing the vehicle model into the target vehicle model is 1, the data of changing the vehicle model into the data of other vehicle models is 0, and all the data are as follows: 1, wherein 75% of the data are trained and 25% of the data are used as test sets;
the LeakyReLU formula is Sigmod has the formula/>
The GBDT model refers to: selecting GradientBoostingClassifier models in sklearn libraries, setting the size of a tree to be 500, setting the maximum depth of the tree to be 4, setting the learning rate to be 0.1, setting the minimum sample number required by splitting an internal node of the tree to be 100, and setting a loss function to be a logarithmic loss function L (Y, P (Y|X))= -log P (Y|X);
Training the data obtained by the feature engineering unit one by one to obtain different models, wherein the trained model data are the same as the MLP model data, and dividing 4:1, wherein 75% of the data are trained and 25% of the data are used as test sets;
The on-line prediction module specifically comprises: the device comprises a feature extraction unit and a vehicle change prediction unit, wherein: the feature extraction unit extracts multidimensional feature information of a user required by prediction by using a method of a feature engineering unit in the offline training module, and the vehicle change prediction unit inputs the obtained multidimensional features of the user into a stored model in batches to obtain a predicted value of whether the user changes a vehicle or not and whether the user changes a target vehicle type or not;
The multi-dimensional characteristic information comprises: regional, three-responsibility insurance policy, ticket policy, traffic violation coefficients, expected odds, final odds, train, negotiated actual value, age, classification of models, risk class of models, type of vehicle, displacement of models, number of platform returns to insurance, number of platform returns to NCD coefficients, total number of vehicle cases, amount of vehicle odds, sex of insured, whether the applicant has a life insurance client, whether the customer is a life insurance long insurance effective policy client, the applicant has purchased the life insurance total policy, age, model, risk, new vehicle purchase price, fuel type, and applicant has paid total policy.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110851738.9A CN113570469B (en) | 2021-07-27 | 2021-07-27 | Intelligent vehicle change prediction method for vehicle insurance user |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110851738.9A CN113570469B (en) | 2021-07-27 | 2021-07-27 | Intelligent vehicle change prediction method for vehicle insurance user |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113570469A CN113570469A (en) | 2021-10-29 |
CN113570469B true CN113570469B (en) | 2024-05-28 |
Family
ID=78168026
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110851738.9A Active CN113570469B (en) | 2021-07-27 | 2021-07-27 | Intelligent vehicle change prediction method for vehicle insurance user |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113570469B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108053075A (en) * | 2017-12-27 | 2018-05-18 | 北京中交兴路车联网科技有限公司 | A kind of scrap-car Forecasting Methodology and system |
WO2020077871A1 (en) * | 2018-10-15 | 2020-04-23 | 平安科技(深圳)有限公司 | Event prediction method and apparatus based on big data, computer device, and storage medium |
CN112579900A (en) * | 2020-12-22 | 2021-03-30 | 优必爱信息技术(北京)有限公司 | Method, system and equipment for recommending second-hand vehicle replacement information |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100332292A1 (en) * | 2009-06-30 | 2010-12-30 | Experian Information Solutions, Inc. | System and method for evaluating vehicle purchase loyalty |
US20150254719A1 (en) * | 2014-03-05 | 2015-09-10 | Hti, Ip, L.L.C. | Prediction of Vehicle Transactions and Targeted Advertising Using Vehicle Telematics |
-
2021
- 2021-07-27 CN CN202110851738.9A patent/CN113570469B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108053075A (en) * | 2017-12-27 | 2018-05-18 | 北京中交兴路车联网科技有限公司 | A kind of scrap-car Forecasting Methodology and system |
WO2020077871A1 (en) * | 2018-10-15 | 2020-04-23 | 平安科技(深圳)有限公司 | Event prediction method and apparatus based on big data, computer device, and storage medium |
CN112579900A (en) * | 2020-12-22 | 2021-03-30 | 优必爱信息技术(北京)有限公司 | Method, system and equipment for recommending second-hand vehicle replacement information |
Also Published As
Publication number | Publication date |
---|---|
CN113570469A (en) | 2021-10-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11373249B1 (en) | Automobile monitoring systems and methods for detecting damage and other conditions | |
CN103294592B (en) | User instrument is utilized to automatically analyze the method and system of the defect in its service offering alternately | |
CN109739844B (en) | Data classification method based on attenuation weight | |
CN110706039A (en) | Electric vehicle residual value rate evaluation system, method, equipment and medium | |
CN111079941B (en) | Credit information processing method, credit information processing system, terminal and storage medium | |
CN112990386B (en) | User value clustering method and device, computer equipment and storage medium | |
CN112434829A (en) | Vehicle maintenance project determination method, system, device and storage medium | |
CN115147155A (en) | Railway freight customer loss prediction method based on ensemble learning | |
CN114078050A (en) | Loan overdue prediction method and device, electronic equipment and computer readable medium | |
CN113570469B (en) | Intelligent vehicle change prediction method for vehicle insurance user | |
CN109766440B (en) | Method and system for determining default classification information for object text description | |
US20230058076A1 (en) | Method and system for auto generating automotive data quality marker | |
CN113421154B (en) | Credit risk assessment method and system based on control chart | |
CN114331728A (en) | Security analysis management system | |
JP2022082525A (en) | Method and apparatus for providing information based on machine learning | |
CN112818215A (en) | Product data processing method, device, equipment and storage medium | |
CN114443803A (en) | Text information mining method and device, electronic equipment and storage medium | |
CN112905713A (en) | Case-related news overlapping entity relation extraction method based on joint criminal name prediction | |
CN110119464A (en) | The intelligent recommendation method and device of numerical value in a kind of contract | |
CN116913460B (en) | Marketing business compliance judgment and analysis method for pharmaceutical instruments and inspection reagents | |
CN115953166B (en) | Customer information management method and system based on big data intelligent matching | |
CN113191595B (en) | Vehicle operation full life cycle cost associated data analysis method and system | |
CN117520994B (en) | Method and system for identifying abnormal air ticket searching user based on user portrait and clustering technology | |
CN117764692A (en) | Method for predicting credit risk default probability | |
CN118333738A (en) | Method for constructing retail credit risk prediction model and credit card service Scorealpha model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |