CN116562894A - Vehicle insurance claim fraud risk identification method, device, electronic equipment and storage medium - Google Patents

Vehicle insurance claim fraud risk identification method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116562894A
CN116562894A CN202310498384.3A CN202310498384A CN116562894A CN 116562894 A CN116562894 A CN 116562894A CN 202310498384 A CN202310498384 A CN 202310498384A CN 116562894 A CN116562894 A CN 116562894A
Authority
CN
China
Prior art keywords
data
historical
vehicle insurance
risk
vehicle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310498384.3A
Other languages
Chinese (zh)
Inventor
林灿
史光辉
王建明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202310498384.3A priority Critical patent/CN116562894A/en
Publication of CN116562894A publication Critical patent/CN116562894A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/018Certifying business or products
    • G06Q30/0185Product, service or business identity fraud
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • General Physics & Mathematics (AREA)
  • Finance (AREA)
  • Marketing (AREA)
  • Data Mining & Analysis (AREA)
  • General Business, Economics & Management (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Strategic Management (AREA)
  • Artificial Intelligence (AREA)
  • Technology Law (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The invention relates to the technical field of artificial intelligence, and provides a vehicle insurance claim fraud risk identification method, device, electronic equipment and storage medium. The invention can be popularized to the risk identification scenes of the vehicle insurance claims fraud of different vehicle types, does not need to repeatedly customize and develop rule combinations corresponding to new vehicle types, reduces labor cost in the development and maintenance process, and is convenient for quick online popularization.

Description

Vehicle insurance claim fraud risk identification method, device, electronic equipment and storage medium
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a vehicle insurance claim fraud risk identification method, a device, electronic equipment and a storage medium.
Background
With the development of socioeconomic performance, the automobile conservation amount is rapidly increased, and the fraud phenomenon in automobile insurance claims is also greatly emerging, wherein the virtual increase and the expansion account for a large proportion of the automobile insurance claim fraud.
The inventor finds that the detection of the virtual expansion loss is mainly realized through hundreds of thousands of rules in the current process of realizing the invention, however, the hundreds of thousands of rules are combined according to the automobile loss items of the existing scene, the automobile loss items of the new scene cannot be covered, the adaptability is lacked, and the accuracy of identifying the risk of the fraud of the automobile insurance claim is lower.
Disclosure of Invention
In view of the foregoing, there is a need for a method, apparatus, electronic device, and storage medium for identifying risk of fraud in a vehicle insurance claim, which can improve accuracy of identifying risk of fraud in a vehicle insurance claim.
A first aspect of the present invention provides a method of identifying risk of fraud in a vehicle insurance claim, the method comprising:
acquiring a plurality of historical vehicle insurance data corresponding to each vehicle type;
Processing the historical vehicle insurance data to obtain historical standard vehicle insurance data;
calculating a historical similar attribute value of each historical standard vehicle insurance data;
determining a risk identification threshold according to the real risk mark of each historical standard vehicle risk data and the historical similar attribute value;
and carrying out risk identification on the vehicle risk data to be detected according to the risk identification threshold value to obtain a risk marking identification result.
According to an optional embodiment of the invention, the calculating the historical similarity attribute value of each of the historical standard vehicle risk data comprises:
extracting features of the historical standard vehicle insurance data to obtain vehicle insurance data features;
calculating the similarity between every two vehicle insurance data features;
calculating a similarity mean value according to a plurality of similarities corresponding to each vehicle insurance data feature;
and determining the similarity mean value as a history similarity attribute value of the corresponding history standard vehicle insurance data.
According to an optional embodiment of the invention, the determining the risk identification threshold according to the actual risk flag and the historical similarity attribute value of each of the historical standard vehicle risk data comprises:
randomly generating a plurality of random numbers, and taking each random number as a candidate risk identification threshold;
For each candidate risk identification threshold, when the historical similar attribute value is larger than the candidate risk identification threshold, the obtained comparison result is that the predicted risk corresponding to the historical similar attribute value is marked as no fraud risk; when the historical similar attribute value is smaller than the candidate risk identification threshold value, the obtained comparison result is that the predicted risk corresponding to the historical similar attribute value is marked as a fraud risk;
acquiring a first number of historical similar attribute values, wherein the corresponding real risk is marked as no fraud risk, and the corresponding predicted risk is marked as no fraud risk, in the historical similar attribute values;
acquiring a second number of historical similar attribute values in which the corresponding real risk is marked as having a fraud risk and the corresponding predicted risk is marked as having a fraud risk;
and determining a target risk identification threshold value in the candidate risk identification threshold values according to the first quantity and the second quantity.
According to an optional embodiment of the present invention, performing risk identification on the vehicle risk data to be tested according to the risk identification threshold value, and obtaining a risk marking identification result includes:
Determining the type of the vehicle to be detected in the vehicle risk data to be detected;
acquiring a plurality of target historical standard vehicle insurance data corresponding to the vehicle type of the vehicle to be tested;
generating historical standard vehicle insurance data to be tested for the vehicle insurance data to be tested;
calculating the similarity to be detected between the historical standard vehicle insurance data to be detected and each target historical standard vehicle insurance data;
obtaining a to-be-detected similarity attribute value of the to-be-detected vehicle insurance data according to the average value of the to-be-detected similarities;
comparing the similarity attribute value to be detected with the risk identification threshold value to obtain a comparison result;
and obtaining the risk mark recognition result according to the comparison result.
According to an optional embodiment of the present invention, the processing the historical vehicle insurance data to obtain historical standard vehicle insurance data includes:
acquiring a standard maintenance project mapping table corresponding to the historical vehicle insurance data according to the vehicle type;
extracting a plurality of maintenance items in the historical vehicle insurance data;
mapping the plurality of maintenance items according to the standard maintenance item mapping table to obtain a plurality of standard maintenance items;
and combining the plurality of standard maintenance projects to obtain historical standard vehicle insurance data.
According to an alternative embodiment of the invention, the method further comprises:
acquiring the number of data of the plurality of historical vehicle insurance data;
dividing the plurality of historical vehicle insurance data according to a preset data quantity coefficient and the number of data pieces to obtain a plurality of historical vehicle insurance data subsets;
and carrying out data cleaning on the historical vehicle insurance data in each historical vehicle insurance data subset to obtain the plurality of cleaning vehicle insurance data.
According to an optional embodiment of the present invention, the performing data cleansing on the historical vehicle insurance data in each of the historical vehicle insurance data subsets, and obtaining the plurality of cleansing vehicle insurance data includes:
performing format conversion on the historical vehicle insurance data in each historical vehicle insurance data subset according to a preset data format to obtain historical vehicle insurance quality data;
detecting the data type of the historical vehicle insurance quality data;
determining a data cleaning rule corresponding to the data type;
and carrying out data cleaning on the corresponding historical vehicle insurance quality data according to the data cleaning rule to obtain the plurality of cleaning vehicle insurance data.
A second aspect of the invention provides a vehicle insurance claim fraud risk identification apparatus, the apparatus comprising:
The acquisition module is used for acquiring a plurality of historical vehicle insurance data corresponding to each vehicle type;
the mapping module is used for processing the historical vehicle insurance data to obtain historical standard vehicle insurance data;
the calculation module is used for calculating the historical similar attribute value of each historical standard vehicle insurance data;
the determining module is used for determining a risk identification threshold according to the real risk mark of each historical standard vehicle risk data and the historical similar attribute value;
and the identification module is used for carrying out risk identification on the vehicle risk data to be detected according to the risk identification threshold value to obtain a risk mark identification result.
A third aspect of the invention provides an electronic device comprising a processor and a memory, the processor being operable to implement the vehicle insurance claim fraud risk identification method when executing a computer program stored in the memory.
A fourth aspect of the invention provides a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the vehicle insurance claim fraud risk identification method.
According to the vehicle insurance claim fraud risk identification method, the vehicle insurance claim fraud risk identification device, the electronic equipment and the storage medium, the historical standard vehicle insurance data are obtained by acquiring the historical vehicle insurance data corresponding to each vehicle type and processing the historical vehicle insurance data, then the historical similar attribute value of each historical vehicle insurance data is calculated, so that the risk identification threshold values of different vehicle types are adaptively learned according to the real risk marks and the historical similar attribute values of each historical vehicle insurance data to serve as the judgment standard of the vehicle insurance claim fraud risk, finally the risk identification can be carried out on the vehicle insurance data to be detected according to the risk identification threshold values, and the accuracy of the vehicle insurance claim fraud risk identification is improved. The invention can be popularized to the risk identification scenes of the vehicle insurance claims fraud of different vehicle types, does not need to repeatedly customize and develop rule combinations corresponding to new vehicle types, reduces labor cost in the development and maintenance process, and is convenient for quick online popularization.
Drawings
Fig. 1 is a flowchart of a method for identifying risk of fraud in a vehicle insurance claim according to an embodiment of the present invention.
Fig. 2 is a block diagram of a device for identifying risk of fraud in a vehicle insurance claim according to a second embodiment of the present invention.
Fig. 3 is a schematic structural diagram of an electronic device according to a third embodiment of the present invention.
Detailed Description
In order that the above-recited objects, features and advantages of the present invention will be more clearly understood, a more particular description of the invention will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. Embodiments of the invention and features of the embodiments may be combined with each other without conflict.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing embodiments of the invention only and is not intended to be limiting of the invention.
The vehicle insurance claim fraud risk identification method provided by the embodiment of the invention is executed by the electronic equipment, and correspondingly, the vehicle insurance claim fraud risk identification device is operated in the electronic equipment.
The embodiment of the invention can perform standardized processing on the data based on the artificial intelligence technology. Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results.
Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
Example 1
Fig. 1 is a flowchart of a method for identifying risk of fraud in a vehicle insurance claim according to an embodiment of the present invention. The method for identifying the risk of the fraud in the vehicle insurance claim specifically comprises the following steps, the sequence of the steps in the flow chart can be changed according to different requirements, and some steps can be omitted.
S11, acquiring a plurality of historical vehicle insurance data corresponding to each vehicle type.
The vehicle model, that is, the model of the vehicle, is a number composed of pinyin letters and arabic numerals, which is designated for one type of vehicle in order to identify the vehicle. The vehicle model may include: small, mini, compact, medium, advanced, luxury, three-compartment, CDV, MPV, SUV, etc.
When the maintenance claim needs to be carried out on the vehicle, the user can report the vehicle insurance data to the electronic equipment through the user terminal, and the user can comprise the vehicle owner and other personnel using the vehicle. The vehicle insurance data may include, but is not limited to: user contact, vehicle model, vehicle license plate, maintenance items requiring insurance claims. After receiving the vehicle insurance data reported by the user terminal, the electronic equipment stores the vehicle insurance data in a local vehicle insurance database, so that a claim settlement worker can carry out risk audit on the vehicle insurance data in the vehicle insurance database, and after the audit is completed, a risk label is marked for the vehicle insurance data. For example, after the risk auditing is passed on the vehicle insurance data, identifying a fraud-free risk tag for the vehicle insurance data, and performing vehicle insurance claims on vehicles corresponding to the vehicle insurance data that passes the auditing. And when the risk auditing is not passed, identifying a fraud risk label for the vehicle insurance data, and not carrying out vehicle insurance claims on vehicles corresponding to the vehicle insurance data which is not checked to pass.
For convenience of description, the risk data subjected to risk verification stored in the risk database of the electronic device is referred to as historical risk data. It should be understood that the corresponding relation between the plurality of historical vehicle insurance data and the risk tag is stored in the vehicle insurance database of the electronic device, and each behavior in the vehicle insurance database includes one piece of historical vehicle insurance data and a corresponding one of risk tags.
The electronic equipment classifies the historical vehicle insurance data in the vehicle insurance database according to the vehicle types to obtain a plurality of historical vehicle insurance data corresponding to each vehicle type. After the plurality of historical vehicle insurance data corresponding to each vehicle type are obtained, the electronic equipment can conduct data cleaning on the plurality of historical vehicle insurance data corresponding to each vehicle type, and a plurality of cleaning vehicle insurance data are obtained.
In an optional embodiment, the data cleaning the plurality of historical vehicle risk data corresponding to each vehicle model, and obtaining the plurality of cleaned vehicle risk data includes:
acquiring the number of data of the plurality of historical vehicle insurance data;
dividing the plurality of historical vehicle insurance data according to a preset data quantity coefficient and the number of data pieces to obtain a plurality of historical vehicle insurance data subsets;
And carrying out data cleaning on the historical vehicle insurance data in each historical vehicle insurance data subset to obtain the plurality of cleaning vehicle insurance data.
The preset data quantity coefficient is the data quantity in each historical vehicle insurance data preset by the electronic equipment. For example, assuming that the preset data size coefficient is X, dividing the plurality of historical vehicle insurance data into Y historical vehicle insurance data subsets according to the preset data size coefficient and the number of data stripes, storing X data stripes in each historical vehicle insurance data subset of the first Y-1 historical vehicle insurance data subsets, storing R data in the last historical vehicle insurance data subset, and storing num=x (Y-1) +r data stripes of the plurality of historical vehicle insurance data.
In the embodiment of the invention, the number of the data pieces of the historical vehicle insurance data can be calculated in a row unit.
For each vehicle model, acquiring the number of data pieces of a plurality of historical vehicle insurance data of the vehicle model, and dividing the plurality of historical vehicle insurance data according to a preset data quantity coefficient and the number of data pieces to obtain a plurality of historical vehicle insurance data subsets; and then, the historical vehicle insurance data in the historical vehicle insurance data subset is synchronously processed by taking the historical vehicle insurance data subset as a unit, so that the data cleaning efficiency is improved. And collecting data obtained after data cleaning is carried out on each historical vehicle insurance data subset, so that a plurality of cleaning vehicle insurance data of the vehicle type can be obtained.
In an optional embodiment, the performing data cleansing on the historical vehicle insurance data in each of the historical vehicle insurance data subsets, and obtaining the plurality of cleansing vehicle insurance data includes:
performing format conversion on the historical vehicle insurance data in each historical vehicle insurance data subset according to a preset data format to obtain historical vehicle insurance quality data;
detecting the data type of the historical vehicle insurance quality data;
determining a data cleaning rule corresponding to the data type;
and carrying out data cleaning on the corresponding historical vehicle insurance quality data according to the data cleaning rule to obtain the plurality of cleaning vehicle insurance data.
The historical vehicle insurance data has non-uniform data format due to different reporting modes of users, and the data needs to be converted into data with uniform format to realize the compatibility of the data, and meanwhile, the analysis requirement of vehicle insurance claim settlement application is met.
The data types of the historical risk quality data may be classified into a missing value, an abnormal value, and a repeated value. If the detected historical vehicle insurance quality data has missing data content, judging that the data type of the historical vehicle insurance quality data is a missing value; if the detected historical vehicle insurance quality data has abnormal data content, judging that the data type of the historical vehicle insurance quality data is an abnormal value; and if the detected historical vehicle insurance quality data have two or more identical data contents, judging that the data type of the historical vehicle insurance quality data is a repeated value.
If the data type of the historical vehicle insurance quality data is judged to be a missing value, determining a data cleaning rule corresponding to the historical vehicle insurance quality data to be filling or rejecting processing; if the data type of the historical vehicle insurance quality data is judged to be an abnormal value, determining a data cleaning rule corresponding to the historical vehicle insurance quality data to be filtering processing; and if the data type of the historical vehicle insurance quality data is judged to be a repeated value, determining a data cleaning rule corresponding to the historical vehicle insurance quality data as cleaning processing.
According to the optional implementation mode, the data format conversion is carried out on the historical vehicle insurance data, after the data type of the historical vehicle insurance quality data is detected, the data cleaning rule is determined according to the data type of the historical vehicle insurance quality data, and the data cleaning is carried out on the historical vehicle insurance quality data according to the data cleaning rule, so that the data cleaning of the historical vehicle insurance data is realized, chaotic data are converted into clean and tidy high-quality data, the usability of big data is enhanced, and the cleaning effect and the cleaning efficiency of the data cleaning are improved.
And S12, processing the historical vehicle insurance data to obtain historical standard vehicle insurance data.
After the electronic equipment acquires a plurality of historical vehicle insurance data corresponding to each vehicle type, the historical vehicle insurance data are processed to obtain historical standard vehicle insurance data, so that a risk identification threshold value is conveniently and adaptively determined based on the marked vehicle insurance data.
It is understood that after data cleaning is performed on a plurality of historical vehicle insurance data corresponding to each vehicle type to obtain a plurality of cleaned vehicle insurance data, the electronic device can process each cleaned vehicle insurance data to obtain historical standard vehicle insurance data.
In an optional embodiment, the processing the historical vehicle insurance data to obtain historical standard vehicle insurance data includes:
acquiring a standard maintenance project mapping table corresponding to the historical vehicle insurance data according to the vehicle type;
extracting a plurality of maintenance items in the historical vehicle insurance data;
mapping the plurality of maintenance items according to the standard maintenance item mapping table to obtain a plurality of standard maintenance items;
and combining the plurality of standard maintenance projects to obtain historical standard vehicle insurance data.
Different vehicle types correspond to different standard maintenance project mapping tables, and standard names of parts of the corresponding vehicle types are recorded in the standard maintenance project mapping tables.
Because the inputs of different users to the same parts are different, each maintenance item needs to be mapped according to the standard maintenance item mapping table, so that the standard maintenance item is obtained, a plurality of standard maintenance items corresponding to each historical vehicle insurance data are combined, and the historical standard vehicle insurance data are obtained. And each historical vehicle insurance data corresponds to one historical standard vehicle insurance data.
And S13, calculating the historical similar attribute value of each historical standard vehicle insurance data.
Each vehicle model corresponds to a plurality of historical vehicle insurance data, each historical vehicle insurance data corresponds to one historical standard vehicle insurance data, each vehicle model corresponds to a plurality of historical standard vehicle insurance data, and according to the plurality of historical standard vehicle insurance data, similar attribute values of each historical standard vehicle insurance data can be obtained, so that historical similar attribute values of each historical vehicle insurance data are obtained.
In an alternative embodiment, said calculating a historical similarity attribute value for each of said historical standard vehicle risk data includes:
extracting features of the historical standard vehicle insurance data to obtain vehicle insurance data features;
calculating the similarity between every two vehicle insurance data features;
Calculating a similarity mean value according to a plurality of similarities corresponding to each vehicle insurance data feature;
and determining the similarity mean value as a history similarity attribute value of the corresponding history standard vehicle insurance data.
In this embodiment, the feature extraction of the historical standard vehicle risk data may include: calculating TF-IDF (w) values of each standard maintenance item in a plurality of standard maintenance items in the historical standard vehicle insurance data, and combining the TF-IDF (w) values into a vector serving as the vehicle insurance data characteristics. Wherein w represents a standard maintenance item, TF (w) =n (w)/N, N (w) represents the number of times the standard maintenance item appears in all standard maintenance items corresponding to the vehicle model, N represents the total number of maintenance items in a plurality of historical standard vehicle risk data corresponding to the vehicle model, IDF (w) =log ((dn+1)/(dw+1))+1, dn represents the number of all vehicles, and Dw represents the number of times the maintenance item appears in all vehicles. Each vehicle risk data feature a vector of dimension 1 x dn.
The similarity between the two vehicle insurance data features may be a cosine distance between the two vehicle insurance data features.
And S14, determining a risk identification threshold according to the real risk marks of each historical standard vehicle risk data and the historical similar attribute values.
According to the corresponding relation between the historical standard vehicle insurance data and the risk mark (real risk mark), the historical standard vehicle insurance data without fraud risk can be divided into a group, the historical standard vehicle insurance data with fraud risk is divided into a group, the first data distribution of the historical similar attribute value corresponding to the historical standard vehicle insurance data without fraud risk and the second data distribution of the historical similar attribute value corresponding to the historical standard vehicle insurance data with fraud risk are displayed on the electronic equipment, and therefore the risk identification threshold is determined according to the first data distribution and the second data distribution.
In an optional embodiment, the determining the risk identification threshold according to the real risk flag and the historical similarity attribute value of each of the historical standard vehicle risk data includes:
randomly generating a plurality of random numbers, and taking each random number as a candidate risk identification threshold;
for each candidate risk identification threshold, when the historical similar attribute value is larger than the candidate risk identification threshold, the obtained comparison result is that the predicted risk corresponding to the historical similar attribute value is marked as no fraud risk; when the historical similar attribute value is smaller than the candidate risk identification threshold value, the obtained comparison result is that the predicted risk corresponding to the historical similar attribute value is marked as a fraud risk;
Acquiring a first number of historical similar attribute values, wherein the corresponding real risk is marked as no fraud risk, and the corresponding predicted risk is marked as no fraud risk, in the historical similar attribute values;
acquiring a second number of historical similar attribute values in which the corresponding real risk is marked as having a fraud risk and the corresponding predicted risk is marked as having a fraud risk;
and determining a target risk identification threshold value in the candidate risk identification threshold values according to the first quantity and the second quantity.
Illustratively, assume that the historical similarity attribute values include: d10 D11, D12, D13, D14, D15, D16, D17, D18, D19, wherein the real risk corresponding to the historical similarity attribute values D10-D15 is marked as no fraud risk, the real risk corresponding to the historical similarity attribute values D16-D19 is marked as fraud risk, the generated random numbers are R1 and R2, and then R1 and R2 are used as candidate risk recognition thresholds.
For the candidate risk identification threshold value R1, each of the historical similar attribute values D10-D19 is compared with R1, and if the historical similar attribute values D10-D16 are all greater than R1 and the historical similar attribute values D17-D19 are all less than R1, the predicted risk corresponding to the historical similar attribute values D10-D16 is marked as no fraud risk, and the predicted risk corresponding to the historical similar attribute values D17-D19 is marked as fraud risk. Obviously, in the historical similar attribute values D10-D16, the predicted risk corresponding to D10-D15 is marked as no fraud risk, and the real risk corresponding to D10-D15 is marked as no fraud risk, then the historical similar attribute values D10-D15 are correctly classified through the candidate risk recognition threshold R1, and the first number is 6. In the historical similar attribute values D17-D19, the predicted risk corresponding to D17-D19 is marked as having a fraud risk, and the real risk corresponding to D17-D19 is marked as having a fraud risk, then the historical similar attribute values D17-D19 are correctly classified through the candidate risk recognition threshold R1, and the second number is 3. While the predicted risk corresponding to D16 is marked as no risk of fraud, but the real risk corresponding to D16 is marked as having a risk of fraud, then it is indicated that the historical similarity attribute value D16 is misclassified past the candidate risk identification threshold R1.
For the candidate risk identification threshold R2, each of the historical similar attribute values D10-D19 is compared with R2, respectively, and if the historical similar attribute values D11-D18 are both greater than R1 and the historical similar attribute values D10-D19 are both less than R1, the predicted risk corresponding to the historical similar attribute values D11-D18 is marked as having no risk of fraud and the predicted risk corresponding to the historical similar attribute values D10-D19 is marked as having risk of fraud. Obviously, in the historical similar attribute values D11-D18, the predicted risks corresponding to D11-D15 are marked as no fraud risk, and the real risks corresponding to D11-D15 are marked as no fraud risk, then the historical similar attribute values D11-D15 are correctly classified through the candidate risk recognition threshold R2, and the first number is 5. In the historical similar attribute values D10 and D19, the predicted risk corresponding to D19 is marked as having a fraud risk, and the real risk corresponding to D19 is marked as having a fraud risk, then it is indicated that the historical similar attribute value D19 is correctly classified by the candidate risk recognition threshold R2, and the second number is 1. While the predicted risk corresponding to D10 is marked as having a fraud risk, the real risk corresponding to D10 is marked as having no fraud risk, the predicted risk corresponding to D16-D18 is marked as having no fraud risk, and the real risk corresponding to a single D16-D18 is marked as having a fraud risk, then it is indicated that the historical similar attribute values D10, D16-D18 are misclassified through the candidate risk recognition threshold R2.
It can be seen that for the candidate risk identification threshold R1, the number correctly classified is the sum of the first number 6 and the second number 3; and for the candidate risk recognition threshold R2, the number correctly classified is the sum of the first number 5 and the second number 1, and the candidate risk recognition threshold R1 is determined as the target risk recognition threshold.
When the historical similar attribute value is predicted in a classified mode through the target risk identification threshold value, the historical similar attribute value can be predicted to be the largest in number of correct categories.
And S15, performing risk identification on the vehicle risk data to be detected according to the risk identification threshold value to obtain a risk marking identification result.
The vehicle insurance data to be tested are vehicle insurance data needing to be subjected to risk auditing.
And obtaining a risk marking recognition result by calculating the similarity between the to-be-detected vehicle risk data and the historical standard vehicle risk data and comparing the similarity with a risk recognition threshold value.
In an optional embodiment, the performing risk identification on the vehicle risk data to be detected according to the risk identification threshold value, and obtaining a risk marking identification result includes:
determining the type of the vehicle to be detected in the vehicle risk data to be detected;
acquiring a plurality of target historical standard vehicle insurance data corresponding to the vehicle type of the vehicle to be tested;
Generating historical standard vehicle insurance data to be tested for the vehicle insurance data to be tested;
calculating the similarity to be detected between the historical standard vehicle insurance data to be detected and each target historical standard vehicle insurance data;
obtaining a to-be-detected similarity attribute value of the to-be-detected vehicle insurance data according to the average value of the to-be-detected similarities;
comparing the similarity attribute value to be detected with the risk identification threshold value to obtain a comparison result;
and obtaining the risk mark recognition result according to the comparison result.
Because different vehicle types correspond to different historical vehicle risk data, correspond to different historical standard vehicle risk data and correspond to different risk identification thresholds, when risk marking identification is carried out on the vehicle risk data to be tested, the vehicle types to be tested in the vehicle risk data to be tested are required to be acquired first, and therefore a plurality of target historical standard vehicle risk data are determined according to the vehicle types to be tested.
The electronic equipment acquires a standard maintenance item mapping table corresponding to the vehicle insurance data to be tested according to the vehicle type of the vehicle to be tested, so that after a plurality of maintenance items to be tested in the vehicle insurance data to be tested are extracted, the plurality of maintenance items to be tested are mapped according to the standard maintenance item mapping table to obtain a plurality of standard maintenance items to be tested, and finally the plurality of standard maintenance items to be tested are combined to obtain the standard vehicle insurance data to be tested.
According to the standard vehicle insurance data to be tested and each target historical standard vehicle insurance data in the plurality of target historical standard vehicle insurance data, one similarity to be tested can be calculated, the plurality of target historical standard vehicle insurance data correspond to the plurality of similarities to be tested, and the average value of the similarities to be tested is determined to be the similarity attribute value to be tested of the vehicle insurance data to be tested. The similarity between the standard vehicle insurance data to be tested and the target historical standard vehicle insurance data can be the cosine distance between the standard vehicle insurance data to be tested and the target historical standard vehicle insurance data.
The comparison result includes that the similarity attribute value to be detected is larger than the risk identification threshold value, and the similarity attribute value to be detected is smaller than the risk identification threshold value.
And when the comparison result is that the similarity attribute value to be detected is larger than the risk identification threshold value, the obtained risk mark identification result is a preset first risk mark identification result. And when the comparison result is that the similarity attribute value to be detected is smaller than the risk identification threshold value, the obtained risk mark identification result is a preset second risk mark identification result. The preset first risk mark identification result may be no fraud risk, and the preset second risk mark identification result may be fraud risk.
In a car insurance claim maintenance application, since multiparty participation (loss fighter/maintenance factory/claimant) is involved, multiple reports of maintenance items are easily caused, forming car insurance claim fraud. In the prior art, the fraud of the car insurance claim is mainly carried out by a mode of expert experience. However, this solution, besides relying on the understanding and experience of the developer on the business, also brings about significant development and maintenance costs, requiring the combination of manually extracting useful information from the massive data. According to the vehicle insurance claim fraud risk identification method, the risk identification threshold value of different vehicle types is adaptively learned by utilizing the plurality of historical vehicle insurance data corresponding to each vehicle type and is used as the judgment standard of the vehicle insurance claim fraud risk, the vehicle insurance claim fraud risk identification method is popularized to the vehicle insurance claim fraud risk identification scenes of different vehicle types, the rule combination corresponding to new vehicle types is not required to be customized and developed repeatedly, the labor cost brought in the development and maintenance process is reduced, and the vehicle insurance risk identification method is convenient to popularize on line.
Example two
Fig. 2 is a block diagram of a device for identifying risk of fraud in a vehicle insurance claim according to an embodiment of the present invention.
In some embodiments, the vehicle insurance claim fraud risk identification apparatus 20 may include a plurality of functional modules composed of computer program segments. The computer program of the individual program segments in the vehicle insurance claim fraud risk identification means 20 may be stored in a memory of the electronic device and executed by at least one processor to perform (see fig. 1 for details) the functions of vehicle insurance claim fraud risk identification.
In this embodiment, the risk identification device 20 may be divided into a plurality of functional modules according to the functions performed by the risk identification device. The functional module may include: the system comprises an acquisition module 201, a cleaning module 202, a mapping module 203, a calculation module 204, a determination module 205 and an identification module 206. The module referred to in the present invention refers to a series of computer program segments capable of being executed by at least one processor and of performing a fixed function, stored in a memory. In the present embodiment, the functions of the respective modules will be described in detail in the following embodiments.
The acquiring module 201 is configured to acquire a plurality of historical vehicle risk data corresponding to each vehicle model.
The vehicle model, that is, the model of the vehicle, is a number composed of pinyin letters and arabic numerals, which is designated for one type of vehicle in order to identify the vehicle. The vehicle model may include: small, mini, compact, medium, advanced, luxury, three-compartment, CDV, MPV, SUV, etc.
When the maintenance claim needs to be carried out on the vehicle, the user can report the vehicle insurance data to the electronic equipment through the user terminal, and the user can comprise the vehicle owner and other personnel using the vehicle. The vehicle insurance data may include, but is not limited to: user contact, vehicle model, vehicle license plate, maintenance items requiring insurance claims. After receiving the vehicle insurance data reported by the user terminal, the electronic equipment stores the vehicle insurance data in a local vehicle insurance database, so that a claim settlement worker can carry out risk audit on the vehicle insurance data in the vehicle insurance database, and after the audit is completed, a risk label is marked for the vehicle insurance data. For example, after the risk auditing is passed on the vehicle insurance data, identifying a fraud-free risk tag for the vehicle insurance data, and performing vehicle insurance claims on vehicles corresponding to the vehicle insurance data that passes the auditing. And when the risk auditing is not passed, identifying a fraud risk label for the vehicle insurance data, and not carrying out vehicle insurance claims on vehicles corresponding to the vehicle insurance data which is not checked to pass.
For convenience of description, the risk data subjected to risk verification stored in the risk database of the electronic device is referred to as historical risk data. It should be understood that the corresponding relation between the plurality of historical vehicle insurance data and the risk tag is stored in the vehicle insurance database of the electronic device, and each behavior in the vehicle insurance database includes one piece of historical vehicle insurance data and a corresponding one of risk tags.
The electronic equipment classifies the historical vehicle insurance data in the vehicle insurance database according to the vehicle types to obtain a plurality of historical vehicle insurance data corresponding to each vehicle type.
The cleaning module 202 is configured to perform data cleaning on a plurality of historical vehicle risk data corresponding to each vehicle model, so as to obtain a plurality of cleaning vehicle risk data.
In an optional embodiment, the data cleaning the plurality of historical vehicle risk data corresponding to each vehicle model, and obtaining the plurality of cleaned vehicle risk data includes:
acquiring the number of data of the plurality of historical vehicle insurance data;
dividing the plurality of historical vehicle insurance data according to a preset data quantity coefficient and the number of data pieces to obtain a plurality of historical vehicle insurance data subsets;
and carrying out data cleaning on the historical vehicle insurance data in each historical vehicle insurance data subset to obtain the plurality of cleaning vehicle insurance data.
The preset data quantity coefficient is the data quantity in each historical vehicle insurance data preset by the electronic equipment. For example, assuming that the preset data size coefficient is X, dividing the plurality of historical vehicle insurance data into Y historical vehicle insurance data subsets according to the preset data size coefficient and the number of data stripes, storing X data stripes in each historical vehicle insurance data subset of the first Y-1 historical vehicle insurance data subsets, storing R data in the last historical vehicle insurance data subset, and storing num=x (Y-1) +r data stripes of the plurality of historical vehicle insurance data.
In the embodiment of the invention, the number of the data pieces of the historical vehicle insurance data can be calculated in a row unit.
For each vehicle model, acquiring the number of data pieces of a plurality of historical vehicle insurance data of the vehicle model, and dividing the plurality of historical vehicle insurance data according to a preset data quantity coefficient and the number of data pieces to obtain a plurality of historical vehicle insurance data subsets; and then, the historical vehicle insurance data in the historical vehicle insurance data subset is synchronously processed by taking the historical vehicle insurance data subset as a unit, so that the data cleaning efficiency is improved. And collecting data obtained after data cleaning is carried out on each historical vehicle insurance data subset, so that a plurality of cleaning vehicle insurance data of the vehicle type can be obtained.
In an optional embodiment, the performing data cleansing on the historical vehicle insurance data in each of the historical vehicle insurance data subsets, and obtaining the plurality of cleansing vehicle insurance data includes:
performing format conversion on the historical vehicle insurance data in each historical vehicle insurance data subset according to a preset data format to obtain historical vehicle insurance quality data;
detecting the data type of the historical vehicle insurance quality data;
determining a data cleaning rule corresponding to the data type;
And carrying out data cleaning on the corresponding historical vehicle insurance quality data according to the data cleaning rule to obtain the plurality of cleaning vehicle insurance data.
The historical vehicle insurance data has non-uniform data format due to different reporting modes of users, and the data needs to be converted into data with uniform format to realize the compatibility of the data, and meanwhile, the analysis requirement of vehicle insurance claim settlement application is met.
The data types of the historical risk quality data may be classified into a missing value, an abnormal value, and a repeated value. If the detected historical vehicle insurance quality data has missing data content, judging that the data type of the historical vehicle insurance quality data is a missing value; if the detected historical vehicle insurance quality data has abnormal data content, judging that the data type of the historical vehicle insurance quality data is an abnormal value; and if the detected historical vehicle insurance quality data have two or more identical data contents, judging that the data type of the historical vehicle insurance quality data is a repeated value.
If the data type of the historical vehicle insurance quality data is judged to be a missing value, determining a data cleaning rule corresponding to the historical vehicle insurance quality data to be filling or rejecting processing; if the data type of the historical vehicle insurance quality data is judged to be an abnormal value, determining a data cleaning rule corresponding to the historical vehicle insurance quality data to be filtering processing; and if the data type of the historical vehicle insurance quality data is judged to be a repeated value, determining a data cleaning rule corresponding to the historical vehicle insurance quality data as cleaning processing.
According to the optional implementation mode, the data format conversion is carried out on the historical vehicle insurance data, after the data type of the historical vehicle insurance quality data is detected, the data cleaning rule is determined according to the data type of the historical vehicle insurance quality data, and the data cleaning is carried out on the historical vehicle insurance quality data according to the data cleaning rule, so that the data cleaning of the historical vehicle insurance data is realized, chaotic data are converted into clean and tidy high-quality data, the usability of big data is enhanced, and the cleaning effect and the cleaning efficiency of the data cleaning are improved.
The mapping module 203 is configured to process the historical vehicle risk data to obtain historical standard vehicle risk data.
After the electronic equipment acquires a plurality of historical vehicle insurance data corresponding to each vehicle type, the historical vehicle insurance data are processed to obtain historical standard vehicle insurance data, so that a risk identification threshold value is conveniently and adaptively determined based on the marked vehicle insurance data.
It is understood that after data cleaning is performed on a plurality of historical vehicle insurance data corresponding to each vehicle type to obtain a plurality of cleaned vehicle insurance data, the electronic device can process each cleaned vehicle insurance data to obtain historical standard vehicle insurance data.
In an optional embodiment, the processing the historical vehicle insurance data to obtain historical standard vehicle insurance data includes:
acquiring a standard maintenance project mapping table corresponding to the historical vehicle insurance data according to the vehicle type;
extracting a plurality of maintenance items in the historical vehicle insurance data;
mapping the plurality of maintenance items according to the standard maintenance item mapping table to obtain a plurality of standard maintenance items;
and combining the plurality of standard maintenance projects to obtain historical standard vehicle insurance data.
Different vehicle types correspond to different standard maintenance project mapping tables, and standard names of parts of the corresponding vehicle types are recorded in the standard maintenance project mapping tables.
Because the inputs of different users to the same parts are different, each maintenance item needs to be mapped according to the standard maintenance item mapping table, so that the standard maintenance item is obtained, a plurality of standard maintenance items corresponding to each historical vehicle insurance data are combined, and the historical standard vehicle insurance data are obtained. And each historical vehicle insurance data corresponds to one historical standard vehicle insurance data.
The calculating module 204 is configured to calculate a historical similar attribute value of each of the historical standard vehicle insurance data.
Each vehicle model corresponds to a plurality of historical vehicle insurance data, each historical vehicle insurance data corresponds to one historical standard vehicle insurance data, each vehicle model corresponds to a plurality of historical standard vehicle insurance data, and according to the plurality of historical standard vehicle insurance data, similar attribute values of each historical standard vehicle insurance data can be obtained, so that historical similar attribute values of each historical vehicle insurance data are obtained.
In an alternative embodiment, said calculating a historical similarity attribute value for each of said historical standard vehicle risk data includes:
extracting features of the historical standard vehicle insurance data to obtain vehicle insurance data features;
calculating the similarity between every two vehicle insurance data features;
calculating a similarity mean value according to a plurality of similarities corresponding to each vehicle insurance data feature;
and determining the similarity mean value as a history similarity attribute value of the corresponding history standard vehicle insurance data.
In this embodiment, the feature extraction of the historical standard vehicle risk data may include: calculating TF-IDF (w) values of each standard maintenance item in a plurality of standard maintenance items in the historical standard vehicle insurance data, and combining the TF-IDF (w) values into a vector serving as the vehicle insurance data characteristics. Wherein w represents a standard maintenance item, TF (w) =n (w)/N, N (w) represents the number of times the standard maintenance item appears in all standard maintenance items corresponding to the vehicle model, N represents the total number of maintenance items in a plurality of historical standard vehicle risk data corresponding to the vehicle model, IDF (w) =log ((dn+1)/(dw+1))+1, dn represents the number of all vehicles, and Dw represents the number of times the maintenance item appears in all vehicles. Each vehicle risk data feature a vector of dimension 1 x dn.
The similarity between the two vehicle insurance data features may be a cosine distance between the two vehicle insurance data features.
The determining module 205 is configured to determine a risk identification threshold according to the real risk flag and the historical similar attribute value of each historical standard vehicle risk data.
According to the corresponding relation between the historical standard vehicle insurance data and the risk mark (real risk mark), the historical standard vehicle insurance data without fraud risk can be divided into a group, the historical standard vehicle insurance data with fraud risk is divided into a group, the first data distribution of the historical similar attribute value corresponding to the historical standard vehicle insurance data without fraud risk and the second data distribution of the historical similar attribute value corresponding to the historical standard vehicle insurance data with fraud risk are displayed on the electronic equipment, and therefore the risk identification threshold is determined according to the first data distribution and the second data distribution.
In an optional embodiment, the determining the risk identification threshold according to the real risk flag and the historical similarity attribute value of each of the historical standard vehicle risk data includes:
randomly generating a plurality of random numbers, and taking each random number as a candidate risk identification threshold;
For each candidate risk identification threshold, when the historical similar attribute value is larger than the candidate risk identification threshold, the obtained comparison result is that the predicted risk corresponding to the historical similar attribute value is marked as no fraud risk; when the historical similar attribute value is smaller than the candidate risk identification threshold value, the obtained comparison result is that the predicted risk corresponding to the historical similar attribute value is marked as a fraud risk;
acquiring a first number of historical similar attribute values, wherein the corresponding real risk is marked as no fraud risk, and the corresponding predicted risk is marked as no fraud risk, in the historical similar attribute values;
acquiring a second number of historical similar attribute values in which the corresponding real risk is marked as having a fraud risk and the corresponding predicted risk is marked as having a fraud risk;
and determining a target risk identification threshold value in the candidate risk identification threshold values according to the first quantity and the second quantity.
Illustratively, assume that the historical similarity attribute values include: d10 D11, D12, D13, D14, D15, D16, D17, D18, D19, wherein the real risk corresponding to the historical similarity attribute values D10-D15 is marked as no fraud risk, the real risk corresponding to the historical similarity attribute values D16-D19 is marked as fraud risk, the generated random numbers are R1 and R2, and then R1 and R2 are used as candidate risk recognition thresholds.
For the candidate risk identification threshold value R1, each of the historical similar attribute values D10-D19 is compared with R1, and if the historical similar attribute values D10-D16 are all greater than R1 and the historical similar attribute values D17-D19 are all less than R1, the predicted risk corresponding to the historical similar attribute values D10-D16 is marked as no fraud risk, and the predicted risk corresponding to the historical similar attribute values D17-D19 is marked as fraud risk. Obviously, in the historical similar attribute values D10-D16, the predicted risk corresponding to D10-D15 is marked as no fraud risk, and the real risk corresponding to D10-D15 is marked as no fraud risk, then the historical similar attribute values D10-D15 are correctly classified through the candidate risk recognition threshold R1, and the first number is 6. In the historical similar attribute values D17-D19, the predicted risk corresponding to D17-D19 is marked as having a fraud risk, and the real risk corresponding to D17-D19 is marked as having a fraud risk, then the historical similar attribute values D17-D19 are correctly classified through the candidate risk recognition threshold R1, and the second number is 3. While the predicted risk corresponding to D16 is marked as no risk of fraud, but the real risk corresponding to D16 is marked as having a risk of fraud, then it is indicated that the historical similarity attribute value D16 is misclassified past the candidate risk identification threshold R1.
For the candidate risk identification threshold R2, each of the historical similar attribute values D10-D19 is compared with R2, respectively, and if the historical similar attribute values D11-D18 are both greater than R1 and the historical similar attribute values D10-D19 are both less than R1, the predicted risk corresponding to the historical similar attribute values D11-D18 is marked as having no risk of fraud and the predicted risk corresponding to the historical similar attribute values D10-D19 is marked as having risk of fraud. Obviously, in the historical similar attribute values D11-D18, the predicted risks corresponding to D11-D15 are marked as no fraud risk, and the real risks corresponding to D11-D15 are marked as no fraud risk, then the historical similar attribute values D11-D15 are correctly classified through the candidate risk recognition threshold R2, and the first number is 5. In the historical similar attribute values D10 and D19, the predicted risk corresponding to D19 is marked as having a fraud risk, and the real risk corresponding to D19 is marked as having a fraud risk, then it is indicated that the historical similar attribute value D19 is correctly classified by the candidate risk recognition threshold R2, and the second number is 1. While the predicted risk corresponding to D10 is marked as having a fraud risk, the real risk corresponding to D10 is marked as having no fraud risk, the predicted risk corresponding to D16-D18 is marked as having no fraud risk, and the real risk corresponding to a single D16-D18 is marked as having a fraud risk, then it is indicated that the historical similar attribute values D10, D16-D18 are misclassified through the candidate risk recognition threshold R2.
It can be seen that for the candidate risk identification threshold R1, the number correctly classified is the sum of the first number 6 and the second number 3; and for the candidate risk recognition threshold R2, the number correctly classified is the sum of the first number 5 and the second number 1, and the candidate risk recognition threshold R1 is determined as the target risk recognition threshold.
When the historical similar attribute value is predicted in a classified mode through the target risk identification threshold value, the historical similar attribute value can be predicted to be the largest in number of correct categories.
The identifying module 206 is configured to perform risk identification on the vehicle risk data to be tested according to the risk identification threshold value, so as to obtain a risk marking identification result.
The vehicle insurance data to be tested are vehicle insurance data needing to be subjected to risk auditing.
And obtaining a risk marking recognition result by calculating the similarity between the to-be-detected vehicle risk data and the historical standard vehicle risk data and comparing the similarity with a risk recognition threshold value.
In an optional embodiment, the performing risk identification on the vehicle risk data to be detected according to the risk identification threshold value, and obtaining a risk marking identification result includes:
determining the type of the vehicle to be detected in the vehicle risk data to be detected;
acquiring a plurality of target historical standard vehicle insurance data corresponding to the vehicle type of the vehicle to be tested;
Generating historical standard vehicle insurance data to be tested for the vehicle insurance data to be tested;
calculating the similarity to be detected between the historical standard vehicle insurance data to be detected and each target historical standard vehicle insurance data;
obtaining a to-be-detected similarity attribute value of the to-be-detected vehicle insurance data according to the average value of the to-be-detected similarities;
comparing the similarity attribute value to be detected with the risk identification threshold value to obtain a comparison result;
and obtaining the risk mark recognition result according to the comparison result.
Because different vehicle types correspond to different historical vehicle risk data, correspond to different historical standard vehicle risk data and correspond to different risk identification thresholds, when risk marking identification is carried out on the vehicle risk data to be tested, the vehicle types to be tested in the vehicle risk data to be tested are required to be acquired first, and therefore a plurality of target historical standard vehicle risk data are determined according to the vehicle types to be tested.
The electronic equipment acquires a standard maintenance item mapping table corresponding to the vehicle insurance data to be tested according to the vehicle type of the vehicle to be tested, so that after a plurality of maintenance items to be tested in the vehicle insurance data to be tested are extracted, the plurality of maintenance items to be tested are mapped according to the standard maintenance item mapping table to obtain a plurality of standard maintenance items to be tested, and finally the plurality of standard maintenance items to be tested are combined to obtain the standard vehicle insurance data to be tested.
According to the standard vehicle insurance data to be tested and each target historical standard vehicle insurance data in the plurality of target historical standard vehicle insurance data, one similarity to be tested can be calculated, the plurality of target historical standard vehicle insurance data correspond to the plurality of similarities to be tested, and the average value of the similarities to be tested is determined to be the similarity attribute value to be tested of the vehicle insurance data to be tested. The similarity between the standard vehicle insurance data to be tested and the target historical standard vehicle insurance data can be the cosine distance between the standard vehicle insurance data to be tested and the target historical standard vehicle insurance data.
The comparison result includes that the similarity attribute value to be detected is larger than the risk identification threshold value, and the similarity attribute value to be detected is smaller than the risk identification threshold value.
And when the comparison result is that the similarity attribute value to be detected is larger than the risk identification threshold value, the obtained risk mark identification result is a preset first risk mark identification result. And when the comparison result is that the similarity attribute value to be detected is smaller than the risk identification threshold value, the obtained risk mark identification result is a preset second risk mark identification result. The preset first risk mark identification result may be no fraud risk, and the preset second risk mark identification result may be fraud risk.
In a car insurance claim maintenance application, since multiparty participation (loss fighter/maintenance factory/claimant) is involved, multiple reports of maintenance items are easily caused, forming car insurance claim fraud. In the prior art, the fraud of the car insurance claim is mainly carried out by a mode of expert experience. However, this solution, besides relying on the understanding and experience of the developer on the business, also brings about significant development and maintenance costs, requiring the combination of manually extracting useful information from the massive data. According to the vehicle insurance claim fraud risk identification method, the risk identification threshold value of different vehicle types is adaptively learned by utilizing the plurality of historical vehicle insurance data corresponding to each vehicle type and is used as the judgment standard of the vehicle insurance claim fraud risk, the vehicle insurance claim fraud risk identification method is popularized to the vehicle insurance claim fraud risk identification scenes of different vehicle types, the rule combination corresponding to new vehicle types is not required to be customized and developed repeatedly, the labor cost brought in the development and maintenance process is reduced, and the vehicle insurance risk identification method is convenient to popularize on line.
Example III
An embodiment of the present invention provides a computer readable storage medium having a computer program stored thereon, where the computer program when executed by a processor implements the steps in the embodiment of the method for identifying risk of fraud for vehicle insurance claims, for example, S11-S15 shown in fig. 1:
S11, acquiring a plurality of historical vehicle insurance data corresponding to each vehicle type;
s12, processing the historical vehicle insurance data to obtain historical standard vehicle insurance data;
s13, calculating a historical similar attribute value of each historical standard vehicle insurance data;
s14, determining a risk identification threshold according to the real risk marks of each historical standard vehicle risk data and the historical similar attribute values;
and S15, performing risk identification on the vehicle risk data to be detected according to the risk identification threshold value to obtain a risk marking identification result.
Alternatively, the computer program, when executed by a processor, performs the functions of the modules/units in the above-described apparatus embodiments, e.g., modules 201-206 in fig. 2:
the acquiring module 201 is configured to acquire a plurality of historical vehicle risk data corresponding to each vehicle model;
the cleaning module 202 is configured to perform data cleaning on a plurality of historical vehicle risk data corresponding to each vehicle model, so as to obtain a plurality of cleaning vehicle risk data;
the mapping module 203 is configured to process the historical vehicle risk data to obtain historical standard vehicle risk data;
the calculating module 204 is configured to calculate a historical similar attribute value of each of the historical standard vehicle insurance data;
The determining module 205 is configured to determine a risk identification threshold according to the real risk flag of each of the historical standard vehicle risk data and the historical similar attribute value;
the identifying module 206 is configured to perform risk identification on the vehicle risk data to be tested according to the risk identification threshold value, so as to obtain a risk marking identification result.
Example IV
Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention. In the preferred embodiment of the invention, the electronic device 3 comprises a memory 31, at least one processor 32, at least one communication bus 33 and a transceiver 34.
It will be appreciated by those skilled in the art that the configuration of the electronic device shown in fig. 3 is not limiting of the embodiments of the present invention, and that either a bus-type configuration or a star-type configuration is possible, and that the electronic device 3 may also include more or less other hardware or software than that shown, or a different arrangement of components.
In some embodiments, the electronic device 3 is a device capable of automatically performing numerical calculation and/or data processing according to a preset or stored instruction, and its hardware includes, but is not limited to, a microprocessor, an application specific integrated circuit, a programmable gate array, a digital processor, an embedded device, and the like. The electronic device 3 may further include a client device, where the client device includes, but is not limited to, any electronic product that can interact with a client by way of a keyboard, a mouse, a remote control, a touch pad, or a voice control device, such as a personal computer, a tablet computer, a smart phone, a digital camera, etc.
The electronic device 3 is only an example, and other electronic products that may be present in the present invention or may be present in the future, such as those that may be adapted to the present invention, are also included in the scope of the present invention and are incorporated herein by reference.
In some embodiments, the memory 31 has stored therein a computer program that, when executed by the at least one processor 32, performs all or part of the steps in the vehicle insurance claim fraud risk identification method as described. The Memory 31 includes Read-Only Memory (ROM), programmable Read-Only Memory (PROM), erasable programmable Read-Only Memory (EPROM), one-time programmable Read-Only Memory (One-time Programmable Read-Only Memory, OTPROM), electrically erasable rewritable Read-Only Memory (EEPROM), compact disc Read-Only Memory (Compact Disc Read-Only Memory, CD-ROM) or other optical disc Memory, magnetic tape Memory, or any other medium that can be used for computer-readable carrying or storing data.
Further, the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created from the use of blockchain nodes, and the like.
The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of data for network transactions, for verifying the validity (anti-counterfeiting) of the data and for generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.
In some embodiments, the at least one processor 32 is a Control Unit (Control Unit) of the electronic device 3, connects the various components of the entire electronic device 3 using various interfaces and lines, and performs various functions of the electronic device 3 and processes data by running or executing programs or modules stored in the memory 31, and invoking data stored in the memory 31. For example, the at least one processor 32, when executing the computer program stored in the memory, implements all or part of the steps of the vehicle insurance claim fraud risk identification method described in embodiments of the present invention; or to implement all or part of the functionality of the vehicle insurance claim fraud risk identification means. The at least one processor 32 may be comprised of integrated circuits, such as a single packaged integrated circuit, or may be comprised of multiple integrated circuits packaged with the same or different functionality, including one or more central processing units (Central Processing unit, CPU), microprocessors, digital processing chips, graphics processors, combinations of various control chips, and the like.
In some embodiments, the at least one communication bus 33 is arranged to enable connected communication between the memory 31 and the at least one processor 32 or the like.
Although not shown, the electronic device 3 may further comprise a power source (such as a battery) for powering the various components, which may preferably be logically connected to the at least one processor 32 via a power management device, such that functions of managing charging, discharging, and power consumption are performed by the power management device. The power supply may also include one or more of any of a direct current or alternating current power supply, recharging device, power failure detection circuit, power converter or inverter, power status indicator, etc. The electronic device 3 may further include various sensors, bluetooth modules, wi-Fi modules, etc., which will not be described herein.
The integrated units implemented in the form of software functional modules described above may be stored in a computer readable storage medium. The software functional modules described above are stored in a storage medium and include instructions for causing a computer device (which may be a personal computer, an electronic device, or a network device, etc.) or a processor (processor) to perform portions of the methods described in the various embodiments of the invention.
In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be other manners of division when actually implemented.
The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional module in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units can be realized in a form of hardware or a form of hardware and a form of software functional modules.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it will be obvious that the term "comprising" does not exclude other elements or that the singular does not exclude a plurality. Several of the elements or devices recited in the specification may be embodied by one and the same item of software or hardware. The terms first, second, etc. are used to denote a name, but not any particular order.
Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention.

Claims (10)

1. A method of identifying risk of fraud in a vehicle insurance claim, the method comprising:
acquiring a plurality of historical vehicle insurance data corresponding to each vehicle type;
processing the historical vehicle insurance data to obtain historical standard vehicle insurance data;
calculating a historical similar attribute value of each historical standard vehicle insurance data;
determining a risk identification threshold according to the real risk mark of each historical standard vehicle risk data and the historical similar attribute value;
and carrying out risk identification on the vehicle risk data to be detected according to the risk identification threshold value to obtain a risk marking identification result.
2. The vehicle insurance claim fraud risk identification method of claim 1, wherein said calculating historical similarity attribute values for each of said historical standard vehicle insurance data includes:
extracting features of the historical standard vehicle insurance data to obtain vehicle insurance data features;
Calculating the similarity between every two vehicle insurance data features;
calculating a similarity mean value according to a plurality of similarities corresponding to each vehicle insurance data feature;
and determining the similarity mean value as a history similarity attribute value of the corresponding history standard vehicle insurance data.
3. The vehicle insurance claim fraud risk identification method of claim 2, wherein said determining a risk identification threshold based on said historical similarity attribute values and true risk signatures of each of said historical standard vehicle insurance data includes:
randomly generating a plurality of random numbers, and taking each random number as a candidate risk identification threshold;
for each candidate risk identification threshold, when the historical similar attribute value is larger than the candidate risk identification threshold, the obtained comparison result is that the predicted risk corresponding to the historical similar attribute value is marked as no fraud risk; when the historical similar attribute value is smaller than the candidate risk identification threshold value, the obtained comparison result is that the predicted risk corresponding to the historical similar attribute value is marked as a fraud risk;
acquiring a first number of historical similar attribute values, wherein the corresponding real risk is marked as no fraud risk, and the corresponding predicted risk is marked as no fraud risk, in the historical similar attribute values;
Acquiring a second number of historical similar attribute values in which the corresponding real risk is marked as having a fraud risk and the corresponding predicted risk is marked as having a fraud risk;
and determining a target risk identification threshold value in the candidate risk identification threshold values according to the first quantity and the second quantity.
4. The method for identifying risk of fraud in a vehicle insurance claim 3, wherein the performing risk identification on the vehicle insurance data to be tested according to the risk identification threshold value to obtain a risk marking identification result includes:
determining the type of the vehicle to be detected in the vehicle risk data to be detected;
acquiring a plurality of target historical standard vehicle insurance data corresponding to the vehicle type of the vehicle to be tested;
generating historical standard vehicle insurance data to be tested for the vehicle insurance data to be tested;
calculating the similarity to be detected between the historical standard vehicle insurance data to be detected and each target historical standard vehicle insurance data;
obtaining a to-be-detected similarity attribute value of the to-be-detected vehicle insurance data according to the average value of the to-be-detected similarities;
comparing the similarity attribute value to be detected with the risk identification threshold value to obtain a comparison result;
and obtaining the risk mark recognition result according to the comparison result.
5. The method of claim 1 to 4, wherein processing the historical vehicle insurance data to obtain historical standard vehicle insurance data comprises:
acquiring a standard maintenance project mapping table corresponding to the historical vehicle insurance data according to the vehicle type;
extracting a plurality of maintenance items in the historical vehicle insurance data;
mapping the plurality of maintenance items according to the standard maintenance item mapping table to obtain a plurality of standard maintenance items;
and combining the plurality of standard maintenance projects to obtain historical standard vehicle insurance data.
6. A vehicle insurance claim fraud risk identification method according to any of claims 1 to 4, characterised in that the method further comprises:
acquiring the number of data of the plurality of historical vehicle insurance data;
dividing the plurality of historical vehicle insurance data according to a preset data quantity coefficient and the number of data pieces to obtain a plurality of historical vehicle insurance data subsets;
and carrying out data cleaning on the historical vehicle insurance data in each historical vehicle insurance data subset to obtain the plurality of cleaning vehicle insurance data.
7. The method of claim 6, wherein the performing data cleansing on historical vehicle insurance data in each subset of historical vehicle insurance data to obtain the plurality of cleansing vehicle insurance data comprises:
Performing format conversion on the historical vehicle insurance data in each historical vehicle insurance data subset according to a preset data format to obtain historical vehicle insurance quality data;
detecting the data type of the historical vehicle insurance quality data;
determining a data cleaning rule corresponding to the data type;
and carrying out data cleaning on the corresponding historical vehicle insurance quality data according to the data cleaning rule to obtain the plurality of cleaning vehicle insurance data.
8. A vehicle insurance claim fraud risk identification device, the device comprising:
the acquisition module is used for acquiring a plurality of historical vehicle insurance data corresponding to each vehicle type;
the mapping module is used for processing the historical vehicle insurance data to obtain historical standard vehicle insurance data;
the calculation module is used for calculating the historical similar attribute value of each historical standard vehicle insurance data;
the determining module is used for determining a risk identification threshold according to the real risk mark of each historical standard vehicle risk data and the historical similar attribute value;
and the identification module is used for carrying out risk identification on the vehicle risk data to be detected according to the risk identification threshold value to obtain a risk mark identification result.
9. An electronic device comprising a processor and a memory, wherein the processor is configured to implement the vehicle insurance claim fraud risk identification method according to any of claims 1 to 7 when executing a computer program stored in the memory.
10. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements a vehicle insurance claim fraud risk identification method according to any of claims 1 to 7.
CN202310498384.3A 2023-05-05 2023-05-05 Vehicle insurance claim fraud risk identification method, device, electronic equipment and storage medium Pending CN116562894A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310498384.3A CN116562894A (en) 2023-05-05 2023-05-05 Vehicle insurance claim fraud risk identification method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310498384.3A CN116562894A (en) 2023-05-05 2023-05-05 Vehicle insurance claim fraud risk identification method, device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116562894A true CN116562894A (en) 2023-08-08

Family

ID=87489195

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310498384.3A Pending CN116562894A (en) 2023-05-05 2023-05-05 Vehicle insurance claim fraud risk identification method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116562894A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117273963A (en) * 2023-11-21 2023-12-22 之江实验室 Risk identification method and device based on car insurance scene

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117273963A (en) * 2023-11-21 2023-12-22 之江实验室 Risk identification method and device based on car insurance scene
CN117273963B (en) * 2023-11-21 2024-06-07 之江实验室 Risk identification method and device based on car insurance scene

Similar Documents

Publication Publication Date Title
CN113592019B (en) Fault detection method, device, equipment and medium based on multi-model fusion
CN112016905B (en) Information display method and device based on approval process, electronic equipment and medium
CN111652279A (en) Behavior evaluation method and device based on time sequence data and readable storage medium
CN115146865A (en) Task optimization method based on artificial intelligence and related equipment
CN111984898A (en) Label pushing method and device based on big data, electronic equipment and storage medium
CN112560465A (en) Method and device for monitoring batch abnormal events, electronic equipment and storage medium
CN116562894A (en) Vehicle insurance claim fraud risk identification method, device, electronic equipment and storage medium
CN114663223A (en) Credit risk assessment method, device and related equipment based on artificial intelligence
CN113486203A (en) Data processing method and device based on question-answering platform and related equipment
CN112541640A (en) Resource authority management method and device, electronic equipment and computer storage medium
CN112052310A (en) Information acquisition method, device, equipment and storage medium based on big data
CN114996386A (en) Business role identification method, device, equipment and storage medium
CN113657499B (en) Rights and interests distribution method and device based on feature selection, electronic equipment and medium
CN115222549A (en) Risk assessment processing method and device, computer equipment and storage medium
CN113487111A (en) Risk identification method and device based on artificial intelligence, electronic equipment and medium
CN115099339A (en) Fraud behavior identification method and device, electronic equipment and storage medium
CN114881313A (en) Behavior prediction method and device based on artificial intelligence and related equipment
CN113888265A (en) Product recommendation method, device, equipment and computer-readable storage medium
CN113326954A (en) Vehicle maintenance task scheduling method, device, equipment and storage medium
CN113221888A (en) License plate number management system testing method and device, electronic equipment and storage medium
CN113312409B (en) Task monitoring method and device, electronic equipment and computer readable storage medium
CN113688924B (en) Abnormal order detection method, device, equipment and medium
CN113961437A (en) Artificial intelligence based security monitoring method and device, electronic equipment and medium
CN113792681B (en) Information acquisition method and device based on point cloud matching, electronic equipment and medium
CN113723554B (en) Model scheduling method, device, electronic equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination