CN113064976A

CN113064976A - Accident vehicle judgment method based on deep learning algorithm

Info

Publication number: CN113064976A
Application number: CN202110421420.7A
Authority: CN
Inventors: 孙涛; 张江波; 何嘉翔; 张果; 蔡鸿平; 张炳康
Original assignee: Shenzhen Mingrui Data Technology Co ltd
Current assignee: Shenzhen Mingrui Data Technology Co ltd
Priority date: 2021-04-19
Filing date: 2021-04-19
Publication date: 2021-07-02

Abstract

The invention provides an accident vehicle judgment method based on a deep learning algorithm, which is characterized by comprising the following steps: the method comprises a model training stage: importing sample data, wherein each sample data comprises a vehicle diagnosis report, a vehicle fault maintenance case and a maintenance file; constructing a model by using a random forest algorithm, substituting sample data for model training and evaluation, wherein the executed operation comprises the following steps: extracting fault information in a vehicle diagnosis report, mapping the fault information by combining a vehicle fault maintenance case to obtain frequently replaced parts corresponding to the fault information, and correspondingly converting the frequently replaced parts according to the relation between the parts and the major classes of the systems in the vehicle to obtain weight distribution of the major classes of the systems; finally, obtaining the weight distribution of each sample classification relative to the fault information; a probability analysis stage: and extracting fault information in the vehicle diagnosis report, substituting the fault information into the trained model, and outputting the probability or/and decision result of each sample classification after the model is analyzed.

Description

Accident vehicle judgment method based on deep learning algorithm

Technical Field

The invention is used for judging the probability of accident vehicles in the second-hand vehicle market, and particularly relates to an accident vehicle judging method based on a deep learning algorithm.

Background

With the support of national policy, the holding amount of domestic vehicles and the rapid increase of the crowd for changing purchases, the market transaction amount of the used vehicles is increasing day by day, and the quality and the riding safety of the vehicles are the most important to be concerned by the buyers of the used vehicles; the accident car is a thunder area which is not kept by all people, driving experience is influenced if the accident car is a thunder area, and life and property safety is endangered if the accident car is a thunder area. At present, accurate accident vehicle judgment is generated based on a maintenance file, but a strictly controlled maintenance record is difficult to obtain, and if the accident record does not enter the maintenance file or the original vehicle owner does not agree to open the maintenance file, the judgment cannot be carried out. In addition, there are accident vehicle determination situations caused by other non-maintenance files, and such determination has a large determination error due to technical means. These all do not benefit the purchaser to judge the vehicle condition, so effective accident vehicle judging means is very important, is the problem that the development of the second-hand vehicle market must solve.

Disclosure of Invention

The invention provides an accident vehicle judgment method based on a deep learning algorithm, which adopts the following specific technical means:

an accident vehicle judgment method based on a deep learning algorithm is characterized by comprising the following steps: the method comprises a model training stage and a probability analysis stage;

the following steps are executed in the model training stage:

importing sample data, wherein each sample data comprises a vehicle diagnosis report, a vehicle fault maintenance case and a maintenance file;

constructing a model by a random forest algorithm, substituting sample data for model training and evaluation, correcting fault information and part information by a maintenance file of the sample data for model training, and substituting for training, wherein the executed operation comprises the following steps:

extracting fault information in a vehicle diagnosis report, mapping the fault information by combining a vehicle fault maintenance case to obtain frequently replaced parts corresponding to the fault information, and correspondingly converting the frequently replaced parts according to the relation between the parts and the major classes of the systems in the vehicle to obtain weight distribution of the major classes of the systems;

setting sample classification according to accident content or/and accident grade, and dividing the weight of the fault information in each system category into characteristic values;

finally, obtaining the weight distribution of each sample classification relative to the fault information;

the following steps are performed in the probability analysis stage:

and extracting fault information in the vehicle diagnosis report, substituting the fault information into the trained model, and outputting the probability of each sample classification or/and a decision result after the model is analyzed, wherein the decision result is the sample classification with the highest probability.

In one or more embodiments of the present invention, in the model training stage, a keyword group of a specific phenomenon or a specific combination is first extracted from a vehicle fault maintenance case, the keyword group is converted into a weight score corresponding to a system class, and the weight score of the system class under the sample class is converted according to the relationship between the sample class and the keyword group.

In one or more embodiments of the invention, the sample classifications include non-accident, light accident, medium accident, major accident, flooding, burning.

In one or more embodiments of the present invention, the fault information includes a fault code, a fault description, or a freeze frame.

In one or more embodiments of the present invention, sample data is expressed as 7: and 3, dividing the test result into a training set and a testing set.

In one or more embodiments of the present invention, noise data in the sample data is removed according to a predetermined rule in the model training stage. The preset rule is as follows: and sequencing the vehicle diagnosis reports in all the sample data and grouping the vehicle diagnosis reports according to a certain time interval, wherein the fault code which has the highest occurrence frequency and the earliest time in the same group is the representative of the group.

The invention has the beneficial effects that: the model is trained through a vehicle diagnosis report, a vehicle fault maintenance case and a maintenance archive, the random forest algorithm model is used in accident vehicle probability prediction for the first time, the classification accident probability is directly output from the diagnosis report by the final application model, and the condition of the vehicle is known to a vehicle buyer by giving visual and contrastive data display, so that the decision of whether the vehicle is purchased is made, the management of a second-hand vehicle market is standardized, the transparency of vehicle condition information is improved, and the method has better technical performance and practicability.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

Detailed Description

The scheme of the present application is further described below with reference to fig. 1:

an accident vehicle judgment method based on a deep learning algorithm comprises a model training stage and a probability analysis stage;

the following steps are executed in the model training stage:

rejecting noise data in the sample data according to a preset rule, wherein the preset rule is as follows: sorting the vehicle diagnosis reports in all sample data, and grouping a plurality of vehicle diagnosis reports within every 30 minutes, wherein the fault code which has the most occurrence times and the earliest time in the same group is the representative of the group, and the representative refers to the generation time of the diagnosis report which is the effective report and the earliest time; noise data generated by machine misdetection is eliminated, and the influence of noise is reduced;

constructing a model by using a random forest algorithm, extracting fault information such as fault codes, fault descriptions or freeze frames in a vehicle diagnosis report by using the model, mapping the fault information by combining a vehicle fault maintenance case to obtain frequently replaced parts corresponding to the fault information, and correspondingly converting the frequently replaced parts according to the relationship between the parts and the major classes of the systems in the vehicle to obtain the weight distribution of the major classes of the systems;

setting sample classification according to accident content or/and accident grade, and dividing the weight of the fault information in each system category into characteristic values; in this embodiment, the sample classification includes non-accident, minor accident, moderate accident, major accident, flooding, burning, and the like, wherein the classified keywords are extracted from the maintenance archive, the maintenance archive classifies the maintenance types, a large number of part sets with large replacement frequency of different types of the maintenance archive are synthesized, and then the extracted keywords are set as a keyword group; substituting sample data for model training and evaluation, correcting fault information and part information by a maintenance file of the sample data for the model training, and substituting for training;

specifically, the positive and negative samples 1: sample data for 1 is as follows: and 3, dividing the model into a training set and a test set, wherein the training set is used for model training, and the test set is used for model test evaluation. The positive and negative samples 1: 1, namely the total number of sample data is 100, and the total number of the sample data is 25 if the total number of the sample data is four types; and the ratio of training set to test set is 7: 3, the data amount of the training set accounts for 70% of the sample data, and the rest 30% of the training set is a test set;

firstly, extracting a key phrase of a specific phenomenon or a specific combination from a vehicle fault maintenance case, converting the key phrase into a weight score corresponding to a system major category, and converting the weight score of the system major category under the sample category according to the relation between the sample category and the key phrase; for example, when a large number of electronic devices such as a sound system, an air conditioner, an ECU, a car light and the like are short-circuited, the corresponding electronic devices need to be replaced, and the probability is that the car is flooded with water, so that keywords such as the sound system, the air conditioner, the ECU, the car light, the electronics, the short circuit and the like form a specific phrase of the car to be flooded with water, and 80% of parts of the fault codes which need to be replaced are electrical systems or electronic systems, so that the two systems are weighted more than three systems, namely a power system, a chassis system and a car body system, in the judgment of the car to be flooded with water;

the following steps are performed in the probability analysis stage:

and extracting fault information in the vehicle diagnosis report, substituting the fault information into the trained model, and outputting the classification probability of each sample after the model is analyzed. The scheme preferably outputs the decision-making intermediate result, namely the probability of classification of each sample after election, so that visual and contrastive data display is given.

The invention adopts the scheme of finally predicting each classification probability by directly using the model through inputting the diagnosis report information, because the random forest algorithm is a supervised learning algorithm, all the weights mentioned in the method are calculated by the algorithm according to the training set and are not artificially settable or changeable, so the random forest algorithm is not interfered by human factors; the accident probability is calculated by an internal mechanism of the model, the internal mechanism is that all decision trees carry out qualitative voting according to the characteristic attributes of the objects, and then the voting proportion is calculated globally to obtain the probability; therefore, the limited maintenance file can be used, and the problem that the judgment cannot be carried out when the accident record does not enter the maintenance file or the original vehicle owner agrees to open the maintenance file can be avoided while the accurate judgment is realized;

the invention adopts a deep learning algorithm to train the random forest by self through actual data to obtain a final model, and has the following advantages:

1) due to the adoption of an integrated algorithm, the accuracy of the method is better than that of most of single algorithms; the method is simple to realize, high in precision and strong in overfitting resistance;

2) the test set has good performance, and due to the introduction of two randomness properties (sample randomness and characteristic randomness), the random forest is not easy to fall into overfitting; the random forest has certain anti-noise capability;

3) due to the combination of trees, the random forest can process nonlinear data, belongs to a nonlinear classification (fitting) model, can process data with high dimensionality (many features), does not need to make feature selection, and has strong adaptability to a data set: the method can process discrete data and continuous data, and a data set does not need to be normalized; because each tree can be independently and simultaneously generated, a parallelization scheme is easy to make;

4) the training speed is high, and the method can be applied to large-scale data sets; and default values (as a class alone) can be processed without additional processing;

5) due to the out-of-bag data (OOB), unbiased estimation of real errors can be obtained in the model generation process, and the training data volume is not lost;

6) in the training process, the mutual influence among the features can be detected, the importance of the features can be obtained, and certain reference significance is achieved.

The above preferred embodiments should be considered as examples of the embodiments of the present application, and technical deductions, substitutions, improvements and the like similar to, similar to or based on the embodiments of the present application should be considered as the protection scope of the present patent.

Claims

1. An accident vehicle judgment method based on a deep learning algorithm is characterized by comprising the following steps: the method comprises a model training stage and a probability analysis stage;

the following steps are executed in the model training stage:

the following steps are performed in the probability analysis stage:

2. The emergency vehicle determination method based on the deep learning algorithm according to claim 1, wherein: in the model training stage, firstly, a key phrase of a specific phenomenon or a specific combination is extracted from a vehicle fault maintenance case, the key phrase is converted into a weight score corresponding to a system class, and the weight score of the system class under the sample classification is converted according to the relation between the sample classification and the key phrase.

3. The emergency vehicle determination method based on the deep learning algorithm according to claim 1, wherein: the sample classification comprises non-accident, light accident, moderate accident, major accident, water logging and fire burning.

4. The emergency vehicle determination method based on the deep learning algorithm according to claim 1, wherein: the fault information includes a fault code, a fault description, or a freeze frame.

5. The emergency vehicle determination method based on the deep learning algorithm according to claim 1, wherein: and (4) the sample data is processed according to the following steps of 7: and 3, dividing the test result into a training set and a testing set.

6. The emergency vehicle determination method based on the deep learning algorithm according to any one of claims 1 to 5, wherein: and eliminating noise data in the sample data according to a preset rule in the model training stage.

7. The emergency vehicle determination method based on the deep learning algorithm according to claim 6, wherein: the preset rule is that vehicle diagnosis reports in all sample data are sorted and grouped according to a certain time interval, and the fault code which has the largest occurrence frequency and the earliest generation time in the same group is the representative of the group.

8. The emergency vehicle determination method based on the deep learning algorithm according to claim 7, wherein: multiple vehicle diagnostic reports within every N minutes are grouped.