CN111079940B

CN111079940B - Decision tree model establishing method and using method for real-time fake-licensed car analysis

Info

Publication number: CN111079940B
Application number: CN201911196978.9A
Authority: CN
Inventors: 杨光; 贺珊; 张龙涛
Original assignee: Wuhan Fiberhome Digtal Technology Co Ltd
Current assignee: Wuhan Fiberhome Digtal Technology Co Ltd
Priority date: 2019-11-29
Filing date: 2019-11-29
Publication date: 2023-03-31
Anticipated expiration: 2039-11-29
Also published as: CN111079940A

Abstract

The invention provides a decision tree model establishing method for real-time fake-licensed vehicle analysis, which is applied to the technical field of fake-licensed vehicle real-time analysis model establishment, S11, a preparation step of training a data set and a verification data set, S12, establishment of a decision tree model and a decision tree model using method for real-time fake-licensed vehicle analysis. By applying the embodiment of the invention, the real-time vehicle passing data is analyzed through the established decision tree model, and the fake-licensed vehicle data meeting the conditions is pushed to alarm, so that the real-time analysis of the fake-licensed vehicle is realized.

Description

Decision tree model establishing method and using method for real-time fake-licensed car analysis

Technical Field

The invention relates to the technical field of vehicle fake-licensed analysis models, in particular to a decision tree model establishing method and a use method for real-time fake-licensed vehicle analysis.

Background

The fake-licensed vehicle refers to a real license plate, and fake license plates with the same number are sleeved on other vehicles, so that illegal vehicles are covered with legal coats on the surfaces, the fake-licensed vehicle belongs to illegal vehicles, the illegal vehicles are difficult to recognize by a traffic police in the driving process, and the illegal vehicles can be automatically analyzed only by means of technical means.

Currently, most of fake-licensed vehicle analyses are performed according to comparison analysis in the aspects of appearance time, appearance place, body color, license plate color, vehicle style and the like of vehicles with the same license plate number, sometimes even depending on vehicle registration information of a vehicle management department, and in the actual process, when a vehicle runs, license plate number acquisition, running time and running place of the vehicle are performed through a snapshot device arranged at each point (the place of each snapshot device is fixed, so that the corresponding longitude and latitude can be roughly obtained when the place where the vehicle is snapshot is in the monitoring range of the snapshot device), while vehicle information (such as vehicle brand, appearance parameters and the like) corresponding to the license plate number cannot be obtained, so that the existing analysis-based process is often limited by comparison data sources and cannot perform real-time analysis; for another example, the space-time point location model adopted for analyzing the fake-licensed vehicle also has the problem that false alarm of vehicle turning caused by the proximity of equipment point locations cannot be eliminated. The above problems also increase the difficulty of real-time fake-licensed car analysis and reduce the accuracy of fake-licensed car analysis from the side.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a decision tree model establishing method and a using method for real-time fake-licensed vehicle analysis, aiming at analyzing real-time vehicle passing data through the established decision tree model, pushing and alarming the fake-licensed vehicle data meeting the conditions, realizing the real-time analysis of the fake-licensed vehicles, and simultaneously realizing the filtration of the vehicle turning situation by comparing the real-time data with multiple time and space, reducing the false alarm probability caused by the fake-licensed vehicles and improving the analysis accuracy rate by the adopted decision tree model.

The invention is realized by the following steps:

the embodiment of the invention discloses a decision tree model building method for real-time fake-licensed car analysis, which comprises the following steps:

s11, preparing a training data set and a verification data set;

obtaining fake-licensed car data appearing in a historical database, obtaining non-fake-licensed car data in a time range corresponding to the appearing time according to the appearing time of the fake-licensed cars, and obtaining first five-dimensional vector data of the fake-licensed cars relevant to corresponding real cars based on the fake-licensed car data and the non-fake-licensed car data, wherein the first five-dimensional vector data corresponding to any one car license number comprises the following steps: license plate number, fake-licensed vehicle occurrence time, fake-licensed vehicle occurrence place, real vehicle occurrence time and real vehicle occurrence place; and acquiring real vehicle data appearing in the historical database, and acquiring second five-dimensional vector data consisting of a plurality of real vehicle data according to the time and place of each real vehicle, wherein the second five-dimensional vector data comprises: license plate number, first time when a real vehicle appears, first place when the real vehicle appears, second time when the real vehicle appears, and second place when the real vehicle appears;

obtaining three-dimensional vector data corresponding to each license plate number based on the first five-dimensional vector data and the second five-dimensional vector data, wherein the three-dimensional vector data comprises: license plate number, time difference of vehicle appearance, and distance of vehicle appearance;

taking the three-dimensional vector data as a sample of a training data set and a sample of a testing data set;

s12, constructing a decision tree model;

according to each license plate number in the three-dimensional data of the training data set, respectively taking the time difference of the real vehicle and the fake-licensed vehicle and the distance of the real vehicle and the fake-licensed vehicle as characteristics, and calculating the corresponding information gain;

according to the information gain corresponding to each license plate number, a root node and a leaf node are constructed to form a preliminary decision tree model;

and verifying and pruning the preliminary decision tree model according to the training data set to obtain the decision tree model.

In one implementation, the step of obtaining three-dimensional vector data corresponding to each license plate number based on the first five-dimensional vector data and the second five-dimensional vector data includes:

obtaining first three-dimensional data aiming at the license plate number based on the first five-dimensional vector data, wherein the first three-dimensional data comprises: license plate number, time difference between a real vehicle and a fake-licensed vehicle, and distance between the real vehicle and the fake-licensed vehicle; the second five-dimensional vector data obtain second three-dimensional data aiming at the license plate number, wherein the second three-dimensional data comprise: license plate number, time difference of real vehicle, distance of real vehicle;

combining the first three-dimensional data and the second three-dimensional data into three-dimensional vector data.

In one implementation, the formula used to calculate the information gain g (X, a) is expressed as:

g(X，A)＝H(X)-H(X|A)

wherein, the first and the second end of the pipe are connected with each other,

h (X) is the entropy of the random variable, H (X | A) is the conditional entropy of the characteristic A, n is the number of values of the characteristic A, and p is _i Is the probability distribution of the ith sample in the set; d represents a sample set of the respective features X, D _i Represents a feature X _i One subdivision of the inner K divisions, i.e. D _i Represents a feature X _i Sample set of (2), D _ik Representing the set of samples in feature Xi that divides k.

In one implementation, the step of constructing a root node and a leaf node according to an information gain corresponding to each license plate number to form a preliminary decision tree model includes:

selecting the characteristic with the maximum information gain as a root node and the other characteristics as leaf nodes according to the information gain corresponding to each license plate number;

acquiring a root node and a leaf node corresponding to each feature;

and constructing a preliminary decision tree model based on the acquired root nodes and leaf nodes.

In one implementation, the step of verifying and pruning the preliminary decision tree model according to the training data set to obtain the decision tree model includes:

verifying the preliminary decision tree model through a training data set;

and pruning according to the verification result and a preset formula to obtain the decision tree model.

In one implementation, the preset formula is specifically expressed as:

wherein Ap and Aq respectively represent p partition and q partition of the feature A, S represents a test data set, model represents a decision tree Model, and if the Model after pruning is a precision Model (A) _p S) and Model (A) for Model accuracy before pruning _q And the ratio of S) is more than or equal to 1, the division after pruning is effective.

In one implementation, the three-dimensional vector data is embodied as:

where Δ t = | t _i -t _j |，i，j∈[1，n]

Wherein p represents the license plate number, m1 and m2 respectively represent the unique identifiers of two sample data, ti is the appearance time of the license plate number p at m1, tj is the appearance time of the license plate number p at m2, Δ t represents the time difference of the two sample data, Δ d represents the distance corresponding to the two appearance times of the license plate number p,

where EARTH _ RADIUS represents the RADIUS of the EARTH, lat _i The longitude and latitude of the snapshot device corresponding to the time ti, lng _j Is at t _j And the longitude and latitude of the snapshot device corresponding to the time.

In addition, the invention also discloses a decision tree model using method for real-time fake-licensed car analysis, which comprises the following steps:

selecting the maximum division value of the time difference characteristics in the decision tree model as the length of a time window for acquiring real-time streaming data, acquiring the real-time streaming data by using spark streaming consumption Kafka, and dividing the real-time streaming data into RDD data sets with the maximum division value:

aggregating each RDD data set through license plate numbers, filtering data with consistent places, respectively calculating time difference values of each passing record of the same license plate, importing the time difference values and the snapshot equipment point location information of each record into a decision tree model as source data, and obtaining decision tree model analysis results, wherein the analysis results comprise license plate information meeting the conditions of the fake-licensed vehicles and the passing records of the license plate information.

The decision tree model is used, firstly, the maximum division value of the time difference characteristic is selected as the length of a time window for acquiring real-time streaming data, and the real-time streaming data is divided into RDD data sets with the maximum division value by consuming the streaming data acquired in real time: filtering data with the same place from each RDD data set through the license plate number, respectively calculating the time difference of each passing record of the same license plate, importing the time difference and the snapshot equipment point location information of each record into a decision tree model as source data, obtaining the place corresponding to the sample and the distance corresponding to the license plate number at any two times, and analyzing based on the time and the distance. In the embodiment of the invention, the training data containing the turning-around condition of the vehicle is used as the data of the non-fake-licensed vehicle when the constructed decision tree model is used, so that the condition that the vehicle is mistakenly reported as the fake-licensed vehicle due to turning-around of the vehicle in the analysis process is reduced, and the accuracy of analysis is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a schematic flow chart of a decision tree model building method for real-time fake-licensed vehicle analysis according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of an application flow of a method for using a decision tree model for real-time fake-licensed car analysis according to an embodiment of the present invention;

fig. 3 is a schematic diagram of another application of the decision tree model using method for real-time fake-licensed vehicle analysis according to the embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

Referring to fig. 1, an embodiment of the present invention provides a decision tree model building method for real-time fake-licensed vehicle analysis, where the method includes:

s11, preparing a training data set and a verification data set.

It can be understood that the training data set and the verification data set are samples obtained by integrating historical data, and the specific implementation steps include: firstly, obtaining fake-licensed car data appearing in a historical database, obtaining non-fake-licensed car data in a time range corresponding to the appearing time according to the appearing time of the fake-licensed car, and obtaining first five-dimensional vector data of the fake-licensed car associated with a corresponding real car based on the fake-licensed car data and the non-fake-licensed car data, wherein the first five-dimensional vector data corresponding to any number of the car comprises the following steps: license plate number, fake-licensed car occurrence time, fake-licensed car occurrence place, real car occurrence time and real car occurrence place; and acquiring real vehicle data appearing in the historical database, and acquiring second five-dimensional vector data consisting of a plurality of real vehicle data according to the appearance time and place of each real vehicle, wherein the second five-dimensional vector data comprises: license plate number, first time when a real vehicle appears, first place when the real vehicle appears, second time when the real vehicle appears, and second place when the real vehicle appears.

In specific implementation, the historical data of the fake-licensed vehicle can be randomly selected as the data of the fake-licensed vehicle, and the real license plate data in the same time range (for example, when the time for obtaining the fake-licensed vehicle is t1, a time range is set, for example, the time range is t1-t2, and t1+ t 3) is selected as the data of the non-fake-licensed vehicle.

It should be noted that, the positions of the capturing devices are fixed, for example, numbers of the capturing devices are set, and the capturing device of each number corresponds to a longitude and latitude, so that the monitoring range is also a fixed adjacent area, and the longitude and latitude of the vehicle appearing in this area can be approximately replaced by the longitude and latitude of the device, so that the appearance time and the appearance place of the fake-licensed vehicle can be obtained according to the capturing device corresponding to each fake-licensed vehicle. Therefore, the occurrence time and the occurrence place of the fake-licensed cars can be obtained according to the corresponding snapshot device of each fake-licensed car. For the fake-licensed vehicle, if the historical data of the real vehicle is obtained within the specified time range, the occurrence time and the occurrence place of the real vehicle can be correspondingly obtained, so that a five-dimensional vector consisting of the 5 data comprising the license plate number, the occurrence time of the fake-licensed vehicle, the occurrence place of the fake-licensed vehicle, the occurrence time of the real vehicle and the occurrence place of the real vehicle can be formed for the license plate number.

According to the above-mentioned method for acquiring the time and place of the vehicle, the historical real vehicle appears at different places within a time range, so that the place of the real vehicle corresponding to two different times can be acquired, and therefore, the license plate number of the real vehicle, the time (twice) of each appearance, and the place of each appearance (two places corresponding to the two times of the appearance) are acquired to form a five-dimensional vector consisting of the five data.

Obtaining three-dimensional vector data corresponding to each license plate number based on the first five-dimensional vector data and the second five-dimensional vector data, wherein the three-dimensional vector data comprises: license plate number, time difference of vehicle appearance, distance of vehicle appearance.

The method comprises the following specific steps: obtaining first three-dimensional data aiming at the license plate number based on the first five-dimensional vector data, wherein the first three-dimensional data comprises: license plate number, time difference between a real vehicle and a fake-licensed vehicle, and distance between the real vehicle and the fake-licensed vehicle; the second five-dimensional vector data obtain second three-dimensional data aiming at the license plate number, wherein the second three-dimensional data comprise: license plate number, time difference of real vehicle, distance of real vehicle; combining the first three-dimensional data and the second three-dimensional data into three-dimensional vector data.

It should be noted that, for any one five-dimensional vector in the first five-dimensional vector data or the second five-dimensional vector data, the two times of occurrence are subtracted to obtain the time difference between the two times of occurrence; and correspondingly, the distance between the two appearance positions is calculated, so that the distance corresponding to the two appearance time of the vehicle can be obtained. In this way, the distance between two addresses where the vehicle appears in one time difference can be obtained. It can be understood that if the time difference is a short range, the distance between two vehicles is far, the travel distance of the vehicle can be obtained by multiplying the speed by the time according to the travel speed of the vehicle (the formal speed is a range), and if the formal distance is far from the distance between two addresses where the vehicle appears, the vehicle is represented by two vehicles (one vehicle is a fake-licensed vehicle), so that the law and characteristic parameters of the appearance of the vehicle are used as training data and are training samples of decision tree learning. Similarly, for a real vehicle, the relationship between the distance between two addresses of the corresponding vehicle and the occurrence time difference is the relationship between time and displacement which can be reached by the form speed of the normal vehicle, so that the form characteristic of the real vehicle and the form characteristic of the fake-licensed vehicle can be learned by a decision tree through a large number of sample learning.

Illustratively, the three-dimensional vector data constitutes a training data set H, expressed for any one sample as:

Δt＝|t _i -t _j |，i，j∈[1，n]

wherein p represents a license plate, m1 and m2 respectively represent unique identifications of two pieces of original data, ti and tj represent the time of occurrence of a vehicle, Δ t represents the time difference (unit is second) of occurrence of two vehicles, and Δ d represents the spatial distance difference (unit is meter) corresponding to two times of occurrence of two vehicles.

Where EARTH _ RADIUS represents the RADIUS of the EARTH. 6371 km, lat and ng snapshot device latitude and longitude (or latitude and longitude corresponding to two vehicles).

And taking the three-dimensional vector data as a sample of a training data set and a sample of a testing data set.

And S12, constructing a decision tree model.

And according to each license plate number in the three-dimensional data of the training data set, respectively taking the time difference of the real vehicle and the fake-licensed vehicle and the distance of the real vehicle and the fake-licensed vehicle as characteristics, and calculating the corresponding information gain.

According to the analysis and principle of the fake-licensed vehicle, extracting time difference and distance from three-dimensional vector data in each training data set H as target characteristics, and constructing a decision tree by adopting a decision tree C.45 algorithm:

probability distribution of historical data in the training set:

P(X＝x _i )＝p _i ，i＝1，2，…，n

wherein p represents that the sample X is X _i Probability distribution case in the set, p _i Is the probability distribution of the ith sample in the set.

First, the entropy H (X) of the random variable is calculated

X is a feature (time difference or distance difference), X _i Is the ith sample, i is the first sample, where any sample t may also be represented.

Calculating the conditional entropy of the divided features A:

wherein n is the number of the characteristic A;

wherein D represents a sample set of the respective feature X, D _i Represents a feature X _i One subdivision of the inner K divisions, i.e. D _i Represents a feature X _i Sample set of (2), D _ik A sample set representing a partition k in the feature Xi;

the information gain of the feature a is obtained as:

g(X，A)＝H(X)-H(X|A)

the information gain ratio of feature a may also be calculated as:

for one feature, the feature with the largest information gain can be selected as a root node according to the calculation, and the other features are taken as leaf nodes.

And repeating the calculation process to obtain root nodes and leaf nodes corresponding to all the features and adding a decision tree model.

And (4) pruning a decision tree model, verifying through a reserved historical fake-licensed vehicle training data set, pruning the model according to the verification condition, and redefining and dividing.

Ap, aq represent the p-partition and q-partition of feature A, respectively, S represents the inspection dataset, model represents the decision tree Model, if the precision Model of the Model after pruning (A) is used _p S) and Model (A) of the Model before pruning _q And the ratio of S) is more than or equal to 1, the division after pruning is effective.

It should be noted that the C4.5 algorithm is a classical algorithm for generating a decision tree, and is an extension and optimization of the ID3 algorithm. The result of the C4.5 algorithm training is a classification model, which can be understood as a decision tree, the split attribute is a tree node, and the classification result is a tree node. Each node has a left sub-tree and a right sub-tree, and the node has no left and right sub-trees.

and aggregating each RDD data set through the license plate number, filtering out data with consistent places, respectively calculating the time difference of each passing record of the same license plate, and importing the time difference and the snapshot equipment point location information of each record into a decision tree model as source data to obtain the analysis result of the decision tree model, wherein the analysis result comprises the license plate information meeting the conditions of the fake-licensed vehicle and the passing record thereof.

As shown in fig. 2, after the training of the decision tree model is completed, in the actual sample analysis process, it is first determined whether the license plate numbers are the same, if so, time interval analysis is performed, and if not, the process is ended; the time interval may be divided into a plurality of time segments, for example, into 0-30, 30-60, 60-90, 90- + ∞, in a division example of 90 s; in the time period of 0-30, the distance of the vehicle is judged, the distance of 8km is further divided into 0-2k, 2k-4k, 4k-8k and 8k- + ∞, therefore, the time difference and the distance corresponding to any sample can be divided in the way, and finally, a definite judgment result is obtained. For example, license plate number AXXXXX, having the same license plate number and a time interval of 55s, and a distance of 1.5km, corresponds to a time interval of 60-90, a distance of 0-2k, and corresponds to an analysis result of yes, denoted as a fake-licensed car.

As shown in fig. 3, in the model after pre-pruning, leaf node yes represents that the fake-licensed vehicle is present, and no represents that the fake-licensed vehicle is not present, so that the feature division nodes are reduced in the model after pruning, and the efficiency is improved.

In a specific implementation mode, real-time vehicle passing data is analyzed according to the model through spark streaming. Selecting the maximum division value T seconds of the time difference characteristics in the decision tree model as the length of a time window for acquiring the real-time streaming data, acquiring the real-time streaming data by using spark streaming consumption Kafka, and dividing the real-time streaming data into RDD data sets with the length of T seconds: aggregating each RDD data set (in time sequence) through license plate numbers, filtering out data with consistent places, respectively calculating the time difference of each passing record of the same license plate, introducing the time difference and the point location information of the snapshot equipment of each record into a model as source data for analysis, pushing out the license plate information and the passing records thereof which accord with the conditions of the fake plate vehicles after analysis in batch by Kafka, and meanwhile persistently entering the license plate information and the passing records thereof into a database. Because training data containing the turning-around condition of the vehicle is used as non-fake-licensed vehicle data when a decision tree model is constructed, the condition that the vehicle is mistakenly reported as a fake-licensed vehicle due to turning-around of the vehicle in the analysis process is reduced, and the accuracy of analysis is improved. And the fake-licensed car alarm data in the platform consumption Kafka can be displayed to the user for user screening.

It should be noted that Kafka is an open source streaming platform developed by the Apache software foundation, written by Scala and Java. Kafka is a high-throughput distributed publish-subscribe messaging system that can handle all the action flow data in a consumer-scale website. This action (web browsing, searching and other user actions) is a key factor in many social functions on modern networks. These data are typically addressed by handling logs and log aggregations due to throughput requirements. This is a viable solution to the limitations of Hadoop-like log data and offline analysis systems, but which require real-time processing. The purpose of Kafka is to unify online and offline message processing through the Hadoop parallel load mechanism, and also to provide real-time messages through clustering.

Spark streaming is based on Spark streaming processing engine, and the basic principle is to split the data input in real time in units of time slices (second level), and then process each time slice data in a batch-like manner through Spark engine.

ElasticSearch is a Lucene-based search server. It provides a distributed multi-user capable full-text search engine based on RESTful web interface. The Elasticsearch was developed in the Java language and published as open source under the Apache licensing terms, a popular enterprise level search engine.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A method for building a decision tree model for real-time fake-licensed vehicle analysis, the method comprising:

s11, preparing a training data set and a verification data set;

obtaining fake-licensed car data appearing in a historical database, obtaining non-fake-licensed car data in a time range corresponding to the appearing time according to the appearing time of the fake-licensed cars, and obtaining first five-dimensional vector data of the fake-licensed cars relevant to corresponding real cars based on the fake-licensed car data and the non-fake-licensed car data, wherein the first five-dimensional vector data corresponding to any one car license number comprises the following steps: license plate number, fake-licensed vehicle occurrence time, fake-licensed vehicle occurrence place, real vehicle occurrence time and real vehicle occurrence place; and acquiring real vehicle data appearing in the historical database, and acquiring second five-dimensional vector data consisting of a plurality of real vehicle data according to the appearance time and place of each real vehicle, wherein the second five-dimensional vector data comprises: the license plate number, the first time when the real vehicle appears, the first place when the real vehicle appears, the second time when the real vehicle appears and the second place when the real vehicle appears;

obtaining three-dimensional vector data corresponding to each license plate number based on the first five-dimensional vector data and the second five-dimensional vector data, wherein the three-dimensional vector data comprises: license plate number, time difference of vehicle appearance, distance of vehicle appearance;

s12, constructing a decision tree model;

2. The method of claim 1, wherein the step of obtaining three-dimensional vector data corresponding to each license plate number based on the first and second five-dimensional vector data comprises:

obtaining first three-dimensional data aiming at the license plate number based on the first five-dimensional vector data, wherein the first three-dimensional data comprises: license plate number, time difference between a real vehicle and a fake-licensed vehicle, and distance between the real vehicle and the fake-licensed vehicle; and the second five-dimensional vector data is used for obtaining second three-dimensional data aiming at the license plate number, wherein the second three-dimensional data comprises: license plate number, time difference of real vehicle, distance of real vehicle;

3. The method for establishing a decision tree model for real-time fake-licensed vehicle analysis according to claim 1 or 2, wherein the formula for calculating the information gain g (X, a) is expressed as:

g(X，A)＝H(X)-H(X|A)

wherein the content of the first and second substances,

h (X) is the entropy of the random variable, H (X | A) is the conditional entropy of the characteristic A, n is the number of values of the characteristic A, and p is _i Is the probability distribution of the ith sample in the set; wherein D represents a sample set of the respective feature X, D _i Represents a feature X _i One subdivision of the inner K divisions, i.e. D _i Represents a feature X _i Sample set of (2), D _ik A sample set of partition k in feature Xi is represented.

4. The method as claimed in claim 3, wherein the step of constructing a preliminary decision tree model by constructing a root node and a leaf node according to the information gain corresponding to each license plate number comprises:

acquiring a root node and a leaf node corresponding to each feature;

5. The method for building a decision tree model for real-time fake-licensed vehicle analysis according to any one of claims 1-2 and 4, wherein the step of verifying and pruning the preliminary decision tree model according to the training data set to obtain the decision tree model comprises:

verifying the preliminary decision tree model through a training data set;

and pruning according to the verification result and a preset formula to obtain a decision tree model.

6. The method for building a decision tree model for real-time fake-licensed vehicle analysis of claim 5, wherein the predetermined formula is specifically expressed as:

wherein Ap and Aq respectively represent p partition and q partition of the characteristic A, S represents a test data set, model represents a decision tree Model, and if the Model after pruning is a precision Model (A) _p S) and Model (A) for Model accuracy before pruning _q And the ratio of S) is more than or equal to 1, the division after pruning is effective.

7. The method for building a decision tree model for real-time fake-licensed vehicle analysis according to claim 1, wherein the three-dimensional vector data is specifically expressed as:

where Δ t = | t _i -t _j |，i，j∈[1，n]

where EARTH _ RADIUS represents the RADIUS of the EARTH, lat _i Is at t _i Longitude and latitude, lng, of the capturing device corresponding to time _j Is at t _j And the longitude and latitude of the snapshot device corresponding to the time.

8. A method of using a decision tree model for real-time fake-licensed vehicle analysis, the method comprising: