CN110992690A

CN110992690A - False data detection method based on space-time outliers in Internet of vehicles

Info

Publication number: CN110992690A
Application number: CN201911200558.3A
Authority: CN
Inventors: 刘伎昭; 王佩雪; 张俊宝; 杨昌霖
Original assignee: Zhongyuan University of Technology
Current assignee: Zhongyuan University of Technology
Priority date: 2019-11-29
Filing date: 2019-11-29
Publication date: 2020-04-10
Anticipated expiration: 2039-11-29
Also published as: CN110992690B

Abstract

The invention provides a false data detection method based on space-time outliers in a vehicle networking, which comprises the following steps: the method comprises the steps of collecting data reported by vehicles by using RSUs deployed on roadsides in the Internet of vehicles, grouping the data according to time and space attributes, quantifying the similarity degree of the data in each group, and detecting an abnormal data group with significant data divergence based on a Bayesian method. The method comprises the steps of defining a highly relevant area of traffic data in a road space as a space-time neighborhood, establishing the space-time neighborhood based on historical data, and detecting false data which obviously deviates from the whole body by using an outlier detection algorithm. The false data is preliminarily judged by utilizing the time-space correlation characteristics of traffic data in the real traffic environment, and the deviation degree of each data in the time-space neighborhood and the whole is quantized by combining a time-space outlier detection algorithm to find out the false data, so that the road traffic condition can be correctly and effectively evaluated, the traffic management behavior capability is improved, and thus, abundant and diverse upper-layer applications such as traffic safety, intelligent navigation and the like are realized.

Description

False data detection method based on space-time outliers in Internet of vehicles

Technical Field

The invention relates to the technical field of Internet of vehicles safety, in particular to a false data detection method based on space-time outliers in the Internet of vehicles.

Background

The internet of vehicles enables the vehicles to sense the motion states (position, speed, direction, instantaneous acceleration and the like) of the vehicles and surrounding vehicles and road environment information (weather, road conditions and the like) during the driving process by installing various sensors (GPS, radar, acceleration sensor and the like) on the vehicles. The wireless communication between vehicles, between vehicle-RoadSide infrastructures or between vehicle-background management centers is realized by using a special Short-distance communication technology (DSRC) and a 5G technology, and the information is shared to surrounding vehicles or is provided to the background traffic management center after being subjected to data aggregation through a pre-deployed RoadSide infrastructure (RSU). Based on the information, the road traffic condition can be effectively evaluated, the vehicle travel time can be calculated, or abnormal events such as traffic jam and traffic accidents can be sensed, so that abundant and diverse upper-layer applications such as traffic safety and intelligent navigation can be realized.

The network architecture of the Internet of vehicles has the characteristics of distribution and self-organization, vehicles have equal positions, and any vehicle can send traffic data. Vehicle nodes may provide inaccurate or even erroneous data due to malicious attacks, user selfishness, sensor failures, etc. Such as: in order to reach the destination faster, the selfish vehicle driver sends the fake high vehicle density and low driving speed, the false image of traffic jam is made, and the surrounding vehicles are induced to drive out of the current road, so that the aim of clearing the road is fulfilled. Vehicles may be misled by wrong data to make wrong reactions, such as sudden braking, lane change, etc., which not only interfere with normal traffic order, but may even cause traffic accidents in extreme cases. The background management center can also make wrong judgment on the traffic condition, and make wrong traffic management behaviors, so that traffic disorder is caused.

Disclosure of Invention

Aiming at the defects in the background technology, the invention provides a false data detection method based on space-time outliers in the Internet of vehicles, and solves the technical problem of traffic confusion caused by injecting false traffic data into an internal enemy in the existing Internet of vehicles.

The technical scheme of the invention is realized as follows:

a false data detection method based on space-time outliers in the Internet of vehicles comprises the following steps:

s1, data collection and grouping: the RSU deployed on the roadside in the Internet of vehicles is responsible for collecting data reported by vehicle nodes in a coverage area, and grouping the data according to the time and space attributes of the data to obtain a data group;

s2, data abnormity detection: calculating a data consistency factor C of the data group by using the RSU, judging whether the value of the data consistency factor C is obviously changed on a time axis based on a Bayesian decision and hypothesis test method, if so, marking the data group as an abnormal data group, and executing the step S3, otherwise, returning to the step S1;

s3, establishing a space-time neighborhood: constructing a space-time field of the abnormal data set by using the associated entropy;

s4, space-time outlier detection: and detecting outliers in the space-time neighborhood of the abnormal data set by using an outlier detection algorithm, and determining the false data injected by the attacker.

The method for detecting data abnormality in step S2 includes:

s21, the calculation method of the data consistency factor C of the data group is as follows:

wherein G is_s(t)＝{d₁,d₂,…d_NThe data group of the traffic data collected by a group of N different vehicles in the road section s and the time slot T, where T is 1,2, … T, i, j is 1,2, …, N;

s22, establishing a hypothesis model according to a hypothesis testing principle:

wherein, theta^T＝{C₁,…C_TIs the consistency factor vector, H, of the current data set at time slots 1,2, … T₀Is represented by C_tDoes not significantly change in value of (A), H₁Is represented by C_tAt time slot t₀A significant change occurs;

and S23, determining the probability that the value of the data consistency factor C is changed significantly in time t by using posterior probability according to the Bayes principle:

wherein q is ∈ {1, 2}, P (H)_q|Θ^T) As a likelihood function, P (H)_q) Is a prior probability;

s24, judging the consistency factor C of the data at t-t according to the likelihood formula₀A significant change occurred:

s25, data group G_s(t₀) And (4) regarding the data as an abnormal data set, constructing a space-time neighborhood, and determining the false data injected by an attacker.

The space-time neighborhood comprises a direct time neighborhood, a direct space neighborhood and an indirect space-time neighborhood; the direct temporal neighborhood: data group G comprising same position and adjacent time slot collection_s(t-1)、G_s(t + 1); the direct spatial neighborhood: including data G of adjacent segments in the same time slot_s-1(t)、G_s+1(t); the indirect space-time neighborhood: including data sets that are both indirectly adjacent in time and space.

The method for constructing the space-time domain of the abnormal data set by using the associated entropy in the step S3 includes:

s31, making X be a random variable, representing traffic data collected by vehicles, and discretizing the X value into Q small subintervals;

s32, data group G_s(t)＝(d₁,d₂,…d_N) Data reported for N vehicles d₁,d₂,…d_NWherein d is_i∈[0,Q-1](ii) a Among the N' velocity values, the number of velocity values falling in the j-th subinterval is s_jWherein j is more than or equal to 0 and less than or equal to Q-1, and the entropy of the random variable X is as follows:

s33, order X_G1As a data group G₁(t) random variable, X_G2As a data group G₂(t) random variable, then data set G₁(t) and G₂The associated entropy of (t) is:

wherein the content of the first and second substances,

to represent

And is

Probability of (2), then random variable

And

the mutual information is:

s34, use data group G₁(t) and G₂Redundant information R (G) between (t)₁,G₂) Quantized data set G₁(t) and G₂Correlation between (t):

s35, presence of S-l^-Or s + l⁺Satisfy the requirement of

Or

But do not

Or

Will l^-Or l⁺Definition data group G_s(t) a spatio-temporal neighborhood boundary; g_s(t) a spatial neighborhood of

Wherein l^-And l⁺Are all positive integers;

s36, the time neighborhood of a data set comprises data sets collected in the same road section but different time slots if t-l exists^-Or t + l⁺Satisfy R (G)_s(t),G_s(t-l^-) Is not less than epsilon or R (G)_s(t),G_s(t+l⁺) Is not less than epsilon, but R (G)_s(t),G_s(t-l^--1))<ε or R (G)_s(t),G_s(t+l⁺+1))<ε, title l^-Or l⁺Is the spatio-temporal neighborhood boundary.

The method for detecting outliers in the spatio-temporal neighborhood of the abnormal data set by using the outlier detection algorithm in step S4 includes:

s41, forming space-time neighborhood of an abnormal data set by M data items, and recording as M ═ o₁,o₂,..p_mCalculate the mean of the data in M:

s42, calculating the standard deviation of the data in M as

Wherein the content of the first and second substances,

s43, checking whether each data item o is in the interval [ A-k ]₀·σ,A+k₀·σ]Wherein k is₀>1 is a predefined constant parameter; data items falling outside this interval are considered outliers, are considered spurious data injected by adversaries, and data providers are considered attackers.

The beneficial effect that this technical scheme can produce: the false data is preliminarily judged by utilizing the time-space correlation characteristics of traffic data in the real traffic environment, and the deviation degree of each data in the time-space neighborhood and the whole is quantized by combining a time-space outlier detection algorithm to find out the false data, so that the road traffic condition can be correctly and effectively evaluated, the traffic management behavior capability is improved, and thus, abundant and diverse upper-layer applications such as traffic safety, intelligent navigation and the like are realized.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a map useful in embodiments of the present invention;

FIG. 3 is a graph of the detection rate of the present invention and comparison algorithm at different vehicle densities;

FIG. 4 is a graph of the detection rate of the present invention and comparison algorithm at different aggressor ratios;

FIG. 5 shows the false detection rate of the present invention and the comparison algorithm at different aggressor ratios.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.

The present invention uses discretized time and road space. The time is divided into several time slots of equal length, denoted as t 1,2, …. The road is divided into several road segments, denoted as s1, 2, …. The vehicle utilizes vehicle-mounted sensors to collect traffic data such as vehicle density, speed, flow and the like in the driving process, and shares the data through a DSRC (Dedicated Short-Range Communications) wireless communication technology so as to realize multi-type upper-layer application such as traffic safety, traffic management and the like. The general format of the message is o ═ t, (lon, lat), d, vid }, where t is the time of data acquisition, lon and lat represent the location (latitude and longitude) of data acquisition, d is the traffic data acquired by the vehicle, and vid is the vehicle id identifier. It is assumed that rsus (roadside units) are deployed in advance along the road and are responsible for collecting data reported by vehicles, the detection method provided by the invention is executed, the authenticity of the data reported by the vehicles is evaluated, and the false data injected by an attacker is detected.

As shown in fig. 1, an embodiment of the present invention provides a method for detecting false data based on spatio-temporal outliers in an internet of vehicles, which includes the following specific steps:

s1, data collection and grouping: the RSU deployed on the roadside in the Internet of vehicles is responsible for collecting data reported by vehicle nodes in a coverage area, and grouping the data according to the time and space attributes of the data to obtain a data group; taking the section s as an example, the RSU groups the received data into: g_s(1),G_s(2),…G_s(T) represents data collected in a series of time slots 1,2, … T in the same road segment s.

the data anomaly detection method comprises the following steps:

s21, defining the data consistency as the data similarity degree in the data consistency factor quantization group, wherein the calculation method of the data consistency factor C of the data group comprises the following steps:

wherein, theta^T＝{C₁,…C_TH is a vector of consistency factors for the current data set at time slots 1,2, … T, H₀For the original hypothesis, represent C_tDoes not significantly change in value of (A), H₁For alternative assumptions, denote C_tAt time slot t₀A significant change occurs;

s25, data group G_s(t₀) Is treated as an anomalous data set and a further detection process is performed to determine the spurious data injected by the attacker.

S3, establishing a space-time neighborhood: constructing a space-time field of the abnormal data set by using the associated entropy; data sets with data anomalies can be found using step S2, but it cannot be determined which data within the set was injected by an attacker. The invention utilizes historical traffic data to construct the space-time field of an abnormal data set, compares each data item with data in a space-time neighborhood, and discovers significantly deviated data items by using a space-time outlier algorithm. In a road scene, traffic data between two adjacent positions generally has strong space-time relevance, but the relevance degree is influenced by various factors such as geographical positions, traffic rules and the like. For example, in an expressway scene, two adjacent road sections in the front and the back present strong relevance, while in an urban scene, the two road sections in the front and the back divided by the intersection are close in geographic position, but are influenced by traffic lights, so that the vehicle motion relevance is weak. Spatio-temporal neighborhoods are used to represent geographical areas in road space where traffic data is highly correlated. In particular, data group G_s(t) data comprising time slot s and road segment t, wherein the time-space neighborhood comprises a direct time neighborhood, a direct space neighborhood and an indirect time-space neighborhood; the direct temporal neighborhood: data group G comprising same position and adjacent time slot collection_s(t-1)、G_s(t + 1); the direct spatial neighborhood: including data G of adjacent segments in the same time slot_s-1(t)、G_s+1(t); the indirect space-time neighborhood: including data sets that are both indirectly adjacent in time and space.

The method for constructing the space-time field of the abnormal data set by utilizing the associated entropy comprises the following steps:

s32, data group G_s(t)＝(d₁,d₂,…d_N) Data reported for N vehicles d₁,d₂,…d_N，d_i∈[0,Q-1](ii) a Among the N' velocity values, the number of velocity values falling in the j-th subinterval is s_jWherein j is more than or equal to 0 and less than or equal to Q-1, and the entropy of the random variable X is as follows:

wherein the content of the first and second substances,

to represent

And is

Probability of (2), then random variable

And

the mutual information is:

s35, presence of S-l^-Or s + l⁺Satisfy the requirement of

Or

But do not

Or

Wherein l^-And l⁺Are all positive integers, are^-Or l⁺Definition data group G_s(t) a spatio-temporal neighborhood boundary; g_s(t) a spatial neighborhood of

The method for detecting outliers in the spatio-temporal neighborhood of the abnormal data set by using the outlier detection algorithm comprises the following steps:

s41, forming space-time neighborhood of an abnormal data set by M data items, and recording as M ═ o₁,o₂,..o_mCalculate the mean of the data in M:

s42, calculating the standard deviation of the data in M as

Wherein the content of the first and second substances,

The effectiveness of the method is verified based on a simulation method, and compared with the existing method for analysis, the technical advantages of the method are evaluated.

(1) Simulation method

In the simulation experiment, traffic scenes with high reality are generated by using traffic simulation software SUMO, an electronic map provided by an OpenStreetMap project is used, and the size of a simulation area is 2000 m × 2000 m, as shown in FIG. 2. All roads are bidirectional, 3 lanes in each direction, and the position and direction of the vehicle entering the simulation area are randomly selected. The average density of the vehicles in the simulation area is 75 vehicles/kilometer, and the real-time running speed is reported to the RSU during the running process of the vehicles. The simulation was run 100 times and all simulation data were the average of 100 runs.

In order to verify the accuracy of the detection method, a collusion attack scene is set, and a false image of traffic jam is made by sending a forged low vehicle speed on the assumption that a plurality of collusion attackers exist. The attack position and time are randomly selected, the forged vehicle speed is randomly selected between [0,10] km/h, the vehicle speed is transmitted once in one time slot by each vehicle, and the proportion of an attacker in all vehicles is recorded as gamma.

The performance of the detection method was evaluated using the following metric. The Detection Rate (DR) is defined as the proportion of all the false data that is correctly detected, and the False Positive Rate (FPR) is defined as the probability of data being incorrectly detected, including the false data being detected as true data and the false data being detected as true data.

(2) Results and analysis

The influence of the vehicle density and the proportion of the attackers on the algorithm performance is examined. The simulation results are given in fig. 3 to 5. FIG. 3 shows the performance comparison evaluation of the method of the present invention and the LEAVE (local evaluation of attemperers and rating evaluators) protocol under different vehicle densities. The detection rate of the method is higher than LEAVE under all vehicle densities, and when the vehicle density is 80 vehicles/kilometer, the detection rate of the method is 98.3%, and the detection rate of a LEAVE protocol is 91.5%. In addition, the detection rate of the method is lower than that of a high-density vehicle scene when the vehicle density is low, and the detection rate is 92.0% when the vehicle density is 20 vehicles/kilometer. The detection rate was 99.9% when the vehicle density was 200 vehicles/km. The distance between the vehicles is large when the vehicle density is low, the space limitation between the vehicles is not obvious, the vehicles can run at an ideal speed, and the running speeds of the vehicles are different greatly due to the fact that different types of vehicles have different ideal speeds, so that data collected by the vehicles present large variance, and the difficulty is increased for correct identification.

Fig. 4 and 5 evaluate the collusion attack resistance of the inventive method and the LEAVE protocol. Fig. 3 shows the detection rate for different aggressor scales. From the figure, it can be seen that the detection rate of the method of the present invention at low aggressor ratio is similar to LEAVE. However, the detection rate of the LEAVE protocol rapidly decreases when the aggressor proportion increases, and 65.5% and 40% when the aggressor proportion reaches 30% and 40%, respectively. The method of the invention shows stronger collusion attack resistance. When the proportion of attackers reaches 25%, the detection rate of 96.5% can still be achieved. Meanwhile, the false detection rate is also obviously lower than LEAVE, and is 5.9% under the condition of 25% of attacker proportion.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A false data detection method based on space-time outliers in the Internet of vehicles is characterized by comprising the following steps:

2. The method for detecting false data based on spatio-temporal outliers in car networking according to claim 1, wherein the method for detecting data anomaly in step S2 is:

3. The method for detecting false data based on spatio-temporal outliers in the internet of vehicles according to claim 1, wherein the spatio-temporal neighborhood comprises a direct temporal neighborhood, a direct spatial neighborhood and an indirect spatio-temporal neighborhood; the direct temporal neighborhood: data group G comprising same position and adjacent time slot collection_s(t-1)、G_s(t + 1); the direct spatial neighborhood: including data G of adjacent segments in the same time slot_s-1(t)、G_s+1(t); the indirect space-time neighborhood: including data sets that are both indirectly adjacent in time and space.

4. The method for detecting the spatiotemporal outlier-based false data in the internet of vehicles according to claim 1, wherein the method for constructing the spatiotemporal domain of the abnormal data set by using the associated entropy in the step S3 is as follows:

wherein the content of the first and second substances,

to represent

And is

Probability of (2), then random variable

And

the mutual information is:

s35, presence of S-l^-Or s + l⁺Satisfy R (G)_s(t),

Or R (G)_s(t),

But R (G)_s(t),

Or R (G)_s(t),

Wherein l^-And l⁺Are all positive integers;

5. The method for detecting spatio-temporal outlier-based false data in Internet of vehicles according to claim 1, wherein the method for detecting outliers in the spatio-temporal neighborhood of the outlier data set by using the outlier detection algorithm in step S4 comprises the following steps:

s42, calculating the standard deviation of the data in M as

Wherein the content of the first and second substances,