CN115423048B - Traffic flow anomaly detection method and system based on pattern similarity - Google Patents

Traffic flow anomaly detection method and system based on pattern similarity Download PDF

Info

Publication number
CN115423048B
CN115423048B CN202211365058.7A CN202211365058A CN115423048B CN 115423048 B CN115423048 B CN 115423048B CN 202211365058 A CN202211365058 A CN 202211365058A CN 115423048 B CN115423048 B CN 115423048B
Authority
CN
China
Prior art keywords
traffic flow
similarity
time sequence
mode
flow data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211365058.7A
Other languages
Chinese (zh)
Other versions
CN115423048A (en
Inventor
张彩明
马翔
袁晨迅
李雪梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Original Assignee
Shandong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University filed Critical Shandong University
Priority to CN202211365058.7A priority Critical patent/CN115423048B/en
Publication of CN115423048A publication Critical patent/CN115423048A/en
Application granted granted Critical
Publication of CN115423048B publication Critical patent/CN115423048B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/01Detecting movement of traffic to be counted or controlled
    • G08G1/0104Measuring and analyzing of parameters relative to traffic conditions
    • G08G1/0125Traffic data processing
    • G08G1/0129Traffic data processing for creating historical data or processing based on historical data

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention discloses a traffic flow anomaly detection method and a system based on pattern similarity, which relate to the technical field of traffic flow anomaly detection models and comprise the following steps: extracting time sequence characteristics from traffic flow data by adopting an improved long-short-term memory neural network; dividing and clustering traffic flow data by adopting a sliding window, and taking a short-term sequence corresponding to a clustering center as a mode characteristic; calculating time sequence similarity for time sequence features of different space positions; determining the mode characteristics closest to each mode characteristic, and weighting the nearest neighbor distances of the mode characteristic pairs to obtain the mode similarity of different spatial positions; determining sequence similarity according to the time sequence similarity and the mode similarity, and constructing traffic flow dynamic relation diagrams of different time and different space positions according to the sequence similarity; and detecting abnormal traffic flow states by adopting a traffic flow dynamic relation diagram and time sequence similarity so as to improve the accuracy of detecting abnormal traffic flow.

Description

Traffic flow anomaly detection method and system based on pattern similarity
Technical Field
The invention relates to the technical field of traffic flow anomaly detection models, in particular to a traffic flow anomaly detection method and system based on pattern similarity.
Background
Along with the related development of big data technology, the artificial intelligence technology is widely applied to traffic flow anomaly detection and traffic flow prediction, accurately detects the anomaly condition of traffic flow, not only can provide favorable decision reference for traffic management departments, but also can provide more proper route selection for pedestrians, and is favorable for relieving traffic pressure.
The change of the traffic flow at the intersection is affected by various aspects such as time, weather, traffic policy and the like, has obvious periodicity, and the existing traffic flow anomaly detection algorithm using the machine learning method has at least the following three problems:
(1) A single recurrent neural network model cannot more effectively extract information of the traffic flow history sequence.
(2) The existing traffic flow anomaly detection only considers the traffic condition of a single intersection, and does not consider the associated influence factors of other intersections.
(3) Calculation lacks an effective measure when calculating the similarity of traffic flows between different roads.
Disclosure of Invention
In order to solve the problems, the invention provides a traffic flow anomaly detection method and a system based on pattern similarity, which respectively extract time sequence characteristics and pattern characteristics from traffic flow data and construct a traffic flow dynamic relationship diagram, so as to judge traffic flow anomalies and improve the accuracy of traffic flow anomaly detection.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
in a first aspect, the present invention provides a traffic flow anomaly detection method based on pattern similarity, including:
acquiring traffic flow data;
extracting time sequence characteristics from traffic flow data by adopting an improved long-short-term memory neural network; the improved long-short-term memory neural network obtains time sequence characteristics after weighting and summing hidden states obtained at different moments;
dividing traffic flow data by adopting a sliding window to obtain a short-term sequence set, clustering the short-term sequence set, and taking a short-term sequence corresponding to a clustering center of each category as a mode characteristic;
calculating time sequence similarity for time sequence features of different space positions;
determining the mode characteristics closest to each mode characteristic, and obtaining mode similarity of different spatial positions after weighting the nearest neighbor distances of the mode characteristic pairs by forming mode characteristic pairs;
determining sequence similarity according to the time sequence similarity and the mode similarity, and constructing traffic flow dynamic relation diagrams of different time and different space positions according to the sequence similarity;
and detecting abnormal traffic flow states by adopting a traffic flow dynamic relation diagram and time sequence similarity.
In the process of weighting and summing the hidden states obtained at different moments to obtain the time sequence characteristics, the weight is determined according to the correlation between the hidden states at different moments and the traffic flow data
Figure 742262DEST_PATH_IMAGE001
The weight is as follows:
Figure 24339DEST_PATH_IMAGE002
Figure 256737DEST_PATH_IMAGE003
wherein ,
Figure 966067DEST_PATH_IMAGE004
is the firsttThe traffic flow data of the day is used,
Figure 916706DEST_PATH_IMAGE005
in order to be in a hidden state,
Figure 733964DEST_PATH_IMAGE006
as a function of the correlation,
Figure 86448DEST_PATH_IMAGE007
in order for the parameters to be learned,
Figure 966680DEST_PATH_IMAGE008
is the number of days of the traffic flow data entered,
Figure 404614DEST_PATH_IMAGE009
is a transpose operation.
Alternatively, the timing similarity is calculated for timing characteristics of different spatial locations
Figure 90811DEST_PATH_IMAGE010
The process of (1) is as follows:
Figure 235484DEST_PATH_IMAGE011
Figure 348934DEST_PATH_IMAGE012
wherein ,
Figure 211847DEST_PATH_IMAGE013
is the firsttSpace position of dayaIs used for the time sequence characteristics of the (a),
Figure 436155DEST_PATH_IMAGE014
is the firsttSpace position of daybIs used for the time sequence characteristics of the (a),
Figure 700915DEST_PATH_IMAGE015
is composed of weight matrix to be learned
Figure 719686DEST_PATH_IMAGE016
And a network of an activation function tanh,
Figure 132213DEST_PATH_IMAGE017
finger will
Figure 97895DEST_PATH_IMAGE013
And
Figure 279478DEST_PATH_IMAGE014
and (5) splicing.
In an alternative embodiment, in the process of weighting the nearest neighbor distance of the pattern feature pair, the weight is the number of elements included in the category of the pattern feature.
Alternatively, the sequence similarity is determined by summing weighted time-series similarity and pattern similarity.
As an alternative embodiment, the process of constructing the traffic flow dynamic relationship graph includes:
constructing a relationship diagram of different spatial positions at the same time according to the sequence similarity of traffic flow data of different spatial positions
Figure 672413DEST_PATH_IMAGE018
Introducing a communication relation matrix between traffic flow data of different spatial positions, and constructing a traffic flow dynamic relation graph according to the relation graph and the communication relation matrix
Figure 306656DEST_PATH_IMAGE019
Figure 810450DEST_PATH_IMAGE020
Figure 112118DEST_PATH_IMAGE021
wherein ,
Figure 673025DEST_PATH_IMAGE022
in order for the parameters to be learned,
Figure DEST_PATH_IMAGE023
for a connected relation matrix, tanh is the activation function,
Figure 794565DEST_PATH_IMAGE024
and
Figure 102050DEST_PATH_IMAGE025
the current time and the time indicated by the a priori data respectively,
Figure 258224DEST_PATH_IMAGE026
in order for the time difference to be a function of the time difference,
Figure 992962DEST_PATH_IMAGE027
is a decreasing function.
Alternatively, the connectivity matrix is:
Figure 601798DEST_PATH_IMAGE028
wherein ,X a Is the space positionaTraffic flow data, X b Is the space positionbIs used for determining the traffic flow data of the vehicle,
Figure 509711DEST_PATH_IMAGE029
is X a and X b A connected relation matrix between the two.
In a second aspect, the present invention provides a traffic flow anomaly detection system based on pattern similarity, including:
the data acquisition module is configured to acquire traffic flow data;
a timing feature extraction module configured to extract timing features for traffic flow data using the modified long-short term memory neural network; the improved long-short-term memory neural network obtains time sequence characteristics after weighting and summing hidden states obtained at different moments;
the mode feature extraction module is configured to segment traffic flow data by adopting a sliding window to obtain a short-term sequence set, and after clustering the short-term sequence set, taking a short-term sequence corresponding to a clustering center of each category as a mode feature;
the time sequence similarity determining module is configured to calculate time sequence similarity for time sequence characteristics of different space positions;
the mode similarity determining module is configured to determine the mode feature closest to each mode feature so as to form a mode feature pair, and the nearest neighbor distances of the mode feature pair are weighted to obtain the mode similarity of different spatial positions;
the dynamic relation diagram construction module is configured to determine sequence similarity according to the time sequence similarity and the mode similarity, and construct traffic flow dynamic relation diagrams of different time and different space positions according to the sequence similarity;
the abnormal detection module is configured to detect abnormal states of the traffic flow by adopting the traffic flow dynamic relation diagram and the time sequence similarity.
In a third aspect, the invention provides an electronic device comprising a memory and a processor and computer instructions stored on the memory and running on the processor, which when executed by the processor, perform the method of the first aspect.
In a fourth aspect, the present invention provides a computer readable storage medium storing computer instructions which, when executed by a processor, perform the method of the first aspect.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides a traffic flow anomaly detection method and a system based on pattern similarity, which adopt an improved long-short-term memory neural network to extract time sequence characteristics, meanwhile, the pattern characteristics are extracted to comprehensively consider the periodic characteristics of traffic flow data, and after similarity calculation is carried out on the extracted two parts of characteristics, a traffic flow dynamic relation graph is constructed, the influence of association relations among different spatial positions is considered in the traffic flow dynamic relation graph, the influence of different time on the current association relation is also considered, finally, the traffic flow anomaly condition is judged by utilizing a graph attention network, and the accuracy of traffic flow anomaly detection is improved.
Additional aspects of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.
Fig. 1 is a flow chart of a traffic flow anomaly detection method based on pattern similarity provided in embodiment 1 of the present invention;
FIG. 2 is a schematic diagram of the dynamic relationship diagram provided in embodiment 1 of the present invention;
fig. 3 is a flowchart of anomaly determination provided in embodiment 1 of the present invention.
Detailed Description
The invention is further described below with reference to the drawings and examples.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present invention. As used herein, unless the context clearly indicates otherwise, the singular forms also are intended to include the plural forms, and furthermore, it is to be understood that the terms "comprises" and "comprising" and any variations thereof are intended to cover non-exclusive inclusions, such as, for example, processes, methods, systems, products or devices that comprise a series of steps or units, are not necessarily limited to those steps or units that are expressly listed, but may include other steps or units that are not expressly listed or inherent to such processes, methods, products or devices.
Embodiments of the invention and features of the embodiments may be combined with each other without conflict.
Example 1
The embodiment proposes a traffic flow anomaly detection method based on pattern similarity, as shown in fig. 1, including:
acquiring traffic flow data;
extracting time sequence characteristics from traffic flow data by adopting an improved long-short-term memory neural network; the improved long-short-term memory neural network obtains time sequence characteristics after weighting and summing hidden states obtained at different moments;
dividing traffic flow data by adopting a sliding window to obtain a short-term sequence set, clustering the short-term sequence set, and taking a short-term sequence corresponding to a clustering center of each category as a mode characteristic;
calculating time sequence similarity for time sequence features of different space positions;
determining the mode characteristics closest to each mode characteristic, and obtaining mode similarity of different spatial positions after weighting the nearest neighbor distances of the mode characteristic pairs by forming mode characteristic pairs;
determining sequence similarity according to the time sequence similarity and the mode similarity, and constructing traffic flow dynamic relation diagrams of different time and different space positions according to the sequence similarity;
and detecting abnormal traffic flow states by adopting a traffic flow dynamic relation diagram and time sequence similarity.
In the present embodiment, traffic flow data within T days is defined as
Figure 458076DEST_PATH_IMAGE030
The method comprises the steps of carrying out a first treatment on the surface of the Wherein, the firsttThe traffic flow data of the day is
Figure 426032DEST_PATH_IMAGE031
Figure 459847DEST_PATH_IMAGE032
Is the firsttDay 3nMinute data, N is the length of the traffic flow data in the day.
Since traffic flow data is affected by various complex factors, in order to vector the timing characteristics thereof, the embodiment adopts a modified long-short-term memory neural network (Long Short Term Memory, LSTM) to model the acquired traffic flow data so as to extract the timing characteristics.
LSTM is an improved algorithm for Recurrent Neural Networks (RNNs) and is widely used in time series modeling, and the gating unit adopted by LSTM can suppress the gradient disappearance problem of RNNs to some extent. For each traffic flow datax t For LSTM constructionThe mode formula is shown as formula (1) -formula (6):
an input door:
Figure 171451DEST_PATH_IMAGE033
(1)
forgetting the door:
Figure 974322DEST_PATH_IMAGE034
(2)
Figure 113179DEST_PATH_IMAGE035
(3)
output door:
Figure 696607DEST_PATH_IMAGE036
(4)
long memory:
Figure 149585DEST_PATH_IMAGE037
(5)
short memory:
Figure 869280DEST_PATH_IMAGE038
(6)
wherein ,W i W f W C andW o are all parameters to be learned, and are used for learning,C t in the state of a cell, the cell is in a state of being,
Figure 116721DEST_PATH_IMAGE039
is an intermediate quantity of the state of the cell,
Figure 187445DEST_PATH_IMAGE040
for the Hadamard product,h t is in a hidden state. For convenience in explaining the specific improved algorithm, the above formula ignores biasAnd (5) transferring items.
In most existing algorithms, the last hidden state is typicallyh t As a result of the LSTM output, this tends to ignore features contained in the previous hidden state.
Therefore, in this embodiment, the hidden states at different moments are fused in a weighted summation manner to obtain a time sequence feature; wherein, the hidden states and the hidden states at different moments are usedx t The correlation of the hidden state is defined with a weight corresponding to a hidden state with large correlation, thereby improving the output result pairx t The expression ability of (a) is represented by the following formula (7) -formula (9):
Figure 175606DEST_PATH_IMAGE041
(7)
Figure 15386DEST_PATH_IMAGE042
(8)
Figure 433729DEST_PATH_IMAGE003
(9)
wherein ,
Figure 991749DEST_PATH_IMAGE043
is that
Figure 583267DEST_PATH_IMAGE044
Is calculated from the correlation function
Figure 215237DEST_PATH_IMAGE045
Determining;
Figure 132377DEST_PATH_IMAGE007
is the parameter to be learned.
After the above processing is carried out on all traffic flow data, the traffic flow data is obtainedx t Vector representation, i.e. temporal featurev t The method comprises the steps of carrying out a first treatment on the surface of the The process is simplified to be represented by formula (10):
Figure 849798DEST_PATH_IMAGE046
(10)
wherein ,
Figure 245007DEST_PATH_IMAGE008
is the number of days data is entered.
Pattern features refer to a series of approximately short-term data that recurs over historical data. Thus, the present embodiment proposes a pattern feature extraction method based on segmentation and clustering to capture periodic features.
Firstly, dividing traffic flow data into a plurality of short-term sequences by adopting a sliding window; specifically:
adopt sliding window to make the firsttTraffic flow data for daysx t Dividing into M windows to construct the thtShort-term set of sequences for days
Figure 997062DEST_PATH_IMAGE047
; wherein
Figure 819525DEST_PATH_IMAGE048
For short-term sequences, L is the window length, m=n-l+1.
Then, clustering the short-term sequence sets according to the distance between the short-term sequences to capture repeated short-term sequences, namely pattern features;
specifically: integrating short-term sequences into collections
Figure 86558DEST_PATH_IMAGE049
In by the pair of
Figure 223141DEST_PATH_IMAGE050
All short-term sequences in (a)
Figure 892020DEST_PATH_IMAGE051
Clustering to capture pattern features;
belonging to the same category
Figure 823067DEST_PATH_IMAGE051
With approximate short-term sequences, taking the cluster center of each category
Figure 577396DEST_PATH_IMAGE052
As the firsttPattern features of the traffic flow data of the day, where each element represents a cluster center of each category, g is the number of categories.
In this embodiment, similarity calculation is performed on the time sequence feature and the mode feature, and then, sequence similarity is determined according to the time sequence similarity and the mode similarity, and balance of the two similarities is controlled.
In this embodiment, the time sequence similarity between the time sequence features of the traffic flow data of different spatial positions (such as different traffic intersections) at the same time is calculated
Figure 252091DEST_PATH_IMAGE010
Figure 775476DEST_PATH_IMAGE011
(11)
Figure 205321DEST_PATH_IMAGE012
(12)
wherein ,
Figure 381700DEST_PATH_IMAGE013
is the firsttSpace position of dayaIs used for the time sequence characteristics of the (a),
Figure 656823DEST_PATH_IMAGE014
is the firsttSpace position of daybIs used for the time sequence characteristics of the (a),
Figure 237977DEST_PATH_IMAGE015
is composed of weight matrix to be learned
Figure 573144DEST_PATH_IMAGE016
And a network of an activation function tanh,
Figure 239748DEST_PATH_IMAGE017
finger will
Figure 318563DEST_PATH_IMAGE013
And
Figure 754223DEST_PATH_IMAGE014
and (5) splicing.
In the present embodiment, the pattern features of traffic flow data of all spatial locations are obtained
Figure 525870DEST_PATH_IMAGE053
After that, by calculating the space positionaTraffic flow data of (a)
Figure 414192DEST_PATH_IMAGE054
Pattern features of (2)
Figure 296697DEST_PATH_IMAGE055
And spatial positionbTraffic flow data of (a)
Figure 649181DEST_PATH_IMAGE056
Pattern features of (2)
Figure 529412DEST_PATH_IMAGE057
Distance between them to obtain
Figure 967347DEST_PATH_IMAGE054
And
Figure 325647DEST_PATH_IMAGE056
to determine pattern similarity.
Due to
Figure 798217DEST_PATH_IMAGE053
Does not have a sequential relationship per se, and
Figure 849349DEST_PATH_IMAGE055
and (3) with
Figure 774580DEST_PATH_IMAGE057
The number of elements contained may vary, resulting in a computational process
Figure 998888DEST_PATH_IMAGE055
And (3) with
Figure 18576DEST_PATH_IMAGE057
The correspondence of the elements is not easily determined. In order to ensure the simplicity and robustness of the algorithm, the embodiment adopts the calculation of the nearest neighbor distance of each mode feature to solve the problem that trend mode features of different traffic flow data sequences have no one-to-one correspondence.
Nearest neighbor distance refers to the distance D between each pattern feature and its nearest pattern feature 1NN Expressed as:
Figure 302927DEST_PATH_IMAGE058
wherein ,
Figure 653137DEST_PATH_IMAGE059
is that
Figure 681135DEST_PATH_IMAGE054
Is the first of (2)
Figure 800401DEST_PATH_IMAGE060
The characteristics of the individual modes,
Figure 990074DEST_PATH_IMAGE061
is that
Figure 827580DEST_PATH_IMAGE056
Is the first of (2)
Figure 393691DEST_PATH_IMAGE062
A personal pattern feature;
will be
Figure 695359DEST_PATH_IMAGE063
When it is "1
Figure 993616DEST_PATH_IMAGE059
And
Figure 380735DEST_PATH_IMAGE064
the Euclidean distance between them is
Figure 688220DEST_PATH_IMAGE065
Will be
Figure 578815DEST_PATH_IMAGE055
All elements of (3) are relative to
Figure 313553DEST_PATH_IMAGE057
Is represented as an array
Figure 922389DEST_PATH_IMAGE066
Notably, when
Figure 33565DEST_PATH_IMAGE064
Is that
Figure 44246DEST_PATH_IMAGE059
Is used to determine the nearest-neighbor of the cell,
Figure 150218DEST_PATH_IMAGE059
may not be
Figure 246350DEST_PATH_IMAGE064
Is the nearest neighbor of (2);
therefore, it is necessary to use
Figure 692374DEST_PATH_IMAGE067
And
Figure 557562DEST_PATH_IMAGE068
respectively represent
Figure 634103DEST_PATH_IMAGE055
All elements of (3) are relative to
Figure 217531DEST_PATH_IMAGE057
Nearest neighbor sum of (2)
Figure 404930DEST_PATH_IMAGE057
All elements of (3) are relative to
Figure 390203DEST_PATH_IMAGE055
Is the nearest neighbor of (2);
in order to make
Figure 637645DEST_PATH_IMAGE055
And
Figure 708369DEST_PATH_IMAGE057
the distance measurement between the two is symmetrical
Figure 699459DEST_PATH_IMAGE067
And
Figure 539239DEST_PATH_IMAGE068
is combined into
Figure 957582DEST_PATH_IMAGE069
, wherein
Figure 250023DEST_PATH_IMAGE070
And
Figure 44804DEST_PATH_IMAGE071
respectively is
Figure 739090DEST_PATH_IMAGE054
And
Figure 390651DEST_PATH_IMAGE056
pattern feature quantity of (2);
Figure 370721DEST_PATH_IMAGE072
Included
Figure 765930DEST_PATH_IMAGE055
and (3) with
Figure 252407DEST_PATH_IMAGE057
The question is how to choose the most reasonable value to represent the nearest neighbor of all pattern features in (a)
Figure 340448DEST_PATH_IMAGE054
And
Figure 545165DEST_PATH_IMAGE056
the distance between, if chosen
Figure 744065DEST_PATH_IMAGE072
Maximum value of (2), then at random
Figure 85047DEST_PATH_IMAGE053
Noise peaks that occur in (a) will seriously affect the distance determination, whereas most if based on a minimum value
Figure 343990DEST_PATH_IMAGE053
There is little distinction between them.
In order to consider the influence of all modes as much as possible, the present embodiment selects pairs
Figure 36003DEST_PATH_IMAGE072
All the values in the pattern are weighted to obtain the pattern similarity
Figure 773015DEST_PATH_IMAGE073
. Structure of the device
Figure 296400DEST_PATH_IMAGE055
And (3) with
Figure 867190DEST_PATH_IMAGE057
When each pattern feature is located in the category, the number of the elements is recorded
Figure 843236DEST_PATH_IMAGE074
And
Figure 118360DEST_PATH_IMAGE075
similar handle
Figure 761831DEST_PATH_IMAGE076
And
Figure 34680DEST_PATH_IMAGE077
is combined into
Figure 763602DEST_PATH_IMAGE078
The weighting function is as shown in equation (13):
Figure 777169DEST_PATH_IMAGE079
(13)
wherein ,
Figure 275147DEST_PATH_IMAGE080
for sum operation.
In this embodiment, the sequence similarity is determined based on the time sequence similarity and the pattern similarity
Figure 718898DEST_PATH_IMAGE081
As shown in formula (14):
Figure 935115DEST_PATH_IMAGE082
(14)
wherein ,
Figure 755304DEST_PATH_IMAGE083
is a weight parameter to be learned for controlling the balance of the two similarities.
In the embodiment, a relationship diagram of different spatial positions (traffic intersections) at the same time is constructed according to the sequence similarity, and then the relationship diagram at different times is processed by using LSTM (least squares) based on the dynamic diagram so as to construct a traffic flow dynamic relationship diagram containing time sequence characteristics;
specifically, according to the differencesSequence similarity of traffic flow data of spatial positions to construct a relationship diagram of different spatial positions at the same time (same day)
Figure 107788DEST_PATH_IMAGE018
As shown in formula (15):
Figure 988019DEST_PATH_IMAGE084
wherein ,
Figure 425954DEST_PATH_IMAGE018
is the firsttA relationship diagram constructed by the days,
Figure 846571DEST_PATH_IMAGE085
is the space positionaAnd spatial positionbSequence similarity between traffic flow data.
Relationship diagram
Figure 256823DEST_PATH_IMAGE018
Can reflect the firsttThe association relation of the day, but the influence of other time on the current association relation is ignored. For this reason, by referring to the gating structure of LSTM, a Dynamic relationship graph construction method is designed, as shown in fig. 2, named as Dynamic-based LSTM (DGLSTM), where the part actually optimizes the input data, and the corresponding specific formula is shown in formula (16):
Figure 104694DEST_PATH_IMAGE020
(16)
wherein ,
Figure 967608DEST_PATH_IMAGE022
in order for the parameters to be learned,
Figure 191916DEST_PATH_IMAGE023
for the connection relation matrix between different space positions, the method is used for guiding the construction of dynamic relation diagrams and definesFormula (17):
Figure 456675DEST_PATH_IMAGE086
(17)
wherein ,
Figure 741026DEST_PATH_IMAGE026
in order for the time difference to be a function of the time difference,
Figure 91235DEST_PATH_IMAGE087
Figure 853655DEST_PATH_IMAGE024
and
Figure 969991DEST_PATH_IMAGE025
the time indicated by the current time and the prior data is respectively;
Figure 425243DEST_PATH_IMAGE088
is a decreasing function for assigning a priori data
Figure 325066DEST_PATH_IMAGE089
And means
Figure 828860DEST_PATH_IMAGE023
The data in (c) is gradually forgotten with increasing time interval.
DGLSTM can be represented by the formula (18) -formula (23):
an input door:
Figure 864949DEST_PATH_IMAGE090
(18)
forgetting the door:
Figure 428785DEST_PATH_IMAGE091
(19)
Figure 815904DEST_PATH_IMAGE092
(20)
output door:
Figure 857810DEST_PATH_IMAGE093
(21)
long memory:
Figure 13985DEST_PATH_IMAGE094
(22)
short memory:
Figure 748722DEST_PATH_IMAGE095
(23)
after DGLSTM, a dynamic relation graph is obtained
Figure 357558DEST_PATH_IMAGE096
The method comprises the steps of carrying out a first treatment on the surface of the Relative to
Figure 468734DEST_PATH_IMAGE018
Figure 479415DEST_PATH_IMAGE096
The method not only can reflect the relevance among the current time sequences, but also is influenced by other time relation diagrams in history. For convenience of description, the following will be made
Figure 447371DEST_PATH_IMAGE096
The construction process of (2) is expressed as shown in formula (24):
Figure 481186DEST_PATH_IMAGE097
(24)
in this embodiment, the graph attention network (Graph attention networks, GAT) is adopted to perform traffic flow anomaly judgment, and by aggregating the effects between approximate sequences, so as to capture implicit information in the dynamic relationship graph, compared with the traditional graph roll-up neural network (Graph Convolutional Networks, GCN) model, the GAT can selectively aggregate the effects of the approximate sequences, and the expression is shown in the formula (25):
Figure 927211DEST_PATH_IMAGE098
(25)
wherein ,
Figure 995661DEST_PATH_IMAGE099
is the firstkThe traffic flow data of each intersection is provided with a time sequence feature obtained through LSTM;
Figure 134519DEST_PATH_IMAGE100
is an intersectionkAnd crossingpSequence similarity of (2) of the order of magnitude of a dynamic relationship graph
Figure 652700DEST_PATH_IMAGE096
First, thekLine 1pThe value of the column;
Figure 902416DEST_PATH_IMAGE101
the weight matrix to be learned; crossingkThe output at GAT is implicit information
Figure 825372DEST_PATH_IMAGE102
Acquiring hidden information of all intersections through GAT
Figure 135131DEST_PATH_IMAGE103
The process of (2) may be represented by formula (26):
Figure 205855DEST_PATH_IMAGE104
(26)
defining a label when abnormality determination is performed
Figure 196945DEST_PATH_IMAGE105
Adopts a single-layer full-connection layer (Fully connected layer, FC) as a prediction function pair
Figure 36725DEST_PATH_IMAGE103
The prediction is performed as shown in fig. 3, and the prediction formula is shown in formula (27):
Figure 455068DEST_PATH_IMAGE106
(27)
wherein ,
Figure 747509DEST_PATH_IMAGE107
is the abnormal result of the judgment.
In this embodiment, the graph annotation force network is trained using cross entropy, as shown in equation (28):
Figure 542290DEST_PATH_IMAGE108
(28)
wherein ,
Figure 236576DEST_PATH_IMAGE109
and
Figure 825821DEST_PATH_IMAGE110
respectively, are intersectionskIn the first placetThe true category and predicted value of the moment in time,Lis a loss function for minimizing the gap between the predicted value and the true class.
Since anomaly detection is a typical classification task, the present embodiment uses the Accuracy (ACC) and Ma Xiusi correlation coefficient (Matthews correlation coefficient, MCC) widely accepted in classification tasks to evaluate the predictive effect of a graph attention network.
The ACC can intuitively express the prediction effect of the model, and the formula is shown as formula (29):
Figure 871137DEST_PATH_IMAGE111
(29)
wherein, true Positive (TP) is the result that both the predicted and the true value are normal; true Negative (TN) is the result that both the predicted and the true values are abnormal; false Positives (FP) are predicted as normal, actually abnormal results; false Negatives (FN) are predicted to be abnormal and actually normal.
MCC is an index that evaluates the performance of a model classification, and is actually a correlation coefficient that describes the relationship between the actual classification and the predicted classification. Its value is between-1 and +1, the coefficient +1 representing perfect prediction, 0 representing no better than random prediction, -1 representing complete inconsistency between prediction and observation. The MCC calculation formula is shown in formula (30):
Figure 204029DEST_PATH_IMAGE112
(30)
after comparing the method of this embodiment with 3 conventional methods, namely, the graph annotation force network (Graph attention networks, GAT), the time convolution neural network (Temporal Convolutional Neural Network, TCN), and the gated loop unit neural network (Gated Recurrent Unit, GRU), the two indexes of the method of this embodiment are the highest, and the method of this embodiment is ranked first in the comparison method, so as to confirm the effectiveness of the method of this embodiment.
Example 2
The embodiment provides a traffic flow anomaly detection system based on pattern similarity, which comprises:
the data acquisition module is configured to acquire traffic flow data;
a timing feature extraction module configured to extract timing features for traffic flow data using the modified long-short term memory neural network; the improved long-short-term memory neural network obtains time sequence characteristics after weighting and summing hidden states obtained at different moments;
the mode feature extraction module is configured to segment traffic flow data by adopting a sliding window to obtain a short-term sequence set, and after clustering the short-term sequence set, taking a short-term sequence corresponding to a clustering center of each category as a mode feature;
the time sequence similarity determining module is configured to calculate time sequence similarity for time sequence characteristics of different space positions;
the mode similarity determining module is configured to determine the mode feature closest to each mode feature so as to form a mode feature pair, and the nearest neighbor distances of the mode feature pair are weighted to obtain the mode similarity of different spatial positions;
the dynamic relation diagram construction module is configured to determine sequence similarity according to the time sequence similarity and the mode similarity, and construct traffic flow dynamic relation diagrams of different time and different space positions according to the sequence similarity;
the abnormal detection module is configured to detect abnormal states of the traffic flow by adopting the traffic flow dynamic relation diagram and the time sequence similarity.
It should be noted that the above modules correspond to the steps described in embodiment 1, and the above modules are the same as examples and application scenarios implemented by the corresponding steps, but are not limited to those disclosed in embodiment 1. It should be noted that the modules described above may be implemented as part of a system in a computer system, such as a set of computer-executable instructions.
In further embodiments, there is also provided:
an electronic device comprising a memory and a processor and computer instructions stored on the memory and running on the processor, which when executed by the processor, perform the method described in embodiment 1. For brevity, the description is omitted here.
It should be understood that in this embodiment, the processor may be a central processing unit CPU, and the processor may also be other general purpose processors, digital signal processors DSP, application specific integrated circuits ASIC, off-the-shelf programmable gate array FPGA or other programmable logic device, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory may include read only memory and random access memory and provide instructions and data to the processor, and a portion of the memory may also include non-volatile random access memory. For example, the memory may also store information of the device type.
A computer readable storage medium storing computer instructions which, when executed by a processor, perform the method described in embodiment 1.
The method in embodiment 1 may be directly embodied as a hardware processor executing or executed with a combination of hardware and software modules in the processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method. To avoid repetition, a detailed description is not provided herein.
Those of ordinary skill in the art will appreciate that the elements of the various examples described in connection with the present embodiments, i.e., the algorithm steps, can be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
While the foregoing description of the embodiments of the present invention has been presented in conjunction with the drawings, it should be understood that it is not intended to limit the scope of the invention, but rather, it is intended to cover all modifications or variations within the scope of the invention as defined by the claims of the present invention.

Claims (8)

1. The traffic flow anomaly detection method based on the pattern similarity is characterized by comprising the following steps of:
acquiring traffic flow data;
extracting time sequence characteristics from traffic flow data by adopting an improved long-short-term memory neural network; the improved long-short-term memory neural network obtains time sequence characteristics after weighting and summing hidden states obtained at different moments;
dividing traffic flow data by adopting a sliding window to obtain a short-term sequence set, clustering the short-term sequence set, and taking a short-term sequence corresponding to a clustering center of each category as a mode characteristic;
calculating time sequence similarity for time sequence features of different space positions;
determining the mode characteristics closest to each mode characteristic, and obtaining mode similarity of different spatial positions after weighting the nearest neighbor distances of the mode characteristic pairs by forming mode characteristic pairs;
determining sequence similarity according to the time sequence similarity and the mode similarity, and constructing traffic flow dynamic relation diagrams of different time and different space positions according to the sequence similarity;
detecting abnormal traffic flow states by adopting a traffic flow dynamic relation diagram and time sequence similarity;
in the process of weighting and summing the hidden states obtained at different moments to obtain the time sequence characteristics, the weight is determined according to the correlation between the hidden states at different moments and the traffic flow data
Figure 628703DEST_PATH_IMAGE001
The weight is as follows:
Figure 534342DEST_PATH_IMAGE002
Figure 568157DEST_PATH_IMAGE003
wherein ,x t is the firsttThe traffic flow data of the day is used,
Figure 217445DEST_PATH_IMAGE004
in order to be in a hidden state,
Figure 17386DEST_PATH_IMAGE005
as a function of the correlation,
Figure 93926DEST_PATH_IMAGE006
in order for the parameters to be learned,
Figure 615038DEST_PATH_IMAGE007
is the number of days of the traffic flow data entered,
Figure 802436DEST_PATH_IMAGE008
is a transposition operation;
calculating time sequence similarity for time sequence characteristics of different space positions
Figure 725393DEST_PATH_IMAGE009
The process of (1) is as follows:
Figure 972835DEST_PATH_IMAGE010
Figure 981242DEST_PATH_IMAGE011
wherein ,
Figure 972332DEST_PATH_IMAGE012
is the firsttSpace position of dayaIs used for the time sequence characteristics of the (a),
Figure 770303DEST_PATH_IMAGE013
is the firsttSpace position of daybIs used for the time sequence characteristics of the (a),
Figure 188646DEST_PATH_IMAGE014
is composed of weight matrix to be learned
Figure 418770DEST_PATH_IMAGE015
And a network of an activation function tanh,
Figure 213551DEST_PATH_IMAGE016
finger will
Figure 845520DEST_PATH_IMAGE012
And
Figure 434765DEST_PATH_IMAGE013
and (5) splicing.
2. The traffic flow anomaly detection method based on pattern similarity according to claim 1, wherein in the process of weighting nearest neighbor distances of pattern feature pairs, the weight is the number of elements contained in the category of the pattern feature.
3. The traffic flow anomaly detection method based on pattern similarity according to claim 1, wherein the sequence similarity is determined by summing up weighted time sequence similarity and pattern similarity.
4. The traffic flow anomaly detection method based on pattern similarity as claimed in claim 1, wherein the process of constructing the traffic flow dynamic relationship graph comprises:
constructing a relationship diagram of different spatial positions at the same time according to the sequence similarity of traffic flow data of different spatial positions
Figure 417764DEST_PATH_IMAGE017
Introducing a communication relation matrix between traffic flow data of different spatial positions, and constructing a traffic flow dynamic relation graph according to the relation graph and the communication relation matrix
Figure 750656DEST_PATH_IMAGE018
Figure 234203DEST_PATH_IMAGE019
Figure 259928DEST_PATH_IMAGE020
wherein ,
Figure 464644DEST_PATH_IMAGE021
in order for the parameters to be learned,
Figure 601227DEST_PATH_IMAGE022
for a connected relation matrix, tanh is the activation function,
Figure 942210DEST_PATH_IMAGE023
and
Figure 138836DEST_PATH_IMAGE024
the current time and the time indicated by the a priori data respectively,
Figure 830849DEST_PATH_IMAGE025
in order for the time difference to be a function of the time difference,
Figure 505543DEST_PATH_IMAGE026
is a decreasing function.
5. The traffic flow anomaly detection method based on pattern similarity as claimed in claim 4, wherein the connectivity matrix is:
Figure 963682DEST_PATH_IMAGE027
wherein ,X a Is the space positionaTraffic flow data, X b Is the space positionbIs used for determining the traffic flow data of the vehicle,
Figure 331210DEST_PATH_IMAGE028
is X a and X b A connected relation matrix between the two.
6. A traffic flow anomaly detection system based on pattern similarity, comprising:
the data acquisition module is configured to acquire traffic flow data;
a timing feature extraction module configured to extract timing features for traffic flow data using the modified long-short term memory neural network; the improved long-short-term memory neural network obtains time sequence characteristics after weighting and summing hidden states obtained at different moments;
the mode feature extraction module is configured to segment traffic flow data by adopting a sliding window to obtain a short-term sequence set, and after clustering the short-term sequence set, taking a short-term sequence corresponding to a clustering center of each category as a mode feature;
the time sequence similarity determining module is configured to calculate time sequence similarity for time sequence characteristics of different space positions;
the mode similarity determining module is configured to determine the mode feature closest to each mode feature so as to form a mode feature pair, and the nearest neighbor distances of the mode feature pair are weighted to obtain the mode similarity of different spatial positions;
the dynamic relation diagram construction module is configured to determine sequence similarity according to the time sequence similarity and the mode similarity, and construct traffic flow dynamic relation diagrams of different time and different space positions according to the sequence similarity;
the abnormal detection module is configured to detect abnormal traffic flow states by adopting a traffic flow dynamic relation diagram and time sequence similarity;
in the process of weighting and summing the hidden states obtained at different moments to obtain the time sequence characteristics, the weight is determined according to the correlation between the hidden states at different moments and the traffic flow data
Figure 244939DEST_PATH_IMAGE001
The weight is as follows:
Figure 723325DEST_PATH_IMAGE002
Figure 304479DEST_PATH_IMAGE003
wherein ,x t is the firsttThe traffic flow data of the day is used,
Figure 577328DEST_PATH_IMAGE004
in order to be in a hidden state,
Figure 243933DEST_PATH_IMAGE005
as a function of the correlation,
Figure 260431DEST_PATH_IMAGE006
in order for the parameters to be learned,
Figure 693161DEST_PATH_IMAGE007
is the number of days of the traffic flow data entered,
Figure 136912DEST_PATH_IMAGE008
is a transposition operation;
calculating time sequence similarity for time sequence characteristics of different space positions
Figure 290813DEST_PATH_IMAGE009
The process of (1) is as follows:
Figure 173318DEST_PATH_IMAGE010
Figure 463485DEST_PATH_IMAGE011
wherein ,
Figure 343717DEST_PATH_IMAGE012
is the firsttSpace position of dayaIs used for the time sequence characteristics of the (a),
Figure 719334DEST_PATH_IMAGE013
is the firsttSpace position of daybIs used for the time sequence characteristics of the (a),
Figure 77635DEST_PATH_IMAGE014
is composed of weight matrix to be learned
Figure 487887DEST_PATH_IMAGE015
And a network of an activation function tanh,
Figure 270511DEST_PATH_IMAGE016
finger will
Figure 133425DEST_PATH_IMAGE012
And
Figure 295416DEST_PATH_IMAGE013
and (5) splicing.
7. An electronic device comprising a memory and a processor and computer instructions stored on the memory and running on the processor, which when executed by the processor, perform the method of any one of claims 1-5.
8. A computer readable storage medium storing computer instructions which, when executed by a processor, perform the method of any of claims 1-5.
CN202211365058.7A 2022-11-03 2022-11-03 Traffic flow anomaly detection method and system based on pattern similarity Active CN115423048B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211365058.7A CN115423048B (en) 2022-11-03 2022-11-03 Traffic flow anomaly detection method and system based on pattern similarity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211365058.7A CN115423048B (en) 2022-11-03 2022-11-03 Traffic flow anomaly detection method and system based on pattern similarity

Publications (2)

Publication Number Publication Date
CN115423048A CN115423048A (en) 2022-12-02
CN115423048B true CN115423048B (en) 2023-04-25

Family

ID=84207956

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211365058.7A Active CN115423048B (en) 2022-11-03 2022-11-03 Traffic flow anomaly detection method and system based on pattern similarity

Country Status (1)

Country Link
CN (1) CN115423048B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116361635B (en) * 2023-06-02 2023-10-10 中国科学院成都文献情报中心 Multidimensional time sequence data anomaly detection method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022047658A1 (en) * 2020-09-02 2022-03-10 大连大学 Log anomaly detection system
WO2022160902A1 (en) * 2021-01-28 2022-08-04 广西大学 Anomaly detection method for large-scale multivariate time series data in cloud environment

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2841422A1 (en) * 2011-07-20 2013-01-24 Elminda Ltd. Method and system for estimating brain concussion
US20200097808A1 (en) * 2018-09-21 2020-03-26 International Business Machines Corporation Pattern Identification in Reinforcement Learning
CN111145541B (en) * 2019-12-18 2021-10-22 深圳先进技术研究院 Traffic flow data prediction method, storage medium, and computer device
CN112801404B (en) * 2021-02-14 2024-03-22 北京工业大学 Traffic prediction method based on self-adaptive space self-attention force diagram convolution

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022047658A1 (en) * 2020-09-02 2022-03-10 大连大学 Log anomaly detection system
WO2022160902A1 (en) * 2021-01-28 2022-08-04 广西大学 Anomaly detection method for large-scale multivariate time series data in cloud environment

Also Published As

Publication number Publication date
CN115423048A (en) 2022-12-02

Similar Documents

Publication Publication Date Title
CN110223517B (en) Short-term traffic flow prediction method based on space-time correlation
Chen et al. Learning graph structures with transformer for multivariate time-series anomaly detection in IoT
Hsieh et al. Unsupervised online anomaly detection on multivariate sensing time series data for smart manufacturing
Zhao et al. Maritime anomaly detection using density-based clustering and recurrent neural network
US9779361B2 (en) Method for learning exemplars for anomaly detection
CN111797122B (en) Method and device for predicting change trend of high-dimensional reappearance concept drift stream data
Guo et al. Hidden Markov models based approaches to long-term prediction for granular time series
CN114220271A (en) Traffic flow prediction method, equipment and storage medium based on dynamic space-time graph convolution cycle network
Xie et al. Deep graph convolutional networks for incident-driven traffic speed prediction
CN115423048B (en) Traffic flow anomaly detection method and system based on pattern similarity
CN113570859B (en) Traffic flow prediction method based on asynchronous space-time expansion graph convolution network
CN106709588B (en) Prediction model construction method and device and real-time prediction method and device
CN114565124A (en) Ship traffic flow prediction method based on improved graph convolution neural network
CN113505536A (en) Optimized traffic flow prediction model based on space-time diagram convolution network
Hosseini et al. Short-term traffic flow forecasting by mutual information and artificial neural networks
CN115169430A (en) Cloud network end resource multidimensional time sequence anomaly detection method based on multi-scale decoding
Kovács et al. Optimistic search: Change point estimation for large-scale data via adaptive logarithmic queries
CN114596726B (en) Parking berth prediction method based on interpretable space-time attention mechanism
CN114861875A (en) Internet of things intrusion detection method based on self-supervision learning and self-knowledge distillation
Tambuwal et al. Deep quantile regression for unsupervised anomaly detection in time-series
Xie et al. " how do urban incidents affect traffic speed?" A deep graph convolutional network for incident-driven traffic speed prediction
CN117150882A (en) Engine oil consumption prediction method, system, electronic equipment and storage medium
CN117111464A (en) Self-adaptive fault diagnosis method under multiple working conditions
CN116992224A (en) Time sequence data reconstruction method based on multi-head attention mechanism
CN115953902A (en) Traffic flow prediction method based on multi-view space-time diagram convolution network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant