Disclosure of Invention
It is therefore an object of the present invention to overcome the above-mentioned drawbacks of the prior art and to provide a method for training a trajectory prediction model and a trajectory prediction method.
The purpose of the invention is realized by the following technical scheme:
according to a first aspect of the present invention, there is provided a method for training a trajectory prediction model, the method comprising:
a1, acquiring space-time trajectory data of trajectory segments containing a plurality of targets represented by dense vectors, and cutting the trajectory data into near-term trajectory data, short-term history data and long-term history data from near to far according to time;
a2, training a coding model of the first multi-head attention mechanism network by using long-term historical data to capture the long-term spatiotemporal relation of each track point in the long-term historical data;
a3, training a recurrent neural network coding model by using short-term historical data to capture the short-term space-time relation of each track point in the short-term historical data;
a4, training a coding model of a second multi-head attention mechanism network by using the long-term space-time relationship and the short-term space-time relationship to adjust the short-term space-time relationship according to the similarity of the long-term space-time relationship and the short-term space-time relationship, so as to obtain the adjusted short-term space-time relationship;
a5, training a decoding model of a third multi-head attention mechanism network by using the recent track data and the adjusted short-term space-time relation to obtain the track prediction model.
In some embodiments of the present invention, the step a1 includes:
a11, acquiring space-time trajectory data of a plurality of targets represented by sparse vectors from a database and preprocessing the space-time trajectory data to obtain a plurality of trajectory segments;
a12, performing data mapping on the plurality of track segments to map sparse vectors to dense vectors, and obtaining space-time track data of the track segments containing a plurality of targets represented by the dense vectors;
and A13, cutting the space-time trajectory data of the trajectory segment containing a plurality of targets represented by dense vectors into recent trajectory data, short-term historical data and long-term historical data from near to far according to preset cutting rules and time.
In some embodiments of the present invention, the step a11 includes:
a111, acquiring space-time trajectory data of a plurality of targets represented by sparse vectors from a database, wherein each target comprises one or more trajectory segments;
a112, preprocessing the acquired space-time trajectory data of a plurality of targets represented by sparse vectors, comprising:
a1121, cutting track segments of two adjacent track points with a time difference larger than or equal to a preset time threshold value from between the two adjacent track points so as to cut the track segments into two or more track segments;
a1122, deleting track segments containing track points, the number of which is less than a first preset number of points;
and A1123, deleting the target with the segment number of the track segment less than the preset segment number.
Preferably, the preset time threshold is 72 hours, the number of the first preset points is 5, and the number of the preset segments is 5.
In some embodiments of the present invention, the step a13 includes:
a131, presetting three time intervals, namely in a first time node, between the first time node and a second time node and before the second time node from track recording data, and respectively taking track sections belonging to the three time intervals as recent track data, short-term historical data and long-term historical data, wherein the values of the first time node and the second time node are set by a user according to needs;
a132, cutting a plurality of track segments represented by dense vectors to cut the track segments spanning any time interval into two segments from two adjacent track points belonging to the two intervals;
and A133, cutting the track segment containing the track points in the short-term historical data, the number of which is greater than the second preset point number, into two segments from the track point of the track segment located at the second preset point number and the next track point of the track segment, and deleting the track segment, the number of which is less than the first preset point number, of the track points after the cutting.
Preferably, the number of the second preset points is 20.
Preferably, in the step a2, the long-term history data is used as a query, a key and a value to be input into a coding model of the first multi-head attention mechanism network so as to capture the long-term spatiotemporal relationship of each track point according to the context information of each track point in the long-term history data;
in the step A3, inputting the short-term historical data into a recurrent neural network to capture the short-term spatiotemporal relationship of each track point according to the context information of each track point in the short-term historical data;
in the step a4, inputting the long-term spatiotemporal relationship as a query and the short-term spatiotemporal relationship as a key and a value into a coding model of the second multi-head attention mechanism network to adjust the short-term spatiotemporal relationship according to the similarity of the long-term spatiotemporal relationship and the short-term spatiotemporal relationship, so as to obtain an adjusted short-term spatiotemporal relationship;
in step a5, the decoding model of the third multi-headed attention system network includes a multi-headed attention system model with a mask and a normal multi-headed attention system model, the multi-headed attention system model with the mask is input with the near-term trajectory data as the query, the key, and the value to capture the near-term spatiotemporal relationship of each trajectory point according to the context information of each trajectory point in the near-term history data, and the near-term spatiotemporal relationship as the query, the adjusted short-term spatiotemporal relationship as the key, and the adjusted short-term spatiotemporal relationship is input into the normal multi-headed attention system model to train the decoding model of the third multi-headed attention system network.
According to a second aspect of the present invention, there is provided a trajectory prediction method including:
according to the real-time trajectory data of the user, the trajectory prediction model obtained by the method for training the trajectory prediction model according to the first aspect is used for performing trajectory prediction, and a trajectory prediction result is obtained.
In some embodiments of the invention, the trajectory prediction method comprises:
b1, acquiring real-time trajectory data expressed by sparse vectors of the user and performing data mapping on the real-time trajectory data to map the sparse vectors to dense vectors to obtain the real-time trajectory data expressed by the dense vectors of the user;
b2, inputting the real-time track data expressed by dense vectors of the user as query, key and value into a multi-head attention mechanism model with a mask of a track prediction model to obtain a decoding result;
b3, inputting the decoding result into a full-link layer of a track prediction model to obtain predicted probability values of a plurality of candidate points of the user at the next moment;
and B4, acquiring the candidate point with the maximum probability value as the predicted track point of the user at the next moment, and outputting the candidate point as a track prediction result.
According to a third aspect of the present invention, there is provided an electronic apparatus comprising:
one or more processors; and
a memory, wherein the memory is to store one or more executable instructions;
the one or more processors are configured to perform the steps of the method as described in the first aspect or the second aspect via execution of the one or more executable instructions.
According to a fourth aspect of the present invention, there is provided a computer readable storage medium having embodied thereon a computer program executable by a processor to perform the steps of the method according to the first or second aspect.
Compared with the prior art, the invention has the advantages that:
the method divides space-time trajectory data into near-term trajectory data, short-term historical data and long-term historical data, captures the long-term space-time relationship of each trajectory point in the long-term historical data by using a coding model of a first multi-head attention machine network, captures the short-term space-time relationship of each trajectory point in the short-term historical data by using a cyclic neural network coding model, adjusts the short-term space-time relationship according to the similarity of the long-term space-time relationship and the short-term space-time relationship by using a coding model of a second multi-head attention machine network to obtain the adjusted short-term space-time relationship, realizes the global dependency of historical trajectories, trains a decoding model of the third multi-head attention machine network by using the near-term trajectory data and the adjusted short-term space-time relationship, and uses the decoding model as a trajectory prediction model, thereby improving the accuracy of.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail by embodiments with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As mentioned in the background section, in the prior art method, the trajectory data of all the time periods are usually processed by using the same model, such as a recurrent neural network model, see fig. 1, which shows a prior art trajectory prediction method, including: k1, acquiring space-time trajectory data from the database and preprocessing the space-time trajectory data to obtain a plurality of trajectory segments; k2, performing data mapping on the plurality of track segments to map the sparse vectors to the dense vectors, and obtaining a plurality of track segments represented by the dense vectors; k3, cutting a plurality of track segments represented by dense vectors into recent track data and long-term track data; k4, training a recurrent neural network model by using the recent track data and the long-term track data; k5, carrying out trajectory prediction by using the trained recurrent neural network model according to the real-time trajectory data of the user to obtain a trajectory prediction result. In the existing track prediction method, a recurrent neural network model can well capture short-term context information, but cannot well capture long-term context information, so that global dependence of historical tracks is difficult to realize, and the accuracy of track prediction is low. Therefore, the method divides the space-time trajectory data into the recent trajectory data, the short-term historical data and the long-term historical data, captures the long-term space-time relationship of each trajectory point in the long-term historical data by using the coding model of the first multi-head attention mechanism network, captures the short-term space-time relationship of each trajectory point in the short-term historical data by using the cyclic neural network coding model, adjusts the short-term space-time relationship according to the similarity of the long-term space-time relationship and the short-term space-time relationship by using the coding model of the second multi-head attention mechanism network to obtain the adjusted short-term space-time relationship, realizes the global dependency of the historical trajectory, trains the decoding model of the third multi-head attention mechanism network by using the recent trajectory data and the adjusted short-term space-time relationship, and then uses the decoding model as a trajectory prediction model, thereby improving the.
According to an embodiment of the present invention, as shown in fig. 2, there is provided a method for training a trajectory prediction model, including:
a1, acquiring space-time trajectory data of trajectory segments containing a plurality of targets represented by dense vectors, and cutting the trajectory data into near-term trajectory data, short-term history data and long-term history data from near to far according to time;
a2, training a coding model of the first multi-head attention mechanism network by using long-term historical data to capture the long-term spatiotemporal relation of each track point in the long-term historical data;
a3, training a recurrent neural network coding model by using short-term historical data to capture the short-term space-time relation of each track point in the short-term historical data;
a4, training a coding model of the second multi-head attention mechanism network by using the long-term space-time relationship and the short-term space-time relationship to adjust the short-term space-time relationship according to the similarity of the long-term space-time relationship and the short-term space-time relationship to obtain the adjusted short-term space-time relationship;
and A5, training a decoding model of the third multi-head attention mechanism network by using the recent track data and the adjusted short-term space-time relation to obtain a track prediction model.
For a better understanding of the present invention, each step is described in detail below with reference to specific examples.
In step a1, spatiotemporal trajectory data of trajectory segments containing a plurality of targets represented by dense vectors is acquired and cut from near to far into near-term trajectory data, short-term history data, and long-term history data according to time.
Preferably, step a1 includes:
a11, acquiring space-time trajectory data of a plurality of targets represented by sparse vectors from a database and preprocessing the space-time trajectory data to obtain a plurality of trajectory segments;
a12, performing data mapping on the plurality of track segments to map the sparse vectors to the dense vectors, and obtaining space-time track data of the track segments containing a plurality of targets represented by the dense vectors;
and A13, cutting the space-time trajectory data of the trajectory segment containing a plurality of targets represented by dense vectors into recent trajectory data, short-term historical data and long-term historical data from near to far according to preset cutting rules and time.
Preferably, step a11 includes:
a111, acquiring space-time trajectory data of a plurality of targets represented by sparse vectors from a database, wherein each target comprises one or more trajectory segments;
a112, preprocessing the acquired space-time trajectory data of a plurality of targets represented by sparse vectors, comprising:
a1121, cutting track segments of two adjacent track points with a time difference larger than or equal to a preset time threshold value from between the two adjacent track points so as to cut the track segments into two or more track segments;
a1122, deleting track segments containing track points, the number of which is less than a first preset number of points;
and A1123, deleting the target with the segment number of the track segment less than the preset segment number.
Preferably, in step A12, a Word2Vec model, an E L Mo model, a GPT model or a BERT model is used for data mapping of a plurality of track segments, a sparse vector is generally defined by representing a Word by a very long vector, the length of the vector is the size n of a dictionary, the components of the vector are only 1, and the positions of the rest are all 0.1 corresponding to the indexes of the Word in the dictionary.
Preferably, the value range of the preset time threshold is 24-80 hours. The value range of the first preset points is 5-8. The value range of the preset number of the sections is 3-7 sections. For example, the preset time threshold is 72 hours, the first preset number of points is 5, and the preset number of segments is 5 segments. If two adjacent track points in a track segment have a time interval of 75 hours and more than 72 hours, the track segment is divided into two segments from the interval between the two adjacent track points. After finding out all adjacent track points with the time difference exceeding 72 hours and completing cutting, identifying the point number contained in each track segment, deleting the track segments with the point number less than 5, and after deleting the track segments with the point number less than 5, identifying the number of the track segments of each target, and deleting the targets with the segment number less than 5 of the track segments contained. The technical scheme of the preferred embodiment can at least realize the following beneficial technical effects: the interference of some trace points with overlarge interval time on subsequent analysis is avoided; track segments with the number of track points smaller than the first preset point number and targets with the number of segments for deleting track segments smaller than the preset segment number can be eliminated, track segments with small information quantity and difficult capture of effective characteristics and targets can be eliminated, and effectiveness of follow-up track prediction is improved.
Preferably, step a13 includes:
a131, presetting three time intervals, namely in a first time node, between the first time node and a second time node and before the second time node since the track recording data, and respectively taking track sections belonging to the three time intervals as recent track data, short-term historical data and long-term historical data;
a132, cutting a plurality of track segments represented by dense vectors to cut the track segments spanning any time interval into two segments from two adjacent track points belonging to the two intervals;
and A133, cutting the track segment containing the track points in the short-term historical data, the number of which is greater than the second preset point number, into two segments from the track point of the track segment located at the second preset point number and the next track point of the track segment, and deleting the track segment, the number of which is less than the first preset point number, of the track points after the cutting.
Preferably, the values of the first time node and the second time node are set by a user according to needs. For example, the first time node is set as 3 days, the second time node is set as 9 days, which corresponds to three preset time intervals, namely within 3 days, between 3 and 9 days and before 9 days, and the track sections within 3 days, between 3 and 9 days and before 9 days are respectively used as the recent track data, the short-term history data and the long-term history data. Suppose that the track segments after one target cut respectively represent ST1,ST2,…,STkI.e. k track segments in total, and cutting and dividing the track segments according to the time interval, and obtaining long-term historyData is ST1,ST2,…,STk-4(ii) a Short term historical data of STk-3,STk-2,STk-1(ii) a Recent trajectory data is STk. The technical scheme of the preferred embodiment can at least realize the following beneficial technical effects: the space-time trajectory data is divided into three time intervals, so that corresponding space-time relations are captured respectively according to the short-term historical data and the long-term historical data, global dependence on historical trajectories is formed, and prediction is accurate.
Preferably, the second predetermined number of dots is 20. The technical scheme of the preferred embodiment can at least realize the following beneficial technical effects: because the short-term historical data is followed by capturing the space-time relationship among the track points by the cyclic neural network coding model, the cyclic neural network coding model is an autoregressive model, and when the number of the track points contained in a single track segment exceeds 20, the cyclic neural network coding model cannot well obtain the space-time relationship among the track points with longer distance, the track segment in the short-term historical data with the track points exceeding 20 is cut, so that the cyclic neural network coding model can well analyze the space-time relationship among the track points, and the accuracy of track prediction is improved.
Preferably, the database is for example an in-memory database redis, a distributed database hive or a relational database MySQ L.
In step A2, a coding model of the first multi-headed attention mechanism network is trained with long-term historical data to capture long-term spatiotemporal relationships of trace points in the long-term historical data.
Preferably, the long-term history data is input as a query, key, and value into a coding model of the first multi-headed attention mechanism network to capture long-term spatiotemporal relationships of the respective trace points according to context information of the respective trace points in the long-term history data. The long-term spatiotemporal relationship is a first vector matrix formed by weights of all track points in each vector dimension in long-term historical data. For the convenience of understanding, the form of the vector matrix of the present invention is described below by the schematic vector matrix of table 1, and the following second, third and fourth vector matrices are also in this form, and the vector dimensions of the first, second, third and fourth vector matrices are all the same, so they will not be separately illustrated in the following. The vector dimension can be set by a user according to the data quantity of the space-time trajectory data according to needs. In general, the vector dimensions are set to 450-550. Preferably, the vector dimension is set to 500.
TABLE 1 exemplary vector matrix
The vector dimension in table 1 is n-dimension, where n-dimension corresponds to n in X1n and X2n in table 1, and indicates that the vector corresponding to a track point includes n components. X11, X12, X13, X14, … …, X1n corresponding to the track point 1 and X21, X22, X23, X24, … …, X2n corresponding to the track point 2 are only symbols representing respective vectors in the invention for simplification. Each component is actually represented as a real or floating-point number. For example, vectors X11, X12, X13, X14, and X1n may be floating point numbers in the form of-2.0122, -0.5094, -0.5750, -2.6393, -0.0634, respectively.
According to one example of the present invention, the working principle of the coding model of the first multi-headed attention mechanism network is:
in the formula 1, the first and second groups of the compound,
i represents the ith layer model;
TE is the abbreviation of the Transformer Encoder model (transform Encoder), and is called TE model in the following abbreviation;
is the output of the TE model at the ith layer;
l N is short for linear Normalization (L initial Normalization), which is a Normalization method;
is the hidden state of the TE model at the ith layer;
FFN is an abbreviation of Feed-forward neural network (Feed-forward network);
equation 1 is illustrated below:
the coding model of the multi-head attention mechanism network is a multi-layer model, i is greater than or equal to 1.
The input to each layer is
(hidden state of ith layer), the output is
(output result of ith layer);
the specific data processing flow corresponding to the formula is as follows:
1. hiding state of ith layer
An afferent feedforward neural network FFN, with the purpose of causing the active feature to be activated;
2. feeding forward the result of the neural network
And
adding, in order to ensure that information is not lost due to the increase of the track length when the information is transmitted in the network;
3. adding the result
Normalization L N was performed in order to eliminate dimensional effects between data;
4. output of
I.e. equal to
In the formula 1, the first and second groups of the compound,
wherein the content of the first and second substances,
representing the output of the TE model at the i-1 layer;
MA is short for multi-head attention mechanism network (multi-head attention mechanism)1Indicating that it corresponds to a first multi-headed attention mechanism network, hereinafter abbreviated as MA1A network;
equation 2 is illustrated below:
equation 2 describes the process flow for obtaining the hidden state of the i-th layer, the input is
(output result of layer i-1), the output is
(hidden state of ith layer) as follows:
1. output results of the i-1 th layer
Afferent MA
1In the network, the purpose is to obtain
Internal association of (2);
2. mixing MA
1Results of the network
And
the purpose of the addition is to ensure that information is not lost due to the increase of the track length when the information is transmitted in the networkLosing;
3. adding the result
Normalization L N was performed in order to eliminate dimensional effects between data;
4. output of
Namely, it is
In formula 2, when i is 1,
wherein the content of the first and second substances,
representing the hidden state of the TE model at the layer 1;
MA1(x1,x1,x1) Is MA1The network requires three elements as inputs, namely query Q, key K and value V, x1The long-term historical data is input into a coding model of the first multi-head attention mechanism network as query, key and value;
equation 3 is illustrated below:
equation 3 describes the process flow for the layer 1 hidden state, where the input is long-term history data x
1The output is
The specific process is as follows:
1. long-term history data x1Afferent MA1In the network, the purpose is to obtain the internal relation between long-term historical tracks x;
2. attention finding of multiple heads MA1(x1,x1,x1) And x1The purpose of the addition is to make the information not to be transferred in the network because of the track lengthIncrease in time and loss of information;
3. adding the result x1+MA1(x1,x1,x1) Normalization L N was performed in order to eliminate dimensional effects between data;
4. output of
I.e. equal to L N (x)
1+MA(x
1,x
1,x
1))。
To better illustrate the MA network, it is further illustrated by its formula below. The MA network is a novel neural network model, and its formula is:
MA(Q,K,V)=Concat(head1,head2,…,head8)WOformula 4;
in the formula 4, the first and second groups of the compound,
q represents a query (query), which is one of the inputs to the Multi-head;
k represents a key (key) which is one of the inputs of the Multi-head;
v represents a value (value), which is one of the inputs to the Multi-head;
head1,head2,…,head8respectively representing the result of each Attention;
o represents an output;
WOa weight matrix representing the output;
the idea corresponding to equation 4 is:
MA(Q,K,V)=Concat(head1,head2,…,head8)WOmeans that 8 Attention-processed (Attention) results are connected by a splicing operation (Concat) and then multiplied by the output weight matrix WOAnd the obtained result is the result of the MA network.
The specific processing flow of the MA network is as follows:
1. the result head of 8 attentions1,head2,…,head8After splicing (concat), multiplying by the output weight matrix WOThe aim is to focus on the information of different subspaces simultaneously;
2. output MA (Q, K, V), i.e. Concat (head)1,head2,…,head8)WO;
In the formula 4, the first and second groups of the compound,
wherein j is a positive integer less than or equal to 8, headjRepresenting head1,head2,…,head8One of (a);
representing a weight matrix for input Q in the jth head;
representing a weight matrix for input K in the jth head;
representing a weight matrix for input K in the jth head;
equation 5 is illustrated below:
equation 5 describes the specific flow of attention processing, and the input is abbreviated as Q, K, V (actually the last equation)
As input), the output is Attention (Q, K, V).
In the formula 5, the first and second groups,
wherein, Attention (Q, K, V) represents the result of Attention processing (Attention), and then Q, K, V are inputs, respectively;
softmax represents a general method for calculating scores and classifying in a neural network;
KTa transpose matrix representing K;
dkrepresenting a threshold for reducing errors due to data imbalance;
the method comprises the following specific steps:
1. multiplying the transposed matrixes of Q and K to obtain QKTIn order to calculate Q, K the relationship between;
2. will QK
TIs divided by
The purpose is to reduce the error caused by data unbalance;
3. will be provided with
Performing softmax calculation to obtain a score between 0 and 1, wherein the purpose is to obtain probability;
4. will be provided with
Multiplying with V in order to calculate the relationship between Q, K, V;
5. output Attention (Q, K, V), i.e.
In step A3, the recurrent neural network coding model is trained with the short-term historical data to capture the short-term spatiotemporal relationship of the trace points in the short-term historical data.
Preferably, the short-term historical data is input into the recurrent neural network to capture the short-term spatiotemporal relationship of each trace point according to the context information of each trace point in the short-term historical data. And the short-term space-time relation is a second vector matrix formed by the track points in the short-term historical data relative to the weights of the vector dimensions.
According to one example of the present invention, the working principle of the recurrent neural network coding model is:
in the formula 7, the first and second groups,
l E represents a recurrent neural network coding model (L STM Encoder), L STM represents a long-Short Term Memory network model (L ong Short-Term Memory), which is a relatively general model of recurrent neural networks;
the output of the L E module representing the ith layer;
a hidden layer state of L E module representing the ith layer;
FFN denotes a feed-forward neural network;
equation 7 is illustrated below:
the cyclic neural network coding model is a multilayer model, namely i is greater than or equal to 1. It should be noted that the number of layers of the cyclic neural network coding model is consistent with that of the multi-head attention mechanism network coding model;
the input to each layer is
(hidden state of ith layer), the output is
(output result of ith layer);
the specific data processing flow corresponding to formula 7 is:
1. hiding state of ith layer
An afferent feedforward neural network FFN, with the purpose of causing the active feature to be activated;
2. feeding forward the result of the neural network
And
is added in order toWhen information is transmitted in the network, information loss caused by the increase of the track length can be avoided;
3. adding the result
Normalization L N was performed in order to eliminate dimensional effects between data;
4. output of
I.e. equal to
In the formula 7, the first and second groups,
wherein the content of the first and second substances,
the output of the L E module at level i-1;
to represent
Is the input to L STM;
equation 8 is illustrated below:
equation 8 describes the process flow for obtaining the hidden state of the i-th layer, the input is
(output result of layer i-1), the output is
(hidden state of ith layer) as follows:
1. output results of the i-1 th layer
In the afferent recurrent neural network L STM, the purpose is to acquire
The timing relationship of (1);
2. combining the results of the recurrent neural network
And
adding, in order to ensure that information is not lost due to the increase of the track length when the information is transmitted in the network;
3. adding the result
Normalization L N was performed in order to eliminate dimensional effects between data;
4. output of
Namely, it is
Wherein, when i is equal to 1,
represents the hidden layer state of layer 1L E module;
x2representing input trajectory data, x2Is short-term historical data;
equation 9 is illustrated below:
equation 9 describes the process flow for the layer 1 hidden state, where the input is the short-term history trace x
2The output is
The specific process is as follows:
1. will short-term history track x2In the afferent recurrent neural network L STM, the purpose is to obtain a short-term historical trajectory x2The timing relationship of (1);
2. l STM (x) of the result of the recurrent neural network2) And x2Adding, in order to ensure that information is not lost due to the increase of the track length when the information is transmitted in the network;
3. adding the result x2+LSTM(x2) Normalization L N was performed in order to eliminate dimensional effects between data;
4. output of
I.e. equal to L N (x)
2+LSTM(x
2))。
Preferably, the recurrent neural network is a neural network model. The traditional recurrent neural network formula is:
ht=fw(concat(ht-1,xt) Equation 10;
wherein h istRepresenting the hidden layer state at the t-th moment; x is the number oftAn input representing a time t; f. ofwRepresenting a function of the parameter w, expanded specifically to fw(x)=tanh(wx)。
In step a4, the coding model of the second multi-headed attention mechanism network is trained with the long-term spatiotemporal relationship and the short-term spatiotemporal relationship to adjust the short-term spatiotemporal relationship according to the similarity of the long-term spatiotemporal relationship and the short-term spatiotemporal relationship to obtain an adjusted short-term spatiotemporal relationship.
Preferably, the long-term spatiotemporal relationship is used as a query, and the short-term spatiotemporal relationship is used as a key and a value to be input into a coding model of the second multi-head attention mechanism network so as to adjust the short-term spatiotemporal relationship according to the similarity of the long-term spatiotemporal relationship and the short-term spatiotemporal relationship, and obtain the adjusted short-term spatiotemporal relationship. The adjusted short-term space-time relationship is a third vector matrix formed by adjusting the similarity of the long-term space-time relationship and the short-term space-time relationship of the weights of the track points relative to the vector dimensions in the short-term historical data. The technical scheme of the embodiment can at least realize the following beneficial technical effects: according to the invention, the short-term spatio-temporal relationship can be adjusted according to the similarity of the long-term spatio-temporal relationship and the short-term spatio-temporal relationship to obtain the adjusted short-term spatio-temporal relationship, so that the global dependence of a historical track is realized, and after the processing of the step, in the score of the final candidate point, compared with the score of the candidate point output by other models, the probability of the partial candidate point far away from the 'true value' is reduced, and the probability of the partial candidate point close to the 'true value' is improved, wherein the 'true value' refers to: and the position point is really located at the next moment, so that the accuracy of the track prediction is improved.
Preferably, the short-term spatiotemporal relationship is adjusted according to the similarity between the long-term spatiotemporal relationship and the short-term spatiotemporal relationship, and the adjusted short-term spatiotemporal relationship is obtained by a multi-head attention mechanism network, and the working principle is as follows:
wherein, MA2Indicating that it corresponds to a second multi-headed attention mechanism network, hereinafter abbreviated as MA2A network;
OCArepresenting the adjusted short-term spatiotemporal relationship;
n represents the model of the Nth layer, and the number of layers of the model is N layers (6 layers) in total;
equation 11 is illustrated below:
in equation 11, the long-term spatiotemporal relationship
Short term spatiotemporal relationships
And short term spatiotemporal relationships
Respectively as queriesQ, bond K and value V into MA
2In the network, the relation between the query Q, the key K and the value V can be calculated, so that the adjustment effect is achieved.
In step a5, a decoding model of the third multi-headed attention mechanism network is trained using the recent trajectory data and the adjusted short-term spatiotemporal relationship to obtain a trajectory prediction model. Namely, the decoding model of the third multi-head attention mechanism network is trained by using the recent track data and the adjusted short-term space-time relation and then is used as a track prediction model.
Preferably, the decoding model of the third multi-head attention system network includes a multi-head attention system model with a mask and a normal multi-head attention system model, the near-term trajectory data is used as a query, a key and a value and is input into the multi-head attention system model with the mask to capture the near-term spatiotemporal relationship of each trajectory point according to the context information of each trajectory point in the near-term historical data, the near-term spatiotemporal relationship is used as a query, and the adjusted short-term spatiotemporal relationship is input into the normal multi-head attention system model as a key and a value to train the decoding model of the third multi-head attention system network. And the recent space-time relationship is a fourth vector matrix formed by the weights of the track points in the recent track data relative to the vector dimensions.
According to an example of the present invention, the operation principle of training the decoding model of the third multi-headed attention mechanism network is:
wherein the content of the first and second substances,
represents the output of an i-th layer Decoder (Decoder);
represents a hidden layer state of an i-th layer Decoder (Decoder);
FFN denotes a feed-forward neural network;
equation 12 is illustrated below:
the decoding model of the third multi-head attention mechanism network is a multi-layer model, i is greater than or equal to 1. It should be noted that, for a multi-head attention mechanism network, the number of coding model layers is consistent with that of decoding model layers, and is 6;
in equation 12, the input for each layer is
(hidden state of ith layer), the output is
(output result of ith layer);
the specific data processing flow corresponding to formula 12 is:
1. hiding state of ith layer
An afferent feedforward neural network FFN, with the purpose of causing the active feature to be activated;
2. feeding forward the result of the neural network
And
adding, in order to ensure that information is not lost due to the increase of the track length when the information is transmitted in the network;
3. adding the result
Normalization L N was performed in order to eliminate dimensional effects between data;
4. output of
I.e. equal to
In the formula 12, the first and second groups of the formula,
wherein the content of the first and second substances,
represents an input layer state of an i-th layer Decoder (Decoder);
OCArepresenting the adjusted short-term spatiotemporal relationship;
a normal multi-head attention mechanism model representing a decoding model of the third multi-head attention mechanism network
O
CA、O
CAMaking the result produced by input Q, K, V;
the formula is illustrated below:
the formula describes the process flow of obtaining the hidden state of the i-th layer, and the input is
(i-th level hidden state) and adjusted short-term spatio-temporal relationship O
CAThe output is
(hidden state of ith layer) as follows:
1. hiding the ith layer from the state
With adjusted short-term spatiotemporal relationship O
CAAfferent MA
3In the network, the purpose is to obtain the internal relation between the two;
2. will result in a plurality of attention
And
adding, in order to ensure that information is not lost due to the increase of the track length when the information is transmitted in the network;
3. adding the result
Normalization L N was performed in order to eliminate dimensional effects between data;
4. output of
Namely, it is
In the formula 13, the first and second groups,
wherein the content of the first and second substances,
represents the output of an i-1 layer Decoder (Decoder);
a masked multi-head attention mechanism model representing a decoding model of the third multi-head attention mechanism network;
equation 14 is illustrated below:
the formula describes the process flow of obtaining the i-th level hidden state, and the input is
(output result of layer i-1), the output is
(secondary hidden state of ith layer), specifically as follows:
1. output results of the i-1 th layer
Multi-head attention mechanism MMA with mask
3In order to obtain
Internal association of (2);
2. multiple attention results with masks
And
adding, in order to ensure that information is not lost due to the increase of the track length when the information is transmitted in the network;
3. adding the result
Normalization L N was performed in order to eliminate dimensional effects between data;
4. output of
Namely, it is
Wherein, when i is equal to 1,
in the formula 15, the first step is,
x3is recent trajectory data;
the formula describes the processing flow of the 1 st level hidden state, and the input is recent track data x
3The output is
The specific process is as follows:
1. the recent track data x3As MMA3Model Q, K and V afferent to multi-head attention mechanism MMA with mask3In order to obtain recent trajectory data x3Inter-relation between
2. MMA-results of multiple attention with masks3(x3,x3,x3) And x3The purpose of the addition is to enable information to be transmitted in the network without loss of information due to increase of track length
3. Adding the result x3+MMA3(x3,x3,x3) Normalization L N was performed in order to eliminate dimensional effects between data;
4. output of
I.e. equal to L N (x)
3+MMA
3(x
3,x
3,x
3))。
According to an embodiment of the present invention, there is provided a trajectory prediction method including: according to the real-time trajectory data of the user, the trajectory prediction model obtained by the method for training the trajectory prediction model in the previous embodiment is used for performing trajectory prediction, and a trajectory prediction result is obtained.
Preferably, the trajectory prediction method includes:
b1, acquiring real-time trajectory data expressed by sparse vectors of the user and performing data mapping on the real-time trajectory data to map the sparse vectors to dense vectors to obtain the real-time trajectory data expressed by the dense vectors of the user;
b2, inputting the real-time track data expressed by dense vectors of the user as query, key and value into a multi-head attention mechanism model with a mask of the track prediction model to obtain a decoding result;
b3, inputting the decoding result into the full-link layer of the track prediction model to obtain the predicted probability values of a plurality of candidate points of the user at the next moment;
and B4, acquiring the candidate point with the maximum probability value as the predicted track point of the user at the next moment, and outputting the candidate point as a track prediction result.
Preferably, the candidate point is a collective term for all the location points where the next location point of the target may appear. Assuming that there are m location points, the scores of the first m candidate points are consistent, and after the processing of the method of the present invention, the scores (probability values) of the m candidates are different, and the higher the probability value is, the higher the probability that the next location point of the target is the location point is.
According to an embodiment of the present invention, there is provided an electronic apparatus including: one or more processors; and a memory, wherein the memory is to store one or more executable instructions; the one or more processors are configured to perform the steps of the methods described in the foregoing embodiments via execution of one or more executable instructions.
It should be noted that, although the steps are described in a specific order, the steps are not necessarily performed in the specific order, and in fact, some of the steps may be performed concurrently or even in a changed order as long as the required functions are achieved.
The present invention may be a system, method and/or computer program product. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied therewith for causing a processor to implement various aspects of the present invention.
The computer readable storage medium may be a tangible device that retains and stores instructions for use by an instruction execution device. The computer readable storage medium may include, for example, but is not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing.
Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.