CN116049887A - Privacy track release method and device based on track prediction - Google Patents

Privacy track release method and device based on track prediction Download PDF

Info

Publication number
CN116049887A
CN116049887A CN202310080714.7A CN202310080714A CN116049887A CN 116049887 A CN116049887 A CN 116049887A CN 202310080714 A CN202310080714 A CN 202310080714A CN 116049887 A CN116049887 A CN 116049887A
Authority
CN
China
Prior art keywords
track
user
grid
sequence
privacy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310080714.7A
Other languages
Chinese (zh)
Inventor
雷涵哲
陈永录
廖琦
姜润浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202310080714.7A priority Critical patent/CN116049887A/en
Publication of CN116049887A publication Critical patent/CN116049887A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/02Banking, e.g. interest calculation or account maintenance

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • General Engineering & Computer Science (AREA)
  • Development Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Data Mining & Analysis (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Remote Sensing (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Technology Law (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a privacy track release method and device based on track prediction, wherein the method comprises the following steps: acquiring a track data set uploaded by a user; dividing a geographical area where the user track is positioned in the data set into grid areas; mapping the track data to the grid area and encoding the track data into a grid cell sequence; constructing a first-order Markov model for the user track based on the grid region and the initial grid unit in the track data; generating and distributing a user confusion track sequence by using a Markov model and adding Pufferfish privacy protection noise. In the invention, the space grid dividing method is adopted to encode the track data, so that the discretization degree of the user track is improved, in addition, the Markov model is adopted to model the track data, the state transition probability among the track grid units is privacy-protected by combining Pufferfish privacy, the continuous position sequence conforming to the original track characteristics is synthesized, and good data utility can be presented.

Description

Privacy track release method and device based on track prediction
Technical Field
The invention relates to the technical field of privacy protection data processing, and can be used in the financial field, in particular to a privacy track release method and device based on track prediction.
Background
The location service in the intelligent device can collect a large number of location points about the movement of the user with very high accuracy, and these widely covered data not only contain geographical location information, but also contain related features such as time stamps and time intervals in the location point context, and by analyzing the continuous location sequence, an attacker can infer the daily living habits, the activity sequence and the movement pattern of the user, which can cause serious privacy leakage problems. Many studies have shown that the risk of privacy leakage of the distribution track data set is high, and that in one track it is sufficient to uniquely identify 95% of the individuals in the data set by only 4 points.
At present, mobile intelligent business of banks is becoming more and more popular, and digital transformation development of banks is accelerated by relying on artificial intelligent technology and abundant position data resources. Through intelligent positioning and personalized track service, nearby website inquiry, route planning, personalized activity recommendation and other services can be provided, and through analysis of user track data, user behavior understanding and line track depiction are improved, and accurate marketing recommendation and bank management of banks are realized. Most track protection methods are currently based on spatially varying techniques such as position generalization or position perturbation, i.e. replacing the real position by an area or a noisy position. The method only considers static scenes and position disturbance on a single time stamp, is suitable for the MCS platform to mine the statistical information of the track data set, and extracts the scenes such as track features. For the continuous track sequence release of the user, the time correlation between the positions of the mobile user is difficult to consider by the disturbance of a single position, and an attacker can launch inference attack according to the continuous query result, so that the optical position adding noise is not suitable for the continuous release scene any more.
In addition, the disturbance track synthesized based on the track motion state and the privacy protection technology can carry a large amount of noise, the noise has randomness and complexity, the deviation between the disturbance track and the original track can be increased, the inherent nonlinear characteristics between the user and the position context are destroyed, and the accuracy of the disturbed data is greatly reduced. How to control the error between the disturbance track and the original track and avoid noise to destroy the space-time correlation between track positions is also a problem to be solved.
Disclosure of Invention
In view of the above, the present invention provides a method and apparatus for issuing a privacy track based on track prediction to solve at least one of the above-mentioned problems.
In order to achieve the above purpose, the present invention adopts the following scheme:
according to a first aspect of the present invention, there is provided a method for publishing a privacy track based on track prediction, the method comprising: acquiring a user track data set uploaded by a user; uniformly dividing the geographical area where the user track is located in the user track data set into grid areas; mapping the user track data in the user track data set to the grid area, and encoding the user track data into a grid cell sequence; constructing a first order Markov model for the user trajectory based on the grid region and the initial grid cells in the user trajectory data, the first order Markov model comprising a state transition matrix M, each element M of the state transition matrix ij Representing the current grid cell V i Transfer to the next grid cell V j Probability of (2); and fixing a start grid cell and a stop grid cell in the user track data, sequentially generating intermediate points between the start grid cell and the stop grid cell by using the Markov model, linking the selected intermediate points with the start grid cell and the stop grid cell to generate and issue a user confusion track sequence, and adding Pufferfish privacy protection noise when the user confusion track sequence is generated.
According to a second aspect of the present invention, there is provided a privacy track distribution device based on track prediction, the privacy track distribution device comprisingThe device comprises: the data acquisition unit is used for acquiring a user track data set uploaded by a user; the grid dividing unit is used for uniformly dividing the geographic area where the user track is in the user track data set into grid areas; a mapping encoding unit, configured to map user track data in the user track data set to the grid region, and encode the user track data into a grid cell sequence; a model construction unit for constructing a first order Markov model for the user trajectory based on the grid region and the initial grid unit in the user trajectory data, the first order Markov model comprising a state transition matrix M, each element M of the state transition matrix ij Representing the current grid cell V i Transfer to the next grid cell V j Probability of (2); the track generation and release unit is used for fixing a start grid unit and a stop grid unit in the user track data, sequentially generating intermediate points between the start grid unit and the stop grid unit by using the Markov model, linking the selected intermediate points, generating and releasing a user track sequence by using the start grid unit and the stop grid unit, and adding Pufferfish privacy protection noise when the user track sequence is generated.
According to a third aspect of the present invention there is provided an electronic device comprising a memory, a processor and a computer program stored on said memory and executable on said processor, the processor implementing the steps of the above method when executing said computer program.
According to a fourth aspect of the present invention there is provided a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the above method.
According to a fifth aspect of the present invention there is provided a computer program product comprising computer programs/instructions which when executed by a processor implement the steps of the above method.
According to the technical scheme, the method for coding the user track data by using the space grid division method improves the discretization degree of the user track, aims at the problem that random noise damages the space-time correlation of the track data in the position protection, models the track data by using a Markov model, performs privacy protection on the state transition probability among track grid units by combining with Pufferfish privacy, synthesizes a continuous position sequence conforming to the original track characteristics, and can present good data utility.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. In the drawings:
fig. 1 is a schematic flow chart of a privacy track publishing method based on track prediction according to an embodiment of the present application;
FIG. 2 is a schematic diagram of user trajectory mapping within a grid area according to an embodiment of the present application;
fig. 3 is a schematic diagram of a generation flow of a state transition matrix according to an embodiment of the present application;
FIG. 4 is a flowchart of generating and distributing a user confusion track sequence according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a Markov-based location transition probability model provided by an embodiment of the present application;
fig. 6 is a schematic structural diagram of a privacy track publishing device based on track prediction according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a track generation and distribution unit provided in an embodiment of the present application;
fig. 8 is a schematic block diagram of a system configuration of an electronic device according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention will be described in further detail with reference to the accompanying drawings. The exemplary embodiments of the present invention and their descriptions herein are for the purpose of explaining the present invention, but are not to be construed as limiting the invention.
It should be noted that the method and the device for publishing the privacy track based on track prediction disclosed in the present application can be used in the technical field of privacy protection data processing, and also can be used in any field except the technical field of privacy protection data processing, and the application field of the method and the device for publishing the privacy track based on track prediction disclosed in the present application is not limited.
Fig. 1 is a schematic flow chart of a privacy track publishing method based on track prediction, which is provided in an embodiment of the present application, and the embodiment describes the present application from a server side, where the privacy track publishing method of the present embodiment is performed by the server side having a main body for privacy protection of a user intelligent device, where the main body may be, for example, various financial institutions, and the method includes the following steps:
step S101: and acquiring a user track data set uploaded by the user.
The user uploads the user track data set through the terminal intelligent device, and the terminal intelligent device can be a mobile phone, a tablet computer, a portable computer, an intelligent wearable device and other devices with positioning functions. For example, when a bank application provides a nearby website query, online reservation, interaction between an intelligent self-service terminal and a mobile phone APP, and personalized website activity recommendation for a user, the user needs to acquire position information, and when the user enjoys the position-based service, the user needs to open position permission and upload corresponding position and track information. For a user track, it can be represented by Tr:
Tr={(p 1 ,t 1 ),...,(p i ,t i ),...,(p n ,t n )},1<i<n;
wherein p is i Representing longitude and latitude coordinates, t i Representing a time stamp.
The user trajectory data set acquired in the present embodiment may include a plurality of user trajectories Tr of a plurality of users.
Step S102: and uniformly dividing the geographical area where the user track is in the user track data set into grid areas. Specifically, in this embodiment, the geographical area where the user track is located is determined according to the longitude and latitude coordinates in the user track data set, and then the geographical area is uniformly divided into grid areas of |n|×|n|, where the geographical area is selected based on covering the user track, and the geographical area can be properly enlarged based on covering the user track, which is not limited in this embodiment.
Step S103: and mapping the user track data in the user track data set to the grid region, and encoding the user track data into a grid cell sequence.
Preferably, in this embodiment, the user trajectory data may be modeled as a directed graph G (V, E) to map the user trajectory data to the grid region, where V and E represent the set of nodes and edges in the directed graph, respectively, and then the user trajectory data is encoded as a sequence of grid cells. Fig. 2 is a schematic diagram of mapping a user track in a grid area according to an embodiment of the present application, where a left graph is a schematic diagram of an original track of a user in the grid area, and a right graph is a schematic diagram of the track of the user in the grid area after being converted into a directed graph, and user track data is encoded into a grid cell sequence and can be expressed as tr= { (v) 1 ,t 1 ),...,(v j ,t j ) V.epsilon.V.1 < j.ltoreq.n, where V j As a node in the directed graph, t j Is a time stamp, so through the mapping encoding operation of this step, the encoding result of the user track data in fig. 2 is: { (2,3), (2,4), (2,5), (3,5), (3,6), (4,6), (5,6), (6,6), (7,6) }.
Step S104: constructing a first order Markov model for the user trajectory based on the grid region and the initial grid cells in the user trajectory data, the first order Markov model comprising a state transition matrix M, each element M of the state transition matrix ij Representing the current grid cell V i Transfer to the next grid cell V j Is a probability of (2).
First, a simple description of the markov model is given below, with a random sequence { X (n), n=0, 1,2, }, E being a random state space of X (n). If for any number of non-negative integers and any natural number, the following condition is satisfied:
P{X(n m +k)|X(n 1 ),X(n 2 ),...,X(n m )}=P{X(n m +k)|X(n m )}
then it is referred to as a first order markov model. The equation shows that the state at the next moment only depends on the current moment and is irrelevant to the past moment, and the model constructed by the principle is the Markov model.
Thus, the first order Markov model of the present embodiment may comprise two parts: a part of the initial grid cells v are initial states and correspond to the user track start The other part is the state transition probability, i.e. the probability of transitioning from one state to another. Since the geographical area is uniformly divided into the grid areas of |n|×|n| in the above steps, the first-order markov model of the present embodiment also includes a state transition matrix M of |n|×|n|.
Preferably, as shown in fig. 3, a schematic flow chart of generating a state transition matrix according to an embodiment of the present application is provided, and the generating process of the state transition matrix M may include the following steps:
step S301: the user trajectory Tr and grid cell V are calculated using (1) i 、V j Is a transition count value of (a).
Figure BDA0004067390240000051
Wherein, |V i ,V j I represents track segment V i V j The number in track Tr, inTr, indicates in user track Tr, |V i ,V j The i inTr is integrally represented as track segment V in the user track Tr i V j Is, |Tr| -1 represents the number of track segments;
step S302: traversing the entire user trajectory data set, counting all trajectories including trajectory segments V using the following equation (2) i V j Count value O of (2) ij
O ij =∑ Tr∈D φ(Tr,v i ,v j ) (2)
Wherein D is a user trajectory dataset;
step S303: repeating steps S301-S302 until the transfer count of all track segments is completed, and generating a transfer count matrix O, wherein the size of the transfer count matrix O is also |N|×|N|;
step S304: each element M of the state transition matrix M is calculated by the following equation (3) ij Generating a state transition matrix M of |N|×|N|).
M ij =O ij /∑ j O ij (3)
Step S105: and fixing a start grid cell and a stop grid cell in the user track data, sequentially generating intermediate points between the start grid cell and the stop grid cell by using the Markov model, linking the selected intermediate points with the start grid cell and the stop grid cell to generate and issue a user confusion track sequence, and adding Pufferfish privacy protection noise when the user confusion track sequence is generated.
In this embodiment, after the initial grid cell and the final grid cell in the user trajectory data are fixed, there are a plurality of intermediate points sequentially generated by using the markov model, which are different in probability, so that a plurality of intermediate points need to be selected finally, and a user confusion trajectory sequence is formed together with the initial grid cell and the final grid cell for distribution. In addition, in the embodiment, when the user confusion track sequence is generated, the Pufferfish privacy protection noise is added to disturb the original position of the user.
The following briefly describes the Pufferfish privacy framework defined by (S, Q, Θ) three parts: s represents a group of secrets containing sensitive information, which cannot be revealed to an attacker; q represents a set of secret pairs; Θ represents a correlation distribution between data. When the data satisfies the distribution Θ, the secret in Q is indistinguishable to an attacker in the Pufferfish framework. Specifically defined is the following formula (4):
Figure BDA0004067390240000061
wherein D represents the dataset, ε represents the privacy budget, algorithm M satisfies Pufferfish privacy, ω εRange (M), Θ represents the relevance distribution of dataset D,(s) i ,s j ) E Q represents a set of secret pairs, P (s i |Θ)≠0,P(s j |Θ)≠0。
Preferably, in this embodiment, when the user confusion track sequence is generated and the Pufferfish privacy protection is provided for the state transition probability, the laplace mechanism may also be used to complete data noise disturbance. And to accurately measure the distance between probability distributions, the sensitivity can be calculated using Wo Sesi tam distance. Therefore, adding the pushfish privacy protection noise when generating the user confusion track sequence in the present embodiment includes:
first the sensitivity W is calculated using a Wo Sesi tan distance, wo Sesi tan distance (Wasserstein distance) the distance between two probability distributions is measured, which is specifically defined as the following equation (5):
Figure BDA0004067390240000071
Π(P μ ,P λ ) Is two distributions P μ And P λ Is a set of all possible joint probability distributions. Firstly, sampling (x, y) -gamma for each possible joint probability distribution gamma to obtain a real sample x and a generated sample y; then calculate the distance of the pair of samples ||x-y||; finally, calculating the sample pair mean value E under the joint probability distribution gamma (x,y)~γ [||x-y||]The lower bound is obtained from the mean of all possible joint probability distributions, i.e. Wo Sesi tan distance.
Based on the above definition of Wo Sesi tame distance and Pufferfish privacy, in order to calculate the sensitivity W, the current position v can be defined in this embodiment p And adjacent toAnother position v q Forming a secret pair whose joint probability distribution is divided into P (v j |v p ) And P (v) j |v q ) The sensitivity W is represented by the following formula (6):
Figure BDA0004067390240000072
wherein sup represents the upper bound of Wo Sesi tandistance;
after the sensitivity W is obtained, a state transition matrix M 'after the Pufferfish privacy protection noise is added is obtained based on the laplace mechanism, and the size of the state transition matrix M' is also |n|×|n|:
Figure BDA0004067390240000073
where Z represents the noise added per transition probability, conforms to the Laplacian distribution, and ε represents the privacy budget.
Further preferably, as shown in fig. 4, the generating and publishing the user confusion track sequence in this step may include the following steps:
step S401: generating in turn a starting grid cell v using the Markov model start And terminating grid cell v end Intermediate points between.
Step S402: generating a plurality of initial grid cells v according to the transition probabilities of different intermediate points start To termination of grid cell v end A sequence of grid tracks therebetween.
In the following description, a markov-based location transition probability model is described, and fig. 5 is a schematic diagram of the markov-based location transition probability model according to an embodiment of the present application. Assuming that the current track position node is grid cell v 5 Due to the specificity of the grid area, the next track position node can only be the surrounding 8 grid cells and the grid cell where it is located. In the differential privacy process, the global sensitivity needs to be calculated in consideration of all state transition cases. Thus, the falseLet the current position node be v i Deducing the next position node as v j The state transition cases of the nine adjacent grid cells can be obtained based on the markov model, thereby obtaining a joint probability distribution of the state transition of the next position, corresponding to the model in fig. 5, the joint probability distribution of the position nodes of which is shown in the following table 1 (also not shown in the probability table 1 not shown in fig. 5, respectively):
TABLE 1
Figure BDA0004067390240000081
In the present embodiment, { v } is known start ,v end And M ', a user confusion track sequence Tr ' = { v ' 1 ,v' 2 ,...,v' s }, v' 1 =v start ,v' s =v end . To determine v 2 To v s-1 According to the geographical area A where the node of the previous position is located, 9 grid cells are all arranged. For each grid cell v k E A, all have state transition probability, and according to Markov prediction, the current position node v can be obtained j J is more than or equal to 2 and less than or equal to s-1. Using v prev Representing a precursor node v j-1 Then choose v k The disturbance trajectory probability as the next position point is the following formula (8):
Figure BDA0004067390240000082
wherein M' start,k To start from the node v 1 To the current node v k The transition probability of the (K-1) order state transition matrix is calculated by a Markov C-K (Chapman-Kolmogrov) equation:
P(v k |v 1 )=P(v 2 |v 1 )L P(v k |v k-1 )=P k-1 (9)
step S403: multiplying the transition probabilities of the grid cells in each track sequence to obtain the total probability of each grid track sequence.
From probabilities of different grid cells, a Markov model may generate a plurality of v start To v end Each grid track sequence can multiply the transition probabilities of grid units in the grid track sequence according to the formula (8) to obtain a total probability.
Step S404: and selecting the grid track sequence with the highest total probability as a user confusion track sequence of the original track of the user for release.
Preferably, in this embodiment, the grid track sequence with the highest total probability may be selected first, then the grid track sequence with the highest total probability is converted into the position sequence, and then the position sequence is issued as the user confusion track sequence of the user original track.
According to the technical scheme, the method for coding the user track data by using the space grid division method improves the discretization degree of the user track, aims at the problem that random noise damages the space-time correlation of the track data in the position protection, models the track data by using a Markov model, performs privacy protection on the state transition probability among track grid units by combining with Pufferfish privacy, synthesizes a continuous position sequence conforming to the original track characteristics, and can present good data utility.
The following describes the above beneficial effects of the present application further by a specific experimental evaluation method, and the experimental environment, experimental data and evaluation method of the simulation experiment performed by the present invention are as follows:
experimental environment: the software and hardware environment of this chapter experiment is: CPU adopts Intel (R) Core (TM) [email protected],16GB memory, 64 bit Windows 10 operating system, and programming language is Python 3.7.3 version.
Experimental data: the public track data set Geolife data set and the Taxi data set are adopted, and the Geolife data set collects GPS tracking information of 182 users in more than 5 years and comprises 17621 track data, wherein most of the data are located in X city. The experiment selected 12304 pieces of trajectory data and divided the geographical area into 100 x 100 grid cells. The Taxi data set collects GPS tracks of 442 taxis in Y city for 1 year, 30000 track data are selected from the GPS tracks through experiments, and the geographical area is divided into 100×100 grid cells.
The evaluation method comprises the following steps: run Error (Trip Error) evaluation.
Let the original user trajectory dataset be D and the published disturbance dataset be D' in a given grid area. Firstly, calculating the travel distribution of the user track, wherein the travel of the position of the moving object is v i →v j Then
Figure BDA0004067390240000091
The representation comprising a stroke v i →v j A subset of all user trajectories D of (c). Thus, the travel distribution of the user trajectory can be obtained as formula (10):
Figure BDA0004067390240000092
where |D| represents the number of trajectories, and in the same way the travel distribution R 'of the disturbance data set D' can be obtained.
Deviation of the travel distribution is then measured, and the similarity of the two probability distributions is measured, typically using JS divergence, with the following formula (11):
Figure BDA0004067390240000101
wherein TE is the travel Error (Trip Error), KL (·) is KL divergence, and the closer the JS divergence value is 0, the smaller the difference between matrixes is indicated; the closer the JS divergence value is to 1, the larger the difference between the matrixes is, and therefore the travel error between the tracks can be obtained.
In order to verify the effectiveness of the present invention, the present invention is compared with the following method.
ngram: variable length track sequence distribution method. And calculating transition probability by using the frequency density of the sub-track, sequentially detecting the position of the next track according to the (n-1) order Markov model, and generating a noise track sequence.
DPT: a differential privacy track release method based on a hierarchical reference system. The method measures the correlation through the geographic position hierarchy of the track, influences the direction weight of random walk in the Markov position prediction process, and carries out Laplace disturbance on the transition probability.
Simulation experiment results: the privacy budgets are set to 0.1, 0.5, 1.0 and 2.0, respectively, according to the privacy protection level from large to small. All experiments were run in 15 replicates and the average of the results was taken.
The run-length errors of the methods performed in the dataset gelife are shown in table 2:
TABLE 2
Privacy budgets ngram DPT The invention is that
0.1 0.455 0.421 0.247
0.5 0.453 0.397 0.228
1 0.441 0.376 0.199
2 0.41 0.359 0.152
The travel error of each method execution in the dataset Taxi is shown in table 3:
TABLE 3 Table 3
Privacy budgets ngram DPT The invention is that
0.1 0.266 0.459 0.273
0.5 0.231 0.457 0.25
1 0.225 0.433 0.218
2 0.218 0.429 0.168
Under the data set Geolife, the travel errors of the invention under different privacy budgets are lower than those of the other two methods ngram and DPT; under the data set Taxi, the travel errors of the invention under different privacy budgets are lower than those of the DPT method. However, when epsilon=0.1 and epsilon=0.5, the stroke error of the invention is slightly higher than that of the ngram method, and when epsilon=1.0 and epsilon=2.0, the stroke error of the invention is lower than that of the ngram method. This shows that the invention has better data utility when the privacy budget is higher.
As can be seen from the above tables 2 and 3, as the privacy budget increases, the trip errors of the three privacy protection methods all show a decreasing trend, which satisfies the differential privacy definition. The larger the privacy budget, the smaller the scale of the generated laplace noise, and the lower the disturbance deviation of the original data. In addition, under different privacy budgets, the travel error of the invention is maintained at a lower level, because the method measures the probability distribution condition of position transition in the privacy protection stage, and restricts the noise scale, and the advantage is obvious especially when the privacy budgets are higher. The track generated by the invention accords with the travel distribution of the original track, and compared with the other two methods, ngram and DPT, the track has good data utility.
Fig. 6 is a schematic structural diagram of a privacy track publishing device based on track prediction according to an embodiment of the present application, where the device includes: the data acquisition unit 610, the mesh division unit 620, the map encoding unit 630, the model construction unit 640, and the trajectory generation issuing unit 650 are sequentially connected therebetween.
The data acquisition unit 610 is configured to acquire a user trajectory data set uploaded by a user.
The grid dividing unit 620 is configured to divide the geographic area where the user track is located in the user track data set into grid areas.
The mapping encoding unit 630 is configured to map the user trajectory data in the user trajectory data set to the grid region, and encode the user trajectory data into a grid cell sequence.
The model construction unit 640 is configured to construct a first-order markov model for the user trajectory based on the grid region and the initial grid unit in the user trajectory data, wherein the first-order markov model comprises a state transition matrix M, and each element M of the state transition matrix ij Representing the current grid cell V i Transfer to the next grid cell V j Is a probability of (2).
The track generation and release unit 650 is configured to fix a start grid unit and a stop grid unit in the user track data, sequentially generate intermediate points between the start grid unit and the stop grid unit by using the markov model, link the selected intermediate points and the start grid unit and the stop grid unit to generate and release a user track sequence, and add a puiffiffish privacy protection noise when generating the user track sequence.
Preferably, the mapping encoding unit 630 may specifically be configured to: the user trajectory data is modeled as a directed graph G (V, E) to map the user trajectory data to the grid region, V and E representing the set of nodes and edges in the directed graph, respectively, and then encoded as a sequence of grid cells.
Preferably, the generation process of the state transition matrix M of the model building unit 640 is as follows:
step S1: the user trajectory Tr and grid cell V are calculated using i 、V j A transition count value of (2);
Figure BDA0004067390240000121
wherein, |V i ,V j I represents track segment V i V j The number in track Tr, inTr, indicates in user track Tr, |V i ,V j The i inTr is integrally represented as track segment V in the user track Tr i V j Is, |Tr| -1 represents the number of track segments;
step S2: traversing the whole user track data set, and counting all tracks to contain track segments V by using the following method i V j Count value O of (2) ij
O ij =∑ Tr∈D φ(Tr,v i ,v j )
Wherein D is a user trajectory dataset;
step S3: repeating the steps S1-S2 until the transfer counting of all track segments is completed, and generating a transfer counting matrix O;
step S4: each element M of the state transition matrix M is calculated using ij Generating a state transition matrix M:
M ij =O ij /∑ j O ij
preferably, the track generation and distribution unit 650 adds the Pufferfish privacy protection noise when generating the user confusion track sequence includes:
calculating sensitivity W using Wo Sesi tam distance, defining current position v p And adjacent another position v q Forming a secret pair whose joint probability distribution is divided into P (v j |v p ) And P (v) j |v q ) Sensitivity W is shown by the following formula:
Figure BDA0004067390240000122
wherein sup represents the upper bound of Wo Sesi tandistance;
the state transition matrix M' obtained after the Pufferfish privacy protection noise is added based on the Laplace mechanism is as follows:
Figure BDA0004067390240000123
where Z represents the noise added per transition probability, conforms to the Laplacian distribution, and ε represents the privacy budget.
Preferably, as shown in fig. 7, the track generation and distribution unit 650 may further include:
an intermediate point generating module 651, configured to sequentially generate intermediate points between the start grid cell and the end grid cell using the markov model;
a track sequence generating module 652, configured to generate a grid track sequence from the start grid cell to the end grid cell according to transition probabilities of different intermediate points;
the total probability obtaining module 653 is configured to multiply the transition probabilities of the grid units in each track sequence to obtain the total probability of each grid track sequence;
the track publishing module 654 is configured to select the grid track sequence with the highest total probability as the user confusion track sequence of the user original track for publishing.
Further preferably, the track publishing module 654 may be specifically configured to: and selecting a grid track sequence with the highest total probability, converting the grid track sequence with the highest total probability into a position sequence, and publishing the position sequence as a user confusion track sequence of the original track of the user.
The detailed description of each unit in the above device may refer to the corresponding description in the foregoing method embodiment, and will not be repeated here.
In summary, the privacy track release device based on track prediction provided by the invention adopts a space grid division method to encode the user track data, improves the discretization degree of the user track, aims at solving the problem that random noise damages the space-time correlation of the track data in position protection, adopts a Markov model to model the track data, combines the Pufferfish privacy to carry out privacy protection on the state transition probability among track grid units, synthesizes a continuous position sequence conforming to the original track characteristics, and can present good data utility.
The embodiment of the invention also provides electronic equipment, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the method when executing the program.
Embodiments of the present invention also provide a computer program product comprising a computer program/instruction which, when executed by a processor, performs the steps of the above method.
The embodiment of the invention also provides a computer readable storage medium, and the computer readable storage medium stores a computer program for executing the method.
As shown in fig. 8, the electronic device 600 may further include: a communication module 110, an input unit 120, an audio processor 130, a display 160, a power supply 170. It is noted that the electronic device 600 need not include all of the components shown in fig. 8; in addition, the electronic device 600 may further include components not shown in fig. 8, to which reference is made to the related art.
As shown in fig. 8, the central processor 100, also sometimes referred to as a controller or operational control, may include a microprocessor or other processor device and/or logic device, which central processor 100 receives inputs and controls the operation of the various components of the electronic device 600.
The memory 140 may be, for example, one or more of a buffer, a flash memory, a hard drive, a removable media, a volatile memory, a non-volatile memory, or other suitable device. The information about failure may be stored, and a program for executing the information may be stored. And the central processor 100 can execute the program stored in the memory 140 to realize information storage or processing, etc.
The input unit 120 provides an input to the central processor 100. The input unit 120 is, for example, a key or a touch input device. The power supply 170 is used to provide power to the electronic device 600. The display 160 is used for displaying display objects such as images and characters. The display may be, for example, but not limited to, an LCD display.
The memory 140 may be a solid state memory such as Read Only Memory (ROM), random Access Memory (RAM), SIM card, or the like. But also a memory which holds information even when powered down, can be selectively erased and provided with further data, an example of which is sometimes referred to as EPROM or the like. Memory 140 may also be some other type of device. Memory 140 includes a buffer memory 141 (sometimes referred to as a buffer). The memory 140 may include an application/function storage 142, the application/function storage 142 for storing application programs and function programs or a flow for executing operations of the electronic device 600 by the central processor 100.
The memory 140 may also include a data store 143, the data store 143 for storing data, such as contacts, digital data, pictures, sounds, and/or any other data used by the electronic device. The driver storage 144 of the memory 140 may include various drivers of the electronic device for communication functions and/or for performing other functions of the electronic device (e.g., messaging applications, address book applications, etc.).
The communication module 110 is a transmitter/receiver 110 that transmits and receives signals via an antenna 111. A communication module (transmitter/receiver) 110 is coupled to the central processor 100 to provide an input signal and receive an output signal, which may be the same as in the case of a conventional mobile communication terminal.
Based on different communication technologies, a plurality of communication modules 110, such as a cellular network module, a bluetooth module, and/or a wireless local area network module, etc., may be provided in the same electronic device. The communication module (transmitter/receiver) 110 is also coupled to a speaker 131 and a microphone 132 via an audio processor 130 to provide audio output via the speaker 131 and to receive audio input from the microphone 132 to implement usual telecommunication functions. The audio processor 130 may include any suitable buffers, decoders, amplifiers and so forth. In addition, the audio processor 130 is also coupled to the central processor 100 so that sound can be recorded locally through the microphone 132 and so that sound stored locally can be played through the speaker 131.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The principles and embodiments of the present invention have been described in detail with reference to specific examples, which are provided to facilitate understanding of the method and core ideas of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims (10)

1. A privacy track publishing method based on track prediction, the method comprising:
acquiring a user track data set uploaded by a user;
uniformly dividing the geographical area where the user track is located in the user track data set into grid areas;
mapping the user track data in the user track data set to the grid area, and encoding the user track data into a grid cell sequence;
constructing a first order Markov model for the user trajectory based on the grid region and the initial grid cells in the user trajectory data, the first order Markov model comprising a state transition matrix M, each element M of the state transition matrix ij Representing the current grid cell V i Transfer to the next grid cell V j Probability of (2);
and fixing a start grid cell and a stop grid cell in the user track data, sequentially generating intermediate points between the start grid cell and the stop grid cell by using the Markov model, linking the selected intermediate points with the start grid cell and the stop grid cell to generate and issue a user confusion track sequence, and adding Pufferfish privacy protection noise when the user confusion track sequence is generated.
2. The method for privacy track distribution based on track prediction as set forth in claim 1, wherein the mapping the user track data in the user track data set to the grid region, and encoding the user track data into a grid cell sequence comprises:
the user trajectory data is modeled as a directed graph G (V, E) to map the user trajectory data to the grid region, V and E representing the set of nodes and edges in the directed graph, respectively, and then encoded as a sequence of grid cells.
3. The privacy track distribution method based on track prediction as claimed in claim 1, wherein the generation process of the state transition matrix M is as follows:
step S1: the user trajectory Tr and grid cell V are calculated using i 、V j A transition count value of (2);
Figure FDA0004067390220000011
wherein, |V i ,V j I represents track segment V i V j The number in track Tr, inTr, indicates in user track Tr, |V i ,V j The i inTr is integrally represented as track segment V in the user track Tr i V j Is, |Tr| -1 represents the number of track segments;
step S2: traversing the whole user track data set, and counting all tracks to contain track segments V by using the following method i V j Count value O of (2) ij
O ij =∑ Tr∈D φ(Tr,v i ,v j )
Wherein D is a user trajectory dataset;
step S3: repeating the steps S1-S2 until the transfer counting of all track segments is completed, and generating a transfer counting matrix O;
step S4: each element M of the state transition matrix M is calculated using ij Generating a state transition matrix M:
M ij =O ij /∑ j O ij
4. the method for issuing a privacy track based on track prediction according to claim 1, wherein adding a pushfish privacy protection noise when generating the user confusion track sequence comprises:
calculating sensitivity W using Wo Sesi tam distance, defining current position v p And adjacent anotherPosition v q Forming a secret pair whose joint probability distribution is divided into P (v j |v p ) And P (v) j |v q ) Sensitivity W is shown by the following formula:
Figure FDA0004067390220000021
wherein sup represents the upper bound of Wo Sesi tandistance;
the state transition matrix M' obtained after the Pufferfish privacy protection noise is added based on the Laplace mechanism is as follows:
Figure FDA0004067390220000022
where Z represents the noise added per transition probability, conforms to the Laplacian distribution, and ε represents the privacy budget.
5. The method for issuing a privacy track based on track prediction according to claim 4, wherein generating intermediate points between the start grid cell and the end grid cell in turn using the markov model, linking the selected intermediate points and the start grid cell and the end grid cell to generate and issue a user confusion track sequence comprises:
sequentially generating intermediate points between the starting grid cells and the ending grid cells by using the Markov model;
generating a plurality of grid track sequences from the initial grid unit to the termination grid unit according to the transition probabilities of different intermediate points;
multiplying the transition probabilities of the grid cells in each track sequence to obtain the total probability of each grid track sequence;
and selecting the grid track sequence with the highest total probability as a user confusion track sequence of the original track of the user for release.
6. The method for issuing a privacy track based on track prediction according to claim 5, wherein selecting a grid track sequence with highest total probability as the user confusion track sequence of the user original track comprises:
and selecting a grid track sequence with the highest total probability, converting the grid track sequence with the highest total probability into a position sequence, and publishing the position sequence as a user confusion track sequence of the original track of the user.
7. A privacy track distribution device based on track prediction, the device comprising:
the data acquisition unit is used for acquiring a user track data set uploaded by a user;
the grid dividing unit is used for uniformly dividing the geographic area where the user track is in the user track data set into grid areas;
a mapping encoding unit, configured to map user track data in the user track data set to the grid region, and encode the user track data into a grid cell sequence;
a model construction unit for constructing a first order Markov model for the user trajectory based on the grid region and the initial grid unit in the user trajectory data, the first order Markov model comprising a state transition matrix M, each element M of the state transition matrix ij Representing the current grid cell V i Transfer to the next grid cell V j Probability of (2);
the track generation and release unit is used for fixing a start grid unit and a stop grid unit in the user track data, sequentially generating intermediate points between the start grid unit and the stop grid unit by using the Markov model, linking the selected intermediate points, generating and releasing a user track sequence by using the start grid unit and the stop grid unit, and adding Pufferfish privacy protection noise when the user track sequence is generated.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method of any of claims 1 to 6 when the computer program is executed by the processor.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any one of claims 1 to 6.
10. A computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the method of any of claims 1 to 6.
CN202310080714.7A 2023-01-19 2023-01-19 Privacy track release method and device based on track prediction Pending CN116049887A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310080714.7A CN116049887A (en) 2023-01-19 2023-01-19 Privacy track release method and device based on track prediction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310080714.7A CN116049887A (en) 2023-01-19 2023-01-19 Privacy track release method and device based on track prediction

Publications (1)

Publication Number Publication Date
CN116049887A true CN116049887A (en) 2023-05-02

Family

ID=86120273

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310080714.7A Pending CN116049887A (en) 2023-01-19 2023-01-19 Privacy track release method and device based on track prediction

Country Status (1)

Country Link
CN (1) CN116049887A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117094614A (en) * 2023-10-20 2023-11-21 深圳依时货拉拉科技有限公司 Loading and unloading point position recommendation method and device, computer equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117094614A (en) * 2023-10-20 2023-11-21 深圳依时货拉拉科技有限公司 Loading and unloading point position recommendation method and device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
Xu et al. Real‐time regional seismic damage assessment framework based on long short‐term memory neural network
Feng et al. Deepmove: Predicting human mobility with attentional recurrent networks
CN112116155B (en) Population flow prediction method and device based on intelligent decision and computer equipment
US9049549B2 (en) Method and apparatus for probabilistic user location
Ghaemi et al. LaSVM-based big data learning system for dynamic prediction of air pollution in Tehran
CN110119475B (en) POI recommendation method and system
CN113139140B (en) Tourist attraction recommendation method based on space-time perception GRU and combined with user relationship preference
Nguyen et al. PM2. 5 prediction using genetic algorithm-based feature selection and encoder-decoder model
Xu et al. Urban noise mapping with a crowd sensing system
Hu et al. Nonnegative matrix tri-factorization with user similarity for clustering in point-of-interest
EP3192061B1 (en) Measuring and diagnosing noise in urban environment
CN116049887A (en) Privacy track release method and device based on track prediction
Chen et al. STLP-GSM: a method to predict future locations of individuals based on geotagged social media data
Ding et al. Spatial-temporal distance metric embedding for time-specific POI recommendation
CN108600340A (en) It is a kind of that total method and device is pushed away based on the history crowd size for moving big-sample data
Jiang et al. Supercharging crowd dynamics estimation in disasters via spatio-temporal deep neural network
Addae et al. Integrating multi-criteria analysis and spherical cellular automata approach for modelling global urban land-use change
CN113269379B (en) Method and device for determining attributes of resource objects, storage medium and computer equipment
CN110411450A (en) It is a kind of for compressing the map-matching method of track
CN112883292B (en) User behavior recommendation model establishment and position recommendation method based on spatio-temporal information
Wu et al. Spatio‐temporal neural network for taxi demand prediction using multisource urban data
Xue et al. Urban population density estimation based on spatio‐temporal trajectories
CN113326877A (en) Model training method, data processing method, device, apparatus, storage medium, and program
Zhao et al. Practical model with strong interpretability and predictability: An explanatory model for individuals' destination prediction considering personal and crowd travel behavior
CN115455276A (en) Method and device for recommending object, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination