CN110312206A - Based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value - Google Patents
Based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value Download PDFInfo
- Publication number
- CN110312206A CN110312206A CN201910529806.2A CN201910529806A CN110312206A CN 110312206 A CN110312206 A CN 110312206A CN 201910529806 A CN201910529806 A CN 201910529806A CN 110312206 A CN110312206 A CN 110312206A
- Authority
- CN
- China
- Prior art keywords
- data
- base station
- mobile phone
- signaling data
- user
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W4/00—Services specially adapted for wireless communication networks; Facilities therefor
- H04W4/02—Services making use of location information
- H04W4/025—Services making use of location information using location based information parameters
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W4/00—Services specially adapted for wireless communication networks; Facilities therefor
- H04W4/20—Services signaling; Auxiliary data signalling, i.e. transmitting data via a non-traffic channel
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W8/00—Network data management
- H04W8/18—Processing of user or subscriber data, e.g. subscribed services, user preferences or user profiles; Transfer of user or subscriber data
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Databases & Information Systems (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
The present invention provides a kind of based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value, mobile phone signaling data for being acquired according to each cellular base station carries out identification to the trip situation of each user to obtain the individual Trip chain of user, it is characterized in that, include the following steps: step S1, obtains mobile phone signaling data;Step S2 cleans mobile phone signaling data;Step S3 determines dynamic threshold D corresponding with each cellular base station according to base station location by preset dynamic threshold calculation method;Step S4, classification obtain user's data group to be analyzed;Step S5 sequentially chooses one group of user's data group to be analyzed;Step S6, the data to be analyzed being successively read in current-user data group, and according to the dwell point of corresponding dynamic threshold D judgement active user, step S7, it generates individual Trip chain and is stored, step S8, repeat step S5 to step S7 and finished up to all users data to be analyzed are processed.
Description
Technical field
The invention belongs to urban planning fields, are related to a kind of trip recognition methods based on mobile phone signaling data, specifically relate to
And it is a kind of based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value.
Background technique
In recent years, mobile phone signaling data is applied to the every field of urban study and urban planning on a large scale, is arrived greatly
City space and space structure, Urban Traffic Planning, urban population and job distribution, city real population, it is small to individual
Behavior patterns mining.When being analyzed by mobile phone signaling data, it usually needs first obtained according to mobile phone signaling data each
The individual Trip chain of mobile phone user, to preferably be divided the behavior pattern of user in city according to the individual Trip chain
Analysis.
In the prior art, space clustering procedure identifies individual Trip chain when general use, when this method mainly passes through
Between threshold value and capacity-threshold identification user stop, and different places twice stop between identify primary trip.
However, the capacity-threshold of space clustering procedure is using fixed threshold when above-mentioned, it, should under the research of City-scale
Fixed threshold inevitably results in that accuracy of identification is uneven, i.e., forms " owing identification " in the intensive place of the base station distributions such as downtown,
In suburb etc., the place of base station distributions coefficient forms " cross and identify ".And reduce threshold value can to avoid " owe identification " in certain places,
But " cross and identify " of more location certainly will be will cause, vice versa.
Therefore, some researchers are to improve accuracy of identification, downtown base station intensively locate and edge base station coefficient at into
The adjustment of row capacity-threshold;Also there is researcher according to base station distribution density, city is divided into three areas, takes different spaces respectively
Threshold value, i.e. " partition space threshold value ".Although above two method can avoid to a certain extent accuracy of identification is non-uniform from asking
Topic, but inside subregion, can still there are problems that " cross and identify " and " owing identification ".
Summary of the invention
To solve the above problems, providing a kind of based on dynamic space threshold value improved mobile phone signaling data trip identification side
Method, present invention employs following technical solutions:
The present invention provides one kind based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value, is used for root
Identification is carried out to the trip situation of each user to obtain of user according to the mobile phone signaling data of each cellular base station acquisition
Body Trip chain, which comprises the steps of: step S1 obtains mobile phone signaling data, which at least wraps
Base station location containing customer identification number, timestamp and corresponding cellular base station;Step S2 cleans mobile phone signaling data
Processing is to obtain data to be analyzed;Step S3 is successively determined by preset dynamic threshold calculation method according to base station location
Dynamic threshold D corresponding with each cellular base station;Step S4 is analysed to data and classify obtaining according to customer identification number
To user's data group to be analyzed of each user of correspondence;It is to be analyzed sequentially to choose one group of user according to customer identification number by step S5
Current-user data group is used as in data group;Step S6 is successively read to be analyzed in current-user data group in chronological order
Data, and determine according to corresponding dynamic threshold D and preset time threshold T the dwell point of active user, step S7, according to
Dwell point and current-user data generate the individual Trip chain of active user and store, and step S8 repeats step S5 extremely
Step S7 is finished up to all users data to be analyzed are processed.
It is provided by the invention to be based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value, can also have
Such technical characteristic, wherein dynamic threshold calculation method is k covering of the fan method: with the base station position of current cellular base station calculated
It sets as current location, and divides k covering of the fan centered on current location, further selected in each covering of the fan according to current location
Nearest cellular base station is taken to obtaining k neighbor base station, finally calculate each neighbor base station at a distance from current location thus
Using the maximum value of distance as the capacity-threshold of current cellular base station calculated.
It is provided by the invention to be based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value, can also have
Such technical characteristic, wherein the value of k is 6.
It is provided by the invention to be based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value, can also have
Such technical characteristic, wherein step S2 includes following sub-step: step S2-1, by all mobile phone signaling datas according to user
Identifier carries out classification to obtain multiple subscriber signaling data groups;Step S2-2 is successively read each subscriber signaling data group,
And the mobile phone signaling data that record is repeated in each subscriber signaling data group is recorded into data dump as repetition;Step S2-3,
Be successively read each subscriber signaling data group, and judge each mobile phone signaling data in each subscriber signaling data group two-by-two it
Between timestamp whether have intra-record slack byte less than 1s, if being less than 1s using latter mobile phone signaling data as pingpang handoff number
According to removing;Step S2-4 is successively read each subscriber signaling data group, and judges each subscriber signaling according to drift determination method
Whether there is drift data in each mobile phone signaling data in data group, drift data is removed if having, drift about determination method
Are as follows: mobile phone signaling data is sorted in chronological order, and according to the distance of the base station location of two neighboring mobile phone signaling data with
And whether the time difference calculating speed of timestamp is higher than 60KM/S, is determined as drift data if being higher than;Step S2-5 obtains quilt
Each subscriber signaling data group and merging after step S2-2 to S2-4 reading and removing form data group to be analyzed.
It is provided by the invention to be based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value, can also have
Such technical characteristic, wherein step S6 includes following sub-step: step S6-1, in chronological order by current-user data group
In data sorting to be analyzed;Step S6-2, the data to be analyzed being successively read in a current-user data group are simultaneously added temporary
Deposit data collection;Step S6-3 judges that temporal data concentrates whether the base station location of each data to be analyzed exceeds with dynamic threshold D
In the round judgement range divided for diameter, if it is determined that for otherwise return step S6-2, if it is determined that be to enter step S6-4;
Step S6-4 determines that temporal data is concentrated earliest and whether the time difference of data to be analyzed the latest is greater than time threshold T, if
It is judged to otherwise emptying temporary data set and return step S6-2, if it is determined that be to enter step S6-5;Step S6-5, will be temporary
Deposit data collection is stored as the dwell point of active user, and empties temporary data set;Step S6-6 repeats step S6-2 extremely
Step S6-5 is until the data to be analyzed in current-user data group all read and finish.
It is provided by the invention to be based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value, can also have
Such technical characteristic, wherein dynamic threshold calculation method is k nearest neighbor algorithm: with the base station of current cellular base station calculated
K nearest cellular base station is chosen as neighbor base station as current location, and using current location in position, finally calculates each neighbour
Nearly base station is at a distance from current location to using the maximum value of distance as the capacity-threshold of current cellular base station calculated.
It is provided by the invention to be based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value, can also have
Such technical characteristic, wherein the value of k is 15.
It is provided by the invention to be based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value, can also have
Such technical characteristic, wherein time threshold T is 20 minutes.
Invention action and effect
It is according to the present invention to be based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value, due to by dynamic
State threshold value calculation method is calculated the dynamic threshold of corresponding each cellular base station, and according to the dynamic threshold and it is preset when
Between threshold value mobile phone signaling data is identified, therefore can more accurately obtain the individual Trip chain of each user, reduce real
Each cellular base station distribution density unevenness bring error in the city of border, thus allow it is subsequent according to individual Trip chain carry out resident
Behavioural analysis and Urban Planning can obtain stronger support, and analysis result is made more to be consistent with practical.The present invention is logical
It crosses dynamic threshold and solves fixed space threshold value bring identification error, identify more trips in downtown, compensate for " center
Owe identification " the problem of;And the trip in the identification of the place such as suburb and new city is less, the problem of compensating for " identification is crossed in suburb ".
Detailed description of the invention
Fig. 1 is the flow chart of mobile phone signaling data trip recognition methods in the embodiment of the present invention;
Fig. 2 is the drift data schematic diagram of the embodiment of the present invention;
Fig. 3 is the signal of the value of time threshold and capacity-threshold and the identification relationship of trip number in the embodiment of the present invention
Figure;
Fig. 4 is the schematic diagram of " jump base station " phenomenon in the embodiment of the present invention;
Fig. 5 is the schematic diagram that fixed space threshold value owes identification in the embodiment of the present invention;
Fig. 6 is K nearest neighbor algorithm and the schematic diagram of K covering of the fan method in the embodiment of the present invention;
Fig. 7 is the schematic diagram of the root-mean-square error for the result that various method different parameters obtain in the embodiment of the present invention;With
And
The schematic diagram of space clustering procedure identification trip when Fig. 8 is in the embodiment of the present invention.
Specific embodiment
In order to be easy to understand the technical means, the creative features, the aims and the efficiencies achieved by the present invention, tie below
Examples and drawings are closed to the present invention is based on the improved mobile phone signaling data trip recognition methods works of dynamic space threshold value specifically to explain
It states.
<embodiment>
In the present embodiment, operation has that the present invention is based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value
Computer software can in the analytic process of mobile phone signalling analysis, to the identification that the individual Trip chain of user optimizes,
The operation method of the computer software is as follows:
Fig. 1 is the flow chart of mobile phone signaling data trip recognition methods in the embodiment of the present invention.
As shown in Figure 1, the trip recognition methods of mobile phone signaling data specifically comprises the following steps:
Step S1 is obtained by the mobile phone signaling data of each cellular base station acquisition in city, subsequently into step S2.
In the present embodiment, each cellular base station can be understood as one " point " in the spatial dimension in city, whenever mobile phone
The behaviors Shi Huiyu cellular base stations such as switching on and shutting down, making and receiving calls, transmitting-receiving short message occur and generate information exchange, at this point, the user (i.e. should
Mobile phone) it a record can be generated is recorded by corresponding cellular base station.That is, at corresponding time point and specific position (base station position
Set) record (mobile phone signaling data) comprising space time information is generated, the format of the record is as shown in table 1:
The field information of 1 mobile phone signaling data of table
In table 1, MSID represents the unique identifier (i.e. customer identification number) of each mobile phone;Timestamp represents timestamp,
I.e. the time that signaling exchange occurs for mobile phone and base station, represents at 21 points in evening on January 9th, 2,019 09 minute 23 such as 20190109210923
Second;LAC and CellID represents the Position Number of base station jointly, this number represents particular geographic location (the i.e. base station position of base station
It sets);EventID represents mobile phone and the concrete type of signal exchange, such as switching on and shutting down, transmitting-receiving short message, making and receiving calls occur for base station;
Can Flag represent mobile phone and be tracked, and 1 represents and can track, and 0 represents and cannot track, and in general the value is all 1, for 0 value,
Directly rejected from initial data.
Step S2 starts the cleaning processing the mobile phone signaling data obtained in step S1 to obtain data to be analyzed, have
Body includes step S2-1 to step S2-4.
Mobile phone signaling data is not intended to planning or academic research and generates, but the data of operator run by-product
Object.Data generate during can generate some noise datas or abnormal data unavoidably, if not to these abnormal datas into
Row processing, then will necessarily cause inevitably to influence on subsequent recognition result, it is therefore desirable to by step S2 to mobile phone
Noise data in signaling data is purged to guarantee the credibility of result of study.In the present embodiment, noise data
Type is divided into repetition record data, pingpang handoff data and drift data, accounting such as 2 institute of table in mobile phone signaling data
Show.
All kinds of noise datas of table 2 and accounting
Noise data type | Data volume accounting |
Pingpang handoff data | 42% |
Repeated data | 15.5% |
Drift data | 0.4% |
All mobile phone signaling datas are carried out classification according to customer identification number to obtain multiple subscriber signalings by step S2-1
Data group, subsequently into step S2-2.
Step S2-2 is successively read each subscriber signaling data group, and will repeat to record in each subscriber signaling data group
Mobile phone signaling data as repeat record data dump, subsequently into step S2-3.
Strictly speaking, repeating record data is not to repeat the data of record, but generate a plurality of note in a very short period of time
Record, and the time is recorded even within the same second, therefore looks like data in data record and repetition has occurred.Although this kind of number
According to being not exception, but expressed information is redundancy, if do not removed this kind of noise data, can reduce the effect of identification
Rate.This kind of data account for the 15.5% of total amount of data.The processing method of this kind of data is only protected in a plurality of completely the same data
Stay wherein one.
Step S2-3 is successively read each subscriber signaling data group, and judges each in each subscriber signaling data group
Whether the timestamp of mobile phone signaling data between any two has intra-record slack byte less than 1s, by latter mobile phone signaling if being less than 1s
Data are as pingpang handoff data dump, subsequently into step S2-4.
Pingpang handoff data are noise datas very common in mobile phone signaling data, and accounting is up to 42% (such as 2 institute of table
Show).The form of this kind of data is: in a short time, the high-frequency data that user generates in two or more base stations.Due to user's meeting
It is recorded by nearest base station, and when user is in the edge of the signal cover of two or more base stations, it will generate frequently
Base station switching, thus produce pingpang handoff data.
The information of pingpang handoff data representation is all " a certain position that user is in multiple base station signals boundary ", and mistake
More data volumes can also reduce recognition efficiency, therefore can handle this kind of data.Processing method is as follows: judging two notes
Whether record interval if it is less than 1 second is then integrated into the base station of later record earlier records less than 1 second.
In the research of other researchers, some researchers can be multiple to the comparison for identifying and judgeing processing of pingpang handoff
It is miscellaneous.But since this kind of data result between neighbor base station, the capacity-threshold in Trip chain recognition methods can be to neighbor base station
It carries out " fault-tolerant ".So handling this data with simple method.
Step S2-4 is successively read each subscriber signaling data group, and judges that each user believes according to drift determination method
Whether have drift data, drift data is removed if having, subsequently into step if enabling in each mobile phone signaling data in data group
Rapid S2-5.
The situation that drift data occurs is: continuously recorded at three of a certain user it is inner (with time-sequencing), it is intermediate that
The position of record is far from other records, as shown in Fig. 2, using the point of triangle mark as the corresponding base station position of drift data in figure
Set, and with the activity of normal users for, such travel behaviour should not be generated in city.In general, user is mobile
When speed (space length of two records in front and back is divided by time interval i.e. in mobile phone signaling data) is greater than 60KM/H, that is, it can determine whether
Drift data has occurred.
Therefore, in the present embodiment, step S2-4 use drift determination method are as follows: by by mobile phone signaling data temporally
Sequence sorts, and according to the distance of the base station location of two neighboring mobile phone signaling data and the time difference calculating speed of timestamp
Whether it is higher than 60KM/H, is determined as drift data if being higher than.
Step S2-5 obtains each subscriber signaling data group after being read and removed by step S2-2 to S2-4 and merges shape
At data group to be analyzed, subsequently into step S3.
In the present embodiment, mobile phone signaling data is divided into multiple subscriber signaling data groups in step S2-1, however
In the actual moving process of program, directly all mobile phone signaling datas can be ranked up according to customer identification number therein,
Successively to the processing that is purged of each customer identification number, complete to remove the mobile phone signaling data of processing to be data to be analyzed.
Generally, after pre-processing to data (i.e. after the completion of step S2), it just will do it the identification of individual Trip chain.Though
So the researcher that also has is using the methods of machine learning identification trip, but space is poly- when the Trip chain recognition methods of more mainstream is
Class method, but the researcher having is called DBSCAN algorithm.The thinking of the algorithm is used by time threshold and capacity-threshold identification
The stop at family, and primary trip is identified between the stop of different places twice.
However, in the above-mentioned methods, key is the setting of time threshold T and capacity-threshold D, different threshold values sets meeting
Different recognition results (as shown in Figure 3) is brought, capacity-threshold and time threshold are smaller, and the trip number identified is more.?
Past has had many researchers before identification trip in this way, the two threshold values are discussed and are taken
It is worth (as shown in table 3), wherein the source of value is generally derived from the definition in trip survey to trip, or with reference to existing research
Experience value.
The threshold value selection and reason of space clustering procedure when 3 existing research of table uses
But from table 3 it can be found that early stage application in, when capacity-threshold selection be all a kind of trial, in addition to
It opens dimension and attempts the influence of different time threshold and capacity-threshold to trip number, other researchers are not to the appropriate of value
Property is discussed.Although generally all can be defined as referring to trip according in conventional survey, mobile phone signaling data is simultaneously
It is not perfect data, intra-record slack byte is sparse, and spatial accuracy depends on the coordinate and density of base station.Thus it is completely dependent on tradition
It is unreasonable that the definition of investigation, which does restriction for the identification of data in mobile phone, it is also necessary to which the feature according to data in mobile phone itself is adjusted
It is whole.
Capacity-threshold is one of the core parameter in recognizer, on the one hand the generation of capacity-threshold is due to adjusting in tradition
The definition of trip is required in looking into, on the other hand more importantly the spatial accuracy of mobile phone signaling data is close dependent on base station
Degree, when user is recorded by base station, illustrates the physical location of user in the signal service range of base station.Some researchers
The service range of base station delimited by Thiessen polygon, but the service range of base station is not that stringent delimit is distinguished, but has
Overlapped part.When user is in service range overlay region, even if no change has taken place for physical location, it is also possible to quilt
Several adjacent base stations of periphery are recorded, also known as " jump base station " phenomenon.
By taking Fig. 4 as an example, the point on the track shows one day track of user, which has apparent trip on and off duty
Behavior, but " jumping base station " phenomenon (part dotted line frame a in Fig. 4) of an adjacent base station has occurred at night." jump base station " due to
Record time interval feature and normal data do not have difference, therefore can not be gone by way of similar cleaning " pingpang handoff data "
It removes, and the effect of capacity-threshold then can be used as a kind of " spatial tolerance ", prevent " jumping base station " phenomenon of neighbor base station from accidentally being known
It Wei not go on a journey.Capacity-threshold value is smaller, and qualified point is more, and the trip number that can be identified is more;Capacity-threshold value
Bigger, then it is considered as stopping, therefore the trip number identified is with regard to less that more short distances, which goes out guild,.Although threshold value value is got over
Small, the travel amount of identification is more, but recognition effect should not be using identified amount how much as judging quota, because of identification more than these
In trip, there may be part to be " jump base station " and the trip misidentified.
The value of the capacity-threshold of other researchers is mostly 500m at present, according to be mostly conventional survey trip definition and
Forefathers' research experience numerical value.If some researchers have taken Guangzhou to investigate in multiple definition using conventional survey as foundation
The trip of 500m defines, but there is also 300 meters (Hangzhou), 400 meters (Shanghai) of values;In the trip research in the U.S., even
Trip distance or travel time are not limited.Since early stage studies tentative 500m value, and made by later researcher
It is used till today for empirical value, actual discussion is also insufficient.
In practical applications, the density of cellular base station difference in the other range of City-level is very big.At city center,
Base station number is up to 20-40 base station in the grid of 500*500m, and positioning accuracy is high;And two base station distances in suburb are up to
2km or more, positioning accuracy are low.Therefore, the case where " identification is owed at center, and identification is crossed in suburb " being will cause using fixed space threshold value
(as shown in Figure 5).By taking 500m fixed threshold as an example, if in downtown the trip that distance is 300m occurs for user, it is using 500m
Threshold value can not then identify current trip, to cause " owing identification " phenomenon (Fig. 5 left figure);And suburb base distance between sites often
Greater than 500m, " jumping base station " phenomenon of 500m or more such as occurs, then may be misidentified by 500m capacity-threshold at trip, to make
At " cross and identify " phenomenon (Fig. 5 right figure).Although reducing threshold value can certainly will will cause more to avoid " the owing identification " in certain places
" cross and identify " in more places, vice versa.Under the research of City-scale, take the capacity-threshold of unified fixation can be to causing to know
Other precision is uneven.
Some researchers think to improve accuracy of identification, it should downtown base station intensively locate and edge base station coefficient at
Carry out the adjustment of capacity-threshold.Also there is researcher according to base station distribution density, city is divided into three areas, takes different skies respectively
Between threshold value, i.e. " partition space threshold value ".Partition space threshold value can avoid the non-uniform problem of accuracy of identification to a certain extent,
But can still there are problems that " cross and identify " and " owing identification " inside subregion.
Therefore, in order to solve the problems, such as the above method, the invention proposes the capacity-thresholds of each base station can
According to peripheral base station density and distance, threshold size (dynamic threshold D is calculated by step S3) is dynamically adjusted, to solve
The problem of " cross and identify " and " owing identification ".
Step S3 is successively determined and each cellular base station by preset dynamic threshold calculation method according to base station location
Corresponding dynamic threshold D.
In the present embodiment, three kinds of dynamic threshold calculation methods are inventors herein proposed:
1), density algorithm: the distribution density of calculation base station, and be the relation function of building density and capacity-threshold, from
And carry out the conversion of density and capacity-threshold;
2), K nearest neighbor algorithm: as shown in Fig. 6 (a), using the base station location of current cellular base station calculated as present bit
It sets, and k nearest cellular base station is chosen as neighbor base station using current location, finally calculate each neighbor base station and present bit
The distance set is to using the maximum value of distance as the capacity-threshold of current cellular base station calculated;
3), K covering of the fan algorithm: as shown in Fig. 6 (b), using the base station location of current cellular base station calculated as present bit
It sets, and divides k covering of the fan centered on current location, nearest mobile phone is further chosen in each covering of the fan according to current location
Base station finally calculates each neighbor base station at a distance from current location thus by the maximum of distance to obtain k neighbor base station
It is worth the capacity-threshold as current cellular base station calculated.
However in the above-mentioned methods, density algorithm calculates the simplest, but cannot embody the relationship between neighbor base station,
And it is related to the transformational relation of density and capacity-threshold, therefore the present invention wouldn't use.
And in latter two dynamic threshold algorithm, K nearest neighbor algorithm is more intuitive, and K covering of the fan algorithm can be to avoid peripheral base station
The larger situation of distance difference.And in both algorithms, K is the parameter value of algorithm, and the different values of K will lead to different knots
Fruit.Therefore in order to evaluate the difference of two kinds of algorithms and select suitable parameters, inventor takes following steps to evaluate:
Step A. randomly selects 100 users, 1 day record, and visualizes to track, to this 100 users'
Number of going on a journey carries out manual identified, collects as verifying.
Step B. with the fixed threshold of value 300-2000m, the K nearest neighbor algorithm of k value 6-19, K value 3-9 K covering of the fan
These three obtaining value methods of algorithm substitute into Trip chain recognition methods to the trip number of this 100 users respectively as capacity-threshold
It is identified.
Step C. verifies the recognition result of three kinds of obtaining value methods with manual identified result, and root-mean-square error is calculated
(MSE), and the algorithm put up the best performance is selected.
By above step, the knot of available fixed threshold, K nearest neighbor algorithm and K covering of the fan algorithm under different parameters
Fruit, as shown in Figure 7.It can be seen that when using fixed threshold, as value increases, error decline, when the error that value is 900m
Minimum, subsequent value increase error and are risen;And the rule in K nearest neighbor algorithm is similar, when K increases to 15, error is minimum,
Then also increase as K increases error;In K covering of the fan algorithm, rule is also similar, and as K=6, error is minimum.
As it can be seen that the error of K covering of the fan method is the smallest (1.69) in the parameter that three kinds of methods are put up the best performance, compared to fixation
Adjacent to the optimal parameter of method, accuracy promotes 11.7% and 5.06% respectively by threshold value and K.In this dynamic threshold calculation method
Under, the capacity-threshold value of central city nucleus in 500m hereinafter, the value at central city and new city center be 500-1000m,
Suburbs value is 100-1500m, outer suburbs value 2000m or more.This distribution and the introducing purpose of dynamic space threshold value are coincide, and are made
Obtain, on the contrary threshold value height low in the high location space threshold value of base station density.
Therefore, the dynamic threshold D for calculating each cellular base station in the present embodiment by the K covering of the fan method of K=6, subsequently enters
Step S4.
Step S4 is analysed to data according to customer identification number and carries out classification so that the user for obtaining corresponding to each user waits for
Analyze data group.
In the present embodiment, each group of user's data group to be analyzed all includes all data to be analyzed of corresponding user.
Step S5 is successively chosen in one group of user's data group to be analyzed according to customer identification number as current-user data
Group.
Step S6, the data to be analyzed being successively read in current-user data group in chronological order, and according to corresponding dynamic
State threshold value D and preset time threshold T determines the dwell point of active user, specifically includes step S6-1 to step S6-6:
Step S6-1, in chronological order by the data sorting to be analyzed in current-user data group;
Simultaneously temporary data set is added in step S6-2, the data to be analyzed being successively read in a current-user data group;
Step S6-3 judges that temporal data concentrates whether the base station location of each data to be analyzed exceeds with dynamic threshold D
In the round judgement range divided for diameter, if it is determined that for otherwise return step S6-2, if it is determined that be to enter step S6-4;
Step S6-4 determines that temporal data is concentrated earliest and whether the time difference of data to be analyzed the latest is greater than the time
Threshold value T, if it is determined that otherwise to empty temporary data set and return step S6-2, if it is determined that be to enter step S6-5;
Step S6-5 is stored temporary data set as the dwell point of active user, and is emptied temporary data set;
Step S6-6 repeats step S6-2 to step S6-5 until the data to be analyzed in current-user data group are all read
It finishes.
In the step S6 of the present embodiment, when identifying the dwell point of user, by taking Fig. 8 as an example, wherein the following figure is some user
Distribution of the track (positions of i.e. multiple continuous mobile phone signaling datas) on space plane, upper figure is the track edge of the user
Distribution on time (vertical direction in figure).The user record is gradually read by step S6-2 to step S6-3, until record
It can no longer be lived by the circle that diameter is D, judge that the record time range Δ t1 in circle is longer than time threshold T again at this time, therefore should
Temporary data set out can be identified as 1 stop;Continue to read record, until there is record that can be lived by the circle that diameter is D again,
But the time span Δ t2 recorded in this hour circle is shorter than time threshold T, therefore cannot be identified as 1 stop.In this section of example
In, which can be identified 1 time and stop and 1 trip.
Step S7 generates the individual Trip chain of active user according to dwell point and current-user data and stores.
In the step S7 of the present embodiment, every two dwell point before and after each user is identified as once going on a journey, further root
Individual Trip chain is generated according to each dwell point and trip of the user.
In the present embodiment, individual Trip chain includes customer identification number, the base station location of origin base station, initial time, end
Base station location, terminal time and the trip situation of point base stations.In the field of trip situation, 0 indicates that stop, 1 indicate trip.
Step S8 repeats step S5 to step S7 and finishes up to all users data to be analyzed are processed.
It, in step s 5 all can be according to user when step S8 repeats step S5 to step S7 in the present embodiment
Identifier is chosen at one group of user's data group to be analyzed after current-user data group as new current-user data group, thus
Realize traversal completely.
Embodiment action and effect
It is based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value according to provided in this embodiment, due to
The dynamic threshold of corresponding each cellular base station is calculated by dynamic threshold calculation method, and according to the dynamic threshold and in advance
If time threshold mobile phone signaling data is identified, therefore can more accurately obtain the individual Trip chain of each user,
Each cellular base station distribution density unevenness bring error in actual cities is reduced, to allow subsequent according to the progress of individual Trip chain
Behaviour analysis and Urban Planning can obtain stronger support, make analyze result with actually be more consistent.This
Invention solves fixed space threshold value bring identification error by dynamic threshold, identifies more trips in downtown, makes up
The problem of " center owing identification ";And the trip in the identification of the place such as suburb and new city is less, compensates for " identification is crossed in suburb "
Problem.
In embodiment, due to being calculated by k covering of the fan method dynamic threshold, under conditions of using k=6, this method
Accuracy rate of the obtained dynamic threshold when identifying user's trip is relative in the accuracy rate by existing fixed space threshold value
11.7% or so is risen, effect is optimal in each method that the present invention verifies.
In embodiment, due to repetition record data, pingpang handoff data and the drift data in mobile phone signaling data
It removes, and obtains data to be analyzed, to further reduce meeting when identifying individual Trip chain by mobile phone signaling data
The error of generation optimizes final analysis result.
In embodiment, due to being concentrated by the way that the data to be analyzed read are temporarily stored in temporal data, and judge each wait divide
The base station location of data is analysed whether using dynamic threshold as in the range of the circle of diameter, therefore preferably whether can judge user
In a region stay longer, to keep the individual Trip chain generated more accurate.
In embodiment, due to can with k nearest neighbor algorithm calculate dynamic threshold, although this method be not it is optimal, in k value
When being 15, the accuracy rate that the accuracy rate when identifying user's trip was calculated relative to the past by fixed threshold is still improved
5.06%.
Above-described embodiment is only used for the specific embodiment illustrated the present invention, and the present invention is not limited to the above embodiments
Description range.
Claims (8)
1. one kind is based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value, for according to each cellular base station
The mobile phone signaling data of acquisition carries out identifying to obtain the individual Trip chain of the user to the trip situation of each user,
It is characterized in that, includes the following steps:
Step S1 obtains the mobile phone signaling data, which includes at least customer identification number, timestamp and right
The base station location for the cellular base station answered;
Step S2 starts the cleaning processing the mobile phone signaling data to obtain data to be analyzed;
Step S3 is successively determined and each mobile phone by preset dynamic threshold calculation method according to the base station location
The corresponding dynamic threshold D in base station;
The data to be analyzed are carried out classification according to the customer identification number to obtain corresponding to each user by step S4
User's data group to be analyzed;
Step S5, according to the customer identification number sequentially choose one group described in user's data group to be analyzed as active user's number
According to group;
Step S6, the data to be analyzed being successively read in the current-user data group in chronological order, and according to corresponding
The dynamic threshold D and preset time threshold T determine active user dwell point,
Step S7 goes forward side by side according to the individual Trip chain that the dwell point and the current-user data generate the active user
Row storage,
Step S8, repeating said steps S5 are finished to step S7 up to all user's data to be analyzed are processed.
2. it is according to claim 1 based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value, it is special
Sign is:
Wherein, the dynamic threshold calculation method is k covering of the fan method:
Using the base station location of the current cellular base station calculated as current location, and drawn centered on the current location
Divide k covering of the fan, chooses the nearest cellular base station in each covering of the fan according to the current location further to obtain k
Neighbor base station finally calculates each neighbor base station at a distance from the current location to make the maximum value of the distance
For the capacity-threshold of the current cellular base station calculated.
3. it is according to claim 2 based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value, it is special
Sign is:
Wherein, the value of the k is 6.
4. it is according to claim 1 based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value, it is special
Sign is:
Wherein, step S2 includes following sub-step:
All mobile phone signaling datas are carried out classification according to the customer identification number to obtain multiple users by step S2-1
Signaling data group;
Step S2-2 is successively read each subscriber signaling data group, and will repeat in each subscriber signaling data group
The mobile phone signaling data of record is as repetition record data dump;
Step S2-3 is successively read each subscriber signaling data group, and judges in each subscriber signaling data group
Whether the timestamp of each mobile phone signaling data between any two has intra-record slack byte less than 1s, by latter item if being less than 1s
The mobile phone signaling data is as pingpang handoff data dump;
Step S2-4 is successively read each subscriber signaling data group, and judges each use according to drift determination method
Whether there is drift data in each mobile phone signaling data in the signaling data group of family, the drift number is removed if having
According to,
The drift determination method are as follows: the mobile phone signaling data sorts in chronological order, and according to the two neighboring hand
Whether the distance of the base station location of machine signaling data and the time difference calculating speed of the timestamp are higher than 60KM/S, if being higher than
Then it is determined as drift data;
Step S2-5 obtains each subscriber signaling data group after being read and removed by step S2-2 to S2-4 and merges shape
At the data group to be analyzed.
5. it is according to claim 1 based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value, it is special
Sign is:
Wherein, the step S6 includes following sub-step:
Step S6-1, in chronological order by the data sorting to be analyzed in the current-user data group;
Simultaneously temporal data is added in step S6-2, the data to be analyzed being successively read in the current-user data group
Collection;
Step S6-3 judges that the temporal data concentrates whether the base station location of each data to be analyzed exceeds with dynamic threshold
Value D is in the round judgement range that diameter divides, if it is determined that otherwise to return to the step S6-2, if it is determined that be then to enter step
Rapid S6-4;
Step S6-4 determines that the temporal data is concentrated earliest and whether the time difference of the data to be analyzed the latest is greater than
The time threshold T, if it is determined that otherwise to empty the temporary data set and returning to the step S6-2, if it is determined that for be then into
Enter step S6-5;
Step S6-5 is stored the temporary data set as the dwell point of the active user, and is emptied described temporary
Data set;
Step S6-6, repeating said steps S6-2 are to the step S6-5 up to the number to be analyzed in the current-user data group
It is finished according to all reading.
6. it is according to claim 1 based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value, it is special
Sign is:
Wherein, the dynamic threshold calculation method is k nearest neighbor algorithm:
Using the base station location of the current cellular base station calculated as current location, and chosen recently with the current location
K cellular base stations as neighbor base station, finally calculate each neighbor base station at a distance from the current location from
And using the maximum value of the distance as the capacity-threshold of the current cellular base station calculated.
7. it is according to claim 6 based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value, it is special
Sign is:
Wherein, the value of the k is 15.
8. it is according to claim 1 based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value, it is special
Sign is:
Wherein, the time threshold T is 20 minutes.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910529806.2A CN110312206A (en) | 2019-06-19 | 2019-06-19 | Based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910529806.2A CN110312206A (en) | 2019-06-19 | 2019-06-19 | Based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110312206A true CN110312206A (en) | 2019-10-08 |
Family
ID=68076947
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910529806.2A Pending CN110312206A (en) | 2019-06-19 | 2019-06-19 | Based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110312206A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111132046A (en) * | 2019-12-27 | 2020-05-08 | 江苏欣网视讯软件技术有限公司 | Heuristic signaling data noise point filtering algorithm |
CN112351394A (en) * | 2020-11-03 | 2021-02-09 | 崔毅 | Traffic travel model construction method based on mobile phone signaling data |
CN113347574A (en) * | 2021-06-03 | 2021-09-03 | 中国联合网络通信集团有限公司 | Method and device for determining permanent station |
CN113469600A (en) * | 2020-03-31 | 2021-10-01 | 北京三快在线科技有限公司 | Travel track segmentation method and device, storage medium and electronic equipment |
CN113891378A (en) * | 2020-07-02 | 2022-01-04 | ***通信集团安徽有限公司 | Method and device for calculating coverage area of base station signal and calculating equipment |
CN114979995A (en) * | 2022-05-23 | 2022-08-30 | 智慧足迹数据科技有限公司 | Mobile phone signaling data simplifying method and device, electronic equipment and storage medium |
CN117119387A (en) * | 2023-10-25 | 2023-11-24 | 北京市智慧交通发展中心(北京市机动车调控管理事务中心) | Method and device for constructing user travel chain based on mobile phone signaling data |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105142106A (en) * | 2015-07-29 | 2015-12-09 | 西南交通大学 | Traveler home-work location identification and trip chain depicting method based on mobile phone signaling data |
CN107277765A (en) * | 2017-05-12 | 2017-10-20 | 西南交通大学 | A kind of mobile phone signaling track preprocess method based on cluster Outlier Analysis |
CN107909098A (en) * | 2017-11-09 | 2018-04-13 | 苏州大成电子科技有限公司 | A kind of city dweller's anchor point computational methods based on big data |
CN109104694A (en) * | 2018-06-26 | 2018-12-28 | 重庆市交通规划研究院 | A kind of user stop place discovery method and system based on mobile phone signaling |
CN109492704A (en) * | 2018-11-23 | 2019-03-19 | 济南浪潮高新科技投资发展有限公司 | A kind of dynamic classifier chain method of adjustment for multiple labeling classification |
-
2019
- 2019-06-19 CN CN201910529806.2A patent/CN110312206A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105142106A (en) * | 2015-07-29 | 2015-12-09 | 西南交通大学 | Traveler home-work location identification and trip chain depicting method based on mobile phone signaling data |
CN107277765A (en) * | 2017-05-12 | 2017-10-20 | 西南交通大学 | A kind of mobile phone signaling track preprocess method based on cluster Outlier Analysis |
CN107909098A (en) * | 2017-11-09 | 2018-04-13 | 苏州大成电子科技有限公司 | A kind of city dweller's anchor point computational methods based on big data |
CN109104694A (en) * | 2018-06-26 | 2018-12-28 | 重庆市交通规划研究院 | A kind of user stop place discovery method and system based on mobile phone signaling |
CN109492704A (en) * | 2018-11-23 | 2019-03-19 | 济南浪潮高新科技投资发展有限公司 | A kind of dynamic classifier chain method of adjustment for multiple labeling classification |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111132046A (en) * | 2019-12-27 | 2020-05-08 | 江苏欣网视讯软件技术有限公司 | Heuristic signaling data noise point filtering algorithm |
CN113469600A (en) * | 2020-03-31 | 2021-10-01 | 北京三快在线科技有限公司 | Travel track segmentation method and device, storage medium and electronic equipment |
CN113891378A (en) * | 2020-07-02 | 2022-01-04 | ***通信集团安徽有限公司 | Method and device for calculating coverage area of base station signal and calculating equipment |
CN113891378B (en) * | 2020-07-02 | 2023-09-05 | ***通信集团安徽有限公司 | Method and device for calculating signal coverage of base station and calculating equipment |
CN112351394A (en) * | 2020-11-03 | 2021-02-09 | 崔毅 | Traffic travel model construction method based on mobile phone signaling data |
CN113347574A (en) * | 2021-06-03 | 2021-09-03 | 中国联合网络通信集团有限公司 | Method and device for determining permanent station |
CN113347574B (en) * | 2021-06-03 | 2023-04-07 | 中国联合网络通信集团有限公司 | Method and device for determining ordinary station |
CN114979995A (en) * | 2022-05-23 | 2022-08-30 | 智慧足迹数据科技有限公司 | Mobile phone signaling data simplifying method and device, electronic equipment and storage medium |
CN117119387A (en) * | 2023-10-25 | 2023-11-24 | 北京市智慧交通发展中心(北京市机动车调控管理事务中心) | Method and device for constructing user travel chain based on mobile phone signaling data |
CN117119387B (en) * | 2023-10-25 | 2024-01-23 | 北京市智慧交通发展中心(北京市机动车调控管理事务中心) | Method and device for constructing user travel chain based on mobile phone signaling data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110312206A (en) | Based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value | |
CN106912015B (en) | Personnel trip chain identification method based on mobile network data | |
CN108415975B (en) | BDCH-DBSCAN-based taxi passenger carrying hot spot identification method | |
CN108959466B (en) | Taxi passenger carrying hot spot visualization method and system based on BCS-DBSCAN | |
CN101442762B (en) | Method and apparatus for analyzing network performance and locating network fault | |
CN103077604B (en) | traffic sensor management method and system | |
CN109782763A (en) | A kind of method for planning path for mobile robot under dynamic environment | |
CN107168342B (en) | Pedestrian trajectory prediction method for robot path planning | |
CN110276966B (en) | Intersection signal control time interval dividing method | |
CN104537836B (en) | Link travel time distribution forecasting method | |
CN109729518B (en) | Mobile phone signaling-based urban traffic early peak congestion source identification method | |
CN104240496B (en) | A kind of determination method and apparatus of trip route | |
CN105991674A (en) | Information push method and device | |
CN110119408B (en) | Continuous query method for moving object under geospatial real-time streaming data | |
CN109284773A (en) | Traffic trip endpoint recognition methods based on multilayer Agglomerative Hierarchical Clustering algorithm | |
CN106326923A (en) | Sign-in position data clustering method in consideration of position repetition and density peak point | |
CN108108883B (en) | Clustering algorithm-based vehicle scheduling network elastic simplification method | |
CN104376084B (en) | Similarity of paths computational methods and device | |
CN104125582A (en) | Method of planning communication network | |
CN108538054A (en) | A kind of method and system obtaining traffic information based on mobile phone signaling data | |
CN103473420B (en) | The automatic positioning method of statistical graph in a kind of statistical maps | |
CN115474206A (en) | Real-time people number determination method and device, electronic equipment and storage medium | |
CN110933601B (en) | Target area determination method, device, equipment and medium | |
CN109391946A (en) | A kind of method and device of base station cluster planning | |
AU2021102429A4 (en) | Method for selecting roads in a small-mesh accumulation area |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20191008 |
|
WD01 | Invention patent application deemed withdrawn after publication |