CN110312206A - Based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value - Google Patents

Based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value Download PDF

Info

Publication number
CN110312206A
CN110312206A CN201910529806.2A CN201910529806A CN110312206A CN 110312206 A CN110312206 A CN 110312206A CN 201910529806 A CN201910529806 A CN 201910529806A CN 110312206 A CN110312206 A CN 110312206A
Authority
CN
China
Prior art keywords
data
base station
mobile phone
signaling data
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910529806.2A
Other languages
Chinese (zh)
Inventor
王德
江贺韬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tongji University
Original Assignee
Tongji University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tongji University filed Critical Tongji University
Priority to CN201910529806.2A priority Critical patent/CN110312206A/en
Publication of CN110312206A publication Critical patent/CN110312206A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/02Services making use of location information
    • H04W4/025Services making use of location information using location based information parameters
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/20Services signaling; Auxiliary data signalling, i.e. transmitting data via a non-traffic channel
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W8/00Network data management
    • H04W8/18Processing of user or subscriber data, e.g. subscribed services, user preferences or user profiles; Transfer of user or subscriber data

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The present invention provides a kind of based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value, mobile phone signaling data for being acquired according to each cellular base station carries out identification to the trip situation of each user to obtain the individual Trip chain of user, it is characterized in that, include the following steps: step S1, obtains mobile phone signaling data;Step S2 cleans mobile phone signaling data;Step S3 determines dynamic threshold D corresponding with each cellular base station according to base station location by preset dynamic threshold calculation method;Step S4, classification obtain user's data group to be analyzed;Step S5 sequentially chooses one group of user's data group to be analyzed;Step S6, the data to be analyzed being successively read in current-user data group, and according to the dwell point of corresponding dynamic threshold D judgement active user, step S7, it generates individual Trip chain and is stored, step S8, repeat step S5 to step S7 and finished up to all users data to be analyzed are processed.

Description

Based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value
Technical field
The invention belongs to urban planning fields, are related to a kind of trip recognition methods based on mobile phone signaling data, specifically relate to And it is a kind of based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value.
Background technique
In recent years, mobile phone signaling data is applied to the every field of urban study and urban planning on a large scale, is arrived greatly City space and space structure, Urban Traffic Planning, urban population and job distribution, city real population, it is small to individual Behavior patterns mining.When being analyzed by mobile phone signaling data, it usually needs first obtained according to mobile phone signaling data each The individual Trip chain of mobile phone user, to preferably be divided the behavior pattern of user in city according to the individual Trip chain Analysis.
In the prior art, space clustering procedure identifies individual Trip chain when general use, when this method mainly passes through Between threshold value and capacity-threshold identification user stop, and different places twice stop between identify primary trip.
However, the capacity-threshold of space clustering procedure is using fixed threshold when above-mentioned, it, should under the research of City-scale Fixed threshold inevitably results in that accuracy of identification is uneven, i.e., forms " owing identification " in the intensive place of the base station distributions such as downtown, In suburb etc., the place of base station distributions coefficient forms " cross and identify ".And reduce threshold value can to avoid " owe identification " in certain places, But " cross and identify " of more location certainly will be will cause, vice versa.
Therefore, some researchers are to improve accuracy of identification, downtown base station intensively locate and edge base station coefficient at into The adjustment of row capacity-threshold;Also there is researcher according to base station distribution density, city is divided into three areas, takes different spaces respectively Threshold value, i.e. " partition space threshold value ".Although above two method can avoid to a certain extent accuracy of identification is non-uniform from asking Topic, but inside subregion, can still there are problems that " cross and identify " and " owing identification ".
Summary of the invention
To solve the above problems, providing a kind of based on dynamic space threshold value improved mobile phone signaling data trip identification side Method, present invention employs following technical solutions:
The present invention provides one kind based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value, is used for root Identification is carried out to the trip situation of each user to obtain of user according to the mobile phone signaling data of each cellular base station acquisition Body Trip chain, which comprises the steps of: step S1 obtains mobile phone signaling data, which at least wraps Base station location containing customer identification number, timestamp and corresponding cellular base station;Step S2 cleans mobile phone signaling data Processing is to obtain data to be analyzed;Step S3 is successively determined by preset dynamic threshold calculation method according to base station location Dynamic threshold D corresponding with each cellular base station;Step S4 is analysed to data and classify obtaining according to customer identification number To user's data group to be analyzed of each user of correspondence;It is to be analyzed sequentially to choose one group of user according to customer identification number by step S5 Current-user data group is used as in data group;Step S6 is successively read to be analyzed in current-user data group in chronological order Data, and determine according to corresponding dynamic threshold D and preset time threshold T the dwell point of active user, step S7, according to Dwell point and current-user data generate the individual Trip chain of active user and store, and step S8 repeats step S5 extremely Step S7 is finished up to all users data to be analyzed are processed.
It is provided by the invention to be based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value, can also have Such technical characteristic, wherein dynamic threshold calculation method is k covering of the fan method: with the base station position of current cellular base station calculated It sets as current location, and divides k covering of the fan centered on current location, further selected in each covering of the fan according to current location Nearest cellular base station is taken to obtaining k neighbor base station, finally calculate each neighbor base station at a distance from current location thus Using the maximum value of distance as the capacity-threshold of current cellular base station calculated.
It is provided by the invention to be based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value, can also have Such technical characteristic, wherein the value of k is 6.
It is provided by the invention to be based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value, can also have Such technical characteristic, wherein step S2 includes following sub-step: step S2-1, by all mobile phone signaling datas according to user Identifier carries out classification to obtain multiple subscriber signaling data groups;Step S2-2 is successively read each subscriber signaling data group, And the mobile phone signaling data that record is repeated in each subscriber signaling data group is recorded into data dump as repetition;Step S2-3, Be successively read each subscriber signaling data group, and judge each mobile phone signaling data in each subscriber signaling data group two-by-two it Between timestamp whether have intra-record slack byte less than 1s, if being less than 1s using latter mobile phone signaling data as pingpang handoff number According to removing;Step S2-4 is successively read each subscriber signaling data group, and judges each subscriber signaling according to drift determination method Whether there is drift data in each mobile phone signaling data in data group, drift data is removed if having, drift about determination method Are as follows: mobile phone signaling data is sorted in chronological order, and according to the distance of the base station location of two neighboring mobile phone signaling data with And whether the time difference calculating speed of timestamp is higher than 60KM/S, is determined as drift data if being higher than;Step S2-5 obtains quilt Each subscriber signaling data group and merging after step S2-2 to S2-4 reading and removing form data group to be analyzed.
It is provided by the invention to be based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value, can also have Such technical characteristic, wherein step S6 includes following sub-step: step S6-1, in chronological order by current-user data group In data sorting to be analyzed;Step S6-2, the data to be analyzed being successively read in a current-user data group are simultaneously added temporary Deposit data collection;Step S6-3 judges that temporal data concentrates whether the base station location of each data to be analyzed exceeds with dynamic threshold D In the round judgement range divided for diameter, if it is determined that for otherwise return step S6-2, if it is determined that be to enter step S6-4; Step S6-4 determines that temporal data is concentrated earliest and whether the time difference of data to be analyzed the latest is greater than time threshold T, if It is judged to otherwise emptying temporary data set and return step S6-2, if it is determined that be to enter step S6-5;Step S6-5, will be temporary Deposit data collection is stored as the dwell point of active user, and empties temporary data set;Step S6-6 repeats step S6-2 extremely Step S6-5 is until the data to be analyzed in current-user data group all read and finish.
It is provided by the invention to be based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value, can also have Such technical characteristic, wherein dynamic threshold calculation method is k nearest neighbor algorithm: with the base station of current cellular base station calculated K nearest cellular base station is chosen as neighbor base station as current location, and using current location in position, finally calculates each neighbour Nearly base station is at a distance from current location to using the maximum value of distance as the capacity-threshold of current cellular base station calculated.
It is provided by the invention to be based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value, can also have Such technical characteristic, wherein the value of k is 15.
It is provided by the invention to be based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value, can also have Such technical characteristic, wherein time threshold T is 20 minutes.
Invention action and effect
It is according to the present invention to be based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value, due to by dynamic State threshold value calculation method is calculated the dynamic threshold of corresponding each cellular base station, and according to the dynamic threshold and it is preset when Between threshold value mobile phone signaling data is identified, therefore can more accurately obtain the individual Trip chain of each user, reduce real Each cellular base station distribution density unevenness bring error in the city of border, thus allow it is subsequent according to individual Trip chain carry out resident Behavioural analysis and Urban Planning can obtain stronger support, and analysis result is made more to be consistent with practical.The present invention is logical It crosses dynamic threshold and solves fixed space threshold value bring identification error, identify more trips in downtown, compensate for " center Owe identification " the problem of;And the trip in the identification of the place such as suburb and new city is less, the problem of compensating for " identification is crossed in suburb ".
Detailed description of the invention
Fig. 1 is the flow chart of mobile phone signaling data trip recognition methods in the embodiment of the present invention;
Fig. 2 is the drift data schematic diagram of the embodiment of the present invention;
Fig. 3 is the signal of the value of time threshold and capacity-threshold and the identification relationship of trip number in the embodiment of the present invention Figure;
Fig. 4 is the schematic diagram of " jump base station " phenomenon in the embodiment of the present invention;
Fig. 5 is the schematic diagram that fixed space threshold value owes identification in the embodiment of the present invention;
Fig. 6 is K nearest neighbor algorithm and the schematic diagram of K covering of the fan method in the embodiment of the present invention;
Fig. 7 is the schematic diagram of the root-mean-square error for the result that various method different parameters obtain in the embodiment of the present invention;With And
The schematic diagram of space clustering procedure identification trip when Fig. 8 is in the embodiment of the present invention.
Specific embodiment
In order to be easy to understand the technical means, the creative features, the aims and the efficiencies achieved by the present invention, tie below Examples and drawings are closed to the present invention is based on the improved mobile phone signaling data trip recognition methods works of dynamic space threshold value specifically to explain It states.
<embodiment>
In the present embodiment, operation has that the present invention is based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value Computer software can in the analytic process of mobile phone signalling analysis, to the identification that the individual Trip chain of user optimizes, The operation method of the computer software is as follows:
Fig. 1 is the flow chart of mobile phone signaling data trip recognition methods in the embodiment of the present invention.
As shown in Figure 1, the trip recognition methods of mobile phone signaling data specifically comprises the following steps:
Step S1 is obtained by the mobile phone signaling data of each cellular base station acquisition in city, subsequently into step S2.
In the present embodiment, each cellular base station can be understood as one " point " in the spatial dimension in city, whenever mobile phone The behaviors Shi Huiyu cellular base stations such as switching on and shutting down, making and receiving calls, transmitting-receiving short message occur and generate information exchange, at this point, the user (i.e. should Mobile phone) it a record can be generated is recorded by corresponding cellular base station.That is, at corresponding time point and specific position (base station position Set) record (mobile phone signaling data) comprising space time information is generated, the format of the record is as shown in table 1:
The field information of 1 mobile phone signaling data of table
In table 1, MSID represents the unique identifier (i.e. customer identification number) of each mobile phone;Timestamp represents timestamp, I.e. the time that signaling exchange occurs for mobile phone and base station, represents at 21 points in evening on January 9th, 2,019 09 minute 23 such as 20190109210923 Second;LAC and CellID represents the Position Number of base station jointly, this number represents particular geographic location (the i.e. base station position of base station It sets);EventID represents mobile phone and the concrete type of signal exchange, such as switching on and shutting down, transmitting-receiving short message, making and receiving calls occur for base station; Can Flag represent mobile phone and be tracked, and 1 represents and can track, and 0 represents and cannot track, and in general the value is all 1, for 0 value, Directly rejected from initial data.
Step S2 starts the cleaning processing the mobile phone signaling data obtained in step S1 to obtain data to be analyzed, have Body includes step S2-1 to step S2-4.
Mobile phone signaling data is not intended to planning or academic research and generates, but the data of operator run by-product Object.Data generate during can generate some noise datas or abnormal data unavoidably, if not to these abnormal datas into Row processing, then will necessarily cause inevitably to influence on subsequent recognition result, it is therefore desirable to by step S2 to mobile phone Noise data in signaling data is purged to guarantee the credibility of result of study.In the present embodiment, noise data Type is divided into repetition record data, pingpang handoff data and drift data, accounting such as 2 institute of table in mobile phone signaling data Show.
All kinds of noise datas of table 2 and accounting
Noise data type Data volume accounting
Pingpang handoff data 42%
Repeated data 15.5%
Drift data 0.4%
All mobile phone signaling datas are carried out classification according to customer identification number to obtain multiple subscriber signalings by step S2-1 Data group, subsequently into step S2-2.
Step S2-2 is successively read each subscriber signaling data group, and will repeat to record in each subscriber signaling data group Mobile phone signaling data as repeat record data dump, subsequently into step S2-3.
Strictly speaking, repeating record data is not to repeat the data of record, but generate a plurality of note in a very short period of time Record, and the time is recorded even within the same second, therefore looks like data in data record and repetition has occurred.Although this kind of number According to being not exception, but expressed information is redundancy, if do not removed this kind of noise data, can reduce the effect of identification Rate.This kind of data account for the 15.5% of total amount of data.The processing method of this kind of data is only protected in a plurality of completely the same data Stay wherein one.
Step S2-3 is successively read each subscriber signaling data group, and judges each in each subscriber signaling data group Whether the timestamp of mobile phone signaling data between any two has intra-record slack byte less than 1s, by latter mobile phone signaling if being less than 1s Data are as pingpang handoff data dump, subsequently into step S2-4.
Pingpang handoff data are noise datas very common in mobile phone signaling data, and accounting is up to 42% (such as 2 institute of table Show).The form of this kind of data is: in a short time, the high-frequency data that user generates in two or more base stations.Due to user's meeting It is recorded by nearest base station, and when user is in the edge of the signal cover of two or more base stations, it will generate frequently Base station switching, thus produce pingpang handoff data.
The information of pingpang handoff data representation is all " a certain position that user is in multiple base station signals boundary ", and mistake More data volumes can also reduce recognition efficiency, therefore can handle this kind of data.Processing method is as follows: judging two notes Whether record interval if it is less than 1 second is then integrated into the base station of later record earlier records less than 1 second.
In the research of other researchers, some researchers can be multiple to the comparison for identifying and judgeing processing of pingpang handoff It is miscellaneous.But since this kind of data result between neighbor base station, the capacity-threshold in Trip chain recognition methods can be to neighbor base station It carries out " fault-tolerant ".So handling this data with simple method.
Step S2-4 is successively read each subscriber signaling data group, and judges that each user believes according to drift determination method Whether have drift data, drift data is removed if having, subsequently into step if enabling in each mobile phone signaling data in data group Rapid S2-5.
The situation that drift data occurs is: continuously recorded at three of a certain user it is inner (with time-sequencing), it is intermediate that The position of record is far from other records, as shown in Fig. 2, using the point of triangle mark as the corresponding base station position of drift data in figure Set, and with the activity of normal users for, such travel behaviour should not be generated in city.In general, user is mobile When speed (space length of two records in front and back is divided by time interval i.e. in mobile phone signaling data) is greater than 60KM/H, that is, it can determine whether Drift data has occurred.
Therefore, in the present embodiment, step S2-4 use drift determination method are as follows: by by mobile phone signaling data temporally Sequence sorts, and according to the distance of the base station location of two neighboring mobile phone signaling data and the time difference calculating speed of timestamp Whether it is higher than 60KM/H, is determined as drift data if being higher than.
Step S2-5 obtains each subscriber signaling data group after being read and removed by step S2-2 to S2-4 and merges shape At data group to be analyzed, subsequently into step S3.
In the present embodiment, mobile phone signaling data is divided into multiple subscriber signaling data groups in step S2-1, however In the actual moving process of program, directly all mobile phone signaling datas can be ranked up according to customer identification number therein, Successively to the processing that is purged of each customer identification number, complete to remove the mobile phone signaling data of processing to be data to be analyzed.
Generally, after pre-processing to data (i.e. after the completion of step S2), it just will do it the identification of individual Trip chain.Though So the researcher that also has is using the methods of machine learning identification trip, but space is poly- when the Trip chain recognition methods of more mainstream is Class method, but the researcher having is called DBSCAN algorithm.The thinking of the algorithm is used by time threshold and capacity-threshold identification The stop at family, and primary trip is identified between the stop of different places twice.
However, in the above-mentioned methods, key is the setting of time threshold T and capacity-threshold D, different threshold values sets meeting Different recognition results (as shown in Figure 3) is brought, capacity-threshold and time threshold are smaller, and the trip number identified is more.? Past has had many researchers before identification trip in this way, the two threshold values are discussed and are taken It is worth (as shown in table 3), wherein the source of value is generally derived from the definition in trip survey to trip, or with reference to existing research Experience value.
The threshold value selection and reason of space clustering procedure when 3 existing research of table uses
But from table 3 it can be found that early stage application in, when capacity-threshold selection be all a kind of trial, in addition to It opens dimension and attempts the influence of different time threshold and capacity-threshold to trip number, other researchers are not to the appropriate of value Property is discussed.Although generally all can be defined as referring to trip according in conventional survey, mobile phone signaling data is simultaneously It is not perfect data, intra-record slack byte is sparse, and spatial accuracy depends on the coordinate and density of base station.Thus it is completely dependent on tradition It is unreasonable that the definition of investigation, which does restriction for the identification of data in mobile phone, it is also necessary to which the feature according to data in mobile phone itself is adjusted It is whole.
Capacity-threshold is one of the core parameter in recognizer, on the one hand the generation of capacity-threshold is due to adjusting in tradition The definition of trip is required in looking into, on the other hand more importantly the spatial accuracy of mobile phone signaling data is close dependent on base station Degree, when user is recorded by base station, illustrates the physical location of user in the signal service range of base station.Some researchers The service range of base station delimited by Thiessen polygon, but the service range of base station is not that stringent delimit is distinguished, but has Overlapped part.When user is in service range overlay region, even if no change has taken place for physical location, it is also possible to quilt Several adjacent base stations of periphery are recorded, also known as " jump base station " phenomenon.
By taking Fig. 4 as an example, the point on the track shows one day track of user, which has apparent trip on and off duty Behavior, but " jumping base station " phenomenon (part dotted line frame a in Fig. 4) of an adjacent base station has occurred at night." jump base station " due to Record time interval feature and normal data do not have difference, therefore can not be gone by way of similar cleaning " pingpang handoff data " It removes, and the effect of capacity-threshold then can be used as a kind of " spatial tolerance ", prevent " jumping base station " phenomenon of neighbor base station from accidentally being known It Wei not go on a journey.Capacity-threshold value is smaller, and qualified point is more, and the trip number that can be identified is more;Capacity-threshold value Bigger, then it is considered as stopping, therefore the trip number identified is with regard to less that more short distances, which goes out guild,.Although threshold value value is got over Small, the travel amount of identification is more, but recognition effect should not be using identified amount how much as judging quota, because of identification more than these In trip, there may be part to be " jump base station " and the trip misidentified.
The value of the capacity-threshold of other researchers is mostly 500m at present, according to be mostly conventional survey trip definition and Forefathers' research experience numerical value.If some researchers have taken Guangzhou to investigate in multiple definition using conventional survey as foundation The trip of 500m defines, but there is also 300 meters (Hangzhou), 400 meters (Shanghai) of values;In the trip research in the U.S., even Trip distance or travel time are not limited.Since early stage studies tentative 500m value, and made by later researcher It is used till today for empirical value, actual discussion is also insufficient.
In practical applications, the density of cellular base station difference in the other range of City-level is very big.At city center, Base station number is up to 20-40 base station in the grid of 500*500m, and positioning accuracy is high;And two base station distances in suburb are up to 2km or more, positioning accuracy are low.Therefore, the case where " identification is owed at center, and identification is crossed in suburb " being will cause using fixed space threshold value (as shown in Figure 5).By taking 500m fixed threshold as an example, if in downtown the trip that distance is 300m occurs for user, it is using 500m Threshold value can not then identify current trip, to cause " owing identification " phenomenon (Fig. 5 left figure);And suburb base distance between sites often Greater than 500m, " jumping base station " phenomenon of 500m or more such as occurs, then may be misidentified by 500m capacity-threshold at trip, to make At " cross and identify " phenomenon (Fig. 5 right figure).Although reducing threshold value can certainly will will cause more to avoid " the owing identification " in certain places " cross and identify " in more places, vice versa.Under the research of City-scale, take the capacity-threshold of unified fixation can be to causing to know Other precision is uneven.
Some researchers think to improve accuracy of identification, it should downtown base station intensively locate and edge base station coefficient at Carry out the adjustment of capacity-threshold.Also there is researcher according to base station distribution density, city is divided into three areas, takes different skies respectively Between threshold value, i.e. " partition space threshold value ".Partition space threshold value can avoid the non-uniform problem of accuracy of identification to a certain extent, But can still there are problems that " cross and identify " and " owing identification " inside subregion.
Therefore, in order to solve the problems, such as the above method, the invention proposes the capacity-thresholds of each base station can According to peripheral base station density and distance, threshold size (dynamic threshold D is calculated by step S3) is dynamically adjusted, to solve The problem of " cross and identify " and " owing identification ".
Step S3 is successively determined and each cellular base station by preset dynamic threshold calculation method according to base station location Corresponding dynamic threshold D.
In the present embodiment, three kinds of dynamic threshold calculation methods are inventors herein proposed:
1), density algorithm: the distribution density of calculation base station, and be the relation function of building density and capacity-threshold, from And carry out the conversion of density and capacity-threshold;
2), K nearest neighbor algorithm: as shown in Fig. 6 (a), using the base station location of current cellular base station calculated as present bit It sets, and k nearest cellular base station is chosen as neighbor base station using current location, finally calculate each neighbor base station and present bit The distance set is to using the maximum value of distance as the capacity-threshold of current cellular base station calculated;
3), K covering of the fan algorithm: as shown in Fig. 6 (b), using the base station location of current cellular base station calculated as present bit It sets, and divides k covering of the fan centered on current location, nearest mobile phone is further chosen in each covering of the fan according to current location Base station finally calculates each neighbor base station at a distance from current location thus by the maximum of distance to obtain k neighbor base station It is worth the capacity-threshold as current cellular base station calculated.
However in the above-mentioned methods, density algorithm calculates the simplest, but cannot embody the relationship between neighbor base station, And it is related to the transformational relation of density and capacity-threshold, therefore the present invention wouldn't use.
And in latter two dynamic threshold algorithm, K nearest neighbor algorithm is more intuitive, and K covering of the fan algorithm can be to avoid peripheral base station The larger situation of distance difference.And in both algorithms, K is the parameter value of algorithm, and the different values of K will lead to different knots Fruit.Therefore in order to evaluate the difference of two kinds of algorithms and select suitable parameters, inventor takes following steps to evaluate:
Step A. randomly selects 100 users, 1 day record, and visualizes to track, to this 100 users' Number of going on a journey carries out manual identified, collects as verifying.
Step B. with the fixed threshold of value 300-2000m, the K nearest neighbor algorithm of k value 6-19, K value 3-9 K covering of the fan These three obtaining value methods of algorithm substitute into Trip chain recognition methods to the trip number of this 100 users respectively as capacity-threshold It is identified.
Step C. verifies the recognition result of three kinds of obtaining value methods with manual identified result, and root-mean-square error is calculated (MSE), and the algorithm put up the best performance is selected.
By above step, the knot of available fixed threshold, K nearest neighbor algorithm and K covering of the fan algorithm under different parameters Fruit, as shown in Figure 7.It can be seen that when using fixed threshold, as value increases, error decline, when the error that value is 900m Minimum, subsequent value increase error and are risen;And the rule in K nearest neighbor algorithm is similar, when K increases to 15, error is minimum, Then also increase as K increases error;In K covering of the fan algorithm, rule is also similar, and as K=6, error is minimum.
As it can be seen that the error of K covering of the fan method is the smallest (1.69) in the parameter that three kinds of methods are put up the best performance, compared to fixation Adjacent to the optimal parameter of method, accuracy promotes 11.7% and 5.06% respectively by threshold value and K.In this dynamic threshold calculation method Under, the capacity-threshold value of central city nucleus in 500m hereinafter, the value at central city and new city center be 500-1000m, Suburbs value is 100-1500m, outer suburbs value 2000m or more.This distribution and the introducing purpose of dynamic space threshold value are coincide, and are made Obtain, on the contrary threshold value height low in the high location space threshold value of base station density.
Therefore, the dynamic threshold D for calculating each cellular base station in the present embodiment by the K covering of the fan method of K=6, subsequently enters Step S4.
Step S4 is analysed to data according to customer identification number and carries out classification so that the user for obtaining corresponding to each user waits for Analyze data group.
In the present embodiment, each group of user's data group to be analyzed all includes all data to be analyzed of corresponding user.
Step S5 is successively chosen in one group of user's data group to be analyzed according to customer identification number as current-user data Group.
Step S6, the data to be analyzed being successively read in current-user data group in chronological order, and according to corresponding dynamic State threshold value D and preset time threshold T determines the dwell point of active user, specifically includes step S6-1 to step S6-6:
Step S6-1, in chronological order by the data sorting to be analyzed in current-user data group;
Simultaneously temporary data set is added in step S6-2, the data to be analyzed being successively read in a current-user data group;
Step S6-3 judges that temporal data concentrates whether the base station location of each data to be analyzed exceeds with dynamic threshold D In the round judgement range divided for diameter, if it is determined that for otherwise return step S6-2, if it is determined that be to enter step S6-4;
Step S6-4 determines that temporal data is concentrated earliest and whether the time difference of data to be analyzed the latest is greater than the time Threshold value T, if it is determined that otherwise to empty temporary data set and return step S6-2, if it is determined that be to enter step S6-5;
Step S6-5 is stored temporary data set as the dwell point of active user, and is emptied temporary data set;
Step S6-6 repeats step S6-2 to step S6-5 until the data to be analyzed in current-user data group are all read It finishes.
In the step S6 of the present embodiment, when identifying the dwell point of user, by taking Fig. 8 as an example, wherein the following figure is some user Distribution of the track (positions of i.e. multiple continuous mobile phone signaling datas) on space plane, upper figure is the track edge of the user Distribution on time (vertical direction in figure).The user record is gradually read by step S6-2 to step S6-3, until record It can no longer be lived by the circle that diameter is D, judge that the record time range Δ t1 in circle is longer than time threshold T again at this time, therefore should Temporary data set out can be identified as 1 stop;Continue to read record, until there is record that can be lived by the circle that diameter is D again, But the time span Δ t2 recorded in this hour circle is shorter than time threshold T, therefore cannot be identified as 1 stop.In this section of example In, which can be identified 1 time and stop and 1 trip.
Step S7 generates the individual Trip chain of active user according to dwell point and current-user data and stores.
In the step S7 of the present embodiment, every two dwell point before and after each user is identified as once going on a journey, further root Individual Trip chain is generated according to each dwell point and trip of the user.
In the present embodiment, individual Trip chain includes customer identification number, the base station location of origin base station, initial time, end Base station location, terminal time and the trip situation of point base stations.In the field of trip situation, 0 indicates that stop, 1 indicate trip.
Step S8 repeats step S5 to step S7 and finishes up to all users data to be analyzed are processed.
It, in step s 5 all can be according to user when step S8 repeats step S5 to step S7 in the present embodiment Identifier is chosen at one group of user's data group to be analyzed after current-user data group as new current-user data group, thus Realize traversal completely.
Embodiment action and effect
It is based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value according to provided in this embodiment, due to The dynamic threshold of corresponding each cellular base station is calculated by dynamic threshold calculation method, and according to the dynamic threshold and in advance If time threshold mobile phone signaling data is identified, therefore can more accurately obtain the individual Trip chain of each user, Each cellular base station distribution density unevenness bring error in actual cities is reduced, to allow subsequent according to the progress of individual Trip chain Behaviour analysis and Urban Planning can obtain stronger support, make analyze result with actually be more consistent.This Invention solves fixed space threshold value bring identification error by dynamic threshold, identifies more trips in downtown, makes up The problem of " center owing identification ";And the trip in the identification of the place such as suburb and new city is less, compensates for " identification is crossed in suburb " Problem.
In embodiment, due to being calculated by k covering of the fan method dynamic threshold, under conditions of using k=6, this method Accuracy rate of the obtained dynamic threshold when identifying user's trip is relative in the accuracy rate by existing fixed space threshold value 11.7% or so is risen, effect is optimal in each method that the present invention verifies.
In embodiment, due to repetition record data, pingpang handoff data and the drift data in mobile phone signaling data It removes, and obtains data to be analyzed, to further reduce meeting when identifying individual Trip chain by mobile phone signaling data The error of generation optimizes final analysis result.
In embodiment, due to being concentrated by the way that the data to be analyzed read are temporarily stored in temporal data, and judge each wait divide The base station location of data is analysed whether using dynamic threshold as in the range of the circle of diameter, therefore preferably whether can judge user In a region stay longer, to keep the individual Trip chain generated more accurate.
In embodiment, due to can with k nearest neighbor algorithm calculate dynamic threshold, although this method be not it is optimal, in k value When being 15, the accuracy rate that the accuracy rate when identifying user's trip was calculated relative to the past by fixed threshold is still improved 5.06%.
Above-described embodiment is only used for the specific embodiment illustrated the present invention, and the present invention is not limited to the above embodiments Description range.

Claims (8)

1. one kind is based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value, for according to each cellular base station The mobile phone signaling data of acquisition carries out identifying to obtain the individual Trip chain of the user to the trip situation of each user, It is characterized in that, includes the following steps:
Step S1 obtains the mobile phone signaling data, which includes at least customer identification number, timestamp and right The base station location for the cellular base station answered;
Step S2 starts the cleaning processing the mobile phone signaling data to obtain data to be analyzed;
Step S3 is successively determined and each mobile phone by preset dynamic threshold calculation method according to the base station location The corresponding dynamic threshold D in base station;
The data to be analyzed are carried out classification according to the customer identification number to obtain corresponding to each user by step S4 User's data group to be analyzed;
Step S5, according to the customer identification number sequentially choose one group described in user's data group to be analyzed as active user's number According to group;
Step S6, the data to be analyzed being successively read in the current-user data group in chronological order, and according to corresponding The dynamic threshold D and preset time threshold T determine active user dwell point,
Step S7 goes forward side by side according to the individual Trip chain that the dwell point and the current-user data generate the active user Row storage,
Step S8, repeating said steps S5 are finished to step S7 up to all user's data to be analyzed are processed.
2. it is according to claim 1 based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value, it is special Sign is:
Wherein, the dynamic threshold calculation method is k covering of the fan method:
Using the base station location of the current cellular base station calculated as current location, and drawn centered on the current location Divide k covering of the fan, chooses the nearest cellular base station in each covering of the fan according to the current location further to obtain k Neighbor base station finally calculates each neighbor base station at a distance from the current location to make the maximum value of the distance For the capacity-threshold of the current cellular base station calculated.
3. it is according to claim 2 based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value, it is special Sign is:
Wherein, the value of the k is 6.
4. it is according to claim 1 based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value, it is special Sign is:
Wherein, step S2 includes following sub-step:
All mobile phone signaling datas are carried out classification according to the customer identification number to obtain multiple users by step S2-1 Signaling data group;
Step S2-2 is successively read each subscriber signaling data group, and will repeat in each subscriber signaling data group The mobile phone signaling data of record is as repetition record data dump;
Step S2-3 is successively read each subscriber signaling data group, and judges in each subscriber signaling data group Whether the timestamp of each mobile phone signaling data between any two has intra-record slack byte less than 1s, by latter item if being less than 1s The mobile phone signaling data is as pingpang handoff data dump;
Step S2-4 is successively read each subscriber signaling data group, and judges each use according to drift determination method Whether there is drift data in each mobile phone signaling data in the signaling data group of family, the drift number is removed if having According to,
The drift determination method are as follows: the mobile phone signaling data sorts in chronological order, and according to the two neighboring hand Whether the distance of the base station location of machine signaling data and the time difference calculating speed of the timestamp are higher than 60KM/S, if being higher than Then it is determined as drift data;
Step S2-5 obtains each subscriber signaling data group after being read and removed by step S2-2 to S2-4 and merges shape At the data group to be analyzed.
5. it is according to claim 1 based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value, it is special Sign is:
Wherein, the step S6 includes following sub-step:
Step S6-1, in chronological order by the data sorting to be analyzed in the current-user data group;
Simultaneously temporal data is added in step S6-2, the data to be analyzed being successively read in the current-user data group Collection;
Step S6-3 judges that the temporal data concentrates whether the base station location of each data to be analyzed exceeds with dynamic threshold Value D is in the round judgement range that diameter divides, if it is determined that otherwise to return to the step S6-2, if it is determined that be then to enter step Rapid S6-4;
Step S6-4 determines that the temporal data is concentrated earliest and whether the time difference of the data to be analyzed the latest is greater than The time threshold T, if it is determined that otherwise to empty the temporary data set and returning to the step S6-2, if it is determined that for be then into Enter step S6-5;
Step S6-5 is stored the temporary data set as the dwell point of the active user, and is emptied described temporary Data set;
Step S6-6, repeating said steps S6-2 are to the step S6-5 up to the number to be analyzed in the current-user data group It is finished according to all reading.
6. it is according to claim 1 based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value, it is special Sign is:
Wherein, the dynamic threshold calculation method is k nearest neighbor algorithm:
Using the base station location of the current cellular base station calculated as current location, and chosen recently with the current location K cellular base stations as neighbor base station, finally calculate each neighbor base station at a distance from the current location from And using the maximum value of the distance as the capacity-threshold of the current cellular base station calculated.
7. it is according to claim 6 based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value, it is special Sign is:
Wherein, the value of the k is 15.
8. it is according to claim 1 based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value, it is special Sign is:
Wherein, the time threshold T is 20 minutes.
CN201910529806.2A 2019-06-19 2019-06-19 Based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value Pending CN110312206A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910529806.2A CN110312206A (en) 2019-06-19 2019-06-19 Based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910529806.2A CN110312206A (en) 2019-06-19 2019-06-19 Based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value

Publications (1)

Publication Number Publication Date
CN110312206A true CN110312206A (en) 2019-10-08

Family

ID=68076947

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910529806.2A Pending CN110312206A (en) 2019-06-19 2019-06-19 Based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value

Country Status (1)

Country Link
CN (1) CN110312206A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111132046A (en) * 2019-12-27 2020-05-08 江苏欣网视讯软件技术有限公司 Heuristic signaling data noise point filtering algorithm
CN112351394A (en) * 2020-11-03 2021-02-09 崔毅 Traffic travel model construction method based on mobile phone signaling data
CN113347574A (en) * 2021-06-03 2021-09-03 中国联合网络通信集团有限公司 Method and device for determining permanent station
CN113469600A (en) * 2020-03-31 2021-10-01 北京三快在线科技有限公司 Travel track segmentation method and device, storage medium and electronic equipment
CN113891378A (en) * 2020-07-02 2022-01-04 ***通信集团安徽有限公司 Method and device for calculating coverage area of base station signal and calculating equipment
CN114979995A (en) * 2022-05-23 2022-08-30 智慧足迹数据科技有限公司 Mobile phone signaling data simplifying method and device, electronic equipment and storage medium
CN117119387A (en) * 2023-10-25 2023-11-24 北京市智慧交通发展中心(北京市机动车调控管理事务中心) Method and device for constructing user travel chain based on mobile phone signaling data

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105142106A (en) * 2015-07-29 2015-12-09 西南交通大学 Traveler home-work location identification and trip chain depicting method based on mobile phone signaling data
CN107277765A (en) * 2017-05-12 2017-10-20 西南交通大学 A kind of mobile phone signaling track preprocess method based on cluster Outlier Analysis
CN107909098A (en) * 2017-11-09 2018-04-13 苏州大成电子科技有限公司 A kind of city dweller's anchor point computational methods based on big data
CN109104694A (en) * 2018-06-26 2018-12-28 重庆市交通规划研究院 A kind of user stop place discovery method and system based on mobile phone signaling
CN109492704A (en) * 2018-11-23 2019-03-19 济南浪潮高新科技投资发展有限公司 A kind of dynamic classifier chain method of adjustment for multiple labeling classification

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105142106A (en) * 2015-07-29 2015-12-09 西南交通大学 Traveler home-work location identification and trip chain depicting method based on mobile phone signaling data
CN107277765A (en) * 2017-05-12 2017-10-20 西南交通大学 A kind of mobile phone signaling track preprocess method based on cluster Outlier Analysis
CN107909098A (en) * 2017-11-09 2018-04-13 苏州大成电子科技有限公司 A kind of city dweller's anchor point computational methods based on big data
CN109104694A (en) * 2018-06-26 2018-12-28 重庆市交通规划研究院 A kind of user stop place discovery method and system based on mobile phone signaling
CN109492704A (en) * 2018-11-23 2019-03-19 济南浪潮高新科技投资发展有限公司 A kind of dynamic classifier chain method of adjustment for multiple labeling classification

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111132046A (en) * 2019-12-27 2020-05-08 江苏欣网视讯软件技术有限公司 Heuristic signaling data noise point filtering algorithm
CN113469600A (en) * 2020-03-31 2021-10-01 北京三快在线科技有限公司 Travel track segmentation method and device, storage medium and electronic equipment
CN113891378A (en) * 2020-07-02 2022-01-04 ***通信集团安徽有限公司 Method and device for calculating coverage area of base station signal and calculating equipment
CN113891378B (en) * 2020-07-02 2023-09-05 ***通信集团安徽有限公司 Method and device for calculating signal coverage of base station and calculating equipment
CN112351394A (en) * 2020-11-03 2021-02-09 崔毅 Traffic travel model construction method based on mobile phone signaling data
CN113347574A (en) * 2021-06-03 2021-09-03 中国联合网络通信集团有限公司 Method and device for determining permanent station
CN113347574B (en) * 2021-06-03 2023-04-07 中国联合网络通信集团有限公司 Method and device for determining ordinary station
CN114979995A (en) * 2022-05-23 2022-08-30 智慧足迹数据科技有限公司 Mobile phone signaling data simplifying method and device, electronic equipment and storage medium
CN117119387A (en) * 2023-10-25 2023-11-24 北京市智慧交通发展中心(北京市机动车调控管理事务中心) Method and device for constructing user travel chain based on mobile phone signaling data
CN117119387B (en) * 2023-10-25 2024-01-23 北京市智慧交通发展中心(北京市机动车调控管理事务中心) Method and device for constructing user travel chain based on mobile phone signaling data

Similar Documents

Publication Publication Date Title
CN110312206A (en) Based on the improved mobile phone signaling data trip recognition methods of dynamic space threshold value
CN106912015B (en) Personnel trip chain identification method based on mobile network data
CN108415975B (en) BDCH-DBSCAN-based taxi passenger carrying hot spot identification method
CN108959466B (en) Taxi passenger carrying hot spot visualization method and system based on BCS-DBSCAN
CN101442762B (en) Method and apparatus for analyzing network performance and locating network fault
CN103077604B (en) traffic sensor management method and system
CN109782763A (en) A kind of method for planning path for mobile robot under dynamic environment
CN107168342B (en) Pedestrian trajectory prediction method for robot path planning
CN110276966B (en) Intersection signal control time interval dividing method
CN104537836B (en) Link travel time distribution forecasting method
CN109729518B (en) Mobile phone signaling-based urban traffic early peak congestion source identification method
CN104240496B (en) A kind of determination method and apparatus of trip route
CN105991674A (en) Information push method and device
CN110119408B (en) Continuous query method for moving object under geospatial real-time streaming data
CN109284773A (en) Traffic trip endpoint recognition methods based on multilayer Agglomerative Hierarchical Clustering algorithm
CN106326923A (en) Sign-in position data clustering method in consideration of position repetition and density peak point
CN108108883B (en) Clustering algorithm-based vehicle scheduling network elastic simplification method
CN104376084B (en) Similarity of paths computational methods and device
CN104125582A (en) Method of planning communication network
CN108538054A (en) A kind of method and system obtaining traffic information based on mobile phone signaling data
CN103473420B (en) The automatic positioning method of statistical graph in a kind of statistical maps
CN115474206A (en) Real-time people number determination method and device, electronic equipment and storage medium
CN110933601B (en) Target area determination method, device, equipment and medium
CN109391946A (en) A kind of method and device of base station cluster planning
AU2021102429A4 (en) Method for selecting roads in a small-mesh accumulation area

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20191008

WD01 Invention patent application deemed withdrawn after publication