A kind of traffic journey characteristic data extraction method based on large data
Technical field
The present invention relates to a kind of traffic journey characteristic data extraction method, particularly relate to a kind of traffic journey characteristic data extraction method based on large data, belong to the analytical applications field of the large data of traffic.
Background technology
In recent years, along with the fast development of China's economy, urban infrastructure construction mushroom development, land use morphology change is also accelerated thereupon.Along with the transport facility of various advanced person and the application of various informationalized traffic administration means, traffic infrastructure and traffic operating mode are all promptly changing.In this case, obtained the method for Resident Trip Characteristics data by traditional folk houses trip survey, no matter in economy, or achievement accuracy, ageing on, all cannot meet the needs of New Times traffic programme and management.Therefore, in the urgent need to frequent, low cost, robotization the new technology of Resident Trip Characteristics data can be obtained.
Along with the fast development of accumulation and the large data technique day by day of the large data of traffic; make under the prerequisite of Reasonable Protection privacy; from multi-source data, merge acquisition resident trip data become possibility, how to utilize large data means to replace traditional investigation method to analyze the Main way that traffic journey characteristic becomes Recent study.
There has been certain exploration the domestic theoretical research aspect to carrying out traffic trip analysis based on data in mobile phone, floating car data etc. at present, wherein, the achievement in research that OD matrix data obtains is mainly contained: by mating of mobile phone and base station data thereof and road, the attribute of automatic identified region and base station, fast separates urban transportation community; Extract individual's trip track chain information by data in mobile phone, through expanding sample, the personnel obtaining certain area coverage travel frequently the matrix of departure place and destination; Carried out the displacement state of judgement sample by data in mobile phone, obtain starting point and the terminal of sample trip, classified statistics form travelling OD matrix data; Judged the dwell point of user by the mistiming and event type of uploading cellphone information, and then produce traffic attraction by the traffic of the travel amount between two continuously between dwell point calculating traffic zone and each community.
And in analysis to other Resident Trip Characteristics data, have by map datum, mobile phone location data, the vehicle data that floats are carried out organization and administration, set up the population trip characteristics spatial analytical model based on mobile phone location data and floating vehicle data, obtain population trip characteristics integrated information; Also have by mobile phone location data and traffic zone coupling, judge the trip record sheet of dwell point, trip distance, trip speed and all users, add up and obtain residence and place of working result table, thus obtain user's trip characteristics parameter.
There is following shortcoming in the method more than obtaining other traffic journey characteristic data such as travelling OD matrix data:
1) do not set up in conjunction with the feature of Various types of data the base unit that traffic trip analyzes, still adopt traditional traffic zone division methods, the traffic zone adopting data in mobile phone to analyze divides, consider not enough to the demand of traffic analysis, compare with traditional investigation method, with the obvious advantage on frequent, low cost, but availability of data aspect Shortcomings.
2) do not consider that the data in mobile phone of single source is not high in precision, screening error large etc. in deficiency, the Trip chain recognition result obtained is traffic zone level, cannot meet traffic trip analyze in the road traffic demand of distributing.
3) data in mobile phone can not directly correspond in the trip analysis of de facto population, needs the model treatment through correlation analysis, but is through simple mobile phone retention expansion sample, there is error larger.
4) according to the determination methods of place of working and residence, consider that the factor of real work and behavior in home life is few, such as, when judging residence, just set evening hours, within this time period, the ratio of the signaling number of certain base station cell is greater than 50%, for residence, although this method is simple to operation, the agenda of the event type in data in mobile phone and people is not connected, cannot further excacation ground and residence information.
Summary of the invention
For the defect that above-mentioned prior art exists, the invention discloses a kind of traffic journey characteristic data extraction method based on large data, adopting with data in mobile phone is core, and gps data, loop data, video data etc. are auxiliary, based on the Resident Trip Characteristics data capture method after data fusion.Wherein traffic trip analysis comprises: the duty that the division of traffic zone, the identification of trip route, OD analyze (origin and destination, each minizone travel amount, each community go on a journey total generation and traffic attraction) and traffic zone is lived than analyzing.Mainly comprise following step: step 1, the elementary cell that traffic trip is analyzed divides.Traffic zone division is carried out in conjunction with administrative division and base station geographic attribute etc.; Step 2, data sampling and processing, fusion.Comprise mobile phone signaling data (mobile phone sig data information comprises Customs Assigned Number, event type, base station numbering, latitude and longitude of base station coordinate, uplink time), gps data, collection based on data such as video, coil section flows, the information processing of data in mobile phone and the fusion of multi-source data; Step 3, the identification of traffic trip chain, analyzes through point and dwell point.Take data in mobile phone as core, optimize travel information by multi-source different pieces of information, identify traffic trip chain, and analyze through point and dwell point on this basis; Step 4, traffic OD analyzes.Data in mobile phone carries out the output of OD result in conjunction with demographic data etc.; Step 5, duty lives to analyze.Analyze traffic trip chains all in a week, in conjunction with switching on and shutting down data and communicating data, differentiate residence and place of working, and carry out duty firmly analysis.
Concrete technical scheme of the present invention is as follows:
Step 1, traffic zone divides
The traffic zone division methods that the present invention takes carries out traffic zone division in conjunction with administrative division and base station geographic attribute etc.Specific practice is as follows:
Step 1.1, carries out location matches with latitude and longitude of base station information and basic traffic geography information, is matched base station on road net;
Step 1.2, reads in the polygon geography information of the traffic zone based on administrative division divides from database;
Step 1.3, according to the geographic position relation of each traffic zone and base station, namely the relation of inclusion of polygon and point in plane, matches each base station in each traffic zone belonging to it, sets up the membership of base station and traffic zone;
Step 1.4, base station is in the situation at edge, traffic zone, area Commutation Law is adopted to determine the membership of base station and traffic zone, namely the area of base station, edge in each traffic zone and the ratio of its area coverage is determined, by cell enlargement maximum for base station, edge accounting, base station, edge is all incorporated into this community.
Step 2, data sampling and processing, fusion
The large data that the present invention adopts comprise mobile phone signaling data (mobile phone sig data information comprises Customs Assigned Number, event type, base station numbering, latitude and longitude of base station coordinate, uplink time), gps data, based on data such as video, coil section flows, the concrete method gathering, process and merge is as follows:
Step 2.1, collects the event type information data of database cellphone subscriber, and the event information of cellphone subscriber presses users classification screening;
Step 2.2, collects existing traffic dynamic, the static datas such as gps data, video data, coil section data on flows;
Step 2.3, filters for collected data in mobile phone, filters out the improper user data of event information exception and the user of " pingpang handoff " occurs due to the base station of longitude and latitude overlap and neighbor base station signal.In units of user, adopt the method for continuity point Distance Judgment, the base station of longitude and latitude overlap is merged, integrated causing multiple location point back and forth jumped between adjacent base station because of signal drift by setting threshold values simultaneously; Specifically, each user is calculated from location point 0 to the distance of front and back position point continuously between two, if distance is less than set threshold value, then a rear location point is merged into previous location point, base station corresponding for location point numbering is changed into the base station numbering of a upper location point, and a upper location point number of communications is added 1;
Step 2.4 is that unit rearranges event information to the normal users of having filtered, according to time sequence, extract its communication corresponding to base station geographic position data;
Step 2.5, carries out multisource data fusion management by the data in mobile phone put in order and gps data, video data, coil section data on flows.
Step 3, the identification of traffic trip chain, analyzes through point and dwell point
Traveler can pass through or rest on different locus in one day, and these position datas can be reflected by mobile phone location data.For any user, the location point passing through in a day or stop is divided into two classes (through point, dwell point), describes the state that user is residing thereon.The point of of short duration process in point is for traveler space moving process, dwell point refers to the point (point in the present invention, being greater than 1 hour the residence time is considered as user's dwell point, and is worked the departure place or destination that are considered as certain trip of traveler) that the traveler residence time is longer.By the process to mobile phone signaling data, the judgement of dwell point and traffic zone, place can be carried out, the intraday Trip chain of traveler can be obtained in conjunction with uplink time, by carrying out statistical study to the intraday all trips of all cellphone subscribers, the sunrise line number amount of two minizones that the traveler that can draw to hold mobile phone is sample, i.e. travelling OD, and go on a journey total generation and traffic attraction in each community.Specific practice is as follows:
Step 3.1, by the mobile phone location data of each normal users according to time-sequencing, obtains each normal users traffic trip chain of a day;
Step 3.2, for every bar traffic trip chain, dwell point is judged: to each normal users according to two mobile phone location data time differences often adjacent in Trip chain, according to traffic trip chain sequence, process two often adjacent mobile phone location data time successively poor, be greater than 1 hour when the mistiming, then judge that this position is dwell point;
Step 3.3, dwell point is judged: under standby status of mobile phone, same base station location reported primary information every 1-2 hour by signal type, this is period position renewal, and event type is that period position upgrades, because the point in the present invention, being greater than 1 hour the residence time is considered as user's dwell point.Therefore to the location point only once communicated continuously in chronological order, if it uploads event type is 1, then this point is also judged as user's dwell point;
Step 3.4, after the dwell point identified in traffic trip chain, other points remaining are just through a little.
Step 4: between traffic zone, origin and destination travel amount and community produce the calculating of traffic attraction
The all dwell points calculated are sorted according to time sequencing by step 4.1, two origin and destination, minizone trip gauge belonging to two continuous dwell points 1 time.The trip of last community produces gauge 1 time, a rear community trip attraction gauge 1 time, finally origin and destination, all user minizones trip quantity, community trip generation, community trip attraction amount are amounted to, the total trip data of origin and destination, the minizone travel amount that the traveler drawing to hold mobile phone is sample and each community.
Step 4.2, the traffic zone population of all traffic zones, mobile phone recoverable amount and mobile phone market occupation rate is adopted to calculate on the basis of expansion sample ratio, by gps data, based on video, coil section traffic flow data is static in interior existing Urban Transportation, dynamic data expands sample ratio calibration optimization, obtain accurately between traffic zone origin and destination travel amount and community produce traffic attraction.Expand sample proportion computing technology: traffic zone population/(mobile phone recoverable amount/mobile phone market occupation rate).
Step 5, residence and place of working differentiate
Step 5.1, based on all traffic trip chains after coupling, extracts certain user data of continuous month, adds up and judges between residence the number of times that period inherent each traffic zone occurs; The traffic zone that this occurrence number is maximum is the residence of this user;
Step 5.2, based on all traffic trip chains after coupling, extracts certain user continuous one month workaday data, adds up and judges between place of working the number of times that period inherent each traffic zone occurs; The traffic zone that this occurrence number is maximum is the place of working of this user;
Step 5.3, in conjunction with the usual behavior of people's real life, i.e. night's rest time, shutdown behavior is more common; Work by day the time, call behavior should be more.Therefore, when judging residence, the number of signaling of shutting down in mobile phone signaling and the signaling number of other types are weighted, the ratio that occurs in same place data at night one week is greater than 50%, is residence; When judging place of working, then the call number of signaling and the signaling number of other types are weighted, in same place data on daytime one week, the ratio of appearance is greater than 50%, is place of working.Thus improve place of working and the residence identification accuracy of system.
Weighted calculation formula is:
Wherein, b represents the ratio appearing at certain traffic zone in a week, and A represents the weight of certain mobile phone signaling event type, and N represents the event type of mobile phone signaling data.
Accompanying drawing explanation
Fig. 1 is the acquisition methods process flow diagram of Resident Trip Characteristics data of the present invention
Fig. 2 is that traffic zone of the present invention divides process flow diagram
Fig. 3 is data in mobile phone pretreatment process figure of the present invention
Fig. 4 is traffic trip chain identification process figure of the present invention
Fig. 5 is travel amount between traffic zone of the present invention (OD matrix data) calculation flow chart
Fig. 6 is residence of the present invention, place of working decision flowchart
Specific embodiments
Below in conjunction with accompanying drawing, feature of the present invention and other correlated characteristic are described in further detail.
Step 1, traffic zone divides
The traffic zone division methods that the present invention takes carries out traffic zone division in conjunction with administrative division and base station geographic attribute etc.Specific practice is as follows:
Step 1.1, carries out location matches with latitude and longitude of base station information and basic traffic geography information, is matched base station on road net;
Step 1.2, reads in the polygon geography information of the traffic zone based on administrative division divides from database;
Step 1.3, according to the geographic position relation of each traffic zone and base station, namely the relation of inclusion of polygon and point in plane, matches each base station in each traffic zone belonging to it, sets up the membership of base station and traffic zone;
Step 1.4, base station is in the situation at edge, traffic zone, area Commutation Law is adopted to determine the membership of base station and traffic zone, namely the area of base station, edge in each traffic zone and the ratio of its area coverage is determined, by cell enlargement maximum for base station, edge accounting, base station, edge is all incorporated into this community.
Step 2, data sampling and processing, fusion
The large data that the present invention adopts comprise mobile phone signaling data (mobile phone sig data information comprises Customs Assigned Number, event type, base station numbering, latitude and longitude of base station coordinate, uplink time), gps data, based on data such as video, coil section flows, the concrete method gathering, process and merge is as follows:
Step 2.1, collects the event type information data of database cellphone subscriber, and the event information of cellphone subscriber presses users classification screening;
Step 2.2, collects existing traffic dynamic, the static datas such as gps data, video data, coil section data on flows;
Step 2.3, filters for collected data in mobile phone, filters out the improper user data of event information exception and the user of " pingpang handoff " occurs due to the base station of longitude and latitude overlap and neighbor base station signal.In units of user, adopt the method for continuity point Distance Judgment, the base station of longitude and latitude overlap is merged, integrated causing multiple location point back and forth jumped between adjacent base station because of signal drift by setting threshold values simultaneously; Specifically, each user is calculated from location point 0 to the distance of front and back position point continuously between two, if distance is less than set threshold value, then a rear location point is merged into previous location point, base station corresponding for location point numbering is changed into the base station numbering of a upper location point, and a upper location point number of communications is added 1;
Step 2.4 is that unit rearranges event information to the normal users of having filtered, according to time sequence, extract its communication corresponding to base station geographic position data;
Step 2.5, carries out multisource data fusion management by the data in mobile phone put in order and gps data, video data, coil section data on flows.
Step 3, the identification of traffic trip chain, analyzes through point and dwell point
Traveler can pass through or rest on different locus in one day, and these position datas can be reflected by mobile phone location data.For any user, the location point passing through in a day or stop is divided into two classes (through point, dwell point), describes the state that user is residing thereon.The point of of short duration process in point is for traveler space moving process, dwell point refers to the point (point in the present invention, being greater than 1 hour the residence time is considered as user's dwell point, and is worked the departure place or destination that are considered as certain trip of traveler) that the traveler residence time is longer.By the process to mobile phone signaling data, the judgement of dwell point and traffic zone, place can be carried out, the intraday Trip chain of traveler can be obtained in conjunction with uplink time, by carrying out statistical study to the intraday all trips of all cellphone subscribers, the sunrise line number amount of two minizones that the traveler that can draw to hold mobile phone is sample, i.e. travelling OD, and go on a journey total generation and traffic attraction in each community.Specific practice is as follows:
Step 3.1, by the mobile phone location data of each normal users according to time-sequencing, obtains each normal users traffic trip chain of a day;
Step 3.2, for every bar traffic trip chain, dwell point is judged: to each normal users according to two mobile phone location data time differences often adjacent in Trip chain, according to traffic trip chain sequence, process two often adjacent mobile phone location data time successively poor, be greater than 1 hour when the mistiming, then judge that this position is dwell point;
Step 3.3, dwell point is judged: under standby status of mobile phone, same base station location reported primary information every 1-2 hour by signal type, this is period position renewal, and event type is that period position upgrades, because the point in the present invention, being greater than 1 hour the residence time is considered as user's dwell point.Therefore to the location point only once communicated continuously in chronological order, if it uploads event type is 1, then this point is also judged as user's dwell point;
Step 3.4, after the dwell point identified in traffic trip chain, other points remaining are just through a little.
Step 4: between traffic zone, origin and destination travel amount and community produce the calculating of traffic attraction
The all dwell points calculated are sorted according to time sequencing by step 4.1, two origin and destination, minizone trip gauge belonging to two continuous dwell points 1 time.The trip of last community produces gauge 1 time, a rear community trip attraction gauge 1 time, finally origin and destination, all user minizones trip quantity, community trip generation, community trip attraction amount are amounted to, the total trip data of origin and destination, the minizone travel amount that the traveler drawing to hold mobile phone is sample and each community.
Step 4.2, the traffic zone population of all traffic zones, mobile phone recoverable amount and mobile phone market occupation rate is adopted to calculate on the basis of expansion sample ratio, by gps data, based on video, coil section traffic flow data is static in interior existing Urban Transportation, dynamic data expands sample ratio calibration optimization, obtain accurately between traffic zone origin and destination travel amount and community produce traffic attraction.Expand sample proportion computing technology: traffic zone population/(mobile phone recoverable amount/mobile phone market occupation rate).
Step 5, residence and place of working differentiate
Step 5.1, based on all traffic trip chains after coupling, extracts certain user data of continuous month, adds up and judges between residence the number of times that period inherent each traffic zone occurs; The traffic zone that this occurrence number is maximum is the residence of this user;
Step 5.2, based on all traffic trip chains after coupling, extracts certain user continuous one month workaday data, adds up and judges between place of working the number of times that period inherent each traffic zone occurs; The traffic zone that this occurrence number is maximum is the place of working of this user;
Step 5.3, in conjunction with the usual behavior of people's real life, i.e. night's rest time, shutdown behavior is more common; Work by day the time, call behavior should be more.Therefore, when judging residence, the number of signaling of shutting down in mobile phone signaling and the signaling number of other types are weighted, the ratio that occurs in same place data at night one week is greater than 50%, is residence; When judging place of working, then the call number of signaling and the signaling number of other types are weighted, in same place data on daytime one week, the ratio of appearance is greater than 50%, is place of working.Thus improve place of working and the residence identification accuracy of system.
Weighted calculation formula is:
Wherein, b represents the ratio appearing at certain traffic zone in a week, and A represents the weight of certain mobile phone signaling event type, and N represents the event type of mobile phone signaling data.