CN109035770B - Real-time analysis and prediction method for bus passenger capacity in big data environment - Google Patents

Real-time analysis and prediction method for bus passenger capacity in big data environment Download PDF

Info

Publication number
CN109035770B
CN109035770B CN201810860244.5A CN201810860244A CN109035770B CN 109035770 B CN109035770 B CN 109035770B CN 201810860244 A CN201810860244 A CN 201810860244A CN 109035770 B CN109035770 B CN 109035770B
Authority
CN
China
Prior art keywords
bus
getting
card swiping
station
icid
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810860244.5A
Other languages
Chinese (zh)
Other versions
CN109035770A (en
Inventor
张颖
顾高翔
刘杰
吴佳玲
郭鹏
朱万明
宫龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHANGHAI SHIMAI INFORMATION TECHNOLOGY CO LTD
Original Assignee
SHANGHAI SHIMAI INFORMATION TECHNOLOGY CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHANGHAI SHIMAI INFORMATION TECHNOLOGY CO LTD filed Critical SHANGHAI SHIMAI INFORMATION TECHNOLOGY CO LTD
Priority to CN201810860244.5A priority Critical patent/CN109035770B/en
Publication of CN109035770A publication Critical patent/CN109035770A/en
Application granted granted Critical
Publication of CN109035770B publication Critical patent/CN109035770B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/01Detecting movement of traffic to be counted or controlled
    • G08G1/0104Measuring and analyzing of parameters relative to traffic conditions

Landscapes

  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Traffic Control Systems (AREA)
  • Devices For Checking Fares Or Tickets At Control Points (AREA)

Abstract

The method for analyzing and predicting the passenger capacity of the bus in real time in the big data environment extracts the space-time trajectory of the bus by using the bus-mounted GPS data, and segments the bus route in a time-sharing manner to obtain road sections where the bus is located at different moments; acquiring time data of a user taking a bus by using bus swiping card records, and acquiring station data after the time data is matched with GPS data of the bus, so as to extract travel behavior characteristics of individuals taking the bus; spatial clustering is carried out on stops of the bus lines, possible transfer points are combined, and the situation that transfer information cannot be extracted due to different stop names is prevented; according to the card swiping modes of different bus lines, the probability of taking a certain bus line by an individual in a certain time period and the probability distribution of getting off at each station along the line after getting on the bus are calculated in a statistical manner; and (4) statistically dividing the historical data of the passenger carrying condition of the buses in each bus line of the road section, and predicting the historical data.

Description

Real-time analysis and prediction method for bus passenger capacity in big data environment
Technical Field
The invention relates to a time-interval-road-section bus passenger volume prediction method for recording data based on a mass encrypted bus code scanning or card swiping (hereinafter, all bus taking payment modes such as IC card swiping, intelligent terminal two-dimensional code scanning, NFC card swiping and the like are called card swiping), which comprises the steps of obtaining the real-time spatial position of a bus according to GPS data of the bus, extracting the basic behavior characteristics of an individual bus taking trip according to the bus card swiping record, constructing a probability distribution matrix of the individual bus taking a certain bus line in a time-interval mode, predicting a getting-off station of the individual bus taking a certain bus line, thus obtaining the possible O-D distribution of the individual bus taking, obtaining the passenger capacity and the overload condition of the bus taking the time-interval-road-section, and providing data support for the scheduling and optimization of the bus line.
Background
With the acceleration of the urbanization process, public transport is one of the most important choices for daily trips of urban residents as an important transport means between maintenance nodes in an urban hub-network structure, and is also an important guarantee for maintaining normal daily operation of the city. With the continuous expansion of the built-up area of cities, public transportation systems become more and more complex, the pressure of passenger flow borne by the public transportation systems is higher and higher, and the pressure changes obviously along with the time. In one day, the peak time of morning and evening is the peak of the passenger flow pressure of the bus system, and the passenger flow pressure in the rest time periods is relatively small; in the long term, along with the continuous expansion of cities, the original bus lines with fewer passengers are also doubled in pressure along with the migration of a large number of people. Therefore, short-term and long-term changes in bus passenger flow demand require reasonable configuration and optimization of bus routes and time-divided shifts, which requires a real-time monitoring and accurate prediction of bus passenger flow demand based thereon.
In recent years, with the development of information technology, the data information amount is increased explosively, the data sources are more and more, and the data amount is also more and more huge. Data recorded by information sensors such as mobile phones, WIFI, the Internet of things, GPS and IC cards become the most important data source in big data analysis, and relatively complete individual trip records of the data become big data, especially traffic big data, and good data support is provided for analysis. Taking bus code scanning or an IC card as an example, the records of passengers getting on and off the bus form a series of data sets of the travel of the user by adopting the bus, and an important data source is provided for the generation of bus passenger flow and the extraction of time-division and local feature changes of the bus passenger flow.
Disclosure of Invention
The purpose of the invention is: obtaining the passenger capacity and overload condition of buses in a score period-road section through massive encrypted bus code scanning or card swiping data
In order to achieve the purpose, the general technical scheme of the invention is as follows: extracting a space-time trajectory of the bus by using bus-mounted GPS data, and segmenting a bus route in a time-sharing manner to obtain road sections where the bus is located at different moments; acquiring time data of a user taking a bus by using bus swiping card records, and acquiring station data after the time data is matched with GPS data of the bus, so as to extract travel behavior characteristics of individuals taking the bus; spatial clustering is carried out on stops of the bus lines, possible transfer points are combined, and the situation that transfer information cannot be extracted due to different stop names is prevented; according to the card swiping modes of different bus lines, the probability of taking a certain bus line by an individual in a certain time period and the probability distribution of getting off at each station along the line after getting on the bus are calculated in a statistical manner; and (4) statistically dividing the historical data of the passenger carrying condition of the buses in each bus line of the road section, and predicting the historical data.
Specifically, the invention provides a real-time analysis and prediction method for bus passenger capacity in a big data environment, which is characterized by comprising the following steps:
step 1, bus GPS data which are continuous in TIME and space are obtained, the bus GPS data at least comprise bus line numbers LID, bus numbers BID, communication action occurrence TIME TIME1, longitude positions Long of buses and latitude positions Lat of the buses, different buses correspond to different bus line numbers LID and bus numbers BID, GPS data of the buses corresponding to each bus number BID in a specified TIME period are extracted, and a bus travel TIME-space sequence of each bus is formed;
step 2, obtaining anonymous encrypted bus card swiping record data, generating a piece of bus card swiping record data every TIME a card is swiped in a TIME sequence, wherein each piece of bus card swiping record data at least comprises an IC number ICID, a bus line number LID, a bus number BID, a communication action occurrence TIME TIME2, an on-off TYPE TYPE and a bus COST COST, different code scanning terminals or IC cards correspond to different IC number ICIDs, different IC number ICIDs correspond to different individuals, extracting bus card swiping records of each IC number ICID in a specified TIME period, and forming card swiping data sets of the individuals corresponding to the different IC number ICIDs;
step 3, acquiring bus stops of all bus lines in a specified spatial range, clustering all bus stops by adopting a spatial clustering algorithm, combining bus stops close to spatial positions as possible transfer nodes, adjusting bus stops in all bus lines, and replacing the bus stops before clustering with the clustered bus stops so as to extract the transfer behaviors of subsequent individuals;
step 4, sorting the card swiping data sets of all individuals obtained in the step 2 to obtain information of all getting-on and getting-off stations of all individuals to form complete bus card swiping TIME sequence data of all individuals, and for current bus card swiping record data in the card swiping data set corresponding to the current individual, obtaining the spatial position of the TIME2 bus at the current communication action occurrence moment, so as to obtain information of the getting-on stations of the current bus card swiping record data, and adding the information into the current bus card swiping record data;
and 5, dividing the bus card swiping modes into three types: the method comprises the following steps of all swiping cards of getting-on and getting-off vehicles, swiping cards in a segmented mode after getting-on and getting-off vehicles and swiping cards in unit price after getting-on, analyzing and judging the time and the place of getting-on and getting-off vehicles of individuals respectively according to different swiping card modes:
step 5.1, traversing all the complete bus card swiping time sequence data of the passengers corresponding to the IC serial number ICID in sequence, reading the bus card swiping record data, and splitting the card swiping time sequence according to the bus line serial number LID;
step 5.2, judging the card swiping mode of each bus line in each IC number ICID, adopting different methods to count the getting-on/off information of an individual on the current bus line according to different card swiping modes, and deducing whether the getting-off point can be calculated, wherein the method comprises the following steps:
step 5.2.1, reading the getting-on stop information, the bus line number LID and the getting-on TIME of each bus taking aiming at the current IC number ICID, namely the TIME2 of the occurrence moment of the communication action;
step 5.2.2, estimating getting-off information of the current IC serial number ICID in each bus taking by adopting different methods according to different card swiping modes of different bus lines;
if the card swiping mode of the current bus line is that the card swiping is needed to get on or off the bus, the card swiping records related to the current bus line in the bus code scanning terminal or the IC card all contain getting-off information of the bus taking the current bus line each time, so that all known getting-off point records of the bus taking the current bus line are obtained aiming at the current IC number ICID, and the method comprises the following steps: if card swiping record data R1 exists on the current bus route, and the corresponding bus stop S1 is judged to be a getting-on stop according to the getting-on and getting-off TYPE TYPE in the card swiping record data R1, whether the getting-on and getting-off TYPE TYPE in the next card swiping record data R2 of the card swiping record data R1 is a getting-off stop is judged, if yes, the bus stop S2 corresponding to the card swiping record data R2 is a getting-off stop, the card swiping record data R1 and the card swiping record data R2 are combined and marked as known getting-off point records, if the getting-on and getting-off TYPE TYPE in the card swiping record data R2 is a getting-on stop, it is indicated that the current IC number ICID does not have getting-off card swiping in the previous trip, the card swiping record data R1 is abandoned, and the next card swiping record data is obtained again for judgment;
if the current card swiping mode of the bus line is the card swiping mode of getting on the bus and the sectional charging is carried out, all records of IC number ICID which can calculate the getting-off point and records which can not calculate the getting-off point are obtained, and the method comprises the following steps:
if the current IC number ICID has card-swiping record data R3 on the current bus during the time period T1, and the corresponding bus stop is S3, then:
step 5.2.2.1, calculating the possible get-off station interval of the passenger according to the bus taking COST of the card swiping record data R3;
step 5.2.2.2, reading next card swiping record data R4 of the card swiping record data R3, if an initial bus stop S4 corresponding to the card swiping record data R4 is in the get-off stop interval calculated in step 5.2.2.1, considering that the get-off stop of the bus with the current IC number ICID in the current bus route is S4, and recording the card swiping record data R3 as a record capable of calculating the get-off stop; if the starting bus stop S4 corresponding to the card swiping record data R4 is not in the get-off stop section calculated in the step 5.2.2.1, recording the card swiping record data R3 as a record that the get-off stop cannot be calculated;
if the card swiping mode of the current bus line is the card swiping mode of getting on the bus and unified charging is carried out, all records of IC number ICID which can calculate the getting off point and records which can not calculate the getting off point are obtained, and the method comprises the following steps:
if the current IC number ICID has the card swiping record data R5 on the current bus during the time period T1, the bus station on the bus corresponding to the card swiping record data R5 is S5, and the next card swiping record data R6 of the card swiping record data R5 is read:
if the starting bus stop S6 in the card swiping record data R6 is located in an along-line stop of the current bus line after the departure from the card swiping record data R5, and the difference value between the TIME TIME2 when the communication action of the card swiping record data R6 occurs and the TIME from the current bus line to the starting bus stop S6 is within the threshold value range T _ Thrh, the current IC number ICID is considered to get off at the starting bus stop S6, and meanwhile, the card swiping record data R5 is recorded as a record of the derivable getting-off point;
if the starting bus stop S6 in the card swiping record data R6 is not in the station along the bus after the current bus route departs from the card swiping record data R5, recording the card swiping record data R6 as a record of the point where the bus stop can not be calculated;
counting the frequency of the current IC serial number ICID at all the getting-on points in the current bus line, and taking the frequency as the basis of the statistics of the non-calculable getting-off points;
step 6, counting the probability of taking each bus line in each time period of each day of a week and the probability of getting off the bus at each station along the way after the bus is loaded by the individual aiming at the individual corresponding to each IC number ICID, and obtaining the probability distribution of the travel O-D road section;
step 7, acquiring real-time bus card swiping data from a data source and GPS data of each bus on each bus line, mining the bus taking condition of potential passengers on each bus line, calculating the possible real-time passenger carrying capacity and the crowding degree of each bus in each bus line according to the probability distribution of the travel O-D road section acquired in the step 6, and predicting the passenger carrying demand and the crowding degree of the buses in a specified time period in the future;
preferably, in the step 1, the GPS data of one bus is a time-space track record, and all time-space track records of each bus in a specified time period are inquired according to the bus line number LID and the bus number BID of the bus, and the longitude and latitude in the time-space track records are converted into geographic coordinates, thereby constructing a bus travel time-space sequence.
Preferably, in the step 2, the bus card swiping record data further comprises a bus COST COST.
Preferably, the step 3 comprises:
step 3.1, acquiring bus stops of all bus lines in a specified spatial range and position information of each bus stop, converting the position information into XY coordinates, and mapping the XY coordinates into a geographic space with the traffic lines;
3.2, clustering the bus stops by adopting a spatial clustering method and taking the traffic distance between the bus stops as a standard, and combining the bus stops which are very close to each other in spatial distance, wherein the method comprises the following steps:
step 3.2.1, setting a clustering standard that the distance between two bus stops is less than d meters;
step 3.2.2, taking each bus stop as a clustering core to obtain spatial clustering: searching peripheral bus stops by taking the space position of the current clustering core as the center of a circle, and if the bus stops with the traffic distance smaller than d meters exist, placing the bus stops into the clustering space of the current clustering core;
step 3.2.3, merging the spatial clusters obtained in the step 3.2.2 to form a larger spatial cluster of the bus stop which is relatively independent in space, wherein the merging condition is as follows: if any two spatial clusters have the same bus stop, merging the two current spatial clusters;
step 3.2.4, extracting a space center of each bus stop space cluster, mapping the space center to a map, obtaining the space position and the geographical name of the current space center, merging the bus stops in each bus stop space cluster, naming the merged bus stop by the geographical name of the space center of each bus stop space cluster, wherein the XY coordinate of the current space center is the average value of the XY coordinates of all bus stops in the corresponding bus stop space cluster;
and 3.3, rearranging each bus line, and replacing the bus stop before clustering with the clustered bus stop so as to extract the transfer behavior of the subsequent individuals.
Preferably, in the step 4, the obtaining of the getting-on station information of the current bus card swiping record data in the card swiping dataset corresponding to the current individual includes the following steps:
step 4.1, generating Thiessen polygons for the bus stations which are subjected to spatial clustering in the designated spatial range according to a road traffic network, and dividing the spatial range of each bus station;
step 4.2, according to the communication action occurrence TIME TIME2, the bus line number LID and the bus number BID in the bus card swiping record data, reading the position information X-IC and Y-IC of the position of the bus at the communication action occurrence TIME TIME2 from the GPS data of the bus corresponding to the bus line number LID and the bus number BID;
and 4.3, mapping the position information X-IC and the position information Y-IC obtained in the step 4.2 into the space range of the bus station generated in the step 4.1, and obtaining the bus station where the TIME2 bus is located at the moment of occurrence of the communication action, so as to obtain the information of the bus station where the current bus card is swiped and the data is recorded.
Preferably, the step 6 comprises:
step 6.1, counting the times of taking the bus line L in a specific time period T in a specified time period for each IC number ICID, and setting a bus code scanning terminal or an IC card with an IC number ICID number of IC1, wherein the probability that the bus line L is taken in a bus station S in the time period T is P _ U (T, S, L, IC1), namely N _ U (T, S, L, IC1)/N _ Day, wherein: n _ U (T, S, L, IC1) is the number of times the IC1 gets on the bus line L at the bus stop S within a time period T of each Day for a specified time period, N _ Day is the number of days N _ Day within the time period T;
6.2, on the basis of the estimation of the bus stop for each bus taking record obtained in the step 5, counting the bus stop probability of the individual at each bus stop by adopting different methods according to different card swiping modes of each bus route;
if the card swiping mode of the bus line L is to swipe cards for getting on and off the bus, counting the number of times of getting off at each station along the way by the IC1 under the condition that the IC is getting on the bus at the bus station S1 within the time period T1 within the time period T, and after the IC1 gets on the bus at the bus station S1 within the time period T1 and takes the bus line L, the probability of getting off at the bus station S2 is N _ D (S1, S2, L, IC1)/N _ U (T1, S1, L, IC1), wherein N _ U (T1, S1, L, IC1) is the number of times of taking the bus line L by the IC1 within the time period T1 at the bus station S1 within the time period T1;
if the card swiping mode of the bus line L is the card swiping mode of getting on the bus and the charging is carried out in a segmented mode, whether the frequency of the getting off point can be calculated according to the card swiping record of the getting off point or not is separately counted: for the record of the non-calculable getting-off point, it is assumed that an IC number ICID is generally continuous when taking a bus, that is, the getting-off point is the getting-off point of another trip, so the frequency of the occurrence of the current IC number ICID at all the getting-on points in the bus line L is counted as the basis of the non-calculable getting-off point:
for the record of the predictable get-off point, counting the frequency N _ D (S3, S4, D, L, IC1) of getting-off of the IC1 from the bus station S3 to the bus station S4 along the bus station S4 after getting-on the bus line L in the time period T1, the probability that the IC1 gets-off from the bus station S3 to the bus station S4 in the time period T1 is N _ D (S3, S4, D, L, IC1)/N _ U (T1, S3, L, IC1), where: n _ U (T1, S3, L, IC1) is the number of times that the IC1 gets on the bus line L at the bus stop S3 within the time period T1 within the specified time period;
for the record of the un-calculable departure point, counting the frequency of getting-on N _ U (S4, L, IC1) at the bus stop S4 in the history record of taking the IC1 by the bus line L, and then the probability that the IC1 gets off the bus stop S4 on the remaining line after the bus stop S3 gets on the bus line L in the time period T1 is N _ U (S4, L, IC1)/sum (N _ U (SN, L, IC1)), wherein SN is the station set of the IC1 on the remaining path of the bus line L after getting-on at the bus stop S3, and sum (N _ U (SN, L, IC1)) is the frequency of getting-on the station of the IC1 on the remaining path after getting-on at the bus stop S3 and is used for replacing the frequency of getting-off;
if the card swiping mode of the bus line L is the card swiping mode of the bus on the upper bus and the unified charging is carried out, whether the frequency of the bus leaving point is calculated according to the bus leaving point recorded by the card swiping mode can be calculated and needs to be separately counted:
for the record of the predictable getting-off point, counting the frequency N _ D (S5, S6, L, IC1) of getting-off at the bus stop S6 along the IC1 after getting-on from the bus stop S5 and taking the bus line L in the time period T1, the probability that the IC1 gets-off from the bus stop S5 and taking the bus line L to the bus stop S6 in the time period T1 is N _ D (S5, S6, L, IC1)/N _ U (T1, S5, L, IC1), wherein: n _ U (T1, S5, L, IC1) is the number of times that the IC1 gets on the bus line L at the bus stop S5 within the time period T1 within the specified time period;
for the record of the non-predictable getting-off point, counting the getting-on frequency N _ S6_ U _ IC1 of the IC1 at the bus stop S6 in the history of taking the bus route L, and then after the IC1 gets on the bus route L at the bus stop S5 in the time period T1, the probability of getting-off at the bus stop S6 on the remaining route is N _ U (S6, L, IC1)/sum (N _ U (SN, L, IC1)), where SN is the station set of the IC1 on the remaining route of the bus route L after getting on the bus stop S5, and sum (N _ U (SN, L3, IC1)) is the frequency of getting-on of the IC1 on the remaining route after getting on the bus stop S5, and is used for replacing the getting-off frequency;
step 6.3, according to statistics of the getting-off probability of each stop in the card swiping records of the deductible getting-off point and the non-deductible getting-off point in the three card swiping modes obtained in the step 6.2, the probability that the IC1 gets off the bus at the bus stop SS after getting on the bus at the bus stop S in the time period T1 is obtained in a record number weighting mode:
for the bus route where the card is swiped for getting on and off, all the effective records have complete information of getting on and off, so that the probability P _ D (T1, S, SS, L, IC1) that the final IC1 gets off at the bus stop SS after getting on the bus at the bus stop S in the time period T1 is obtained in step 6.2.1;
for the bus route with only the card swiping function, card swiping records of the bus route are judged to be two types of inferable and inferable points, the numbers of records of an inferable point and an inferable point, namely N _ C (T1, S, L, IC1) and N _ NC (T1, S, L, IC1), the total number of records N (T1, S, L, IC1), N _ C (T1, S, L, IC1) + N _ NC (T1, S, L, IC1), the probability P _ D (T1, S, SS, L, IC1) of getting off the bus route L at the bus station SS after getting on the bus station S in a time period T1, N _ D (T1, S, SS, L, IC1)/N _ U (T1, S, L, IC1), N _ C (T1, S573) and the total number of the records are respectively counted, IC1)/sum (N _ U (SN, L, IC1)) × N _ NC (T1, S, L, IC1)/N (T1, S, L, IC 1);
step 6.4, traversing all IC number ICIDs, and acquiring the probability that each corresponding individual gets on the bus line L at the station S of the bus station in the time period T1 and the probability of getting off at each bus station along the way; calculating the probability of each individual taking all bus lines at the bus station S in the time period T1 and the probability of getting off at each bus station along the way; calculating the probability of each individual taking all bus lines at all bus stops in the time period T1 and the probability of getting off at each bus stop along the way; and finally, calculating the probability of taking all bus lines at all bus stops of each individual in all time periods of one day and the probability of getting off at each bus stop along the way.
Preferably, the step 7 includes:
step 7.1, obtaining real-time bus card swiping data with a time interval of TM and bus GPS data from a data source, sorting card swiping record data, and sorting according to a bus line number LID and an IC number ICID;
step 7.2, reading the last card swiping record data of each IC number ICID, obtaining the bus line number LID, the bus number BID and the getting-on and getting-off TYPE TYPE taken by the current IC number ICID, obtaining the getting-on stop information of the current IC number ICID through the bus GPS data according to the method of the step 4, and then judging the riding condition of the current IC number ICID at the current time node according to different bus line card swiping modes:
if the current bus line is that cards are all swiped for getting on and off the bus, and the last card swiping record data in the current IC number ICID is that the bus is not taken at present, the expectation that the bus taking the current bus line is 0 in real time is considered;
if the current bus line is that cards are all swiped for getting on and off the bus, and the last card swiping record data in the current IC number ICID is that the bus is on the bus, the current IC number ICID is considered to be that the bus in the current bus line is taken currently, and the expectation that the bus in the current bus line is taken in real time is 1;
if the current bus line is a bus-in card swiping mode, obtaining the current bus station position of the current bus according to GPS data according to the probability of getting off at each station after each ICID gets on the bus station SL and takes the bus of the current bus line in the TIME period of the bus TIME of the current ICID, namely the TIME period of the occurrence TIME TIME2 of the communication action, obtained in the step 6, calculating the probability that the current IC number ICID still does not get off at the current bus station SN, wherein the probability value is 1 minus the probability that the current IC number ICID gets off at the bus station SM between the bus station SL and the bus station SN, and the sum PT (TR, SN, L, ICID) is 1-sum (P _ D (TU, SL, SM, L, ICID)), wherein the TR represents real TIME, and the expectation of the bus taking the current IC number ICID in the bus line in real TIME is E (TR, L, ICID) ═ PT (TR, SN, L, ICID);
7.3, counting the expectations that all the last card swiping behaviors occur in the current IC number ICID of the current bus for taking the current bus in real time aiming at each bus, wherein the sum of the expectations is the expectation of the number of people for taking the current bus in real time;
step 7.4, predicting the passenger carrying demand expectation of each bus route between stops in the specified time period TPJ backward from real time according to the probability of taking the bus at each time period and each stop of the IC number ICID obtained in the step 6, and comprising the following steps:
step 7.4.1, according to the GPS data of the buses and the departure arrangement thereof, firstly predicting the departure condition of each bus route in the TPJ time period and the time interval of each bus stopping at each bus station;
7.4.2, traversing all the IC number ICIDs for each bus, and searching the IC number ICIDs with the bus-entering records in each station on the subsequent course in the specified time period TPJ;
step 7.4.3, aiming at each IC number ICID obtained in the step 7.4.2, calculating the expectation of taking the bus at the bus station S in the time period T of each IC number ICID according to the probability P _ U (T, S, L, ICID) of getting on the bus at the bus station S in the time period T, and calculating the expectation of getting off the bus at the bus station S in the time period T of each passenger possibly taking the current bus according to the probability P _ D _ prj (TU, SL, S, L, ICID) of getting off the bus at the bus station S of the passenger already in the bus; in the prediction of the passenger capacity, whether the predicted IC number ICID gets on the bus is probability distribution, which is different from real-time loading statistics, so that the getting-on behavior is the predicted getting-off probability of the passenger at the bus station S, which is the product of the probability of the passenger getting on the bus station SL and the probability of getting off the bus at the bus station S: p _ D _ prj (TU, SL, S, L, ICID) ═ P _ U (TU, SL, L, ICID) × P _ D (TU, SL, S, L, ICID), and the desire for the current IC number ICID to get off at the bus station S is E _ D (T, S, L, ICID) ═ P _ D _ prj (TU, SL, S, L, ICID);
7.4.4, traversing each IC number ICID, summing expectations of the ICID for getting on the bus at the bus station S in the time period T, namely the expected getting-on amount of the current bus at the bus station S, summing expectations of each IC number ICID for getting off the bus at the bus station S, namely the expected getting-off amount of the current bus at the bus station S, and subtracting the two amounts to obtain the expected passenger carrying amount of the current bus in the predicted time period T;
and 7.5, acquiring the number data of the designed load of each bus, comparing the number data with the expected passenger capacity of each bus in the time period, and acquiring the real-time expected congestion degree statistics and prediction results of each bus in the time period.
The method comprises the steps of processing GPS data of the bus, acquiring a time-space track of daily running of the bus, processing and screening bus IC card swiping record big data, constructing a card swiping data set of the bus taken by an individual through a bus code scanning terminal held by the individual or a card swiping record between an IC card and the bus, processing bus stop space information, merging bus stops close to a space position by adopting a space clustering method, and determining a space service range of each bus stop by dividing Thiessen polygons; synchronizing GPS data of the bus with bus card swiping behaviors, acquiring a geographic spatial position of a bus code scanning terminal or an IC card swiping behavior when the bus card swiping behavior occurs, and mining bus stop information when the card swiping behavior occurs; traversing the historical card swiping records of each bus code scanning terminal or IC card in sequence, dividing the bus routes into three modes of all card swiping for getting on and off the bus, segmented charging for getting on the bus and unified charging for getting on the bus by swiping the card, calculating the O-D points of passengers getting on and off the bus on each bus route by adopting different methods for different types of bus routes, and calculating the probability of getting on the bus at each station in each time period and the probability of getting off the bus at each station along the way after getting on the bus by using the O-D points as the basis of the O-D points; on the basis of obtaining real-time bus card swiping records, calculating the real-time passenger carrying number of each bus by using the probability distribution of each bus code scanning terminal or the holder of the IC card on each bus line in time intervals; and predicting the number of passengers and the crowding degree of each bus in a designated time period in the future by using the probability that the holder of each bus code scanning terminal or IC card takes each bus line at each station in different time periods and the probability that the holder gets off the bus at each station after getting on the bus.
The invention has the advantages that: the bus card swiping record and the GPS data of the bus are fully relied on, the time and place information of each card swiping record can be obtained in a low-cost, automatic and convenient mode, a complete time sequence data set of the bus line taken by the individual is constructed and processed, the probability distribution of taking each bus line by the individual by time and by station and the probability distribution of getting off at each station along the way after getting on the bus are obtained, and therefore the number of passengers and the crowdedness of the bus in real time and in the future are calculated and predicted conveniently and efficiently.
Drawings
Fig. 1A and 1B are flow charts of the present invention.
Detailed Description
In order to make the invention more comprehensible, preferred embodiments are described in detail below with reference to the accompanying drawings.
The method comprises the following steps that 1, bus GPS data obtained from a bus operation company are read by a system, the bus GPS data are continuous in time and space theoretically, different buses correspond to different bus line numbers LID and bus numbers BID, the GPS data of each BID in a specified time period are extracted, and a bus trip time-space sequence is formed.
The GPS data of the buses is the recording data of the vehicle-mounted GPS of each bus to the spatial position of the bus, which is acquired from a bus operator in real time.
Step 1.1, the system reads bus GPS data obtained from a bus operation company, and theoretically, the bus GPS data should be continuous in time and space, and the method comprises the following steps: the system comprises a unique bus line number LID, a unique bus number BID, a communication action occurrence TIME TIME1, a longitude position Long where a bus is located and a latitude position Lat where the bus is located, wherein the bus line number LID and the bus number BID form the bus number;
step 1.2, one bus GPS data is a space-time track record;
and step 1.3, inquiring all GPS records of the bus within a specified time period according to the bus line number LID and the bus number BID of the bus, converting the longitude and latitude of the GPS records into geographic coordinates, and constructing a space-time motion track of the bus.
In this example, part of the GPS data of the bus SB on the bus line SL is shown in table 1:
LID BID TIME Long Lat X Y
...... ...... ...... ...... ...... ......
SL SB 2017-06-20 08:05:57 121.6981 31.1289 22706.9313 -11185.0718
SL SB 2017-06-20 08:08:23 121.7038 31.1227 23253.0972 -11869.8001
SL SB 2017-06-20 08:11:43 121.6995 31.1151 22846.4016 -12714.8964
SL SB 2017-06-20 08:14:25 121.6933 31.1099 22257.4919 -13296.5478
SL SB 2017-06-20 08:18:19 121.6862 31.1042 21577.8252 -13931.1146
...... ...... ...... ...... ...... ......
step 2, the system reads the anonymous encrypted bus card swiping record data obtained from a bus operation company, in the time sequence, one card swiping record data is generated every time a card is swiped, different code scanning terminals or IC cards correspond to different IC serial numbers ICID, bus serial numbers BID are recorded, the bus card swiping record of each IC serial number ICID in a specified time period is extracted, and a card swiping data set of the IC serial number ICID is formed;
step 2.1, the system reads the anonymous encrypted bus card swiping record data obtained from a bus operation company, and the recorded data comprises: the code scanning terminal or the unique serial number ICID of the bus card, the unique serial number BID of the bus, the occurrence TIME TIME2 of communication action, the TYPE TYPE of getting on or off the bus and the COST COST of taking the bus;
the anonymous encrypted bus card swiping data is encrypted card swiping information of an anonymous bus code scanning terminal and an IC card user time sequence which are obtained from a bus operator in real time and subjected to desensitization encryption, and the content comprises: ICID, LID, BID, TIME2, TYPE, COST. The specific introduction is as follows:
the ICID is to perform one-way irreversible encryption on each bus code scanning terminal or IC card user, so as to uniquely identify each bus code scanning terminal or IC card user, and the ICID encrypted by each bus code scanning terminal or IC card user is required to keep uniqueness.
LID, the number of the bus route.
BID, the number of each bus.
TIME2 is the TIME when the currently recorded card swiping behavior occurs, and is measured in milliseconds.
The TYPE is a bus code scanning terminal or an IC card swiping TYPE, and is divided into an on-board card swiping mode and an off-board card swiping mode.
COST is the bus COST, if the bus code scanning terminal or the IC card is used for card swiping for getting on or off the bus, the money is deducted in the record of getting off the bus, and if the card swiping for getting on the bus is carried out, the money is deducted in the record of getting on the bus.
Step 2.2, one piece of card swiping data is one signaling record, and each signaling record is decrypted;
step 2.3, inquiring all card swiping modes of the IC number ICID within a specified time period according to the IC number ICID to form a card swiping data set of an individual bus;
in this example, the record of the scanning code terminal of the public transport with the number of IC1 or the partial card swiping of the IC card is shown in Table 2:
TABLE 2 card swipe record data of IC1
RID ICID LID BID TIME TYPE COST
...... ...... ...... ...... ...... ...... ......
R1 IC1 RL1 B11 2017-06-20 08:30:31 1 -1
R2 IC1 RL1 B11 2017-06-20 09:04:65 2 3
R3 IC1 RL2 B13 2017-06-20 09:17:22 1 2
R4 IC1 RL3 B42 2017-06-20 14:15:56 1 2
R5 IC1 RL4 B5 2017-06-20 17:31:43 1 7
R6 IC1 RL1 B21 2017-06-21 08:28:12 1 -1
...... ...... ..... ..... ...... ...... ......
Step 3, acquiring all bus route stops in the appointed space range, clustering the bus route stops by adopting a space clustering algorithm, combining stops close to the space position as possible transfer nodes, and adjusting the stops in all the bus routes so as to extract and analyze individual transfer behaviors;
step 3.1, acquiring all bus line stops and position information thereof, converting the longitude and latitude of the bus line stops into XY coordinates, and mapping the XY coordinates into a geographic space with a traffic line;
in this example, the spatial location information of some bus stations is shown in table 3:
TABLE 3 spatial information of bus stops
Figure BDA0001748720770000121
Figure BDA0001748720770000131
3.2, clustering the bus stops by adopting a spatial clustering method and taking the traffic distance between the stops as a standard, and combining the stops which are very close to each other in spatial distance:
step 3.2.1, setting a clustering standard that the distance between two stations is less than d meters;
step 3.2.2, each station is regarded as a cluster core, taking the bus station a as an example, the spatial position of the bus station a is taken as the center of a circle to search peripheral stations of the bus station a, and if a bus station b with the traffic distance smaller than d meters exists, the bus station b is taken as the cluster of the bus station a;
step 3.2.3, merging the spatial clusters obtained in the step 3.2.2, wherein the merging condition is that if the clusters x and y have the same station, merging the same station to form a larger spatial cluster of the bus stations which is relatively independent in space;
step 3.2.4, extracting the space center of each cluster, mapping the space center to a map, acquiring the space position and the geographical name of the center point, merging the bus stops in each cluster, naming the merged bus stops by the geographical name of the center point, and taking the XY coordinates of the cluster center as the mean value of the XY coordinates of all the bus stops in the cluster;
in this example, the partial bus stops after spatial clustering are shown in table 4:
Figure BDA0001748720770000132
Figure BDA0001748720770000141
3.3, rearranging each bus line, and replacing the station in the bus line before clustering with the clustered bus station so as to extract transfer information of subsequent individuals;
step 4, arranging card swiping records of the bus code scanning terminal or the IC card, and acquiring station information of the individual getting on or off the bus according to card swiping time and the space position of the bus at the time;
step 4.1, extracting card swiping records of all bus passengers in a specified time period, and sequencing the card swiping records of each ICID according to card swiping time according to a bus code scanning terminal or an IC card unique identification number ICID to form a card swiping behavior time sequence of each bus passenger;
step 4.2, reading the positions X-IC and Y-IC of the bus when the card swiping action occurs from the GPS data of the bus according to the communication action occurrence TIME TIME2, the bus line number LID and the bus number BID in the card swiping record;
in this example, the spatial position of the bus code scanning terminal with the number of IC1 or the occurrence of the card swiping behavior of the IC card is shown in Table 5:
TABLE 5 spatial location of IC1 card swiping behavior when it occurs
RID ICID LID BID TIME TYPE X Y COST
...... ...... ...... ...... ...... ...... ......
R1 IC1 RL1 B11 2017-06-20 08:30:31 1 -3706.4625 -18.1212 -1
R2 IC1 RL1 B11 2017-06-20 09:04:65 2 -2618.1421 -733.5432 3
R3 IC1 RL2 B13 2017-06-20 09:17:22 1 -2614.1322 -706.8132 2
R4 IC1 RL3 B42 2017-06-20 14:15:56 1 -2883.4211 -1063.6231 2
R5 IC1 RL4 B5 2017-06-20 17:31:43 1 -2313.4212 -1350.4321 7
R6 IC1 RL1 B21 2017-06-21 08:28:12 1 -3707.4324 -19.4321 -1
...... ...... ...... ...... ...... ...... ......
4.3, as the bus is possibly in a starting state when the card swiping action occurs, a space range needs to be divided for each bus station, and for the bus stations which pass through the space clustering in the appointed space range, a Thiessen polygon is generated according to a road traffic network, and the space range of each bus station is divided;
4.4, mapping the spatial positions X-IC and Y-IC of the bus obtained in the step 4.2 when the individual swipes the card into the Thiessen polygon generated in the step 4.3, and obtaining the station where the bus is located when the card swiping action occurs;
step 4.5, adding the acquired station information into a bus code scanning terminal or an IC card swiping record of a passenger to form complete bus card swiping time sequence data of the passenger;
in this example, the bus scanning terminal with the number of IC1 or the bus stop where the IC card swiping action occurs are shown in table 6:
TABLE 6 bus stop where IC1 card swiping behavior occurs
RID ICID LID BID TIME TYPE X Y SID COST
...... ...... ...... ...... ...... ...... ...... ...... ...... ......
R1 IC1 RL1 B11 2017-06-20 08:30:31 1 -3706.4625 -18.1212 L market -1
R2 IC1 RL1 B11 2017-06-20 09:04:65 2 -2618.1421 -733.5432 Road junction B 3
R3 IC1 RL2 B13 2017-06-20 09:17:22 1 -2614.1322 -706.8132 Road junction B 2
R4 IC1 RL3 B42 2017-06-20 14:15:56 1 -2883.4211 -1063.6231 D way 2
R5 IC1 RL4 B5 2017-06-20 17:31:43 1 -2313.4212 -1350.4321 G way 7
R6 IC1 RL1 B21 2017-06-21 08:28:12 1 -3707.4324 -19.4321 L market -1
...... ...... ...... ...... ...... ...... ...... ...... ...... ......
Step 5, dividing the card swiping mode of the bus into three types: the method comprises the following steps of swiping cards for getting on and off the bus, swiping the cards in a segmented mode after getting on the bus (only once swiping but according to different destination prices) and swiping the cards at unit prices after getting on the bus (unified price), and analyzing and judging the time and the place of getting on and off the bus of an individual according to different card swiping modes;
step 5.1, sequentially traversing each ICID, reading card swiping records including card swiping time, card swiping bus lines and card swiping bus numbers, and splitting card swiping time sequences according to the bus line numbers;
step 5.2, judging the card swiping mode of each bus line in each ICID, adopting different methods to count the getting-on/off information of an individual on the bus line according to different card swiping modes, and deducing whether the getting-off point can be calculated or not;
step 5.2.1, reading the bus-getting-on stop, bus line number and bus-getting-on time of each bus taking aiming at each ICID;
step 5.2.2, estimating getting-off information of the ICID in each bus taking by adopting different methods according to different card swiping modes of different bus lines;
if the card swiping mode of the bus line L1 is that the card swiping is carried out on the getting-on/off bus, the getting-off information of each bus taking is contained in the card swiping record related to LID1 in the bus code scanning terminal or the IC card, so that for IC1, if the card swiping record R1 is arranged on L1, and the bus stop is S1, the getting-off place S2 and the time are recorded in the next card swiping record R2, and R1 and R2 are merged and marked as the known getting-off point record; if the R2 is also the getting-on record, the IC1 indicates that the card is not swiped by the getting-off card in the previous trip, and then the R1 is discarded;
if the card swiping mode of the bus line L2 is the card swiping on the getting-on bus and the charging is carried out in a segmented manner, if the IC1 has a card swiping record R3 on the L2 during the time period T1 and the bus stop is S3, firstly, a possible getting-off stop interval of the passenger is calculated according to the COST COST recorded by the card swiping, and then the next card swiping record R4 after the IC1 swipes the card on the L2 is read; if the initial station S4 in the R3 is in the section of the estimated getting-off station, the IC1 is considered as the getting-off station taking the L2 as S4, and R3 is recorded as the record of the estimated getting-off station; if the getting-on point of R4 is not within the section of the possible getting-off station of R3, recording R3 as a record that the getting-off point cannot be calculated;
if the card swiping mode of the bus line L3 is the card swiping on the bus and the charging is unified, if the IC1 has the card swiping record R5 on the L3 during the time period T1, the station of the bus is S5, the next card swiping record R6 after the ICI swipes the card on the L3 is read: if the starting station S6 in R6 is in an along-line station after L3 departs from R5, and the difference value between the card swiping time of R6 and the time from the bus line L3 to S6 is in a threshold value range T _ Thrh, the IC1 is considered to get off at S6, and R3 is recorded as a record of an inferable getting-off point; if the getting-on point of R6 is not in the station along the route of R5, recording R6 as a record that the getting-off point cannot be calculated;
the frequency of all the getting-on points of the IC1 in the L3 is counted as the basis of the statistics of the non-calculable getting-off points.
In this example, assuming that the time threshold T _ Thrh is 1 hour, the IC1 takes three buses of RL1, RL2 and RL3 on day 20 6 month in 2017, where RL1 is the boarding and disembarking, so the disembarking point can be inferred to be at the intersection B, RL2 and RL3 are the boarding and unified fare, RL4 is the boarding and sectionalized fare, and the inference of the disembarking point needs to depend on the transfer information, where the boarding time for R3 is 17 am, the boarding time for R4 is 15 pm, the time interval is greater than T _ Thrh, so L2 cannot be inferred, the boarding time for R5 is 31 pm, the boarding time for taking L3 from D to G is 46 minutes, so the disembarking point for RL3 is not infertile, RL 48 is the sectionalized fare, the boarding time for R6 is from RL 5 6, the boarding range from the mall L is equal to G2G, and the departure range from the mall is equivalent to G4, therefore, R5 can calculate the get-off point, which is market L.
Step 6, counting the probability of taking each bus route by each ICID in a time period of one week and one day of an individual and the probability of getting off the bus at each station along the way after getting on the bus to obtain the probability distribution of the travel O-D road section, wherein the method comprises the following steps:
step 6.1, counting the times of taking the bus line L in a specific time period T in each ICID, and setting a bus code scanning terminal or an IC card with an ICID number of IC1, wherein the probability P _ U (T, S, L, IC1) that the bus line L is taken at the bus station S in the time period T is N _ U (T, S, L, IC1)/N _ Day, namely the times N _ U (T, S, L, IC1) that the bus line L is taken at the bus station S in the T time period of each Day by the IC1 in the specified time period is divided by the number of days N _ Day in the specified time period (the working Day and the non-working Day are distinguished);
in this example, setting the time interval to be 1 hour, obtaining that the number of times that 8-9 points in the morning of a working day take the RL1 on the L market station is 22 times according to long-term card-swiping record data (30 days) of the IC1, and the total number of the working days in the period is 22 days, so that the probability that the IC1 takes the RL1 on the L market station at 8-9 points on the working day is 1, the number of times that 8-9 points take the RL1 on the L market station at the morning of a non-working day is 1, the number of days in the period is 8 days, and the probability that the IC1 takes the RL1 on the L market station at 8-9 points on the non-working day is 0.125;
the number of times that the IC1 gets on the RL2 at the B intersection station at 8-9 points in the morning of the workday is 2, and the probability that the IC gets on the RL2 at the B intersection station at 8-9 points in the workday is 0.091; the number of times that the user rides on RL2 at the intersection B at 9-10 points in the morning of the working day is 20, and the probability that the user rides on RL2 at the intersection B at 9-10 points in the working day is 0.909; the number of times that the user rides on RL2 at the intersection B at 8-9 points earlier than the non-working day is 0, and the probability that the user rides on RL2 at the intersection B at 8-9 points on the non-working day is 0; the number of times that the user gets to the RL2 at the intersection B station at 9-10 points earlier in the non-workday is 2, and the probability that the user gets to the RL2 at the intersection B station at 8-9 points in the non-workday is 0.125;
the number of times that the IC1 hitches the RL3 at the D-road station at 14-15 points of a working day is 3, and the probability that the IC hitches the RL3 at the D-road station at 14-15 points of the working day is 0.136; the number of times that the user rides on RL3 at the D-way station at 14-15 points on a non-working day is 0, and the probability that the user rides on RL3 at the D-way station at 14-15 points on a working day is 0; the number of times that the IC1 rides on the RL4 at the G road station at the 17-18 points of the working day is 22, the probability that the IC rides on the RL4 at the G road station at the 14-15 points of the working day is 1; the number of times that the user rides on the RL4 at the G-road station at 14-15 points on the non-working day is 0, and the probability that the user rides on the RL4 at the G-road station at 14-15 points on the working day is 0.
Step 6.2, on the basis of the estimation of the getting-off point of each bus taking record obtained in the step 5, counting the getting-off probability of the individual at each station by adopting different methods according to different card swiping modes of each bus route, and the method comprises the following steps:
if the card swiping mode of the bus line L1 is to swipe cards for getting on and off the bus, counting the number of times that the IC1 gets off the bus at each station (such as S2) along the way under the condition that the IC1 gets on the bus at the station S1 in the time period T1, and counting the probability that the IC1 gets off the bus at the station S2 after getting on the bus line L1 at the bus station S1 in the time period T1, wherein the probability that the IC1 gets off the bus at the station S2 is N _ D (S1, S2, L1, IC1)/N _ U (T1, S1, L1, IC1), wherein N _ U (T1, S1, L1, IC1) is the number of times that the IC1 gets on the bus line L1 at the station S1 in the time period T1 in the time period;
in this example, the number of getting-off times and the getting-off probability of the IC1 at the station of mall L after taking the station of mall L1 at working day 8-9 are shown in Table 7:
TABLE 7 number and probability of getting off at L market station and RL1 station along the way at working day 8-9 point IC1
SID Number of times Probability of
M way 0 0
N crossing 0 0
O mechanism 0 0
...... ...... ......
Road junction B 22 1
P building 0 0
Q way 0 0
S road 0 0
...... ...... ......
Step 6.2.2, if the card swiping mode of the bus line L2 is to swipe the card of the upper bus and perform the sectional charging, separately counting whether the frequency of the lower bus point is according to the lower bus point recorded by swiping the card, and for the record of the non-calculable lower bus point, it is assumed that an ICID taking the bus generally has continuity, that is, the upper bus point is also a lower bus point of another trip, so the frequency of the ICID occurring at all the upper bus points in the L is counted as the basis of the statistics of the non-calculable lower bus point:
for the record of the predictable getting-off point, counting the frequency N _ D (S3, S4, D, L2, IC1) of getting-off at each station (such as S4) along the line after the IC1 gets on the bus from S3 to get-off at L2 in the time period T1, the probability that the IC1 gets off from S3 to L2 in the time period T1 from S3 to S4 is N _ D (S3, S4, D, L2, IC1)/N _ U (T1, S3, L2, IC1), wherein N _ U (T1, S3, L2, IC1) is the number of times of getting-on bus route L2 at station S3 in the time period T1 of the IC1 in the designated time period;
for the record of the non-infertile departure point, counting the frequency of getting-on of N _ U (S4, L2 and IC1) at each stop (such as S4) in the history record of taking the IC1 at the L2, and then after the IC1 time period T1 at S3 gets on the bus line L2, the probability of getting-off at each stop (such as S4) on the remaining line is N _ U (S4, L2 and IC1)/sum (N _ U (SN, L2 and IC1)), wherein SN is the station set of the IC1 on the remaining path of the L2 after getting-on at S3, and sum (N _ U (SN, L2 and IC1)) is the frequency of getting-on the station on the remaining path of the IC1 after getting-on at S3 and is used for replacing the frequency of getting-off;
in this example, after the IC1 gets on the RL4 at the G-road station at the working day 17-18, 19 records of the possible alighting points and 3 records of the impossible alighting points are recorded, and the number of alighting and alighting probability at each station along the way are shown in table 8:
TABLE 8 IC1 frequency and probability of getting off at station along the way after taking RL4 at station G at working day 17-18 (recording of getting off can be calculated)
Figure BDA0001748720770000181
Figure BDA0001748720770000191
The number of alighting times and alighting probability of each station along the way are shown in table 9 by the IC1 according to the record of the un-deducible alighting point after the station is taken at the G road station at the working day 17-18 and the RL 4:
TABLE 9 IC1 frequency and probability of getting on at station along the way after taking RL4 at station G at working day 17-18 (record of non-reckoning off-station)
SID Number of times Probability of
R road 0 0
S mansion 5 0.161
...... ...... ......
T market 4 0.129
U road 22 0.710
L market 0 0
V-shaped bridge 0 0
...... ...... ......
Step 6.2.3, if the card swiping mode of the bus line L3 is the card swiping mode of the getting-on bus and the charging is unified, whether the frequency of the getting-off point can be calculated according to the card swiping record of the getting-off point or not needs to be separately counted;
step 6.2.3.1, counting the frequency N _ D (S5, S6, L3, and IC1) of getting-off at each station (such as S6) along the route after the IC1 gets on the bus from S5 to take the bus L3 in the time period T1, and obtaining the probability that the IC1 gets off from S5 to take the bus L3 to S6 in the time period T1, namely N _ D (S5, S6, L3, IC1)/N _ U (T1, S5, L3, IC1), wherein N _ U (T1, S5, L3, and IC1) is the number of times that the IC1 gets on the bus line L3 at the station S5 in the time period T1;
step 6.2.3.2, for the record of the un-reckoning departure point, counting the frequency of getting-on at each station (such as S6) in the history record of the L3 of the IC1, N _ S6_ U _ IC1, and then the probability of getting-off at each station (such as S6) on the remaining route after taking the bus route L3 in the S5 within the IC1 time period T1 is N _ U (S6, L3, IC1)/sum (N _ U (SN, L3, IC1)), wherein SN is the station set of the IC1 on the remaining route of the L3 after getting-on at S5, and sum (N _ U (SN, L3, IC1)) is the frequency of getting-on the station on the remaining route after getting-on at S5 by the IC1, and is used for replacing the frequency of getting-off;
in this example, after the IC1 gets on RL2 at the B-intersection station at the 9-10 working days, 15 records of the inferable alighting points and 5 records of the inferable alighting points are recorded, and the number of alighting times and the alighting probability at each station along the way are shown in table 10 according to the records of the inferable alighting points:
TABLE 10 IC1 frequency and probability of getting off at each station along the way after taking RL2 at B crossing station at working day 9-10 (recording of getting off can be calculated)
SID Number of times Probability of
W way 0 0
...... ...... ......
X square 15 1
Y bridge 0 0
...... ...... ......
The number of getting-off times and the getting-off probability of each station along the way after the IC1 gets on the RL2 at the B intersection station at the working day 9-10 are shown in the table 11 according to the record of the non-calculable getting-off point:
TABLE 11 IC1 frequency and probability of getting on at station after taking RL2 at B crossing station on working day 9-10 (recording of non-reckoning off-vehicle point)
SID Number of times Probability of
W way 0 0
...... ...... ......
X square 17 1
Y bridge 0 0
...... ...... ......
Step 6.3, according to statistics of the getting-off probability of each station in the card swiping records of the deductible getting-off point and the non-deductible getting-off point in the three card swiping modes obtained in the step 6.2, the probability that the IC1 gets on the S station and gets off the SS station within the T1 time period is obtained in a record number weighting mode;
for the bus line (such as L1) where the card is swiped for getting on and off the bus, all the effective records have complete information of getting on and off the bus, so that the probability P _ D (T1, S, SS, L1, IC1) that the final IC1 gets off the bus at the SS station after getting on the bus at the S station in the T1 period is obtained in step 6.2.1;
for the bus lines (such as L and L) with only the upper bus for card swiping, card swiping records are judged to be two types which can be inferred by the lower bus point, the numbers N _ C (T, S, L, IC) and N _ NC (T, S, L, IC) of the records which can be inferred by the lower bus point and the records which can not be inferred by the lower bus point are respectively counted, and the total number N (T, S, L, IC) of the records which can be inferred by the lower bus point and the records which can not be inferred by the lower bus point are counted, a record number weighting mode is adopted, and the probability P _ D (T, S, SS, L, IC) of getting off the SS station after the IC gets on the S station is picked up in a T period and then gets off the SS station is adopted, wherein the probability P _ D (T, S, SS, L, IC)/N _ U (T, S, L, IC) of getting off the SS station is counted in a T period, l2, IC 1)). N _ NC (T1, S, L2, IC1)/N (T1, S, L2, IC 1);
in this example, the composite probability of getting off at each station along the way after IC1 gets on RL2 at the B intersection station at the 9-10 working days is shown in Table 12:
table 12 composite probability of getting off at each station after 9-10 working days of IC1 get on RL2 at B intersection station
SID Probability of
W way 0
...... ......
X square 1
Y bridge 0
...... ......
The composite probability that the IC1 gets off at each station along the way after taking the RL4 at the G-way station at the working day 17-18 is shown in the table 13:
TABLE 13 composite probability of getting off at station along the way after taking RL4 at IC1 working day 17-18 point at G station
SID Probability of
R road 0
S mansion 0.158
...... ......
T market 0.108
U road 0.733
L market 0
V-shaped bridge 0
...... ......
Step 6.4, traversing all ICIDs, and acquiring the probability that each user gets on the bus line L at the S stop within the time period of T1 and the probability of getting off at each stop along the way; calculating the probability of taking all bus lines at the S stop within the T1 period of each user and then getting off at each stop along the way on the basis; calculating the probability of each user taking all bus lines at all bus stops in the T1 time period and then getting off at each stop along the way on the basis; finally, calculating the probability of each user taking all bus lines at all bus stops in all time periods of one day and then getting off at each stop along the way;
step 7, acquiring real-time bus card swiping data from a data source and GPS data of each bus on each bus line, mining the riding condition of potential passengers on the bus lines, calculating the possible real-time passenger capacity and the crowding degree of each bus in each bus line, and predicting the passenger carrying demand and the crowding degree of the bus in a specified time period in the future;
step 7.1, obtaining real-time bus card swiping data with a time interval of TM (TM needs more than or equal to 3 hours for constructing a complete individual riding record) and bus GPS data from a data source, sorting the card swiping data, and sorting according to bus lines and ICID numbers;
step 7.2, reading the last card swiping record of each ICID, obtaining the bus line number LID, the bus number BID and the getting-on and getting-off TYPE TYPE taken by the ICID, obtaining the getting-on station information of the ICID through the bus GPS data according to the method of the step 4, and then judging the taking condition of the ICID at the current time node according to different bus line card swiping modes:
7.2.1, if the cards are all swiped for getting on and off the bus and the last card swiping record in the ICID is the getting off the bus, considering that the ICID does not take the bus currently, and expecting that the ICID takes the bus BID in the bus line LID in real time to be 0;
7.2.2, if the cards are all swiped for getting on and off the bus and the last card swiping record in the ICID is the getting on, considering that the ICID takes the bus BID in the bus line LID currently, and setting the expected E (TR, L, ICID) of taking the bus BID in the bus line LID in real Time (TR) as 1;
step 7.2.3, if the bus is a bus-in swipe, obtaining the probability of getting off at each station along the way after each ICID gets on the bus BID of the bus line LID at the station SL in the TIME period of the bus-in TIME (i.e. TIME2) obtained in step 6, obtaining the current station position of the BID according to the GPS data, and calculating the probability that the ICID still does not get off at the current Station (SN), where the probability value is obtained by subtracting the probability that the ICID gets off at the station (e.g. SM) between SL and SN from the probability value PT (TR, SN, L, ICID) ═ 1-sum (P _ D (TU, SL, SM, L, ICID)), and then the expectation that the bus BID in the ICID real-TIME bus line LID is E (TR, L, ICID) ═ PT (TR, SN, L, ICID);
7.3, counting the expectation of taking the BID of the bus in real time when all last card swiping behaviors occur in the ICID of the BID aiming at each BID of the bus, wherein the sum (E (TR, L, ICID)) of the expectation is the expectation of the number of people carrying the BID of the bus in real time;
in this example, when TM is 3 hours, the starting point is 7 am on a certain working day, and the calculated bus RB1 of the bus line SL1 passes through the L market accessory road segment at 10 am, the expected number of passengers is 32, and according to the "city construction system index interpretation" published by the construction department in 2001, the calculation formula of the rated passenger capacity is: the number of passengers is the fixed number of seats of the carriage plus the effective standing area (square meter) of the carriage x the number of people allowed to stand per square meter, and the number of people allowed to stand per square meter is 8, so that the maximum load number of RB1 is about 80, and the congestion degree of RB1 at 10 am of the day is 0.4, so that the passengers are not congested;
7.4, predicting the passenger carrying demand expectation of each bus route between the stops in the specified time period TPJ backwards in real time according to the probability of taking the bus by each ICID time period stop in the step 6;
step 7.4.1, according to the GPS data of the buses and the departure arrangement thereof, firstly predicting the departure condition of each bus route in the TPJ time period and the time interval of each bus stopping at each platform;
7.4.2, traversing all the ICIDs aiming at each bus (such as B1), and searching the ICIDs with the bus getting-on records in each station on the subsequent path in the specified time period TPJ;
step 7.4.3, calculating, for each ICID, an expectation of each ICID to take a bus B1 at the stop S at the time T according to its probability P _ U (T, S, L, ICID) of getting on the bus at the stop S within the time period T, and calculating an expectation of each passenger who may have taken a bus B1 to get off the bus at the stop S at the time T according to a probability P _ D _ prj (TU, SL, S, L, ICID) of getting off the bus at the stop S of the passenger who has already been in the bus; since in the prediction of the passenger capacity, whether to get on the vehicle for the predicted ICID is a probability distribution, which is different from the real-time loading statistics, the probability that the passenger gets off the vehicle at the station S is the product of the probability that the passenger gets on the vehicle at the SL station and the probability that the passenger gets off the vehicle at the station S, which is predicted in the getting-on behavior: p _ D _ prj (TU, SL, S, L, ICID) ═ P _ U (TU, SL, L, ICID) × P _ D (TU, SL, S, L, ICID), and the ICID is expected to be E _ D (T, S, L, ICID) ═ P _ D _ prj (TU, SL, S, L, ICID) at the S station;
7.4.4, traversing each ICID, summing the expectation of getting on the bus at the stop S at the time T, namely the expectation of getting on the bus B1 at the stop S, summing the expectation of getting off the bus at the stop S, namely the expectation of getting off the bus B1 at the stop S, and subtracting the expectation of getting off the bus B1 at the stop S to obtain the predicted expectation of passenger carrying capacity of the bus at the time T;
and 7.5, acquiring the number data of the designed loads of each bus, comparing the number data with the expected number of passengers of each bus in the time period, and acquiring the real-time expected congestion degree statistics and prediction results of each bus in the time period.
In this example, the expected number of passengers and the degree of congestion of the bus RB1 are estimated from the probability of taking the L1 at the L mall station and other stations subsequent thereto at 10 am and later on a certain weekday and the probability of getting off the bus at the L mall station and other stations subsequent thereto, as shown in table 14:
table 14 RB1 shows the number of persons expected to get on/off the bus and the degree of congestion at each station after 10 am on a certain working day
Figure BDA0001748720770000231
Figure BDA0001748720770000241
The invention aims to obtain a spatial activity data set of an individual taking a bus route between stations by utilizing a bus code scanning terminal or a card swiping record between an IC card and the bus and GPS data of the bus, and excavate a large number of individual time-interval and place-by-place bus route taking behaviors; the method comprises the steps that spatial clustering is carried out on bus stops, the bus stops adjacent to the geographic position are combined, the original bus stop is replaced in a card swiping record, and a geographic basis is provided for identifying transfer behaviors in the bus traveling process; dividing the combined bus stops into Thiessen polygons, and determining the space range of each bus stop; the method comprises the steps that a large number of bus card swiping records are utilized to obtain the probability that each individual takes each bus line at each bus stop in each time period in working days and non-working days, on the basis, the individual is divided into three modes of card swiping for getting on and off, sectional charging for getting on and off and unified charging for getting on according to different card swiping modes of the bus lines, and the probability that the individual gets off at the rest stops along the bus after taking different bus lines at different stops in different time periods is calculated and obtained respectively; according to the probability, on the basis of obtaining the bus card swiping record in real time, the number of passengers on each current bus is estimated, and the number of passengers getting on and off, the number of passengers carrying passengers and the crowdedness of each bus at each stop within a period of time are predicted. The invention utilizes the existing mass anonymous bus card swiping record and continuous bus GPS position information in the bus system, can obtain the bus line taking behaviors of a large number of individuals in a specified time range in a low-cost, automatic and convenient way, and deduces the probability that the individuals take different buses in different time periods and places and get off at other stations along the way after getting on the buses; therefore, the number of passengers and the crowdedness of the bus in real time and in the future can be calculated and predicted quickly and efficiently.

Claims (4)

1. A real-time analysis and prediction method for bus passenger capacity in a big data environment is characterized by comprising the following steps:
step 1, bus GPS data which are continuous in TIME and space are obtained, the bus GPS data comprise bus line numbers LID, bus numbers BID, communication action occurrence TIME TIME1, longitude positions Long of buses and latitude positions Lat of the buses, different buses correspond to different bus line numbers LID and bus numbers BID, GPS data of the buses corresponding to each bus number BID in a specified TIME period are extracted, and bus travel TIME-space sequences of each bus are formed;
step 2, obtaining anonymous encrypted bus card swiping record data, generating a piece of bus card swiping record data every TIME a card is swiped in a TIME sequence, wherein each piece of bus card swiping record data comprises an IC (integrated circuit) number ICID, a bus line number LID, a bus number BID, a communication action occurrence TIME TIME2, an on-off TYPE TYPE and a bus COST COST, different code scanning terminals or IC cards correspond to different IC number ICIDs, different IC number ICIDs correspond to different individuals, bus card swiping records of each IC number ICID in a specified TIME period are extracted, and card swiping data sets of the individuals corresponding to the different IC number ICIDs are formed;
step 3, acquiring bus stops of all bus lines in a specified spatial range, clustering all bus stops by adopting a spatial clustering algorithm, combining bus stops close to spatial positions as possible transfer nodes, adjusting bus stops in all bus lines, and replacing the bus stops before clustering with the clustered bus stops so as to extract the transfer behaviors of subsequent individuals;
step 4, sorting the card swiping data sets of all individuals obtained in the step 2 to obtain information of all getting-on and getting-off stations of all individuals to form complete bus card swiping TIME sequence data of all individuals, and for current bus card swiping record data in the card swiping data set corresponding to the current individual, obtaining the spatial position of the TIME2 bus at the current communication action occurrence moment, so as to obtain information of the getting-on stations of the current bus card swiping record data, and adding the information into the current bus card swiping record data;
and 5, dividing the bus card swiping modes into three types: the method comprises the following steps of all swiping cards of getting-on and getting-off vehicles, swiping cards in a segmented mode after getting-on and getting-off vehicles and swiping cards in unit price after getting-on, analyzing and judging the time and the place of getting-on and getting-off vehicles of individuals respectively according to different swiping card modes:
step 5.1, traversing all the complete bus card swiping time sequence data of the passengers corresponding to the IC serial number ICID in sequence, reading the bus card swiping record data, and splitting the card swiping time sequence according to the bus line serial number LID;
step 5.2, judging the card swiping mode of each bus line in each IC number ICID, adopting different methods to count the getting-on/off information of an individual on the current bus line according to different card swiping modes, and deducing whether the getting-off point can be calculated, wherein the method comprises the following steps:
step 5.2.1, reading the getting-on stop information, the bus line number LID and the getting-on TIME of each bus taking aiming at the current IC number ICID, namely the TIME2 of the occurrence moment of the communication action;
step 5.2.2, estimating getting-off information of the current IC serial number ICID in each bus taking by adopting different methods according to different card swiping modes of different bus lines;
if the card swiping mode of the current bus line is that the card swiping is needed to get on or off the bus, the card swiping records related to the current bus line in the bus code scanning terminal or the IC card all contain getting-off information of the bus taking the current bus line each time, so that all known getting-off point records of the bus taking the current bus line are obtained aiming at the current IC number ICID, and the method comprises the following steps: if card swiping record data R1 exists on the current bus route, and the corresponding bus stop S1 is judged to be a getting-on stop according to the getting-on and getting-off TYPE TYPE in the card swiping record data R1, whether the getting-on and getting-off TYPE TYPE in the next card swiping record data R2 of the card swiping record data R1 is a getting-off stop is judged, if yes, the bus stop S2 corresponding to the card swiping record data R2 is a getting-off stop, the card swiping record data R1 and the card swiping record data R2 are combined and marked as known getting-off point records, if the getting-on and getting-off TYPE TYPE in the card swiping record data R2 is a getting-on stop, it is indicated that the current IC number ICID does not have getting-off card swiping in the previous trip, the card swiping record data R1 is abandoned, and the next card swiping record data is obtained again for judgment;
if the current card swiping mode of the bus line is the card swiping mode of getting on the bus and the sectional charging is carried out, all records of IC number ICID which can calculate the getting-off point and records which can not calculate the getting-off point are obtained, and the method comprises the following steps:
if the current IC number ICID has card-swiping record data R3 on the current bus during the time period T1, and the corresponding bus stop is S3, then:
step 5.2.2.1, calculating the possible get-off station interval of the passenger according to the bus taking COST of the card swiping record data R3;
step 5.2.2.2, reading next card swiping record data R4 of the card swiping record data R3, if an initial bus stop S4 corresponding to the card swiping record data R4 is in the get-off stop interval calculated in step 5.2.2.1, considering that the get-off stop of the bus with the current IC number ICID in the current bus route is S4, and recording the card swiping record data R3 as a record capable of calculating the get-off stop; if the starting bus stop S4 corresponding to the card swiping record data R4 is not in the get-off stop section calculated in the step 5.2.2.1, recording the card swiping record data R3 as a record that the get-off stop cannot be calculated;
if the card swiping mode of the current bus line is the card swiping mode of getting on the bus and unified charging is carried out, all records of IC number ICID which can calculate the getting off point and records which can not calculate the getting off point are obtained, and the method comprises the following steps:
if the current IC number ICID has the card swiping record data R5 on the current bus during the time period T1, the bus station on the bus corresponding to the card swiping record data R5 is S5, and the next card swiping record data R6 of the card swiping record data R5 is read:
if the starting bus stop S6 in the card swiping record data R6 is located in an along-line stop of the current bus line after the departure from the card swiping record data R5, and the difference value between the TIME TIME2 when the communication action of the card swiping record data R6 occurs and the TIME from the current bus line to the starting bus stop S6 is within the threshold value range T _ Thrh, the current IC number ICID is considered to get off at the starting bus stop S6, and meanwhile, the card swiping record data R5 is recorded as a record of the derivable getting-off point;
if the starting bus stop S6 in the card swiping record data R6 is not in the station along the bus after the current bus route departs from the card swiping record data R5, recording the card swiping record data R6 as a record of the point where the bus stop can not be calculated;
counting the frequency of the current IC serial number ICID at all the getting-on points in the current bus line, and taking the frequency as the basis of the statistics of the non-calculable getting-off points;
step 6, counting the probability of taking each bus line in each time period of each day of a week and the probability of getting off at each station along the way after getting on the bus by the individual aiming at the individual corresponding to each IC number ICID, and obtaining the probability distribution of the travel O-D road section, wherein the method specifically comprises the following steps:
step 6.1, counting the times of taking the bus line L in a specific time period T in a specified time period for each IC number ICID, and setting a bus code scanning terminal or an IC card with an IC number ICID number of IC1, wherein the probability that the bus line L is taken in a bus station S in the time period T is P _ U (T, S, L, IC1), namely N _ U (T, S, L, IC1)/N _ Day, wherein: n _ U (T, S, L, IC1) is the number of times the IC1 gets on the bus line L at the bus stop S within a time period T of each Day for a specified time period, N _ Day is the number of days N _ Day within the time period T;
6.2, on the basis of the estimation of the bus stop for each bus taking record obtained in the step 5, counting the bus stop probability of the individual at each bus stop by adopting different methods according to different card swiping modes of each bus route;
if the card swiping mode of the bus line L is to swipe cards for both getting on and off the bus, counting the number of times of getting off at each station along the way under the condition that the IC1 gets on the bus at the bus station S1 within the time period T1, and then the probability of getting off at the bus station S2 after getting on the bus at the bus station S1 within the time period T1 is N _ D (S1, S2, L, IC1)/N _ U (T1, S1, L, IC1), where N _ U (T1, S1, L, IC1) is the number of times of getting on the bus line L by the IC1 within the time period T1 at the bus station S1 within the time period T1, and N _ D (S1, S2, L, IC1) is the number of times of getting off at the bus station S1 after getting on the bus station L from the bus station S1 within the time period T1;
if the card swiping mode of the bus line L is the card swiping mode of getting on the bus and the charging is carried out in a segmented mode, whether the frequency of the getting off point can be calculated according to the card swiping record of the getting off point or not is separately counted: for the record of the non-calculable getting-off point, it is assumed that an IC number ICID is generally continuous when taking a bus, that is, the getting-off point is the getting-off point of another trip, so the frequency of the occurrence of the current IC number ICID at all the getting-on points in the bus line L is counted as the basis of the non-calculable getting-off point:
for the record of the predictable get-off point, counting the frequency N _ D (S3, S4, D, L, IC1) of getting-off of the IC1 from the bus station S3 to the bus station S4 along the bus station S4 after getting-on the bus line L in the time period T1, the probability that the IC1 gets-off from the bus station S3 to the bus station S4 in the time period T1 is N _ D (S3, S4, D, L, IC1)/N _ U (T1, S3, L, IC1), where: n _ U (T1, S3, L, IC1) is the number of times that the IC1 gets on the bus line L at the bus stop S3 within the time period T1 within the specified time period;
for the record of the un-calculable departure point, counting the frequency of getting-on N _ U (S4, L, IC1) at the bus stop S4 in the history record of taking the IC1 by the bus line L, and then the probability that the IC1 gets off the bus stop S4 on the remaining line after the bus stop S3 gets on the bus line L in the time period T1 is N _ U (S4, L, IC1)/sum (N _ U (SN, L, IC1)), wherein SN is the station set of the IC1 on the remaining path of the bus line L after getting-on at the bus stop S3, and sum (N _ U (SN, L, IC1)) is the frequency of getting-on the station of the IC1 on the remaining path after getting-on at the bus stop S3 and is used for replacing the frequency of getting-off;
if the card swiping mode of the bus line L is the card swiping mode of the bus on the upper bus and the unified charging is carried out, whether the frequency of the bus leaving point is calculated according to the bus leaving point recorded by the card swiping mode can be calculated and needs to be separately counted:
for the record of the predictable getting-off point, counting the frequency N _ D (S5, S6, L, IC1) of getting-off at the bus stop S6 along the IC1 after getting-on from the bus stop S5 and taking the bus line L in the time period T1, the probability that the IC1 gets-off from the bus stop S5 and taking the bus line L to the bus stop S6 in the time period T1 is N _ D (S5, S6, L, IC1)/N _ U (T1, S5, L, IC1), wherein: n _ U (T1, S5, L, IC1) is the number of times that the IC1 gets on the bus line L at the bus stop S5 within the time period T1 within the specified time period;
for the record of the non-predictable getting-off point, counting the getting-on frequency N _ U (S6, L, IC1) of the IC1 at the bus stop S6 in the history of taking the bus route L, and then the probability that the IC1 gets off the bus stop S6 on the remaining route after the bus stop S5 gets on the bus route L in the time period T1 is N _ U (S6, L, IC1)/sum (N _ U (SN, L, IC1)), where SN is the station set on the remaining route of the bus route L after the IC1 gets on the bus stop S5, and sum (N _ U (SN, L3, IC1)) is the frequency of getting-on of the station on the remaining route after the IC1 gets on the bus stop S5, and is used for replacing the getting-off frequency;
step 6.3, according to statistics of the getting-off probability of each stop in the card swiping records of the deductible getting-off point and the non-deductible getting-off point in the three card swiping modes obtained in the step 6.2, the probability that the IC1 gets off the bus at the bus stop SS after getting on the bus at the bus stop S in the time period T1 is obtained in a record number weighting mode:
for the bus route where the card is swiped for getting on and off, all the effective records have complete information of getting on and off, so that the probability P _ D (T1, S, SS, L, IC1) that the final IC1 gets off at the bus stop SS after getting on the bus at the bus stop S in the time period T1 is obtained in step 6.2.1;
for the bus route with only the card swiping function, card swiping records of the bus route are judged to be two types of inferable and inferable points, the numbers of records of an inferable point and an inferable point, namely N _ C (T1, S, L, IC1) and N _ NC (T1, S, L, IC1), the total number of records N (T1, S, L, IC1), N _ C (T1, S, L, IC1) + N _ NC (T1, S, L, IC1), the probability P _ D (T1, S, SS, L, IC1) of getting off the bus route L at the bus station SS after getting on the bus station S in a time period T1, N _ D (T1, S, SS, L, IC1)/N _ U (T1, S, L, IC1), N _ C (T1, S573) and the total number of the records are respectively counted, IC1)/sum (N _ U (SN, L, IC1)) × N _ NC (T1, S, L, IC1)/N (T1, S, L, IC1), wherein N _ D (T1, S, SS, L, IC1) is the frequency with which IC1 gets off at a bus stop SS along the bus stop after getting on the bus line L from the bus stop S within a time period T1; n _ U (T1, S, L, IC1) is the number of times the IC1 takes the bus line L at the bus stop S within the time period T1 within the specified time period; n _ U (SS, L, IC1) is the getting-on frequency of the IC1 at the bus stop SS in the history of taking the bus line L; sum (N _ U (SN, L, IC1)) represents the frequency of getting on at a stop on the remaining path of the bus route L after getting on at the bus stop SN by IC 1;
step 6.4, traversing all IC number ICIDs, and acquiring the probability that each corresponding individual gets on the bus line L at the station S of the bus station in the time period T1 and the probability of getting off at each bus station along the way; calculating the probability of each individual taking all bus lines at the bus station S in the time period T1 and the probability of getting off at each bus station along the way; calculating the probability of each individual taking all bus lines at all bus stops in the time period T1 and the probability of getting off at each bus stop along the way; finally, calculating to obtain the probability of taking all bus lines at all bus stations of each individual in all time periods of one day and the probability of getting off at each bus station along the way;
step 7, obtaining real-time bus card swiping data from a data source and GPS data of each bus on each bus line, mining the bus taking condition of potential passengers on each bus line, calculating the possible real-time passenger carrying capacity and crowding degree of each bus in each bus line according to the probability distribution of the travel O-D road section obtained in the step 6, and predicting the passenger carrying demand and crowding degree of the bus in a specified time period in the future, wherein the step 7 comprises the following steps:
step 7.1, obtaining real-time bus card swiping data with a time interval of TM and bus GPS data from a data source, sorting card swiping record data, and sorting according to a bus line number LID and an IC number ICID;
step 7.2, reading the last card swiping record data of each IC number ICID, obtaining the bus line number LID, the bus number BID and the getting-on and getting-off TYPE TYPE taken by the current IC number ICID, obtaining the getting-on stop information of the current IC number ICID through the bus GPS data according to the method of the step 4, and then judging the riding condition of the current IC number ICID at the current time node according to different bus line card swiping modes:
if the current bus line is that cards are all swiped for getting on and off the bus, and the last card swiping record data in the current IC number ICID is that the bus is not taken at present, the expectation that the bus taking the current bus line is 0 in real time is considered;
if the current bus line is that cards are all swiped for getting on and off the bus, and the last card swiping record data in the current IC number ICID is that the bus is on the bus, the current IC number ICID is considered to be that the bus in the current bus line is taken currently, and the expectation that the bus in the current bus line is taken in real time is 1;
if the current bus line is a bus-in card swiping mode, obtaining the current bus station position of the current bus according to GPS data according to the probability of getting off at each station after each ICID gets on the bus station SL and takes the bus of the current bus line in the TIME period of the bus TIME of the current ICID, namely the TIME period of the occurrence TIME TIME2 of the communication action, obtained in the step 6, calculating the probability that the current IC number ICID still does not get off at the current bus station SN, wherein the probability value is 1 minus the probability that the current IC number ICID gets off at the bus station SM between the bus station SL and the bus station SN, and the sum PT (TR, SN, L, ICID) is 1-sum (P _ D (TU, SL, SM, L, ICID)), wherein the TR represents real TIME, and the expectation of the bus taking the current IC number ICID in the bus line in real TIME is E (TR, L, ICID) ═ PT (TR, SN, L, ICID);
7.3, counting the expectations that all the last card swiping behaviors occur in the current IC number ICID of the current bus for taking the current bus in real time aiming at each bus, wherein the sum of the expectations is the expectation of the number of people for taking the current bus in real time;
step 7.4, predicting the passenger carrying demand expectation of each bus route between stops in the specified time period TPJ backward from real time according to the probability of taking the bus at each time period and each stop of the IC number ICID obtained in the step 6, and comprising the following steps:
step 7.4.1, according to the GPS data of the buses and the departure arrangement thereof, firstly predicting the departure condition of each bus route in the TPJ time period and the time interval of each bus stopping at each bus station;
7.4.2, traversing all the IC number ICIDs for each bus, and searching the IC number ICIDs with the bus-entering records in each station on the subsequent course in the specified time period TPJ;
step 7.4.3, aiming at each IC number ICID obtained in the step 7.4.2, calculating the expectation of taking the bus at the bus station S in the time period T of each IC number ICID according to the probability P _ U (T, S, L, ICID) of getting on the bus at the bus station S in the time period T, and calculating the expectation of getting off the bus at the bus station S in the time period T of each passenger possibly taking the current bus according to the probability P _ D _ prj (TU, SL, S, L, ICID) of getting off the bus at the bus station S of the passenger already in the bus; in the prediction of the passenger capacity, whether the predicted IC number ICID gets on the bus is probability distribution, which is different from real-time loading statistics, so that the getting-on behavior is the predicted getting-off probability of the passenger at the bus station S, which is the product of the probability of the passenger getting on the bus station SL and the probability of getting off the bus at the bus station S: p _ D _ prj (TU, SL, S, L, ICID) ═ P _ U (TU, SL, L, ICID) × P _ D (TU, SL, S, L, ICID), and the desire for the current IC number ICID to get off at the bus station S is E _ D (T, S, L, ICID) ═ P _ D _ prj (TU, SL, S, L, ICID);
7.4.4, traversing each IC number ICID, summing expectations of the ICID for getting on the bus at the bus station S in the time period T, namely the expected getting-on amount of the current bus at the bus station S, summing expectations of each IC number ICID for getting off the bus at the bus station S, namely the expected getting-off amount of the current bus at the bus station S, and subtracting the two amounts to obtain the expected passenger carrying amount of the current bus in the predicted time period T;
and 7.5, acquiring the number data of the designed load of each bus, comparing the number data with the expected passenger capacity of each bus in the time period, and acquiring the real-time expected congestion degree statistics and prediction results of each bus in the time period.
2. The method for analyzing and predicting the passenger capacity of the bus in real time in the big data environment as claimed in claim 1, wherein in the step 1, the GPS data of one bus is a space-time trajectory record, all the space-time trajectory records of each bus in a specified time period are inquired according to the bus line number LID and the bus number BID of the bus, and the longitude and latitude in the space-time trajectory records are converted into geographic coordinates, so as to construct a bus travel space-time sequence.
3. The method for analyzing and predicting the bus passenger capacity in real time in the big data environment as claimed in claim 1, wherein the step 3 comprises:
step 3.1, acquiring bus stops of all bus lines in a specified spatial range and position information of each bus stop, converting the position information into XY coordinates, and mapping the XY coordinates into a geographic space with the traffic lines;
3.2, clustering the bus stops by adopting a spatial clustering method and taking the traffic distance between the bus stops as a standard, and combining the bus stops which are very close to each other in spatial distance, wherein the method comprises the following steps:
step 3.2.1, setting a clustering standard that the distance between two bus stops is less than d meters;
step 3.2.2, taking each bus stop as a clustering core to obtain spatial clustering: searching peripheral bus stops by taking the space position of the current clustering core as the center of a circle, and if the bus stops with the traffic distance smaller than d meters exist, placing the bus stops into the clustering space of the current clustering core;
step 3.2.3, merging the spatial clusters obtained in the step 3.2.2 to form a larger spatial cluster of the bus stop which is relatively independent in space, wherein the merging condition is as follows: if any two spatial clusters have the same bus stop, merging the two current spatial clusters;
step 3.2.4, extracting a space center of each bus stop space cluster, mapping the space center to a map, obtaining the space position and the geographical name of the current space center, merging the bus stops in each bus stop space cluster, naming the merged bus stop by the geographical name of the space center of each bus stop space cluster, wherein the XY coordinate of the current space center is the average value of the XY coordinates of all bus stops in the corresponding bus stop space cluster;
and 3.3, rearranging each bus line, and replacing the bus stop before clustering with the clustered bus stop so as to extract the transfer behavior of the subsequent individuals.
4. The method for analyzing and predicting the passenger capacity of the bus in real time in the big data environment as claimed in claim 1, wherein in the step 4, the step of obtaining the getting-on station information of the current bus card-swiping record data in the card-swiping dataset corresponding to the current individual comprises the following steps:
step 4.1, generating Thiessen polygons for the bus stations which are subjected to spatial clustering in the designated spatial range according to a road traffic network, and dividing the spatial range of each bus station;
step 4.2, according to the communication action occurrence TIME TIME2, the bus line number LID and the bus number BID in the bus card swiping record data, reading the position information X-IC and Y-IC of the position of the bus at the communication action occurrence TIME TIME2 from the GPS data of the bus corresponding to the bus line number LID and the bus number BID;
and 4.3, mapping the position information X-IC and the position information Y-IC obtained in the step 4.2 into the space range of the bus station generated in the step 4.1, and obtaining the bus station where the TIME2 bus is located at the moment of occurrence of the communication action, so as to obtain the information of the bus station where the current bus card is swiped and the data is recorded.
CN201810860244.5A 2018-07-31 2018-07-31 Real-time analysis and prediction method for bus passenger capacity in big data environment Active CN109035770B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810860244.5A CN109035770B (en) 2018-07-31 2018-07-31 Real-time analysis and prediction method for bus passenger capacity in big data environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810860244.5A CN109035770B (en) 2018-07-31 2018-07-31 Real-time analysis and prediction method for bus passenger capacity in big data environment

Publications (2)

Publication Number Publication Date
CN109035770A CN109035770A (en) 2018-12-18
CN109035770B true CN109035770B (en) 2022-01-04

Family

ID=64648252

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810860244.5A Active CN109035770B (en) 2018-07-31 2018-07-31 Real-time analysis and prediction method for bus passenger capacity in big data environment

Country Status (1)

Country Link
CN (1) CN109035770B (en)

Families Citing this family (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110009124B (en) * 2019-01-03 2023-05-02 创新先进技术有限公司 Data processing method, server and system for determining bus stop
CN111400419B (en) * 2019-01-03 2023-10-27 腾讯科技(深圳)有限公司 Method and equipment for fusing same-name bus stops in electronic map
CN110211379A (en) * 2019-05-27 2019-09-06 南京航空航天大学 A kind of public transport method for optimizing scheduling based on machine learning
CN110223116B (en) * 2019-06-06 2021-07-06 武汉元光科技有限公司 Public transport network information questionnaire investigation method and device
CN110322694A (en) * 2019-07-16 2019-10-11 青岛海信网络科技股份有限公司 A kind of method and device of urban traffic control piece Division
CN110647929B (en) * 2019-09-19 2021-05-04 北京京东智能城市大数据研究院 Method for predicting travel destination and method for training classifier
CN111814035B (en) * 2019-11-18 2024-08-16 北京嘀嘀无限科技发展有限公司 Information recommendation method, electronic equipment and storage medium
CN111339159B (en) * 2020-02-24 2023-08-18 交通运输部科学研究院 Analysis mining method for one-ticket public transport data
CN111402618A (en) * 2020-03-27 2020-07-10 北京嘀嘀无限科技发展有限公司 Method and device for determining boarding station, storage medium and electronic equipment
CN111476494B (en) * 2020-04-11 2023-05-23 重庆交通开投科技发展有限公司 Method for accurately analyzing public traffic population geographic distribution based on multi-source data
CN111741051B (en) * 2020-04-14 2021-08-27 腾讯科技(深圳)有限公司 Method and device for determining full load rate of vehicle, storage medium and electronic device
CN111491261B (en) * 2020-04-24 2022-03-01 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Individual movement track extraction method based on intelligent card swiping data
CN112288131B (en) * 2020-09-24 2021-06-11 和智信(山东)大数据科技有限公司 Bus stop optimization method, electronic device and computer-readable storage medium
CN112991722B (en) * 2021-02-03 2022-07-19 浙江中控信息产业股份有限公司 High-frequency gps (gps) point bus real-time intersection prediction method and system
CN112966218B (en) * 2021-02-26 2023-06-16 佳都科技集团股份有限公司 Real-time calculation method and device for passenger carrying number of carriage
CN112949939B (en) * 2021-03-30 2022-12-06 福州市电子信息集团有限公司 Taxi passenger carrying hotspot prediction method based on random forest model
CN113658433B (en) * 2021-08-18 2022-08-30 苏州工业园区测绘地理信息有限公司 Method for extracting passenger flow characteristics based on bus card swiping and code scanning data
CN113780851B (en) * 2021-09-16 2024-04-26 上海世脉信息科技有限公司 Bus parking lot address selection method based on expected driving cost
CN113781822A (en) * 2021-09-24 2021-12-10 湖北惠诚共创科技有限公司 Bus dispatching system based on big data
CN114241770B (en) * 2021-12-21 2022-11-18 杭州图软科技有限公司 Bus scheduling method and system based on accurate real-time information
CN115186049B (en) * 2022-09-06 2023-02-03 深圳市城市交通规划设计研究中心股份有限公司 Intelligent bus alternative station site selection method, electronic equipment and storage medium
CN117542181B (en) * 2024-01-10 2024-04-30 四川三思德科技有限公司 Real-time abnormality early warning method and system for multi-mode deep learning

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010061321A (en) * 2008-09-03 2010-03-18 Railway Technical Res Inst Passenger flow prediction system
CN103198565A (en) * 2013-04-12 2013-07-10 王铎源 Charge and passenger flow information acquisition method for bus IC (integrated circuit) cards
CN105550789A (en) * 2016-02-19 2016-05-04 上海果路交通科技有限公司 Method for predicting bus taking passenger flow
CN107563540A (en) * 2017-07-25 2018-01-09 中南大学 A kind of public transport in short-term based on random forest is got on the bus the Forecasting Methodology of the volume of the flow of passengers
CN108242149A (en) * 2018-03-16 2018-07-03 成都智达万应科技有限公司 A kind of big data analysis method based on traffic data

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100299177A1 (en) * 2009-05-22 2010-11-25 Disney Enterprises, Inc. Dynamic bus dispatching and labor assignment system
CN101615340A (en) * 2009-07-24 2009-12-30 北京工业大学 Real-time information processing method in the bus dynamic dispatching
US20140289003A1 (en) * 2013-03-25 2014-09-25 Amadeus S.A.S. Methods and systems for detecting anomaly in passenger flow
CN103778778B (en) * 2013-12-19 2015-10-28 银江股份有限公司 A kind of fast bus station service information system and quick public transport arrive at a station the measuring method of information
CN103730008A (en) * 2014-01-15 2014-04-16 汪涛 Bus congestion degree analysis method based on real-time data of bus GPS (Global Position System) and IC (Integrated Circuit) cards
CN104463364B (en) * 2014-12-04 2018-03-20 中国科学院深圳先进技术研究院 A kind of Metro Passenger real-time distribution and subway real-time density Forecasting Methodology and system
CN105427594B (en) * 2015-11-23 2018-10-30 青岛海信网络科技股份有限公司 A kind of public transport section volume of the flow of passengers acquisition methods and system based on two-way passenger flow of getting on the bus
CN105528457B (en) * 2015-12-28 2019-02-19 招商局重庆交通科研设计院有限公司 A kind of traffic information extraction and querying method based on big data technology
CN105868861A (en) * 2016-04-08 2016-08-17 青岛海信网络科技股份有限公司 Bus passenger flow evolution analysis method based on time-space data fusion
CN106297288B (en) * 2016-08-23 2019-06-11 同济大学 A kind of acquisition of bus passenger passenger flow data and analysis method
CN106448155B (en) * 2016-08-29 2019-12-13 北京工业大学 System for transmitting congestion degree in bus in real time based on bus IC card

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010061321A (en) * 2008-09-03 2010-03-18 Railway Technical Res Inst Passenger flow prediction system
CN103198565A (en) * 2013-04-12 2013-07-10 王铎源 Charge and passenger flow information acquisition method for bus IC (integrated circuit) cards
CN105550789A (en) * 2016-02-19 2016-05-04 上海果路交通科技有限公司 Method for predicting bus taking passenger flow
CN107563540A (en) * 2017-07-25 2018-01-09 中南大学 A kind of public transport in short-term based on random forest is got on the bus the Forecasting Methodology of the volume of the flow of passengers
CN108242149A (en) * 2018-03-16 2018-07-03 成都智达万应科技有限公司 A kind of big data analysis method based on traffic data

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《基于Android平台的车载GPS公交车次与载客量查询***的设计与实现》;葛柯娜,马涛,王晶,陈聪,孙艺萍;《科技资讯》;20140623;第12卷(第18期);全文 *
《基于IC卡数据的短时公交客流预测》;朱翔希;《中国优秀硕士学位论文全文数据库 (工程科技Ⅱ辑)》;20180615;全文 *

Also Published As

Publication number Publication date
CN109035770A (en) 2018-12-18

Similar Documents

Publication Publication Date Title
CN109035770B (en) Real-time analysis and prediction method for bus passenger capacity in big data environment
Gurumurthy et al. Analyzing the dynamic ride-sharing potential for shared autonomous vehicle fleets using cellphone data from Orlando, Florida
Sánchez-Martínez Inference of public transportation trip destinations by using fare transaction and vehicle location data: Dynamic programming approach
CN108922178B (en) Public transport vehicle real-time full load rate calculation method based on public transport multi-source data
JP7428124B2 (en) Information processing device, information processing method, and program
Lee et al. Assessing transit competitiveness in Seoul considering actual transit travel times based on smart card data
Hamadneh et al. Impacts of shared autonomous vehicles on the travelers’ mobility
CN111915200B (en) Urban public transport supply and demand state division method based on fine spatial scale of bus sharing rate
CN111932925A (en) Method, device and system for determining travel passenger flow of public transport station
CN114358808A (en) Public transport OD estimation and distribution method based on multi-source data fusion
JP6307376B2 (en) Traffic analysis system, traffic analysis program, and traffic analysis method
Fourie et al. Using smartcard data for agent-based transport simulation
Kato et al. Latest urban rail demand forecast model system in the Tokyo Metropolitan Area
CN114971136A (en) Bus and tour bus scheduling method
CN111696376A (en) Method for determining arrival sequence of buses
CN111508220B (en) Method for accurately performing tail end connection based on public transport population distribution
CN111723871B (en) Estimation method for real-time carriage full load rate of bus
CN113468243A (en) Subway passenger flow analysis and prediction method and system
CN112990518B (en) Real-time prediction method and device for destination station of individual subway passenger
CN113160542A (en) Riding method and device based on information feedback
CN116090785B (en) Custom bus planning method for two stages of large-scale movable loose scene
WO2003098556A1 (en) A system for evaluating a vehicles usage within zones
CN111339159B (en) Analysis mining method for one-ticket public transport data
Mosallanejad et al. Origin-destination estimation of bus users by smart card data
YOO et al. Origin-destination estimation using cellular phone BS information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant