CN111339159A - Analysis and mining method for one-ticket public transportation data - Google Patents

Analysis and mining method for one-ticket public transportation data Download PDF

Info

Publication number
CN111339159A
CN111339159A CN202010111713.0A CN202010111713A CN111339159A CN 111339159 A CN111339159 A CN 111339159A CN 202010111713 A CN202010111713 A CN 202010111713A CN 111339159 A CN111339159 A CN 111339159A
Authority
CN
China
Prior art keywords
data
station
card
bus
passenger
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010111713.0A
Other languages
Chinese (zh)
Other versions
CN111339159B (en
Inventor
赵海宾
郭忠
杨新征
王子甲
魏领红
尹怡晓
郝萌
吴洪洋
李振宇
尹志芳
廖凯
李超
张晚笛
朱经纬
崔占伟
彭虓
刘海旭
王吉生
林翊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Academy of Transportation Sciences
Original Assignee
China Academy of Transportation Sciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Academy of Transportation Sciences filed Critical China Academy of Transportation Sciences
Priority to CN202010111713.0A priority Critical patent/CN111339159B/en
Publication of CN111339159A publication Critical patent/CN111339159A/en
Application granted granted Critical
Publication of CN111339159B publication Critical patent/CN111339159B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Probability & Statistics with Applications (AREA)
  • Tourism & Hospitality (AREA)
  • Fuzzy Systems (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Quality & Reliability (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Educational Administration (AREA)
  • General Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the technical field of transportation, in particular to an analysis and mining method of one-ticket public transportation data, which comprises the steps of firstly cleaning public transportation IC card data, public transportation GPS data, vehicle-mounted machine data and one-way stop information; screening bus IC card data including transaction card numbers, transaction time, lines and license plate numbers, firstly extracting the information of the transaction card numbers, the transaction time, the lines and the license plate numbers, then deleting useless fields except the bus IC card data, and deleting records of losing part of the fields; useful fields in the extracted bus-mounted GPS data comprise fields of a bus-mounted machine number, arrival and departure information, positioning time, positioning longitude and latitude, a line number, a sub-line number, a sequence number, bus speed and bus travel mileage, and useless fields are deleted. The invention analyzes the station distribution with larger passenger transfer amount under the existing operation scheme, and provides reliable reference data for the adjustment of subsequent lines.

Description

Analysis and mining method for one-ticket public transportation data
Technical Field
The invention relates to the technical field of traffic, in particular to an analysis and mining method for one-ticket public traffic data.
Background
In recent years, intelligent public transportation systems are rapidly developed, public transportation card swiping big data is mined to become a new means for guiding operation planning, but a single-ticket public transportation lacks information of a bus stop of a passenger, and application of corresponding new data is hindered. With the continuous development of the technology, the traffic big data becomes a current research hotspot, compared with the traditional traffic investigation mode, the traffic big data has lower acquisition cost, but the contained information is richer, on one hand, the information makes the research on the traffic behavior of passengers from an individual level possible, and a new visual angle is provided for the traditional traffic research; on the other hand, the mining of the traffic big data can also carry out other researches such as urban structure detection, urban planning and the like. In a plurality of fields, existing researches show that the traffic big data has wide application prospect.
In recent years, the intelligent public transport system is rapidly developed nationwide, the passenger IC card automatic charging system and the vehicle-mounted GPS automatic positioning system are widely used, rich traffic big data resources are accumulated, and a new technical means is provided for acquiring real-time and comprehensive public transport passenger flow data. However, in most of domestic cities, only subways and part of rapid transit lines adopt an in-and-out card swiping mode, and conventional buses with wider coverage and larger passenger flow are generally made of one ticket, so that passengers only need to swipe cards when getting on the bus and do not need to swipe cards when getting off the bus, information such as getting-on and getting-off stations and transfer records is lost in card swiping records, and the data cannot be directly utilized.
Disclosure of Invention
In view of the above, the present invention provides a method for analyzing and mining data of a ticketed bus, so as to solve the problems in the background art.
The invention provides a method for analyzing and mining one-ticket public transportation data, which is used for cleaning public transportation IC card data, public transportation GPS data, vehicle-mounted machine data and one-way stop information.
Further, valid fields in the bus passenger IC card data include transaction card numbers, transaction dates, driving routes and license plate numbers, and main fields and explanations are shown in Table 1.
TABLE 1 public transport IC card data main field and its explanation
Field(s) Description of field
CARDNO Transaction card number
TRADEDATE Transaction time
ROUTECODE Line
VEHICLECODE License plate number
The bus card swiping method mainly comprises the following steps of cleaning data of bus cards, and mainly deleting logically obvious and unreasonable records:
s1, extracting the fields needed by the research and deleting useless fields;
s2: the record with the missing partial field is then deleted.
Furthermore, the vehicle-mounted GPS data of the bus has 59 fields, but part of the fields are not started at present and are null values, wherein useful fields comprise vehicle-mounted machine numbers, arrival and departure information, positioning time, positioning longitude and latitude, line numbers, sub-line numbers, sequence numbers, bus speed, bus driving mileage and other fields, and main fields and explanations are shown in a table 2.
TABLE 2 GPS data Primary fields and their interpretation
Field(s) Description of field Field(s) Description of field
PRODUCTID Vehicle-mounted machine number LATITUDE Latitude
ISARRLFT Arrival and departure information ROUTEID Line number
ACTDATETIME Positioning time SUBROUTEID Sub-line number
LONGITUDE Longitude (G) STATIONSEQNUM Station sequence number
Can produce GPS location record when the bus business turn over station, can produce arrival and departure data respectively in 5 meters around the station, to GPS data, main data cleaning process is:
s1, extracting fields required by research and deleting useless fields;
s2, deleting the record of longitude and latitude outside the monitoring area range based on ArcGIS;
and S3, deleting the data only arriving at the station or only leaving the station.
Further, the vehicle-mounted machine data and the vehicle-mounted machine information refer to the license plate number and the line name corresponding to the vehicle-mounted machine number, are used for matching the license plate number corresponding to the GPS data, and are used for performing correlation fusion on the GPS data and the IC card data, and data samples of the vehicle-mounted machine data and the vehicle-mounted machine information are shown in table 3.
TABLE 3 vehicle-mounted device information comparison table
Vehicle-mounted frame number License plate number Line name
20111271 AA1271 42-way
20111601 AA1601 306 route
Further, the one-way station information table, the one-way station relationship table is a station sequence number, a station name and a station type corresponding to the line number and the sub-line number, and since only the station sequence number exists in the GPS data and the positioning station name does not exist in the GPS data, the one-way station relationship table is used to match the positioning station name to the GPS data, and sample data of the table is shown in table 4. After the line number and the sub-line number are screened, the station sequence number and the station name are in one-to-one correspondence.
TABLE 4 one-way site relationship Table
Line number Sub-line number Site type numbering Station sequence number Site name
1 1 3 43 Museum (Single) (east)
1 1 3 44 Archives (east)
1 1 3 45 Folk princess party style building (east)
In the one-way site relation table, many sites are divided into four directions of east, west, south and north, and a plurality of adjacent longitudes and latitudes often exist in a GIS map of the same site, so that for convenience, the site information in GIS data is combined, and the longitudes and latitudes of the same site in different directions and different rows are averaged and fused to obtain the unique longitudes and latitudes of the site, as shown in fig. 1.
And fusing the GPS data with the information of the vehicle-mounted machine and the fused one-way station relation table to obtain the coming and leaving GPS data containing the license plate number, the station name and the station longitude and latitude, wherein the data sample is shown in a table 5.
TABLE 5 one-way site relation table after matching latitude and longitude
Line number Sub-line number Station sequence number Site name Longitude (G) Latitude
2 2001 3 People's hospital 106.2523192 38.5045019
2 2001 4 People's hospital 106.2523192 38.5045019
2 2001 5 Garden garden 106.251383 38.4981606
Further, the site inference method:
the passenger card swiping data of the 'one-ticket-system' bus lacks information of a boarding and alighting station and a transfer station of the passenger, and in order to complement the information, the following algorithm is provided:
passenger boarding station inference:
the bus GPS data and the passenger IC card data are fused, the bus getting-on station of the passenger is determined by comparing the passenger card swiping time with the station GPS data updating time, and the deduction algorithm is as follows:
inputting original passenger card swiping data UserData; bus GPS data VehiclesGPS; a list of license plate numbers venules;
and (3) outputting: matching a card swiping data set Result;
wherein, the Selectdata function represents extracting data satisfying the condition from the data; the ComputeInterval (a, B) function represents the time interval between calculations A, B. Due to errors of the GPS positioning time and the card swiping time, data with the difference between the GPS positioning time and the card swiping time being more than 180 seconds are removed in the algorithm, so that the accuracy of a matching result is guaranteed. The method comprises the following specific steps:
inputting: the original passenger card swiping data UserData; bus GPS data VehiclesGPS; a list of license plate numbers venules;
and (3) outputting: matching a card swiping data set Result;
defining i to represent each license plate number, wherein a is a record with the license plate number i in passenger card swiping data, b is a record with the license plate number i in bus GPS data, and j is each card swiping record in a;
firstly, setting an initial boarding station as empty; the initialization time interval is infinite, the name of the recording station is k, the name of the boarding station is added to the recording j, Selectdata (data, condition) is each record in B, ComputeInterval (A, B) is the time interval between the recording j and the recording k, the data with the GPS positioning time and the card swiping time difference larger than 180 seconds are removed, and the recording j with the boarding station is recorded into Result; the next cycle is then executed.
Further, there are two methods for passenger disembarkation site inference:
the number of times that different passengers utilize public transport to go out every day is different, and some passengers go out many times a day, and a large amount of passengers only carry out public transport trip once in one day, to these two kinds of different situations, utilize following two kinds of methods to accomplish the presumption process of passenger's station of getting off.
Further, the first method of passenger get-off site inference is based on the get-off site inference of the passenger travel chain
Aiming at passengers who utilize the bus to go out for a plurality of times in one day, the data of the first day comprises a plurality of card swiping records, a closed bus trip chain or a non-closed bus trip chain can be formed, the passenger trip chain is used for estimating the getting-off station of the passenger, and the process is as follows:
s1, extracting all card swiping records of the passenger in the same card swiping record in one day, and sorting according to the card swiping time;
s2, aiming at a passenger, firstly, acquiring all stations of the route taken by the passenger for the trip according to the boarding station in the previous record of the passenger;
s3, calculating the station with the closest space distance to all stations of the last riding route in the next riding record of the passenger, wherein the station is the getting-off station of the passenger during the previous riding;
s4, when the card swiping information in the S2 is the last card swiping record of the card number, the first card swiping record of the passenger is used as the next passenger record, so that the getting-off point of the passenger when the passenger last takes the bus is calculated, and the getting-off station calculation of the card number is finished;
s5, continuously running the steps S1-S4 for all card numbers until all card numbers finish the inference process.
Further, a second approach to passenger disembarkation site inference is probabilistic-based disembarkation site inference:
for passengers who do not have continuous bus travel in one day, the passenger getting-off station estimation model based on the station getting-off probability is applied, and existing research shows that the attraction strength and the occurrence strength of the bus station are basically balanced, so that the attraction strength of the station can be equivalently replaced by the occurrence strength of the station. According to the estimation result of the passenger boarding station, the number of boarding persons at each station of any line can be counted, and the attraction strength of the station is calculated as shown in the formula 2:
Figure BDA0002390253010000041
in the formula, siIndicating the number of people getting on the ith station.
Probability p of passenger alightingijStation number and station attraction strength p of bus average bus tripiThe number of bus stops for the travel of residents is mainly concentrated in a certain range, and statistical experience shows that the number of bus stops approximately accords with poisson distribution in a fixed driving direction, as shown in formula 3:
Figure BDA0002390253010000042
in the formula ZijRepresenting the probability of passengers getting on the ith station and getting off the ith station; λ represents the average number of stations on the bus trip, and when the number of stations after the station i is less than λ, λ is (n- λ), and n is the total number of single-line bus stations, so that the probability that a passenger gets on or off the station j from the station i can be constructed as shown in formula 4:
Figure BDA0002390253010000051
therefore, the total number of passengers getting on the station i and getting off the station j is shown as the formula 5:
Mij=si×pij(formula 5)
Further, transfer site inference
Bus transfer identification can be considered from a time perspective and a space perspective. As shown in fig. 2, a bus passenger punches a card to get on at the time of P1 stop point T1, the bus arrives at the P2 stop point from the time of T1 to the time of T2, the walking distance L is the time of T2, the time of T2 reaches the transfer stop point P3, the bus is punched and loaded when waiting for the time of T3 to the time of T3, the bus stop point is taken on the transfer route, and the final running time T4 to T4 reaches the destination stop point P4, so that the trip is completed.
T is consumed in the transfer processsAs represented by formula 1:
Ts=t3-t2=Twalk+Twait=t3-t1-Tv(formula 1)
Wherein T iswalkThe walking time from the get-off station to the transfer station; t iswaitAs a waiting time at the transfer station; t isvThe last time the car was in.
Further, the bus taking time T is analyzedvAnd transfer walking time TwalkWaiting time T of transfer stationwaitThe maximum time interval for transfer can be obtained, and the maximum possible transfer threshold value is 60min by combining the existing literature and traffic investigation.
The transfer identification process steps are as follows:
s1, extracting a bus IC card record, recording the card swiping time t1, obtaining the adjacent card swiping record of the same card number, and recording the card swiping time t 2;
s2, calculating a card swiping time interval Ti-t 2-t1, if Ti is less than or equal to Tmax and the distance L between transfer stations is less than 500m, considering the passenger 'S next trip as a transfer behavior, and otherwise, considering the passenger' S next trip as a trip;
s3, judging all data of the same card number and recording the recognition result;
and S4, repeating the steps S1-S3 until the last card is reached, and finishing the passenger transfer behavior identification.
The method for analyzing and mining the one-ticket public transportation data has the beneficial effects that: a complete one-ticket public transportation data mining process is provided, wherein the process comprises the steps of original data cleaning, passenger getting-on and getting-off station conjecture, passenger transfer station conjecture and the like, a large amount of people flow data is collected, reliable data support is conveniently provided for public transportation operation and route planning based on the data, and the algorithm identifies the distribution of main passenger flow in a distributed manner; daily passenger flow and spatial distribution conditions of an on-line are obtained, so that the travel demands of residents are visually known macroscopically; the station distribution with large passenger transfer amount under the existing operation scheme is analyzed, and reliable reference data is provided for adjustment of subsequent lines.
Drawings
FIG. 1 is a schematic diagram of a bus stop fusion of the present invention before and after;
FIG. 2 is a schematic diagram of the transfer of the invention;
FIG. 3 is a mass transit stop pick-up chart of the present invention;
FIG. 4 is a bus stop pick-up flow chart of the present invention;
FIG. 5 is the first 15 sites of total daily passenger flow for the present invention;
FIG. 6 is a line of ten thousand daily passenger flows of the present invention;
FIG. 7 is the line traffic of the present invention;
FIG. 8 is the transfer traffic front 15 station of the present invention;
FIG. 9 is a spatial distribution of station transfer passenger traffic for the present invention;
fig. 10 is a flow chart of the overall algorithm operation of the present invention.
Detailed Description
The present invention will be described in detail with reference to the drawings and specific embodiments, and it is to be understood that the described embodiments are only a few embodiments of the present invention, rather than the entire embodiments, and that all other embodiments obtained by those skilled in the art based on the embodiments in the present application without inventive work fall within the scope of the present application.
In this embodiment, fig. 3 and 4 illustrate the mass transit station boarding and disembarking passenger flow rates, and the number of stations corresponding to the passenger flow rates is marked in the legend. From the perspective of spatial distribution, stations with large upper and lower passenger flow volumes are concentrated in the east of the city, which indicates that the east is a core area of the city; according to the distribution situation of the passenger flow of the stations, 282 stations with the passenger volume of more than 2000 on the stations and 174 stations with the passenger volume of more than 2000 passengers off the stations are provided, and meanwhile, the number of the stations with the passenger volume of less than 300 passengers on the stations and the number of the stations with the passenger volume of less than 300 passengers off the stations are both more than 1800, which reflects the unbalanced phenomenon of the public transportation development caused by the single polarization development of the city.
Fig. 5 is a station with passenger flow rates 15 before ranking, and it can be known from the figure that the daily passenger flow rates of the 15 stations exceed 3600 people, which is an important passenger flow distribution place, wherein the station with the largest daily passenger flow rate is a north bus yard, the daily passenger flow rate of the station reaches 6511 people, the passenger flow rates near the stations are all large, and relevant optimization is performed on the stations, which is beneficial to improving the service level of buses.
In this embodiment, the line operating conditions are as follows: based on the passenger identification results of getting on and off the station, the all-day passenger flow of each line in the actual operation of the bus can be obtained, fig. 6 shows that the all-day passenger flow exceeds 1 ten thousand times, and 15 lines are provided, wherein the 81-path daily average passenger flow reaches 41650 times, which is 1.6 times of 191-path passenger flow and 4 times of 316-path passenger flow, which on one hand shows that the line has an important role in the bus network, and on the other hand, the related lines are adjusted by analyzing the passenger flow OD of the line in detail, and partial functions of the line are shared, so that the service level of the bus network can be effectively improved.
The spatial distribution of the passenger flow through each line is shown in fig. 7, wherein the red lines indicate that the daily passenger flow exceeds 15000 people, and these lines are the backbone lines of the public transportation network, and as shown in the figure, these lines are mainly used for communicating the core area and the peripheral area of the city, mainly the west of the city. The passenger flow of the lines connecting the east and west of the city has larger difference, the passenger flow distribution is concentrated in one line, which is adaptive to the huge difference of the passenger flow of different lines reflected in the figure 6, the passenger flow distribution is more uniform by adjusting the bus lines, and the service level of the buses under the emergency can be effectively improved.
In this embodiment, the transfer stations recognize the transfer passenger flow of each bus station in yinchuan city within one day by using the above algorithm, and fig. 8 shows 15 stations with the largest transfer passenger flow, wherein the transfer passenger flow of the bus station in north gate is the largest, and the daily transfer passenger flow reaches 1037 people. The spatial distribution situation is shown in fig. 9, the stations with large transfer amount are intensively distributed in the east core area of the city, further investigation and research are carried out aiming at the stations, and then the adjustment of relevant lines is carried out, so that the number of transfer passengers can be effectively reduced, and the satisfaction degree of the passengers is improved.
Although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the spirit and scope of the invention as defined in the appended claims. The techniques, shapes, and configurations not described in detail in the present invention are all known techniques.

Claims (3)

1. A method for analyzing and mining one-ticket public transportation data is specifically executed according to the following steps:
s1: cleaning bus IC card data, bus GPS data, vehicle-mounted machine data and one-way stop information;
s1.1: the bus IC card data comprises transaction card numbers, transaction time, lines and license plate numbers, firstly, the information of the transaction card numbers, the transaction time, the lines and the license plate numbers is extracted, then useless fields except the bus IC card data are deleted, and records of losing parts of the fields are deleted;
s1.2: the bus-mounted GPS data of the bus has a plurality of fields, but part of the fields are not started at present and are null values; extracting fields of useful fields including vehicle-mounted machine numbers, arrival and departure information, positioning time, positioning longitude and latitude, line numbers, sub-line numbers, sequence numbers, bus speeds and bus driving mileage, and deleting useless fields; then deleting the record of the longitude and latitude outside the information acquisition area range based on ArcGIS; deleting data only arriving or leaving;
s1.3: the vehicle-mounted machine information refers to the license plate number and the line name corresponding to the vehicle-mounted machine number, is used for matching the license plate number corresponding to the GPS data, is used for performing correlation fusion on the GPS data and the IC card data, and extracts the data of the vehicle-mounted frame number, the license plate number and the line name;
s1.4: the one-way station relation comprises a line number, a station sequence number corresponding to a sub-line number, a station name and station type data, station information in ArcGIS data is combined, the longitude and latitude of the same station in different directions and different rows are averaged and fused to obtain the unique longitude and latitude value of the station, GPS data and vehicle-mounted machine information are fused with the fused one-way station relation, and the arriving and leaving GPS data comprising the license plate number, the station name and the station longitude and latitude are obtained.
2. The method for analyzing and mining the data of the one-ticket public transport according to claim 1, wherein the method comprises the following steps: the method comprises the following steps that information of station points for getting on and off and transfer stations of passengers is lacked in passenger card swiping data of the bus, and the lacked data is calculated through the following method;
s2.1 passenger boarding station deduction;
calling original passenger card swiping data UserData, bus GPS data VehiclesGPS and license plate number list Vehicles; due to errors of the GPS positioning time and the card swiping time, data with the difference between the GPS positioning time and the card swiping time being more than 180 seconds are removed, so that the accuracy of a matching result is guaranteed;
s2.2, deducing the passenger getting-off station;
s2.2.1 extracting all card-swiping records of the passenger in the same card-swiping record in one day, and sorting according to the card-swiping time;
s2.2.2 aiming at a passenger, firstly, acquiring all stations of the route taken by the passenger for the trip according to the boarding station in the previous record of the passenger;
s2.2.3 calculating the station with the closest space distance to all stations on the last riding route in the next riding record of the passenger, and the station is the getting-off station of the passenger during the previous riding;
s2.2.4 when the card swiping information in S2.2.2 is the last card swiping record of the card number, the first card swiping record of the passenger is used as the next passenger record, so as to calculate the getting-off point when the passenger last took the bus, and the getting-off station calculation of the card number is finished;
s2.2.5, the steps S2.2.1-S2.2.4 are continued for all card numbers until all card numbers complete the inference process.
3. The method for analyzing and mining the data of the one-ticket public transport according to claim 1, wherein the method comprises the following steps: the transfer site deduces that the transfer process consumes the expression 1:
Ts=t3-t2=Twalk+Twait=t3-t1-Tv(formula 1)
Wherein T iswalkThe walking time from the get-off station to the transfer station; t iswaitAs a waiting time at the transfer station; t isvThe last time the vehicle was in time;
s3.1 analyzing bus taking time TvAnd transfer walking time TwalkWaiting time T of transfer stationwaitThe maximum transfer time interval can be obtained, and the maximum possible transfer threshold value is 60 min; the transfer identification process is as follows:
s3.2, extracting a bus IC card record, recording the card swiping time as t1, and obtaining the adjacent card swiping record of the same card number, wherein the card swiping time is t 2;
s3.3, calculating a card swiping time interval Ti which is t2-t1, if Ti is less than or equal to Tmax and the distance L between transfer stations is less than 500m, considering that the passenger goes out next time as a transfer behavior, and otherwise, considering that the passenger goes out once;
s3.4, judging all data of the same card number, and recording the identification result;
and S3.5, repeating the steps S3.1-S3.3 until the last card is reached, and finishing the passenger transfer behavior identification.
CN202010111713.0A 2020-02-24 2020-02-24 Analysis mining method for one-ticket public transport data Active CN111339159B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010111713.0A CN111339159B (en) 2020-02-24 2020-02-24 Analysis mining method for one-ticket public transport data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010111713.0A CN111339159B (en) 2020-02-24 2020-02-24 Analysis mining method for one-ticket public transport data

Publications (2)

Publication Number Publication Date
CN111339159A true CN111339159A (en) 2020-06-26
CN111339159B CN111339159B (en) 2023-08-18

Family

ID=71183600

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010111713.0A Active CN111339159B (en) 2020-02-24 2020-02-24 Analysis mining method for one-ticket public transport data

Country Status (1)

Country Link
CN (1) CN111339159B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111858806A (en) * 2020-07-09 2020-10-30 武汉译码当先科技有限公司 Passenger travel track detection method, device, equipment and storage medium

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102156732A (en) * 2011-04-11 2011-08-17 北京工业大学 Bus IC card data stop matching method based on characteristic stop
CN102324128A (en) * 2011-05-24 2012-01-18 北京交通大学 Method for predicting OD (Origin-Destination) passenger flow among bus stations on basis of IC (Integrated Circuit)-card record and device
US20140095423A1 (en) * 2012-09-29 2014-04-03 International Business Machines Corporation Infering travel path in public transportation system
WO2015096400A1 (en) * 2013-12-24 2015-07-02 中兴通讯股份有限公司 Bus planning method using mobile communication data mining
WO2016045195A1 (en) * 2014-09-22 2016-03-31 北京交通大学 Passenger flow estimation method for urban rail network
CN105788260A (en) * 2016-04-13 2016-07-20 西南交通大学 Public transportation passenger OD calculation method based on intelligent public transportation system data
CN106874432A (en) * 2017-01-24 2017-06-20 华南理工大学 A kind of public transport passenger trip space-time track extraction method
CN107545730A (en) * 2017-09-08 2018-01-05 哈尔滨工业大学 A kind of website based on Based on Bus IC Card Data is got on or off the bus passenger's number estimation method
CN108389420A (en) * 2018-03-13 2018-08-10 重庆邮电大学 A kind of bus passenger get-off stop real-time identification method based on history trip characteristics
CN109035770A (en) * 2018-07-31 2018-12-18 上海世脉信息科技有限公司 The real-time analyzing and predicting method of public transport passenger capacity under a kind of big data environment
CN109903553A (en) * 2019-02-19 2019-06-18 华侨大学 The bus that multi-source data excavates is got on or off the bus station recognition and the method for inspection
CN110084442A (en) * 2019-05-16 2019-08-02 重庆大学 A kind of method of joint public transport and the progress passenger flow OD calculating of rail traffic brushing card data
CN110264710A (en) * 2019-05-21 2019-09-20 天津大学 It is swiped the card the bus passenger flow estimating method with public transport GPS data based on IC card

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102156732A (en) * 2011-04-11 2011-08-17 北京工业大学 Bus IC card data stop matching method based on characteristic stop
CN102324128A (en) * 2011-05-24 2012-01-18 北京交通大学 Method for predicting OD (Origin-Destination) passenger flow among bus stations on basis of IC (Integrated Circuit)-card record and device
US20140095423A1 (en) * 2012-09-29 2014-04-03 International Business Machines Corporation Infering travel path in public transportation system
WO2015096400A1 (en) * 2013-12-24 2015-07-02 中兴通讯股份有限公司 Bus planning method using mobile communication data mining
WO2016045195A1 (en) * 2014-09-22 2016-03-31 北京交通大学 Passenger flow estimation method for urban rail network
CN105788260A (en) * 2016-04-13 2016-07-20 西南交通大学 Public transportation passenger OD calculation method based on intelligent public transportation system data
CN106874432A (en) * 2017-01-24 2017-06-20 华南理工大学 A kind of public transport passenger trip space-time track extraction method
CN107545730A (en) * 2017-09-08 2018-01-05 哈尔滨工业大学 A kind of website based on Based on Bus IC Card Data is got on or off the bus passenger's number estimation method
CN108389420A (en) * 2018-03-13 2018-08-10 重庆邮电大学 A kind of bus passenger get-off stop real-time identification method based on history trip characteristics
CN109035770A (en) * 2018-07-31 2018-12-18 上海世脉信息科技有限公司 The real-time analyzing and predicting method of public transport passenger capacity under a kind of big data environment
CN109903553A (en) * 2019-02-19 2019-06-18 华侨大学 The bus that multi-source data excavates is got on or off the bus station recognition and the method for inspection
CN110084442A (en) * 2019-05-16 2019-08-02 重庆大学 A kind of method of joint public transport and the progress passenger flow OD calculating of rail traffic brushing card data
CN110264710A (en) * 2019-05-21 2019-09-20 天津大学 It is swiped the card the bus passenger flow estimating method with public transport GPS data based on IC card

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
FENG CHEN;JINLEI ZHANG;ZIJIA WANG;SHUNWEI SHI;HAIXU LIU: "Passenger travel characteristics and bus operational states: a study based on IC card and GPS data in Yinchuan, China", 《TRANSPORTATION PLANNING AND TECHNOLOGY 》, vol. 42, no. 8, pages 825 - 847 *
杨鑫: "基于IC卡数据的公交客流智能推断方法研究", 《中国优秀硕士学位论文全文数据库 工程科技Ⅱ辑》, pages 034 - 220 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111858806A (en) * 2020-07-09 2020-10-30 武汉译码当先科技有限公司 Passenger travel track detection method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN111339159B (en) 2023-08-18

Similar Documents

Publication Publication Date Title
CN109035770B (en) Real-time analysis and prediction method for bus passenger capacity in big data environment
CN109308546B (en) Method and system for predicting bus trip get-off station of passenger
CN109903553B (en) Multi-source data mining bus station identification and inspection method
CN105185105B (en) Bus transfer identification method based on vehicle GPS and bus IC card data
CN111862606B (en) Illegal operating vehicle identification method based on multi-source data
CN105788260A (en) Public transportation passenger OD calculation method based on intelligent public transportation system data
CN102324128A (en) Method for predicting OD (Origin-Destination) passenger flow among bus stations on basis of IC (Integrated Circuit)-card record and device
CN108062857B (en) Prediction technique for cab-getter's trip purpose
CN110188923B (en) Multi-mode bus passenger flow calculation method based on big data technology
CN109102114B (en) Bus trip getting-off station estimation method based on data fusion
CN108364464B (en) Probability model-based public transport vehicle travel time modeling method
CN109034566A (en) A kind of intelligent dispatching method and device based on passenger flow above and below bus station
CN101615340A (en) Real-time information processing method in the bus dynamic dispatching
CN111932925A (en) Method, device and system for determining travel passenger flow of public transport station
CN109903555B (en) Bus passenger getting-off data prediction method and system based on big data
CN110853156B (en) Passenger OD identification method integrating bus GPS track and IC card data
CN105390013A (en) Method for predicting bus arrival time based on bus IC card
CN114363842B (en) Bus passenger departure station prediction method and device based on mobile phone signaling data
CN115168529B (en) Hub passenger flow tracing method based on mobile phone positioning data
CN114358808A (en) Public transport OD estimation and distribution method based on multi-source data fusion
CN111046937A (en) Two-segment passenger crowd trip purpose analysis method fusing public transportation data and POI data
CN104464280B (en) Vehicle advance expenditure prediction method and system
CN112036757A (en) Parking transfer parking lot site selection method based on mobile phone signaling and floating car data
CN114118766A (en) Passenger flow OD algorithm based on bus passenger travel multiple matching
CN111339159B (en) Analysis mining method for one-ticket public transport data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant