CN114444795A

CN114444795A - Single-line bus passenger travel data generation method

Info

Publication number: CN114444795A
Application number: CN202210081034.2A
Authority: CN
Inventors: 李军; 区静怡; 骆刚
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2022-01-24
Filing date: 2022-01-24
Publication date: 2022-05-06

Abstract

The invention discloses a single-line bus passenger travel data generation method, which comprises the steps of randomly obtaining G candidate lines, obtaining space vectors of stations of all the candidate lines according to station space information, obtaining space similar total values of all the candidate lines and a target line according to the space vectors, selecting candidate lines corresponding to the former Q space similar total values as preferred candidate lines, obtaining the similarity between each preferred candidate line and the target line through calculation of a data similarity index and the similarity, selecting the preferred candidate line with the highest similarity as an optimal candidate line, and finally taking the optimal candidate line as a learning sample to generate passenger travel data of the target sample through a cyclic generation type confrontation network algorithm. The method can conveniently, quickly and inexpensively generate a large amount of bus passenger travel data of the target route based on real data, the generated data accords with real rules, and the method can be applied to the fields of bus analysis and the like.

Description

Single-line bus passenger travel data generation method

Technical Field

The invention relates to the field of transportation engineering, belongs to the category of urban public transport, and particularly relates to a single-line bus passenger trip data generation method.

Background

The bus passenger travel data refers to a series of data records generated when passengers travel by using buses, and the data records comprise passenger boarding station information, passenger alighting station information, corresponding boarding and alighting time, line numbers and the like. Passenger flow analysis and passenger classification of the urban public transport system can be carried out through public transport passenger travel data, and public transport travel information is widely applied to urban public transport operation scheduling.

A series of innovations are made on bus travel payment modes under the high-speed development of the mobile internet, the payment modes adopted by urban buses in China are a segmented charging system and a ticket system, the charging system is used for carrying out charge calculation by recording passenger getting-on information and getting-off information, and the ticket system is used for only recording the getting-on time-space information of passengers without recording getting-off time-space information. For urban buses adopting a bus-ticket system mode, information data of getting-off passengers are lacked, and the data of bus passengers are incomplete. At present, the commonly used means is to complete data collection through a car following method and a get-off station deduction method, but the get-off deduction method depends on data hypothesis and is low in accuracy, the data flow of the car following method is complex and tedious, manual tabulation recording is relied on, the efficiency is low, and the cost is high.

Disclosure of Invention

The invention provides a method for generating single-line bus passenger travel data, which can conveniently, quickly and low-cost generate single-line target bus passenger travel data under a bus ticket system based on the existing complete travel data set of similar single-line buses, and provides necessary data conditions for application scenes such as bus analysis and the like.

In order to achieve the purpose, the technical scheme of the invention is as follows:

a single-line bus passenger travel data generation method comprises the following steps:

s1, randomly selecting G bus routes as candidate routes, and acquiring historical bus taking data and station space information of all the candidate routes;

s2, determining space vectors of all stations in the G candidate lines according to the station space information;

s3, calculating the spatial similarity indexes of all the sites in the target line and all the sites in any candidate line according to the spatial vectors of the sites to obtain the total spatial similarity value of the candidate line and the target line;

s4, obtaining a total spatial similarity value with the target line by each candidate line according to the mode of the step S3, sequencing the total spatial similarity values of all the candidate lines and the target line from large to small, and selecting the candidate lines corresponding to the first Q total spatial similarity values as preferred candidate lines;

s5, calculating data similarity indexes of all stations in the target line and all stations in any preferred candidate line to obtain a data similarity total value of the preferred candidate line and the target line;

s6, each preferred candidate line obtains a data similarity total value with the target line in the mode of S5, the similarity between each preferred candidate line and the target line is calculated according to the obtained data similarity total value, and the preferred candidate line corresponding to the maximum value of the similarity is selected as the optimal candidate line;

s7, taking the optimal candidate line and the target line as a learning sample and a target sample;

and S8, generating passenger travel data of the target sample by learning the route travel data of the sample through a cycle generation type confrontation network algorithm.

The method of the invention comprises the steps of firstly randomly obtaining G bus lines as candidate lines, then obtaining space vectors of all stops of all the candidate lines according to the stop space information, then obtaining the total spatial similarity value of all the candidate lines and a target line selected in advance according to the space vectors, and selecting candidate lines corresponding to the first Q (Q < G, both positive integers) spatially similar total values as preferred candidate lines, and then, calculating the data similarity index and the similarity from the Q preferred candidate lines to obtain the similarity between each preferred candidate line and the target line, then selecting the preferred candidate line with the highest similarity as the optimal candidate line, finally taking the optimal candidate line as a learning sample and the target line as a target sample, and generating the passenger travel data of the target sample by using the travel data of the learning sample through a cyclic generation type confrontation network algorithm.

Further, in step S2, the site spatial information includes longitude and latitude information and interest point classification information, and the process of determining the site spatial vector of the candidate route according to the site spatial information is as follows:

determining the longitude and latitude of each site and the data quantity of interest points under the tolerance radius, recording the longitude and latitude information of each site as a 2-dimensional vector l, recording the tolerable radius of the site as r, counting the number of k interest points of the site under the tolerable radius r, and recording the number as a k-dimensional vector h, wherein the space vector of each site is represented by a (2+ k) -dimensional vector (l, h).

Further, in step S3, the calculation formula of the spatial similarity index is as follows:

in the formula, s_iFor the ith station in the target line, b_jIs the jth station in the candidate line, P(s)_i，b_j) For site s_iAnd site b_jSpatial similarity index of (1)_i、h_iFor site s_iLatitude and longitude and point of interest information, |_j、h_jTo station b_jThe longitude and latitude and the information of the interest point, alpha and beta are respectively the influence coefficients of the longitude and latitude and the information of the interest point on the space similarity index, and the value range of the space similarity index P is [ -1, 1]And the larger the value is, the higher the spatial similarity of the two sites is.

Further, in step S3, the specific process of obtaining the total spatial similarity value between the candidate line and the target line is as follows:

with n stations, i.e. s, for the target line_i∈{s₁，s₂，...，s_n1, 2, …, n, and m stations, i.e. b, of the candidate line_j∈{b₁，b₂，...，b_m}，j＝1，2，…，m；

Firstly, the station s in the target line₁Calculating the spatial similarity index with m sites in the candidate line to obtain a site s₁Selecting m spatial similarity indexes corresponding to m sites in the candidate line, and selecting the maximum value as a site s₁The maximum value P of the spatial similarity index corresponding to the candidate line₁By analogy, the target is obtainedThe set of the maximum value of the spatial similarity index corresponding to the candidate line for the n stations in the line is marked as { P₁，P₂，...，P_n}, finally P is added₁，P₂，...，P_nAnd accumulating to obtain the spatial similarity total value of the candidate line and the target line.

Further, in step S5, the data similarity index is calculated as follows:

in the formula, E(s)_i，b_j) For site s_iAnd site b_jData similarity index of (1), R_s、R_bTotal number of passengers, T, for target route and preferred candidate route respectively_u、T_dSites s, each being a target line_iThe number of persons getting on or off the vehicle, C_u、C_dSite b of preferred candidate line respectively_jThe number of people getting on or off the vehicle is within the value range of 0,1]And the larger the value is, the lower the data similarity of the two sites is.

Further, in step S5, the specific process of obtaining the data similarity total value of the preferred candidate route and the target route is as follows:

firstly, the station s in the target line₁Calculating data similarity indexes with m sites in the preferred candidate line to obtain a site s₁M data similarity indexes corresponding to m sites in the preferred candidate line are selected, and the maximum value is selected as a site s₁Maximum value E of data similarity index corresponding to the preferred candidate route₁And by analogy, obtaining a set of maximum values of data similarity indexes corresponding to the preferred candidate route and the n sites in the target route, and marking as { E₁，E₂，...，E_nGet E out of the solution₁，E₂，...，E_nAnd accumulating to obtain the data similarity total value of the preferred candidate line and the target line.

Further, in step S6, the calculation formula of the similarity is as follows:

where V (S, b) is the similarity between the preferred candidate line and the target line, E_i∈{E₁,E₂,…,E_n}，E_iFor stations s in the target line_iMaximum value of data similarity index corresponding to preferred candidate line, i.e.

In order to optimize the data similarity total value of the candidate line and the target line, the value range of the similarity V is [0,1 ]]Meanwhile, the larger the value is, the more similar the preferred candidate line and the target line are.

Further, in step S7, a ticket riding data set of the target route is used as the data source of the target sample, and a complete passenger riding data set of the optimal candidate route is used as the data source of the learning sample.

Further, in step S8, the cycle generation type antagonistic network model is used as a generator, the travel data matrix of the learning sample is input to the generator, the simulation data is iteratively generated in the generator, the generated simulation data is put into a decision device for decision, and if the similarity of the simulation data is not lower than a set threshold, the generated simulation data is considered to be valid and output as the passenger travel data of the target sample.

Further, in the decision device, the similarity between the simulation data and the real travel data of the learning sample is calculated firstly, then the similarity is compared with a set threshold, and if the similarity of the simulation data is not lower than the set threshold, the generated simulation data is considered to be effective and output;

the set threshold value is the similarity between the optimal candidate route obtained in step S6 and the target route.

The invention has the beneficial effects that:

the method for generating the bus passenger travel data can generate the passenger travel data of the target line according to the spatial information of the bus stop and the real data of the optimal candidate line, and compared with the traditional following method and the traditional departure stop deducing method, the method for generating the bus passenger travel data of the target line under the bus one-ticket system can conveniently, quickly and low-cost generate the bus passenger travel data of the target line based on the historical real data of the optimal candidate line, only the spatial information of the stop is needed, the method for generating the data is high in applicability, the obtained data is high in accuracy, and the method can be applied to the fields of bus analysis and the like.

Drawings

Fig. 1 is a flow chart of a single-line bus passenger travel data generation method of the present invention.

Detailed Description

The drawings are for illustrative purposes only and are not to be construed as limiting the patent; for the purpose of better illustrating the embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted. The positional relationships depicted in the drawings are for illustrative purposes only and are not to be construed as limiting the present patent.

Example 1:

as shown in fig. 1, a method for generating travel data of a single-line bus passenger includes the following steps:

The method of the invention comprises the steps of firstly randomly obtaining G bus lines as candidate lines, then obtaining space vectors of all stops of all the candidate lines according to the stop space information, then obtaining the total spatial similarity value of all the candidate lines and a target line selected in advance according to the space vectors, and selecting candidate lines corresponding to the first Q (Q < G, both positive integers) space similar total values as preferred candidate lines, and then, calculating the data similarity index and the similarity from the Q preferred candidate lines to obtain the similarity between each preferred candidate line and the target line, then selecting the preferred candidate line with the highest similarity as the optimal candidate line, finally taking the optimal candidate line as a learning sample and the target line as a target sample, and generating the passenger travel data of the target sample by using the travel data of the learning sample through a cyclic generation type confrontation network algorithm.

In the present embodiment, the values of G and Q can be set according to actual conditions. For example, 100 public transportation lines may be randomly selected as candidate lines, that is, G is 100, and after the spatial similarity total values of all the candidate lines and the target line selected in advance are obtained, the candidate lines corresponding to the top 10 spatial similarity total values are selected as preferred candidate lines, that is, Q is 10.

In step S2 of this embodiment, the site spatial information includes longitude and latitude information and interest point classification information, and the process of determining the site spatial vector of the candidate route according to the site spatial information is as follows:

In step S3 of the present embodiment, the calculation formula of the spatial similarity index is as follows:

in the formula, s_iFor the ith station in the target line, b_jIs the jth site in the candidate line, P(s)_i，b_j) For site s_iAnd site b_jSpatial similarity index of (1)_i、h_iFor site s_iLatitude and longitude and point of interest information, |_j、h_jTo station b_jThe longitude and latitude and the information of the interest point, alpha and beta are respectively the influence coefficients of the longitude and latitude and the information of the interest point on the space similarity index, and the value range of the space similarity index P is [ -1, 1]And the larger the value is, the higher the spatial similarity of the two sites is.

According to the spatial similarity index, the specific process of obtaining the spatial similarity total value of the candidate line and the target line is as follows:

Firstly, the station s in the target line₁Calculating the spatial similarity index with m sites in the candidate line to obtain a site s₁Selecting m spatial similarity indexes corresponding to m stations in the candidate line, and selecting the maximum value as the stationPoint s₁The maximum value P of the spatial similarity index corresponding to the candidate line₁And by analogy, obtaining a set of maximum values of the spatial similarity indexes corresponding to the candidate line and marking as { P } of the n sites in the target line₁，P₂，...，P_n}, finally P is added₁，P₂，...，P_nAnd accumulating to obtain the spatial similarity total value of the candidate line and the target line.

In step S5 of the present embodiment, the calculation formula of the data similarity index is as follows:

According to the data similarity index, the specific process of obtaining the data similarity total value of the preferred candidate line and the target line is as follows:

firstly, the station s in the target line₁Calculating data similarity indexes with m sites in the preferred candidate line to obtain a site s₁Selecting the maximum value from the m data similarity indexes corresponding to the m sites in the preferred candidate line as the site s₁Maximum value E of data similarity index corresponding to the preferred candidate route₁And by analogy, obtaining a set of maximum values of data similarity indexes corresponding to the preferred candidate route and the n sites in the target route, and marking as { E₁，E₂，...，E_nGet E out of the solution₁，E₂，...，E_nAccumulating to obtain the preferred candidate line and the target lineThe data for the way is similar to the total value.

In step S6 of the present embodiment, the calculation formula of the similarity is as follows:

where V (S, b) is the similarity between the preferred candidate line and the target line, E_i∈{E₁，E₂，...，E_n}，E_iFor stations s in the target line_iMaximum value of data similarity index corresponding to preferred candidate line, i.e.

In step S7 of the present embodiment, one ticket riding data set of the target route is used as the data source of the target sample, and the complete passenger riding data set of the optimal candidate route is used as the data source of the learning sample.

In step S8 of the present embodiment, a cycle generation type antagonistic network model is used as a generator, a travel data matrix of a learning sample is input to the generator, simulation data is iteratively generated in the generator, the generated simulation data is put into a decision device for decision, and if the similarity of the simulation data is not lower than a set threshold, the generated simulation data is considered valid and output as passenger travel data of a target sample.

In the decision device, the similarity between the simulation data and the real travel data of the learning sample is calculated, then the similarity is compared with a set threshold, and if the similarity of the simulation data is not lower than the set threshold, the generated simulation data is considered to be valid and output. The set threshold is the similarity between the optimal candidate route obtained in step S6 and the target route.

Example 2:

a system to which the method for generating single-line bus passenger travel data in embodiment 1 above is applied is provided, and the system includes:

the data acquisition module is used for randomly selecting G bus lines as candidate lines and acquiring historical bus taking data and station space information of all the candidate lines;

the space vector module is used for determining space vectors of all stations in the G candidate lines according to the station space information;

the spatial similarity total value calculation module is used for calculating spatial similarity indexes of all stations in the target line and all stations in any candidate line according to the spatial vectors of the stations to obtain a spatial similarity total value of the candidate line and the target line, and then each candidate line obtains a spatial similarity total value with the target line according to the method;

the preferred candidate line screening module is used for sorting the spatial similarity total values of all candidate lines and the target line from large to small, and selecting the candidate lines corresponding to the first Q spatial similarity total values as preferred candidate lines;

the data similarity total value calculation module is used for calculating data similarity indexes of all stations in the target line and all stations in any preferred candidate line to obtain data similarity total values of the preferred candidate line and the target line, and then each preferred candidate line obtains a data similarity total value with the target line according to the mode;

the optimal candidate line screening module is used for calculating the similarity between each optimal candidate line and the target line according to the obtained data similarity total value, and selecting the optimal candidate line corresponding to the maximum similarity value as the optimal candidate line;

the generation module is used for taking the optimal candidate route and the target route as a learning sample and a target sample, and generating passenger travel data of the target sample according to the route travel data of the learning sample through a cyclic generation type confrontation network algorithm;

the data acquisition module, the space vector module, the space similarity total value calculation module, the preferred candidate line screening module, the data similarity total value calculation module, the optimal candidate line screening module and the generation module are in mutual communication in a wireless or wired mode.

Example 3:

the present embodiment is similar to embodiment 2, and is different in that the data acquisition module, the spatial vector module, the spatial similarity total value calculation module, the preferred candidate line screening module, the data similarity total value calculation module, the optimal candidate line screening module, and the generation module of the system in embodiment 2 are integrated into one processor, and the analog data finally output by the processor is displayed on a display screen, so that an operator can visually observe the output result.

It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims

1. A single-line bus passenger travel data generation method is characterized by comprising the following steps:

2. The single-line bus passenger travel data generation method according to claim 1, wherein in step S2, the station spatial information includes longitude and latitude information and interest point classification information, and the process of determining the station spatial vector of the candidate line according to the station spatial information is as follows:

3. The single-line bus passenger travel data generation method according to claim 2, wherein in step S3, the calculation formula of the spatial similarity index is as follows:

4. The single-line bus passenger travel data generation method according to claim 3, wherein in step S3, the specific process of obtaining the total spatial similarity value between the candidate line and the target line is as follows:

Firstly, the station s in the target line₁Calculating the spatial similarity index with m sites in the candidate line to obtain a site s₁Selecting m spatial similarity indexes corresponding to m sites in the candidate line, and selecting the maximum value as a site s₁The maximum value P of the spatial similarity index corresponding to the candidate line₁And by analogy, obtaining n sites in the target line and the candidate line pairThe set of maximum values of the corresponding spatial similarity indices is denoted as { P }₁，P₂，...，P_n}, finally P is added₁，P₂，...，P_nAnd accumulating to obtain the spatial similarity total value of the candidate line and the target line.

5. The single-line bus passenger travel data generation method according to claim 4, wherein in step S5, the calculation formula of the data similarity index is as follows:

6. The single-line bus passenger travel data generation method according to claim 5, wherein in step S5, the specific process of obtaining the data similarity total value of the preferred candidate route and the target route is as follows:

firstly, the station s in the target line₁Calculating data similarity indexes with m sites in the preferred candidate line to obtain a site s₁Selecting the maximum value from the m data similarity indexes corresponding to the m sites in the preferred candidate line as the site s₁Maximum value E of data similarity index corresponding to the preferred candidate route₁And by analogy, obtaining a set of maximum values of data similarity indexes corresponding to the preferred candidate route and the n sites in the target route, and marking as { E₁，E₂，...，E_nGet E out of the solution₁，E₂，...，E_nAnd accumulating to obtain the data similarity total value of the preferred candidate line and the target line.

7. The single-line bus passenger travel data generation method according to claim 6, wherein in step S6, the calculation formula of the similarity is as follows:

8. The single-line bus passenger travel data generation method according to claim 1, wherein in step S7, a data set of a ticket for a target route is used as a data source of a target sample, and a data set of a complete passenger for an optimal candidate route is used as a data source of a learning sample.

9. The method for generating the passenger travel data on the single-line bus according to claim 1, wherein in step S8, a cyclic generation type confrontation network model is used as a generator, the travel data matrix of the learning sample is input into the generator, the simulation data is iteratively generated in the generator, the generated simulation data is put into a decision device for decision, and if the similarity of the simulation data is not lower than a set threshold, the generated simulation data is considered to be valid and output as the passenger travel data of the target sample.

10. The single-line bus passenger travel data generation method according to claim 9, wherein in the decision device, the similarity between the simulation data and the real travel data of the learning sample is calculated first, and then the similarity is compared with a set threshold, and if the similarity of the simulation data is not lower than the set threshold, the generated simulation data is considered to be valid and output;