Data Analytics in Urban Transportation

Transportation agencies have traditionally been hampered in planning, managing and evaluating their services by having to rely heavily on costly and unreliable manual data collection systems. However, the development of Information and Telecommunication Technology are changing the amount, type, and quality of data available to planners and managers. We utilize multiple automatic data sources, such as smart cards, GPS vehicle locations, cell phone Call Detailed Records, and mobility tracking apps, to estimate and predict travel demand, explore behavioral regularities, quantify service reliabilities and evaluate travel demand management program.

Trip Detection Using Sparse CDR Data based on Supervised Statistical Learning

Despite a large body of literature related to trip detection using Call Detail Record (CDR) data, the fundamental understanding of the limitations of the data is lacking and, particularly, its sparse nature is not well addressed in existing work. This paper develops an explicit mapping between telecommunication patterns captured by CDRs and physical travel patterns that are of interest to the transportation community. To reduce the over-reliance of existing CDR-based trip detection methods on heuristics and arbitrary assumptions, we use data fusion to form labeled data for supervised statistical learning. In the absence of complementary data, this can be done by extracting labeled observation from more granular cellular data access records and extracting feature vectors from voice-call and SMS records. The proposed approach is demonstrated, using real-world CDR data from a large Chinese city of 6 million people, through inferring whether there exists a hidden visit between two consecutive visits observed from CDR data. Logistic regression, support vector machine and artificial neural network are used to develop statistical classification models, and all show significant improvement over the naïve no- hidden-visit rule, an implicit assumption adopted by most existing studies. The proposed data fusion approach offers a systematic and statistical way to make inference of individual mobility patterns from telecommunication records and can be generalized to other types of large-scale data.

Team: Zhan Zhao, Jinhua Zhao and Haris Koutsopoulos

Publication: Zhan Zhao, Jinhua Zhao and Haris Koutsopoulos (2016) Trip Detection using Sparse Call Detail Record Data based on Supervised Statistical Learning, working paper

Demand Management in Public Transit: Design and Evaluate Crowding Reduction Strategies in Hong Kong

Increases in ridership are outpacing capacity expansions in a number of transit systems. By shifting their focus to demand management, agencies can instead influence how customers use the system, getting more out of the capacity they already have. However, while demand management is well researched for personal vehicle use, its applications for public transportation are still emerging. This thesis explores the strategies transit agencies can use to reduce overcrowding, with a particular focus of how automatically collected fare data can support the design and evaluation of these measures.
A framework for developing demand management policies is introduced to help guide agencies through this process. It includes establishing motivations for the program, aspects to consider in its design, as well as dimensions and metrics to evaluate its impacts. Additional considerations for updating a policy are also discussed, as are the possible data sources and methods for supporting analysis.
This framework was applied to a fare incentive strategy implemented at Hong Kong’s MTR system. In addition to establishing existing congestion patterns, a customer classification analysis was performed to understand the typical travel patterns among MTR users. These results were used to evaluate the promotion at three levels of customer aggregation: all users, user groups, and a panel of high frequency travelers. The incentive was found to have small but non-negligible impacts on morning travel, particularly at the beginning of the peak hour and among users with commuter-like behavior. Through a change point analysis, it was possible to identify the panel members that responded to the promotion and quantify factors that influenced their decision using a discrete choice model. The findings of these analyses are used to recommend potential improvements to MTR’s current scheme.

Team: Anne Halvorsen, Haris Koutsopoulos, Jinhua Zhao

Publication: Anne Halvorsen, Haris Koutsopoulos, Jinhua Zhao (2016) Reducing Subway Crowding: Analysis of An Off-peak Discount Experiment in Hong Kong, Transportation Research Board Annual Conference, Washington, D.C.

Exploring Regularity and Structure in Travel Behavior Using Smart Card Data

As the economic opportunities fostered by large cities become more diverse, the travel patterns of public transport users become more heterogeneous. From personalized customer information, to improved travel demand models, understanding these heterogeneous travel patterns is useful for a number of applications relevant to public transport agencies. This thesis explores how smart card data can be used to analyze and compare the structure of individual travel patterns observed over several weeks. Speci cally, the way in which multiple journeys and activities are ordered and combined into repeated patterns, both by the same individual over time and across individuals is evaluated from the journey sequence of each user. The research is structured around three objectives. First, we introduce a representation of individual travel patterns and develop a measure of travel sequence regularity. The mobility of each individual is modeled as a stochastic process with memory, of which each new realization represents an activity or journey. Entropy rate, a measure of randomness in the stochastic process, is used to quantify repetition in the order of journeys and activities. This analysis reveals that the order of events is an important component of regularity not explicitly captured in previous literature. Second, we develop an approach to identify clusters of travel patterns with similar structure considered with respect to public transport usage and activity patterns. Finally, we present an exploratory evaluation of the associations between the identi ed clusters and socio-demographic characteristics by linking smart card data to an annual travel diary survey. These three objectives are considered in the context of a practical application using the transactions of a sample of approximately 100,000 users collected between February 10th and March 10th 2015 in London.
Team: Gabriel Goulet-Langlois, Haris Koutsopoulos, Jinhua Zhao

Publication: Gabriel Goulet-Langlois, Haris Koutsopoulos, Jinhua Zhao (2016) Inferring Patterns in the Multi-week Activity Sequences of Public Transport Users

Unified estimator for excess journey time: under heterogeneous passenger incidence behavior using smartcard data

Excess journey time (EJT), the difference between actual passenger journey times and journey times implied by the published timetable, strikes a useful balance between the passenger's and operator's perspectives of public transport service quality. Using smartcard data, this paper tried to characterize transit service quality with EJT under heterogeneous incidence behavior (arrival at boarding stations). A rigorous framework was established for analyzing EJT, in particular for reasoning about passenger’ journey time standards as implied by varying incidence behavior. It was found that although the wrong assumption about passenger incidence behavior and journey time standards could result in a biased estimate of EJT for individual passenger journeys, the unified estimator of EJT proposed in this paper is unbiased at the aggregate level regardless of the passenger incidence behavior (random incidence, scheduled incidence, or a mixture of both). A case study based on the London Overground network (with a tap-in-and-tap-out smartcard system) was conducted to demonstrate the applicability of the proposed method. EJT was estimated using the smartcard (Oyster) data at various levels of spatial and temporal aggregation in order to measure and evaluate the service quality. Aggregate EJT was found to vary substantially across the different London Overground lines and across time periods of weekday service.

Team: Jinhua Zhao, Michael Frumin, Nigel Wilson and Zhan Zhao

Publication: Jinhua Zhao, Michael Frumin, Nigel Wilson, Zhan Zhao (2013) "Unified estimator for excess journey time under heterogeneous passenger incidence behavior using smartcard data", Transportation Research Part C, v34, doi: 10.1016/j.trc.2013.05.009 

Estimating a Rail Passenger Trip Origin-Destination Matrix Using Automatic Data Collection Systems

Automatic data collection (ADC) systems are becoming increasingly common in transit systems throughout the world. Although these ADC systems are often designed to support specific fairly narrow functions, the resulting data can have wide-ranging application, well beyond their design purpose. This paper illustrates the potential that ADC systems can provide transit agencies with new rich data sources at low marginal cost, as well as the critical gap between what ADC systems directly offer and what is needed in practice in transit agencies. To close this gap requires data processing and analysis methods with support of technologies such as Database Management System (DBMS) and Geographic Information System (GIS). This research presents a case study of the Automatic Fare Collection (AFC) system of the Chicago Transit Authority (CTA) rail system and develops a method for inferring rail passenger trip origin-destination (OD) matrices from an origin-only AFC system to replace expensive passenger OD surveys. A software tool is developed to facilitate the method implementation and the results of the application in CTA are reported.

Team: Jinhua Zhao, Adam Rahbee and Nigel Wilson

Publication: Jinhua Zhao, Adam Rahbee, Nigel Wilson, "Estimating a Rail Passenger Trip Origin-Destination Matrix Using Automatic Data Collection Systems", Computer-Aided Civil and Infrastructure Engineering, 22(5), doi:10.1111/j.1467-8667.2007.00494.x

Automatic Data for Applied Railway Management: A Case Study on the London Overground

In 2009, London Overground management implemented a new tactical plan for AM and PM Peak service on the North London Line (NLL). This paper documents that tactical planning intervention and evaluates its outcomes in terms of certain aspects of service delivery (the operator’s perspective on system performance) and service quality (the passenger’s perspective). Analyses of service delivery and quality, and passenger demand contribute to the development, proposal, and implementation of the new tactical plan. It is found that NLL trains were routinely delayed en route with excessive dwell time being a major cause. Near-random passenger incidence behavior suggests an even headway service may be more appropriate for NLL. The confluence of these analyses is confirmed by the corresponding excess journey time (EJT) results. Based on longitudinal analysis, evaluation shows that on-time performance increased substantially and observed journey time (OJT) decreased with the introduction of the new plan. Overall, the effects of this implementation appear to have been positive on balance. This case study thus demonstrates the applicability of automatic data generally, and certain measures and techniques in London Overground specifically, to support tactical planning of an urban railway.

Team:  Michael Frumin, Jinhua Zhao, Nigel Wilson, Zhan Zhao

Publication: Michael Frumin, Jinhua Zhao, Nigel Wilson, Zhan Zhao, "Automatic Data for Applied Railway Management", Journal of the Transportation Research Board, accepted Feb 2013

Analyzing Passenger Incidence Behavior in Heterogeneous Transit Services Using Smartcard Data and Schedule-Based Assignment

Passenger incidence (station arrival) behavior has been studied primarily to understand how changes to a transit service will affect passenger waiting times. The impact of one intervention (i.e. increasing frequency) could be overestimated compared to another (i.e. improving reliability), depending on the assumption of incidence behavior. It is important to understand passenger incidence so that management decisions will be based on realistic behavioral assumptions. Prior studies on passenger incidence chose their data samples from stations with a single service pattern such that the linking of passengers to services was straightforward. This simplifies the analysis but heavily limits the stations that can be studied. In any moderately complex network, many stations may have more than one service patterns. This limitation prevents it from being systematically applied to the whole network and limits its use in practice.

This paper concerns with incidence behavior in stations with heterogeneous services. It proposes a method to estimate incidence headway and waiting time by integrating disaggregate smartcard data with published timetables using schedule-based assignment. We apply this method to stations in the entire London Overground to demonstrate its practicality and observe that incidence behavior varies across the network and across times of day, reflecting the different headways and reliability. Incidence is much less timetable-dependent on the North London Line than on the other lines because of shorter headways and poorer reliability. Where incidence is timetable-dependent, passengers reduce their mean scheduled waiting time by over 3 minutes compared with random incidence.

Team: Michael Frumin, Jinhua Zhao

Publication: Michael Frumin, Jinhua Zhao, "Transit Passenger Incidence Behavior", Journal of the Transportation Research Board, 2274, Mar 2012, doi: 10.3141/2274-05