Journal Article
Trip Detection using Sparse Call Detail Record Data based on Supervised Statistical Learning

Despite a large body of literature related to trip detection using Call Detail Record (CDR) data, the fundamental understanding of the limitations of the data is lacking and, particularly, its sparse nature is not well addressed in existing work. This paper develops an explicit mapping between telecommunication patterns captured by CDRs and physical travel patterns that are of interest to the transportation community. To reduce the over-reliance of existing CDR-based trip detection methods on heuristics and arbitrary assumptions, we use data fusion to form labeled data for supervised statistical learning. In the absence of complementary data, this can be done by extracting labeled observation from more granular cellular data access records and extracting feature vectors from voice-call and SMS records. The proposed approach is demonstrated, using real-world CDR data from a large Chinese city of 6 million people, through inferring whether there exists a hidden visit between two consecutive visits observed from CDR data. Logistic regression, support vector machine and artificial neural network are used to develop statistical classification models, and all show significant improvement over the naïve no- hidden-visit rule, an implicit assumption adopted by most existing studies. The proposed data fusion approach offers a systematic and statistical way to make inference of individual mobility patterns from telecommunication records and can be generalized to other types of large-scale data.

Title
Publication TypeJournal Article
Year of PublicationSubmitted
AuthorsZhao Z, Zhao J, Koutsopoulos HN
KeywordsCall Detail Record, data fusion, elapsed time interval, hidden visit, statistical inference, supervised learning
Abstract

Despite a large body of literature related to trip detection using Call Detail Record (CDR) data, the fundamental understanding of the limitations of the data is lacking and, particularly, its sparse nature is not well addressed in existing work. This paper develops an explicit mapping between telecommunication patterns captured by CDRs and physical travel patterns that are of interest to the transportation community. To reduce the over-reliance of existing CDR-based trip detection methods on heuristics and arbitrary assumptions, we use data fusion to form labeled data for supervised statistical learning. In the absence of complementary data, this can be done by extracting labeled observation from more granular cellular data access records and extracting feature vectors from voice-call and SMS records. The proposed approach is demonstrated, using real-world CDR data from a large Chinese city of 6 million people, through inferring whether there exists a hidden visit between two consecutive visits observed from CDR data. Logistic regression, support vector machine and artificial neural network are used to develop statistical classification models, and all show significant improvement over the naïve no- hidden-visit rule, an implicit assumption adopted by most existing studies. The proposed data fusion approach offers a systematic and statistical way to make inference of individual mobility patterns from telecommunication records and can be generalized to other types of large-scale data.