摘要

Accurate and timely traffic flow prediction is crucial to proactive traffic management and control in data driven intelligent transportation systems ((DITS)-I-2), which has attracted great research interest in the last few years. In this paper, we propose a Spatial-Temporal Weighted K-Nearest Neighbor model, named STW-KNN, in a general MapReduce framework of distributed modeling on a Hadoop platform, to enhance the accuracy and efficiency of short-term traffic flow forecasting. More specifically, STW-KNN considers the spatial-temporal correlation and weight of traffic flow with trend adjustment features, to optimize the search mechanisms containing state vector, proximity measure, prediction function, and K selection. Furthermore, STW-KNN is implemented on a widely adopted Hadoop distributed computing platform with the MapReduce parallel processing paradigm, for parallel prediction of traffic flow in real time. Finally, with extensive experiments on real-world big taxi trajectory data, STW-KNN is compared with the state-of-the-art prediction models including conventional K-Nearest Neighbor (KNN), Artificial Neural Networks (ANNs), Naive Bayes (NB), Random Forest (RF), and C4.5. The results demonstrate that the proposed model is superior to existing models on accuracy by decreasing the mean absolute percentage error (NAPE) value more than 11.59% only in time domain and even achieves 89.71% accuracy improvement with the MAPEs of between 3.34% and 6.00% in both space and time domains, and also significantly improves the efficiency and scalability of short-term traffic flow forecasting over existing approaches.