Travel Destination Prediction of Public Transport Commuters by Integrating XGBoost Algorithm and Graph Adjustment Method
-
摘要: 准确把握公共交通通勤乘客的目的地, 有助于明确乘客出行需求, 提升公共交通服务水平。基于北京市1个月的公共交通出行数据和RP调查数据, 通过关联分析乘客公交卡号与公共交通刷卡数据和线站数据, 匹配获得563名通勤乘客完整出行链数据, 并利用关联规则实现302名公交通勤乘客高、中、低出行稳定性辨识。引入XGBoost集成学习算法, 分别以不同公交出行稳定性乘客出行目的地显著影响因素为输入变量, 以下次出行目的地为输出变量, 通过模型参数调优, 分类构建了公共交通通勤个体乘客下次出行目的地预测模型, 高、中、低稳定性乘客出行目的地预测准确率分别为90%, 66.67%和50%。借助个体乘客出行图谱转移概率对模型预测结果进行修正, 将预测准确率分别提升至91.2%, 83.21%和69.5%, 可以有效提升中、低稳定性乘客出行目的地的预测准确性。采用公交都市系统记录的目的地数据对下次出行目的地预测聚合结果进行对比验证, 客流预测值与真值变化梯度的绝对百分误差小于10%。因此, 在划分通勤乘客出行稳定性的基础上, 融合XGBoost和图谱修正的公交通勤乘客目的地预测预测方法具有较高准确性。Abstract: Accurate grasp of the destinations of public transport commuters can clarify travel needs of passengers and improve public transport service. The data of public transport in one-month and the revealed preference(RP)survey in Beijing are collected. The travel chain of 563 public transport commuters is obtained through the association analysis of smart card numbers, transaction data, and network data. A total of 302 public transport commuters with high, medium, and low public travel stability are identified by association rules. The XGBoost integrated learning algorithm is introduced to develop a prediction model of the next travel destination for individual public transport commuters with different travel stabilities. The factors significantly influencing travel destinations are input variables. The following trip destination is the output variable. The prediction model is constructed by adjusting and optimizing parameters repeatedly. The destination prediction accuracy of passengers with high, medium, and low stability is 90%, 66.67%, and 50%, respectively. Besides, the transfer probability of the graph is utilized to revise the predicted results. The prediction accuracy is improved to 91.2%, 83.21%, and 69.5%. The transfer probability of the graph can improve the prediction accuracy of the passengers' travel destinations with medium and low stability. The destination data from the bus metropolitan system is used to compare and verify the aggregation results of destination prediction for the next trip.The absolute percentage error of the predicted value and the true value-changing gradient is less than 10%. Thus, the method of travel destination prediction by combining XGBoost and travel graph correction based on dividing public transport commuters' travel stability has high accuracy.
-
Key words:
- urban transport /
- public transport commuter /
- destination prediction /
- XGBoost algorithm /
- travel graph
-
表 1 个体乘客出行链示意
Table 1. Trip-chain data of individual passengers
卡号 出行模式 上车时间 下车时间 上车线路号 下车线路号 出行距离/m 上车站点 下车站点 上车站点 下车站点 经度/(°) 纬度/(°) 经度/(°) 纬度/(°) 24050273 地铁 2017-04-01
T08.82017-04-01
T08.54 1 8 115 北京南站 木樨地 116.378 39.864 116.337 39.908 24050273 地铁 2017-04-01
T17.32017-04-01
T16.91 4 8 115 木樨地 北京南站 116.337 39.908 116.378 39.864 ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ 24050273 公交 2017-04-30
T17.42017-04-30
T17:39114 114 8 620 白云桥西 开阳桥南 116.340 39.897 116.347 39.867 表 2 下次目的地预测模型变量选取
Table 2. Model variables for predicting next travel destination
乘客类型 输入变量 输出变量 高稳定性 出行目的 下次出行目的地 弹性出行 高峰出行 管制或突发事件 工作日 该次出行目的地 中稳定性 出行时间 出行距离 出行目的 弹性出行 高峰出行 该次出行目的地 低稳定性 出行时间 出行距离 管制或突发事件 工作日 该次出行目的地 表 3 乘客A下次出行目的地预测模型参数优化结果
Table 3. Parameter optimization of the destination prediction model of passenger A
参数 最佳取值 平均预测准确率 准确度标准差 learning rate 0.45 0.867 1 0.105 3 n-estimators 30 0.867 5 0.117 3 max-depth 3 0.867 8 0.117 3 min-child-weight 1 0.868 0 0.077 1 gamma 0 0.868 2 0.066 9 subsample 1 0.868 9 0.057 3 colsample-bytree 1 0.871 9 0.056 5 alpha 4 0.872 1 0.024 4 lambda 4 0.874 6 0.020 3 表 4 乘客A预测结果统计
Table 4. Forecasting results of passenger A
评价指标 精度/% Macro F1 Micro F1 修正前 90 0.69 0.90 修正后 91.2 0.83 0.93 表 5 不同稳定性公共交通通勤乘客下次出行目的地预测结果统计
Table 5. Forecasting results of next travel destinations of public transport commuters with different stability
3类人群 评价指标 精度/% Macro F1 Micro F1 高稳定 修正前 90.61 0.78 0.91 修正后 92.12 0.80 0.92 中稳定 修正前 72.53 0.50 0.73 修正后 80.92 0.52 0.81 低稳定 修正前 53.13 0.46 0.53 修正后 67.27 0.48 0.67 表 6 高稳定性乘客下次出行热点目的地预测
Table 6. Prediction of hot destinations for high stability passengers' next trip
编号 下次出行目的地 目的地占比/% 1 奥体中心 8.33 2 马连洼 8.33 3 三里河 8.33 4 望京 8.33 5 成寿寺 6.25 6 东华门 6.25 7 魏公村 6.25 8 新街口 6.25 9 中关村 6.25 10 科创街 4.17 表 7 中稳定性乘客下次出行热点目的地预测结果
Table 7. Prediction of hot destinations for moderate-stability passengers' next trip
编号 下次出行目的地 目的地占比/% 1 中关村 9.20 2 车公庄 6.90 3 奥体中心 5.75 4 东华门 5.75 5 东直门 5.75 6 前门 5.75 7 三里河 4.60 8 西二旗 4.60 9 世界公园 3.45 10 四惠 3.45 表 8 低稳定性乘客下次出行热点目的地预测结果
Table 8. Prediction of hot destinations for low-stability passengers' next trip
编号 下次出行目的地 目的地占比/% 1 三里河 8.75 2 奥体中心 6.25 3 魏公村 6.25 4 东华门 5.00 5 车公庄 3.75 6 东直门 3.75 7 金融街 3.75 8 前门 3.75 9 西局 3.75 10 中关村 3.75 表 9 通勤乘客群体热点目的地对比
Table 9. Comparisons of hot destinations of commuter passengers
编号 热点目的地预测值 热点目的地真值 1 三里河 东华门 2 奥体中心 中关村 3 中关村 三里河 4 东华门 东直门 5 魏公村 车公庄 6 车公庄 魏公村 7 东直门 前门 8 前门 奥体中心 9 马连洼 马连洼 10 望京 东单 表 10 客流变化梯度对比
Table 10. Comparison of variable gradients of passenger flow
编号 客流变化梯度 绝对百分误差/% 预测值 真值 1 1.07 1.04 2.89 2 1.00 1.05 4.76 3 1.17 1.24 5.65 4 1.09 1.02 6.87 5 1.22 1.35 9.63 6 1.25 1.35 7.41 7 1.00 1.01 1.00 8 1.11 1.02 8.82 9 1.00 1.01 0.99 -
[1] 北京交通发展研究院. 2019年北京市交通发展年度报告[R]. 北京: 北京交通发展研究院, 2019.Beijing Transport Institute. 2019 Beijing transport annual report[R]. Beijing: Beijing Transport Institute, 2019. (in Chinese) [2] 呙娟. 基于公交数据的乘客出行特征分析[D]. 广州: 华南理工大学, 2016.GUO Juan. The travel characteristics analysis of passengers based on the bus data[D]. Guangzhou: South China University of Technology, 2016. (in Chinese) [3] NEVEN A, BRAEKERS K, DECLERCQ K, et al. Assessing the impact of different policy decisions on the resource requirements of a demand responsive transport system for persons with disabilities[J]. Transport Policy, 2015(44): 48-57. http://www.researchgate.net/profile/An_Neven/publication/280083315_Assessing_the_impact_of_different_policy_decisions_on_the_resource_requirements_of_a_Demand_Responsive_Transport_system_for_persons_with_disabilities/links/55a74d1b08ae51639c577186.pdf [4] LEE W. Assessing the impacts of job and worker relocation policies on commuting[J]. Applied Geography, 2012(34): 606-613. http://www.sciencedirect.com/science?_ob=ShoppingCartURL&_method=add&_eid=1-s2.0-S0143622812000252&originContentFamily=serial&_origin=article&_ts=1479545025&md5=ac2ea32efd07791bc11eb317dfeb8e97 [5] CASTIGLIONE J, FREEDMAN J. A systematic investigation of variability due to random simulation error in an activity-based micro simulation forecasting model[C]. 82ndTransportation Research Board Annual Meeting, Washington, D. C., USA: Transportation Research Board, 2009. [6] 郑劲松. 基于数据仓库的城市轨道交通客流分析系统研究[D]. 长沙: 中南大学, 2009.ZHENG Jingsong. Research of urban rail transit passenger flow analysis system based on data warehouse[D]. Changsha: Central South University, 2009. (in Chinese) [7] 靳佳. 基于IC卡的北京市公交出行特征分析[D]. 北京: 首都师范大学, 2013.JIN Jia. An analysis of the characteristics of Beijing's public transport trips based on IC cards[D]. Beijing: Capital Normal University, 2013. (in Chinese) [8] 郭婕. 公交IC卡通勤乘客OD确定方法研究[D]. 南京: 东南大学, 2006.GUO Jie. Research on the method of determining the OD of passengers in bus IC card[D]. Nanjing: Southeast University, 2006. (in Chinese) [9] VELDHUISEN J, TIMMERMANS H, PONE L. Micro-simulation of activity-travel patterns and traffic flows: validation tests and an investigation of monte carlo error[C]. 79th Transportation Research Board Annual Meeting, Washington, D. C., USA: Transportation Research Board, 2000. [10] 沈金星, 李洋, 王逸戍. 居民通勤出行的定制公交需求特征研究[J]. 物流工程与管理, 2017, 39(9): 133-134. doi: 10.3969/j.issn.1674-4993.2017.09.046SHEN Jinxing, LI Yang, WANG Yishu. Study on the characteristics of customized bus demand for commuter trip[J]. Logistics Engineering and Management, 2017, 39(9): 133-134. (in Chinese) doi: 10.3969/j.issn.1674-4993.2017.09.046 [11] TAO L, CEDER A A. Analysis of a new public-transport service concept: customized bus in China[J]. Transport Policy, 2015(39): 63-76. http://www.sciencedirect.com/science?_ob=ShoppingCartURL&_method=add&_eid=1-s2.0-S0967070X15000256&originContentFamily=serial&_origin=article&_ts=1495763187&md5=0cebc5b3c021ed7e8059da16041da5d2 [12] QUAN H, PHONG H N, TOBIASÅ, et al. Implementation of a flow map demonstrator for analyzing commuting and migration flow statistics data[J]. Procedia Social and Behavioral Sciences, 2011(21): 157-166. http://pdfs.semanticscholar.org/4514/1160d6ecec5f21aef39ed4ff61dc639f2c5b.pdf [13] 梁泉, 翁剑成, 周伟, 等. 基于关联规则的公共交通通勤稳定性人群辨识[J]. 吉林大学学报(工学版), 2019, 49(5): 1484-1491. https://www.cnki.com.cn/Article/CJFDTOTAL-JLGY201905013.htmLIANG Quan, WENG Jiancheng, ZHOU Wei, et al. Stability identification of public transport commute passengers based on association rules[J]. Journal of Jilin University(Engineering and Technology Edition), 2019, 49(5): 1484-1491. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-JLGY201905013.htm [14] 梁泉, 翁剑成, 周伟, 等. 面向个体的分类型公交通勤行为影响因素研究[J]. 武汉理工大学学报(交通科学与工程版), 2019, 43(5): 855-859. doi: 10.3963/j.issn.2095-3844.2019.05.013LIANG Quan, WENG Jiancheng, ZHOU Wei, et al. Study on influencing factors of individual-oriented commuting behavior with different types of public transportation[J]. Journal of Wuhan University of Technology(Transportation Science & Engineering), 2019, 43(5): 855-859. (in Chinese) doi: 10.3963/j.issn.2095-3844.2019.05.013 [15] 万志超. 基于XGBoost的不平衡分类方法研究[D]. 合肥: 安徽大学, 2018.WAN Zhichao. Research on imbalanced classification method based on XGBoost[D]. Hefei: Anhui University, 2018. (in Chinese) [16] 施国良, 景志刚, 范丽伟. 基于Lasso和Xgboost的油价预测研究[J]. 工业技术经济, 2018, 37(7): 31-37. doi: 10.3969/j.issn.1004-910X.2018.07.004SHI Guoliang, JIANG Zhigang, FAN Liwei, et al. Research on the Original Oil Price Prediction Based on Lasso-Xgboost Combination Method[J]. Journal of Industrial Technological Economics, 2018, 37(7): 31-37. (in Chinese) doi: 10.3969/j.issn.1004-910X.2018.07.004 [17] CHEN T, GUESTRIN C. Xgboost: A scalable tree boosting system[C]. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco: ACMDigital Library, 2016. [18] 刘峤, 李杨, 段宏, 等. 知识图谱构建技术综述[J]. 计算机研究与发展, 2016, 53(3): 582-600. https://www.cnki.com.cn/Article/CJFDTOTAL-JFYZ201603009.htmLIU Qiao, LI Yang, DUAN Hong, et al. Knowledge graph construction techniques[J]. Journal of Computer Research and Development, 2016, 53(3): 582-600. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-JFYZ201603009.htm [19] 梁泉. 基于个体出行图谱的公共交通通勤出行行为预测方法研究[D]. 北京: 北京工业大学, 2019.LIANG Quan. Research on travel behavior forecasting method of public transport commuters based on individual travel graphs[D]. Beijing: Beijing University of Technology, 2019. (in Chinese) [20] 翁剑成, 王昌, 王月玥, 等. 基于个体出行数据的公共交通出行链提取方法[J]. 交通运输系统工程与信息, 2017, 17(3): 67-73. https://www.cnki.com.cn/Article/CJFDTOTAL-YSXT201703011.htmWENG Jiancheng, WANG Chang, WANG Yueyue, et al. Extraction method of public transit trip chains based on the individual riders'data[J]. Journal of Transportation Systems Engineering and Information Technology, 2017, 17(3): 67-73. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-YSXT201703011.htm [21] 钟颖, 邵毅明, 吴文文, 等. 基于XGBoost的短时交通流预测模型[J]. 科学技术与工程, 2019, 19(30): 337-342. doi: 10.3969/j.issn.1671-1815.2019.30.050ZHONG Ying, SHAO Yiming, WU Wenwen, et al. Short-term traffic flow prediction model based on XGBoost[J]. Science Technology and Engineering, 2019, 19(30): 337-342. (in Chinese) doi: 10.3969/j.issn.1671-1815.2019.30.050 [22] MITCHELL R, FRANK E. Accelerating the XGBoost algorithm using GPU computing[D]. Waikato, New Zealand: University of Waikato, 2017. [23] LIANG Quan, WENG Jiancheng, ZHOU Wei, et al. Individual travel behavior modeling of public transport passenger based on graph construction[J]. Journal of Advanced Transportation, 2018(2018): 1-13. [24] 荣建, 翁剑成. 基于多源数据的公共交通通勤特征提取技术[R]. 北京: 北京工业大学, 2014.RONG Jian, WENG Jiancheng. Commuter feature extraction technology of public transport based on multi-source data[R]. Beijing: Beijing University of Technology, 2014. (in Chinese)