An Analysis of Factors Affecting Injury of Electric Two-wheeler Riders Based on CIDAS Data and Ensemble Learning
-
摘要: 电动两轮车保有量持续增长导致相关的事故伤害日益严重。为研究电动两轮车-机动车碰撞事故中电动两轮车骑行者受伤程度的影响因素,以中国事故深度调查(CIDAS)数据集中的1 246起电动两轮车-机动车事故案例为基础,对比随机森林、XGBoost和LightGBM这3种集成学习模型性能,基于准确率等指标选用性能最优的LightGBM模型进行电动车骑行者受伤严重程度预测。结合SHAP可解释方法,进一步分析发现自变量与因变量之间存在明显的非线性关系:电动两轮车骑行者抛出距离对死亡的影响存在明显的阈值效应,电动两轮车骑行者被抛出距离小于5 m时,不易发生死亡事故,超过5 m时,抛出距离和死亡风险呈正相关;事故发生地为市区外或公路上以及与载重物车辆相撞能显著增加电动两轮车事故中骑行者的死亡风险;电动两轮车不加装脚蹬、座位高度大于70 cm、车把宽度为61~65 cm、车把设计形式为向后弯曲或牛角状等因素可降低死亡风险;与电动两轮车骑行者相关的降低死亡风险的因素包括女性、年龄在30~50岁及对事故发生地环境更为熟悉。Abstract: A growing use of electric two-wheelers leads to an increasing number of serious accidents. In order to study the factors affecting injury severity of electric two-wheeler riders within the collisions involving electric two-wheelers, three integrated learning models, i.e. random forest, XGBoost, and LightGBM, are developed and compared based on 1 246 electric two-wheelers and motor vehicle accidents collected from the China Depth of Accident Investigation(CIDAS)dataset. According to the accuracy and other indicators, the LightGBM model is chosen for its best performance to predict the severity of injury suffered by electric vehicle riders. With SHAP-method analysis, a nonlinear relationship between independent variables and dependent variables is observed. There is an evident threshold for the impacts of the throwing distance of the electric two-wheeler riders on the risk of death. Electric two-wheeler riders are not susceptible to death accidents when the throwing distance is less than 5 meters. When the throwing distance exceeds 5 meters, there is a positively correlation between throwing distance and risk of death. Accidents occur in outside urban areas or on highways and collisions with heavy vehicles significantly increase the risk of death to riders involved in accidents. Factors like no pedal, seat height greater than 70 cm, handlebar width of 61~65 cm, and handlebar design of backward bending or horn shape can reduce the risk of death. Being female, age 30~50, and familiar with the location are associated with a lower risk of death.
-
Key words:
- traffic safety /
- CIDAS data /
- ensemble learning /
- LightGBM model /
- electric two-wheeler /
- injury severity /
- influencing factors
-
表 1 自变量分类表
Table 1. Classification of independent variables
类别 变量 分类赋值(占比/%) 事故信息 事故季节 0:春(28) 1: 夏(33) 2: 秋(25) 3: 冬(14) 事故地点 0: 市区内(58) 1: 市区外(42) 道路信息 道路类型 0: 公路(14) 1: 城市道路(70) 2: 其他(16) 路灯状态 0: 无路灯(13) 1: 开启(18) 2: 关闭(69) 电动车信息 碰撞之后的车辆状况 0:还能继续正常行驶(53) 1:能够滚动、刹车或转向(33) 2:不能滚动(14) 两轮车种类 0: 带脚蹬的电动两轮车(22) 1: 不带脚蹬的电动两轮车(78) 两轮车主要相撞部位 0: 前部(20) 1: 左侧(41) 2: 右侧(31) 3: 尾部(7) 4: 其他(1) 前轮制动类型 0:轮辋制动(11) 1:鼓式制动(70) 2: 单盘式刹(18) 3: 其他(1) 车把设计形式 0: 直把把手(68) 1: 向后弯曲的把手(17) 2: 牛角状把手(12) 3: 其他(3) 车把宽度/cm 0: ≤60(20) 1: > 60~65 (55) 2: > 65~70 (22) 3: > 70~75(2) 4: > 75(1) 座位高度/cm 0:≤70及以下(27) 1: > 70~75(41) 2: > 75~80 (27) 4: > 80(5) 碰撞时两轮车总质量/kg 0: ≤100(8)1: > 100~125 (27) 2: > 125~150 (43) 3: > 150~175(12)4: > 175~200 (6) 5: > 200(4) 电动车骑行者信息 性别1 0: 男(61) 1: 女(39) 年龄1/岁 0:≤ 18(1) 1: > 18~30(17) 2: > 30~40(16) 3: > 40~50 (23) 4: > 50~60 (20) 5: > 60(23) 碰撞前是否采取制动措施 0: 未制动(67) 1: 制动(4) 2: 减速(25) 3: 其他⑷ 事故发生地的熟悉程度1 0:几乎每天(40) 1: 一周几次(43) 2:很少(4) 3:其他(13) 抛出距离/m 0: ≤ 1.0(10) 1: > 1.0~3.0 (31) 2: > 3.0~5.0 (20) 3: > 5.0~10.0 (21) 4: > 10.0~20.0 (12) 5: > 20.0(6) 机动车辆信息 车辆类型 0: 乘用车(85) 1: 载重物车辆(15) 刹车响应 0: 刹车(33) 1: 未刹车(65) 2: 其他(2) 碰撞前车辆运动曲线 0: 静止⑴1: 直线向前(78) 2: 左拐弯(15) 3: 右拐弯(5) 4:形(1) 表 2 二分类混淆矩阵
Table 2. Dichotomous confusion matrix
混淆矩阵 预测值=1 预测值=0 真实值=1 TP FN 真实值=0 FP TN 表 3 二分类问题评价指标及含义
Table 3. Evaluation indexes and meanings of dichotomous problems
判断指标 指标含义 评估依据 评估标准 准确率(Accuracy) 预测正确的样本占总样本的比例 (TP + TN)/(TP + TN + FP + FN) 值越高越好 查准率(Precision) 预测为正例的样本中真实正例的比例 TP/(TP + FP) 值越高越好 查全率(Recall) 真实正例被预测为正例的比例 TP/(TP + FN) 值越高越好 F1 -Score 调和平均的查准率和查全率 $\frac{{2{\rm{ }} \times Precision \times Recall}}{{(Precision + Recall)}}$ 越接近1越好 表 4 LightGBM模型参数优化结果
Table 4. Optimization results of LightGBM model parameters
参数 名称 说明 优化结果 核心参数 learning rate 模型迭代的学习率或步长 0.1 num leaves 单棵数的最大叶子数 30 学习控制参数 max depth 树的最大深度 15 min_data_in_leaf 1个叶子的最小数据量 30 bagging_fraction 每次迭代时用的数据比例 0.4 bagging_freq 进行1次迭代需要的树的数量 20 feature_fraction 每次迭代时用的参数比例 0.6 lambda_l1 L1正则化系数 1x10-5 lambda_l2 L2正则化系数 0.001 min_split_gain 分裂的最小gain 0.0 IO参数 max_bin 桶的最大数量 10 categorical_feature 申明类别变量 — 表 5 LightGBM分类模型预测效果
Table 5. Prediction effect of LightGBM classification model
受伤严重程度 查准率/% 查全率/% F1-Score/% 支持样本量 仅财产损失 25 5 8 22 受伤 90 95 92 317 死亡 58 54 56 35 -
[1] PATRIZIA H, ANDREA U, STEFFEN N, et al. Characteristics of single vehicle crashes with e-bikes in Switzerland[J]. Accident Analysis & Prevention, 2018, 117(4): 232-238. [2] 马国忠, 明士军, 吴海涛. 电动自行车安全特性分析[J]. 中国安全科学学报, 2006, 16(4): 48-52. doi: 10.3969/j.issn.1003-3033.2006.04.009MA G Z, MING S J, WU H T. On safety character of electric bicycle[J]. China Safety Science Journal, 2006, 16(4): 48-52. (in Chinese) doi: 10.3969/j.issn.1003-3033.2006.04.009 [3] 王卫杰, 沈轩霆, 王贵彬, 等. 电动自行车骑行者事故伤害程度影响因素分析[J]. 中国安全科学学报, 2019, 29(2): 20-25. https://www.cnki.com.cn/Article/CJFDTOTAL-ZAQK201902004.htmWANG W J, SHEN X T, WANG G B, et al. Analysis of factors affecting injury to electric bi-cycle rider in crash[J]. China Safety Science Journal, 2019, 29(2): 20-25. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-ZAQK201902004.htm [4] 江亮, 贺宜. 电动两轮车风险驾驶行为及事故影响因素分析[J]. 吉林大学学报(工学版), 2019, 49(4): 1107-1113. https://www.cnki.com.cn/Article/CJFDTOTAL-JLGY201904011.htmJIANG L, HE Y. Risky driving behavior and influencing factors analysis for electric two-wheeler[J]. Journal of Jilin University(Engineering and Technology Edition), 2019, 49(4): 1107-1113. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-JLGY201904011.htm [5] 李英帅, 张旭, 王卫杰, 等. 基于随机森林的电动自行车骑行者事故伤害程度影响因素分析[J]. 交通运输系统工程与信息, 2021, 21(1): 196-200. https://www.cnki.com.cn/Article/CJFDTOTAL-YSXT202101032.htmLI Y S, ZHANG X, WANG W J, et al. Factors affecting electric bicycle rider injury in accident based on random forest model[J]. Journal of Transportation Systems Engineering and Information Technology, 2021, 21(1): 196-200. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-YSXT202101032.htm [6] PARSA A B, TAGHIPOUR H, DERRIBLE S, et al. Real-time accident detection: Coping with imbalanced data[J]. Accident Analysis & Prevention, 2019, 129(8): 202-210. [7] BAO J, LIU P, UKKUSURI S V. A spatiotemporal deep learning approach for citywide short-term crash risk prediction with multi-source data[J]. Accident Analysis & Prevention, 2019, 122(1): 239-254. [8] 张文婧, 陈治亚, 冯芬玲, 等. 基于稀疏理论的DAE在公路事故伤亡预测应用[J]. 计算机工程与应用, 2019, 55(7): 241-247. https://www.cnki.com.cn/Article/CJFDTOTAL-JSGG201907038.htmZHANG W J, CHEN Z Y, FENG F L, et al. Application of deep auto-encoder based on sparse theory in highway accident casualty forecast[J]. Computer Engineering and Applications, 2019, 55(7): 241-247. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-JSGG201907038.htm [9] 柳本民, 闫寒. 基于SVM事故分类的连环追尾事故影响因素分析[J]. 交通信息与安全, 2020, 38(1): 43-51. doi: 10.3963/j.jssn.6174-4861.2020.01.006LIU B M, YAN H. An analysis of influencing factors of multi-vehicle rear-end accidents based on accident classification of SVM[J]. Journal of Transport Information and Safety, 2020, 38(1): 43-51. (in Chinese) doi: 10.3963/j.jssn.6174-4861.2020.01.006 [10] WEN H Y, ZHANG X, ZENG Q, et al. Predicting future driving risk of crash-involved drivers based on a systematic machine learning framework[J]. International Journal of Environmental Research and Public Health, 2019, 16(3): 334-352. doi: 10.3390/ijerph16030334 [11] 纪俊红, 昌润琪, 温廷新. 基于GSK-AdaBoost-LightGBM的交通事故死亡人数预测研究[J]. 安全与环境工程, 2021, 28(1): 24-28. https://www.cnki.com.cn/Article/CJFDTOTAL-KTAQ202101004.htmJI J H, CHANG R Q, WEN T X. Prediction of traffic accident death toll based on GSK-AdaBoost-LightGBM[J]. Safety and Envioronmental Engineering, 2021, 28(1): 24-28. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-KTAQ202101004.htm [12] YANG C, CHEN M, YUAN Q. The application of XGBoost and SHAP to examining the factors in freight truck-related crashes: An exploratory analysis[J]. Accident Analysis & Prevention, 2021, 158(8): 106153. [13] CHAWLA N V, BOWYER K W, HALL L O, et al. SMOTE: synthetic minority over-sampling technique[J]. Journal of Artificial Intelligence Research, 2002, 16(1): 321-357. [14] KE G L, MENG Q, FINLEY T, et al. Lightgbm: A highly efficient gradient boosting decision tree[J]. Advances in Neural Information Processing Systems, 2017, 30(1): 3146-3154. [15] 王芳杰, 王福建, 王雨晨, 等. 基于LightGBM算法的公交行程时间预测[J]. 交通运输系统工程与信息, 2019, 19(2): 116-121. https://www.cnki.com.cn/Article/CJFDTOTAL-YSXT201902017.htmWANG F J, WANG F J, WANG Y C, et al. Bus travel time prediction based on light gradient boosting machine algorithm[J]. Journal of Transportation Systems Engineering and Information Technology, 2019, 19(2): 116-121. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-YSXT201902017.htm [16] BREIMAN L. Random forests[J]. Machine Learning, 2001, 45(1): 5-32. doi: 10.1023/A:1010933404324 [17] CHEN T Q, GUESTRIN C. Xgboost: A scalable tree boosting system[C]. The 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, California, USA: Association for Computing Machinery, 2016. [18] KIDANDO E, MOSES R, OZGUVEN E E, et al. Incorporating travel time reliability in predicting the likelihood of severe crashes on arterial highways using non-parametric random effect regression[J]. Journal of Traffic and Transportation Engineering(English edition), 2019, 6(5): 470-481. doi: 10.1016/j.jtte.2018.04.003 [19] 中华人民共和国工业和信息化部. 电动自行车安全技术规范: GB 17761—2018[S]. 北京: 中国标准出版社, 2018.Ministry of Industry and Information Technology, People's Republic of China. Safety technical specification for electric bicycle: GB 17761—2018[S]. Beijing: Standards Press of China, 2018. (in Chinese) [20] WANG T, CHEN J, SHEN X J. CICTP 2014: Safe, Smart, and Sustainable Multimodal Transportation Systems: Proceedings of the 14th COTA International Conference of Transportation Professionals[M]. Reston, Virginia, USA: American Society of Civil Engineers, 2014. [21] 陈昭明, 徐文远, 曲悠扬, 等. 基于混合Logit模型的高速公路交通事故严重程度分析[J]. 交通信息与安全, 2019, 37 (3): 42-50. doi: 10.3963/j.issn.1674-4861.2019.03.006CHEN Z M, XU W Y, QU Y Y, et al. Severity of traffic crashes on freeways based on mixed logit model[J]. Journal of Transport Information and Safety, 2019, 37(3): 42-50. (in Chinese) doi: 10.3963/j.issn.1674-4861.2019.03.006