留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于深度强化学习的自动驾驶车辆跟驰行为建模

陈越 焦朋朋 白如玉 李汝鉴

陈越, 焦朋朋, 白如玉, 李汝鉴. 基于深度强化学习的自动驾驶车辆跟驰行为建模[J]. 交通信息与安全, 2023, 41(2): 67-75. doi: 10.3963/j.jssn.1674-4861.2023.02.007
引用本文: 陈越, 焦朋朋, 白如玉, 李汝鉴. 基于深度强化学习的自动驾驶车辆跟驰行为建模[J]. 交通信息与安全, 2023, 41(2): 67-75. doi: 10.3963/j.jssn.1674-4861.2023.02.007
CHEN Yue, JIAO Pengpeng, BAI Ruyu, LI Rujian. Modeling Car Following Behavior of Autonomous Driving Vehicles Based on Deep Reinforcement Learning[J]. Journal of Transport Information and Safety, 2023, 41(2): 67-75. doi: 10.3963/j.jssn.1674-4861.2023.02.007
Citation: CHEN Yue, JIAO Pengpeng, BAI Ruyu, LI Rujian. Modeling Car Following Behavior of Autonomous Driving Vehicles Based on Deep Reinforcement Learning[J]. Journal of Transport Information and Safety, 2023, 41(2): 67-75. doi: 10.3963/j.jssn.1674-4861.2023.02.007

基于深度强化学习的自动驾驶车辆跟驰行为建模

doi: 10.3963/j.jssn.1674-4861.2023.02.007
基金项目: 

国家自然科学基金项目 52172301

国家社科基金项目 21ZAD029

北京市社会科学基金项目 21GLA010

详细信息
    作者简介:

    陈越(1996—),硕士研究生. 研究方向:智能交通、自动驾驶. E-mail:chenyue_bucea@163.com

    通讯作者:

    焦朋朋(1980—),博士,教授. 研究方向:智能交通、交通管理、交通规划与管理、交通安全等.E-mail:jiaopengpeng@bucea.edu.cn

  • 中图分类号: U491.2+5

Modeling Car Following Behavior of Autonomous Driving Vehicles Based on Deep Reinforcement Learning

  • 摘要: 为提高自动驾驶车辆的跟驰性能,减轻交通震荡干扰的负面影响,研究了1种基于深度强化学习的自动驾驶跟驰模型。在现有奖励函数设计基础上融入对能源消耗的考虑,基于VT-Micro模型构建能耗相关项;同时对使用跟车时距构建行驶效率因素相关项的方法进行优化,添加虚拟速度来避免在交通震荡场景中出现计算溢出和车间距过近的问题。为克服过往抑制震荡研究中仅用闭合环状模拟道路和仿真车辆轨迹开展训练的局限性,选用NGSIM轨迹数据中交通震荡阶段的驾驶员行为特征搭建训练环境,应用双延迟深度确定性策略梯度算法(Twin Delayed Deep Deterministic Policy Gradient Algorithm,TD3)训练形成多目标优化的跟驰模型。进一步构建模型性能测试评价体系,对比分析TD3模型与其他传统模型在跟车与交通震荡2类测试场景中的表现。跟车测试场景实验结果表明:在舒适度与行驶效率上,TD3模型和传统自适应巡航控制(Adaptive Cruise Control, ACC)模型表现相近,二者均优于人类驾驶员;在安全性上,TD3模型相较于传统ACC模型安全隐患降低53.65%,相较于人类驾驶员降低36.24%;在能源消耗上,TD3模型相较于传统ACC模型和人类驾驶员分别降低6.73%和15.65%。交通震荡场景实验结果表明:TD3模型可以有效减少交通振荡的负面影响;当TD3模型渗透率为100%时,相较于纯人类驾驶环境,行驶过程中的不适性降低55.95%,行驶效率提高8.82%,安全隐患降低73.21%,油耗减少5.97%。

     

  • 图  1  TD3模型训练过程

    Figure  1.  TD3 model training process

    图  2  滑动平均奖励值变化

    Figure  2.  Changing of rolling mean episode reward

    图  3  TTC数据概率密度

    Figure  3.  TTC probability density function

    图  4  iTTC数据概率密度

    Figure  4.  iTTC probability density function

    图  5  能耗数据概率密度

    Figure  5.  Energy consumption probability density function

    图  6  车辆加速度变化

    Figure  6.  Changing of vehicle acceleration

    图  7  车辆速度变化

    Figure  7.  Changing of vehicle speed

    图  8  跟车间距变化

    Figure  8.  Changing of car following distance

    图  9  跟车时距概率密度

    Figure  9.  Time gap probability density function

    图  10  Jerk数据概率密度

    Figure  10.  Jerk probability density function

    图  11  跟驰速度对比

    Figure  11.  Comparison of vehicle speed

    图  12  跟驰加速度对比

    Figure  12.  Comparison of vehicle acceleration

    图  13  不同TD3模型渗透率车辆轨迹对比图

    Figure  13.  Comparison of vehicle trajectory in various TD3 vehicle penetration rate

    表  1  模型超参数

    Table  1.   Hyperparameters of model

    参数 取值
    Actor网络学习率 0.000 1
    Critic网络学习率 0.000 2
    批量大小 512
    经验池大小 50 000
    折扣系数 0.95
    软更新速率 0.01
    Actor网络延迟更新频率 2
    α0 5
    α1 -120
    α2 0.05
    α3 0.4
    α4 0.1
    α5 -1.2
    α6 1
    α7 -0.3
    t0 0.5
    下载: 导出CSV

    表  2  安全性与燃油消耗对比

    Table  2.   Comparison of safety and fuel consumption

    渗透率/% 平均
    iTTC值/s
    相对变化
    率/%
    平均燃油
    消耗/mL
    相对变化
    率/%
    0 32.22 0 247.49 0
    20 26.21 -18.65 246.18 -0.52
    40 22.10 -31.41 243.68 -1.54
    60 16.37 -49.19 238.77 -3.52
    80 10.12 -68.59 233.18 -5.78
    100 8.63 -73.21 232.71 -5.97
    下载: 导出CSV

    表  3  行驶效率与舒适度对比

    Table  3.   Comparison of traffic efficiency and comfort

    渗透率/% 100~200 s
    时平均速
    度/(m/s)
    相对变化
    率/%
    平均Jerk
    绝对值之
    和/(m/s3)
    相对变化
    率/%
    0 7.59 0 51.81 0
    20 7.71 1.58 45.57 -12.04
    40 7.82 3.03 39.68 -23.41
    60 8.06 6.19 33.35 -35.63
    80 8.21 8.16 24.76 -52.21
    100 8.26 8.82 22.82 -55.95
    下载: 导出CSV
  • [1] LI X, CUI J, SHI A, et al. Stop-and-go traffic analysis: theoretical properties, environmental impacts and oscillation mitigation[J]. Transportation Research Part B: Methodological, 2014(70): 319-339.
    [2] ZHENG Z, AHN S, MONSERE C M. Impact of traffic oscillations on freeway crash occurrences[J]. Accident Analysis & Prevention, 2010, 42(2): 626-636.
    [3] GOLOB T F, RECKER W W, ALVAREZ V M. Safety aspects of freeway weaving sections[J]. Transportation Research Part A: Policy & Practice, 2004, 38(1): 35-51.
    [4] 韩雨, 郭延永, 张乐, 等. 消除高速公路运动波的可变限速控制方法[J]. 中国公路学报, 2022, 35(1): 151-158. doi: 10.19721/j.cnki.1001-7372.2022.01.013

    HAN Y, GUO Y Y, ZHANG L, et al. An optimal variable speed limit control approach against freeway jam waves[J]. China Journal of Highway and Transport, 2022, 35(1): 151-158. (in Chinese) doi: 10.19721/j.cnki.1001-7372.2022.01.013
    [5] HE Z, LIANG Z, SONG L, et al. A jam-absorption driving strategy for mitigating traffic oscillations[J]. IEEE Transactions on Intelligent Transportation Systems, 2017, 18(4): 802-813. doi: 10.1109/TITS.2016.2587699
    [6] 秦严严, 王昊, 何兆益, 等. 基于比功率的自动驾驶交通流油耗分析[J]. 交通运输系统工程与信息, 2020, 20(1): 91-96. doi: 10.16097/j.cnki.1009-6744.2020.01.014

    QIN Y Y, WANG H, HE Z Y, et al. Fuel consumption analysis of automated driving traffic flow based on vehicle specific power[J]. Journal of Transportation Systems Engineering and Information Technology, 2020, 20(1): 91-96. (in Chinese) doi: 10.16097/j.cnki.1009-6744.2020.01.014
    [7] KESTING A, TREIBER M, SCHÖNHOF M, et al. Adaptive cruise control design for active congestion avoidance[J]. Transportation Research Part C: Emerging Technologies, 2008.16(6): 668-683. doi: 10.1016/j.trc.2007.12.004
    [8] LI T N, CHEN D J, ZHAO H, et al. Car-following behavior characteristics of adaptive cruise control vehicles based on empirical experiments[J]. Transportation Research Part B: Methodological, 2021.147: 67-91. doi: 10.1016/j.trb.2021.03.003
    [9] LIN X, MENG W, VAN AREM B. Realistic car-following models for microscopic simulation of adaptive and cooperative adaptive cruise control vehicles[J]. Transportation Research Record: Journal of the Transportation Research Board, 2017, 2623(1): 1-9. doi: 10.3141/2623-01
    [10] ZHOU M, QU X, LI X. A recurrent neural network based microscopic car following model to predict traffic oscillation[J]. Transportation Research Part C: Emerging Technologies, 2017, 84: 245-264. doi: 10.1016/j.trc.2017.08.027
    [11] HUANG X, SUN J, SUN J. A car-following model considering asymmetric driving behavior based on long short-term memory neural networks[J]. Transportation Research Part C: Emerging Technologies, 2018, 95: 346-362. doi: 10.1016/j.trc.2018.07.022
    [12] MA L, QU S. A sequence to sequence learning based car-following model for multi-step predictions considering reaction delay[J]. Transportation Research Part C: Emerging Technologies, 2020, 120: 102785. doi: 10.1016/j.trc.2020.102785
    [13] 朱冰, 蒋渊德, 赵健, 等. 基于深度强化学习的车辆跟驰控制[J]. 中国公路学报, 2019, 32(6): 53-60. https://www.cnki.com.cn/Article/CJFDTOTAL-ZGGL201906006.htm

    ZHU B, JIANG Y D, ZHAO J, et al. A car-following control algorithm based on deep reinforcement learning[J]. China Journal of Highway and Transport, 2019, 32(6): 53-60. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-ZGGL201906006.htm
    [14] 闫浩, 刘小珠, 石英. 基于REINFORCE算法和神经网络的无人驾驶车辆变道控制[J]. 交通信息与安全, 2021, 39(1): 164-172. doi: 10.3963/j.jssn.1674-4861.2021.01.0019

    YAN H, LIU X Z, SHI Y. Lane-change control for unmanned vehicle based on REINFORCE algorithm and neural network[J]. Journal of Transport Information and Safety, 2021, 39(1): 164-172. (in Chinese) doi: 10.3963/j.jssn.1674-4861.2021.01.0019
    [15] 李孟凡, 秦文虎, 云中华. 基于横纵向联合控制的多目标优化车辆跟驰研究[J]. 计算机应用研究, 2022, 39(8): 2409-2413. https://www.cnki.com.cn/Article/CJFDTOTAL-JSYJ202208028.htm

    LI M F, QIN W H, YUN Z H. Multi-objective optimal car-following model with lateral and longitudinal control[J]. ApplicationResearchofComputers, 2022, 39 (8): 2409-2413. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-JSYJ202208028.htm
    [16] KREIDIEH A R, WU C, BAYCN A M. Dissipating stop-and-go waves in closed and open networks via deep reinforcement learning[C]. 2018 IEEE International Conference on Intelligent Transportation Systems(ITSC), Hawaii, USA: IEEE, 2018.
    [17] QU X, YU Y, ZHOU M, et al. Jointly dampening traffic oscillations and improving energy consumption with electric, connected and automated vehicles: A reinforcement learning based approach[J]. Applied Energy, 2020(257): 114030
    [18] ZHU M X, WANG Y H, PU Z Y, et al. Safe, efficient, and comfortable velocity control based on reinforcement learning for autonomous driving[J]. Transportation Research Part C: Emerging Technologies, 2020(117): 102662.
    [19] BALAS V E, BALAS M M. Driver assisting by inverse time to collision[C]. 2006 World Automation Congress, Budapest, Hungary: IEEE, 2006.
    [20] YAO Z H, RONG H, JIANG Y S, et al. Stability and safety evaluation of mixed traffic flow with connected automated vehicles on expressways[J]. Journal of Safety Research, 2020(75): 262-274.
    [21] YAO Z H, XU T R, JIANG Y S, et al. Linear stability analysis of heterogeneous traffic flow considering degradations of connected automated vehicles and reaction time[J]. Physica A: Statistical Mechanics and Its Applications, 2021(561): 125218.
    [22] MONTANINO M, PUNZO V. Trajectory data reconstruction and simulation-based validation against macroscopic traffic patterns[J]. Transportation Research Part B: Methodological, 2015, 80: 82-106.
    [23] TREIBER M, HENNECKE A, HELBING D. Congested traffic states in empirical observations and microscopic simulations[J]. Physical Review E, 2000(62): 1805-1824.
  • 加载中
图(13) / 表(3)
计量
  • 文章访问数:  973
  • HTML全文浏览量:  316
  • PDF下载量:  108
  • 被引次数: 0
出版历程
  • 收稿日期:  2022-09-14
  • 网络出版日期:  2023-06-19

目录

    /

    返回文章
    返回