|
| 1 | +--- |
| 2 | +type: "Conference Paper" # Conference Paper, Journal Paper, Ph.D. Thesis, Master's Thesis |
| 3 | +layout: publication # Do not change this |
| 4 | +group: publications # Do not change this |
| 5 | +title: "LLM-PPO Driver: Improving Autonomous Driving via LLM-Guided Reward Shaping and Imitation Learning" # Title of the paper |
| 6 | +# krtitle: # only for domestic papers |
| 7 | +authors: |
| 8 | + - name: "Ahmad Mouri Zadeh Khaki" |
| 9 | + - name: "Kyunghwan Choi" |
| 10 | + corresponding: true # true if this author is the corresponding author |
| 11 | +domestic_or_international: "International" # "International" or "Domestic" |
| 12 | +# preprint: # Preprint information - REMOVE THIS FIELD IF NOT APPLICABLE! |
| 13 | +# - name: Techrxiv |
| 14 | +# doi: "10.36227/techrxiv.173014412.26480551/v1" |
| 15 | +# year: 2024 |
| 16 | + # pdf: "/static/pub/2025-all-wheel.pdf" |
| 17 | + # state: "published" # published, accepted, submitted |
| 18 | +pub: # Publication information - REMOVE THIS FIELD IF NOT APPLICABLE! |
| 19 | + - name: "Asian Control Conference (ASCC)" |
| 20 | + pdf: "/static/pub/2026-LLM-PPO.pdf" |
| 21 | + doi: # Leave it blank if not applicable |
| 22 | + vol: # Leave it blank if not applicable |
| 23 | + num: # Leave it blank if not applicable |
| 24 | + pp: # "380-385" # Leave it blank if not applicable |
| 25 | + year: "2026" # Leave it blank if not applicable |
| 26 | + state: "submitted" # published, accepted, submitted |
| 27 | + bib: # "/static/pub/2025-imposing.bib" # Leave it blank if not applicable |
| 28 | +pub_date: "2026-02-16" # Date of publication. Change Techrxiv (or other preprint) date to Journal date once published. |
| 29 | +image: "/static/pub/2026-LLM-PPO.png" # Representative image of the paper |
| 30 | +abstract: " |
| 31 | + Proximal Policy Optimization (PPO) has shown promise for autonomous driving; however, it suffers from sparse rewards, slow convergence, and unsafe behaviors due to exploration without prior knowledge. These limitations are particularly critical in safety-sensitive driving scenarios, where failure events are rare but severe. To address this issue, we propose LLM-PPO Driver, a framework that enhances PPO-based motion planning by incorporating high-level semantic driving knowledge from a Large Language Model (LLM). The LLM does not participate in real-time decision-making; instead, it provides structured prior knowledge that is integrated through reward shaping and imitation learning. This lightweight and modular design eliminates deployment-time inference overhead while guiding policy learning toward safer and more efficient behaviors. Experiments in the Gym highway-v0 environment demonstrate consistent improvements in task success and safety over a baseline PPO agent, with imitation learning yielding the largest performance gain. These results highlight the effectiveness of leveraging LLM-based prior knowledge to mitigate unsafe exploration and improve learning efficiency in autonomous driving. |
| 32 | +" |
| 33 | +# additional: # additional information such as awards, etc. |
| 34 | +# - "π Awarded **Best Paper Award** at the _2025 European Control Conference (ECC)_." |
| 35 | +# links: # additional links; |
| 36 | +# - name: |
| 37 | +# url: |
| 38 | +--- |
0 commit comments