Skip to content

Commit 208ac44

Browse files
committed
Working on introduction to post about PPO
1 parent 539c4b9 commit 208ac44

1 file changed

Lines changed: 10 additions & 2 deletions

File tree

src/ppo/main.clj

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
:external-requirements []
44
:quarto {:author [:janwedekind]
55
:draft true
6-
:description "A Clojure port of Jinghao's PPO implementation using Pytorch and Quil"
6+
:description "A Clojure port of XinJingHao's PPO implementation using Pytorch and Quil"
77
:image "pendulum.png"
88
:type :post
99
:date "2026-04-18"
@@ -16,5 +16,13 @@
1616
(require-python '[torch :as torch])
1717

1818
;; Recently I started to look into the problem of reentry trajectory planning in the context of developing the [sfsim](https://store.steampowered.com/app/3687560/sfsim/) space flight simulator.
19-
19+
;; I had looked into reinforcement learning before and tried out Q-learning using the [lunar lander reference environment of OpenAI's gym library](https://gymnasium.farama.org/environments/box2d/lunar_lander/).
20+
;; However I had stability issues.
21+
;; The algorithm would learn a strategy and then suddenly diverge again.
22+
;;
23+
;; More recently (2017) the Proximal Policy Optimization (PPO) algorithm was published and it has gained in popularity.
24+
;; PPO is inspired by Trust Region Policy Optimization (TRPO) but is much easier to implement.
25+
;; The [Stable Baselines3](https://github.com/DLR-RM/stable-baselines3) Python library has a implementation of PPO, TRPO, and other reinforcement learning algorithms.
26+
;; However I found [XinJingHao's PPO implementation](https://github.com/XinJingHao/PPO-Continuous-Pytorch/) which I found easier to follow.
27+
;;
2028
;; ![pendulum](pendulum.png)

0 commit comments

Comments
 (0)