You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/ppo/main.clj
+10-2Lines changed: 10 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -3,7 +3,7 @@
3
3
:external-requirements []
4
4
:quarto {:author [:janwedekind]
5
5
:drafttrue
6
-
:description"A Clojure port of Jinghao's PPO implementation using Pytorch and Quil"
6
+
:description"A Clojure port of XinJingHao's PPO implementation using Pytorch and Quil"
7
7
:image"pendulum.png"
8
8
:type:post
9
9
:date"2026-04-18"
@@ -16,5 +16,13 @@
16
16
(require-python '[torch :as torch])
17
17
18
18
;; Recently I started to look into the problem of reentry trajectory planning in the context of developing the [sfsim](https://store.steampowered.com/app/3687560/sfsim/) space flight simulator.
19
-
19
+
;; I had looked into reinforcement learning before and tried out Q-learning using the [lunar lander reference environment of OpenAI's gym library](https://gymnasium.farama.org/environments/box2d/lunar_lander/).
20
+
;; However I had stability issues.
21
+
;; The algorithm would learn a strategy and then suddenly diverge again.
22
+
;;
23
+
;; More recently (2017) the Proximal Policy Optimization (PPO) algorithm was published and it has gained in popularity.
24
+
;; PPO is inspired by Trust Region Policy Optimization (TRPO) but is much easier to implement.
25
+
;; The [Stable Baselines3](https://github.com/DLR-RM/stable-baselines3) Python library has a implementation of PPO, TRPO, and other reinforcement learning algorithms.
26
+
;; However I found [XinJingHao's PPO implementation](https://github.com/XinJingHao/PPO-Continuous-Pytorch/) which I found easier to follow.
0 commit comments