[mdp] add markov property explanation

elisaxia123 · elisaxia123 · commit 9caa8a44e5ce · 2026-04-09T22:18:54.000-04:00
diff --git a/mdp.qmd b/mdp.qmd
@@ -29,7 +29,14 @@ The notation $S_t = s'$ uses a capital letter $S$ to stand for
 
 -   $\gamma \in [0, 1]$ is a *discount factor*, which we'll discuss in @sec-mdp_infinite_horizon.
 
-In this class, we assume the rewards are deterministic functions. Further, in this MDP chapter, we assume the state space and action space are discrete and finite.
+MDPs also satisfy the Markov property, which means the next-state distribution depends only on the current state and action, not on the  Past. Formally, the Markov property is expressed as:
+$$
+Pr(S_{t+1} = s_{t+1} | S_t = s_t, A_t = a_t, S_{t-1} = s_{t-1}, A_{t-1} = a_{t-1}, \ldots, S_0 = s_0, A_0 = a_0) = Pr(S_{t+1} = s_{t+1} | S_t = s_t, A_t = a_t)
+$$
+
+In other words, the future is only dependent on the present, and not on the past.
+
+In this class, we also assume the rewards are deterministic functions. Further, in this MDP chapter, we assume the state space and action space are discrete and finite.
 
 :::{.callout-note}
 # Example