Skip to content

Commit 9caa8a4

Browse files
committed
[mdp] add markov property explanation
1 parent 59b1e3c commit 9caa8a4

1 file changed

Lines changed: 8 additions & 1 deletion

File tree

mdp.qmd

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,14 @@ The notation $S_t = s'$ uses a capital letter $S$ to stand for
2929

3030
- $\gamma \in [0, 1]$ is a *discount factor*, which we'll discuss in @sec-mdp_infinite_horizon.
3131

32-
In this class, we assume the rewards are deterministic functions. Further, in this MDP chapter, we assume the state space and action space are discrete and finite.
32+
MDPs also satisfy the Markov property, which means the next-state distribution depends only on the current state and action, not on the Past. Formally, the Markov property is expressed as:
33+
$$
34+
Pr(S_{t+1} = s_{t+1} | S_t = s_t, A_t = a_t, S_{t-1} = s_{t-1}, A_{t-1} = a_{t-1}, \ldots, S_0 = s_0, A_0 = a_0) = Pr(S_{t+1} = s_{t+1} | S_t = s_t, A_t = a_t)
35+
$$
36+
37+
In other words, the future is only dependent on the present, and not on the past.
38+
39+
In this class, we also assume the rewards are deterministic functions. Further, in this MDP chapter, we assume the state space and action space are discrete and finite.
3340

3441
:::{.callout-note}
3542
# Example

0 commit comments

Comments
 (0)