Bellman equation in RL

it2025-02-02 43

文章目录

Bellman equationDefinition

Bellman equation

Bellman expectation equation 是强化学习中非常基础而且重要的概念，但是有些细节却不好理解，尤其是关于 $\mathbb{E}_{\pi}$ (关于 policy $\pi$ 的期望）的部分。在参考了Understanding RL: The Bellman Equations 和 Derivation of Bellman’s Equation 这两篇文章中的推导内容之后，特地将 Bellman 公式的推理过程整理在这里。

Definition

Given a finite set of states ( $S$ ) and actions ( $A$ ), the state transition probability is $Pr(S_{t+1}=s' \big| S_t =s, A_t=a)$ and the reward is $Pr(R_{t+1}=r \big| S_t =s, A_t=a, S_{t+1}=s')$ Notice the reward is actually a distribtion other than a determined value, this is root of some misunderstanding, because some articles only write the expectation.

Here we could put state transition and reward distribution as a single probability $S_{t+1}=s', R_{t+1}=r \big| S_t =s, A_t=a)$

Given a policy is a mapping from $S$ to $A$ like $\pi(a|s)$ .

The value function $V_\pi(s) = \mathbb{E}_\pi\{\sum_{k=1}^{\inf}{\gamma^{k}R_{t+k+1}}\big| S_t =s\}$ and action value function as $q_\pi(s,a) = \mathbb{E}_\pi\{\sum_{k=1}^{\inf}{\gamma^{k}R_{t+k+1}}\big| S_t =s, A_t=a\}$

According to the well known law of total expectation and the state transition diagram

$q_\pi(s,a) = \sum_{s',r}{[r+\gamma V_\pi(s')]p(s',r|s,a)}$

$V_\pi(s) = \sum_a{\pi(a|s) \sum_{s',r}{[r+\gamma V_\pi(s')]p(s',r|s,a)}}\\ = \sum_a{\pi(a|s) q_\pi(s,a)}$

最新回复(0)