UA MATH566 统计理论 Fisher信息量的性质上

UA MATH566 统计理论 Fisher信息量的性质上

C-R下界是由Fisher统计量定义的,在推导C-R下界的时候,我们只是把下界的逆定义成了Fisher信息量,但尚未探讨这个量的本质是什么?为什么要叫它信息量?它有哪些性质?以及怎么计算的问题。这一讲我们讨论前两个问题,下一讲讨论它的性质,计算则留到后续的博客结合例题介绍。

Fisher信息量的定义

某分布族为 f ( x , θ ) , θ ∈ Θ f(x,\theta),\theta \in \Theta f(x,θ),θΘ,假设 Θ ⊂ R \Theta \subset \mathbb{R} ΘR,则此时的得分函数关于分布参数是一维的
S ( x , θ ) = ∂ log ⁡ L ( θ ) ∂ θ = 1 f ( x , θ ) ∂ f ( x , θ ) ∂ θ S(x,\theta) = \frac{\partial \log L(\theta)}{\partial \theta} = \frac{1}{f(x,\theta)} \frac{\partial f(x,\theta)}{\partial \theta} S(x,θ)=θlogL(θ)=f(x,θ)1θf(x,θ)

则它的一阶矩为零,并称它的二阶矩为Fisher信息量
E [ S ( X , θ ) ] = 0 ,    E [ S ( X , θ ) ] 2 = I ( θ ) E[S(X,\theta)]=0,\ \ E[S(X,\theta)]^2 = I(\theta) E[S(X,θ)]=0,  E[S(X,θ)]2=I(θ)

Fisher信息量的数学意义

下面说明为什么 I ( θ ) I(\theta) I(θ)可以用来衡量信息的多少。在UA MATH636 信息论1 熵中我们介绍了熵、Divergence、互信息等用来衡量信息多少的量,下面我们就来讨论一下Fisher信息量与信息论中的这些信息度量之间的关系。

参考UA MATH636 信息论6 微分熵,微分熵的定义是
h ( θ ) = − E [ log ⁡ f ( x , θ ) ] = − ∫ f ( x , θ ) log ⁡ f ( x , θ ) d x h(\theta) = -E[\log f(x,\theta)] = -\int f(x,\theta) \log f(x,\theta)dx h(θ)=E[logf(x,θ)]=f(x,θ)logf(x,θ)dx

计算微分熵的差分
Δ h = h ( θ + Δ θ ) − h ( θ ) = − ∫ f ( x , θ ) [ log ⁡ f ( x , θ + Δ θ ) − log ⁡ f ( x , θ ) ] d x \Delta h = h(\theta + \Delta\theta) - h(\theta) = -\int f(x,\theta)[ \log f(x,\theta + \Delta\theta)- \log f(x,\theta)]dx Δh=h(θ+Δθ)h(θ)=f(x,θ)[logf(x,θ+Δθ)logf(x,θ)]dx

接下来做Taylor展开
log ⁡ f ( x , θ + Δ θ ) − log ⁡ f ( x , θ ) = ∂ log ⁡ f ( x , θ ) ∂ θ Δ θ + 1 2 ! ∂ 2 log ⁡ f ( x , θ ) ∂ 2 θ ( Δ θ ) 2 + o ( Δ θ ) 2 \log f(x,\theta + \Delta\theta)- \log f(x,\theta) = \frac{\partial \log f(x,\theta)}{\partial \theta}\Delta \theta +\frac{1}{2!} \frac{\partial^2 \log f(x,\theta)}{\partial^2 \theta}(\Delta \theta)^2 + o(\Delta \theta)^2 logf(x,θ+Δθ)logf(x,θ)=θlogf(x,θ)Δθ+2!12θ2logf(x,θ)(Δθ)2+o(Δθ)2

下面计算那两个导数
∂ log ⁡ f ( x , θ ) ∂ θ = f ′ ( x , θ ) f ( x , θ ) ,   ∂ 2 log ⁡ f ( x , θ ) ∂ 2 θ = f ′ ′ ( x , θ ) f ( x , θ ) − [ f ′ ( x , θ ) ] 2 f 2 ( x , θ ) \frac{\partial \log f(x,\theta)}{\partial \theta} = \frac{f'(x,\theta)}{f(x,\theta)},\ \frac{\partial^2 \log f(x,\theta)}{\partial^2 \theta} = \frac{f''(x,\theta)}{f(x,\theta)}-\frac{[f^{'}(x,\theta)]^2}{f^2(x,\theta)} θlogf(x,θ)=f(x,θ)f(x,θ), 2θ2logf(x,θ)=f(x,θ)f(x,θ)f2(x,θ)[f(x,θ)]2

把这些结论带入差分中,
Δ h = − ∫ f ( x , θ ) [ f ′ ( x , θ ) f ( x , θ ) Δ θ + 1 2 ( f ′ ′ ( x , θ ) f ( x , θ ) − [ f ′ ( x , θ ) ] 2 f 2 ( x , θ ) ) ( Δ θ ) 2 + o ( Δ θ ) 2 ] d x \Delta h = -\int f(x,\theta)[ \frac{f'(x,\theta)}{f(x,\theta)}\Delta \theta +\frac{1}{2}(\frac{f''(x,\theta)}{f(x,\theta)}-\frac{[f^{'}(x,\theta)]^2}{f^2(x,\theta)})(\Delta \theta)^2+o(\Delta \theta)^2]dx Δh=f(x,θ)[f(x,θ)f(x,θ)Δθ+21(f(x,θ)f(x,θ)f2(x,θ)[f(x,θ)]2)(Δθ)2+o(Δθ)2]dx

我们逐项分析,第一项
∫ f ( x , θ ) f ′ ( x , θ ) f ( x , θ ) Δ θ d x = E [ S ( X , θ ) ] Δ θ = 0 \int f(x,\theta) \frac{f'(x,\theta)}{f(x,\theta)}\Delta \theta dx = E[S(X,\theta)]\Delta \theta = 0 f(x,θ)f(x,θ)f(x,θ)Δθdx=E[S(X,θ)]Δθ=0

最后一项积分后还是高阶无穷小量,可以忽略
∫ o ( Δ θ ) 2 d x = o ( Δ θ ) 2 \int o(\Delta \theta)^2 dx = o(\Delta \theta)^2 o(Δθ)2dx=o(Δθ)2

第二项中的第一部分
∫ f ( x , θ ) f ′ ′ ( x , θ ) f ( x , θ ) d x = E [ ∂ S ( X , θ ) ∂ θ ] = ∂ ∂ θ E [ S ( X , θ ) ] = 0 \int f(x,\theta )\frac{f''(x,\theta)}{f(x,\theta)} dx = E[\frac{\partial S(X,\theta)}{\partial \theta}] = \frac{\partial}{\partial \theta}E[S(X,\theta)] = 0 f(x,θ)f(x,θ)f(x,θ)dx=E[θS(X,θ)]=θE[S(X,θ)]=0

因此最后只剩下
Δ h = 1 2 ∫ f ( x , θ ) [ f ′ ( x , θ ) ] 2 f 2 ( x , θ ) ( Δ θ ) 2 d x = E [ ∂ log ⁡ L ( x , θ ) ∂ θ ∂ log ⁡ L ( x , θ ) ∂ θ ] ( Δ θ ) 2 = 1 2 I ( θ ) ( Δ θ ) 2 \Delta h = \frac{1}{2} \int f(x,\theta)\frac{[f^{'}(x,\theta)]^2}{f^2(x,\theta)}(\Delta \theta)^2dx \\= E[\frac{\partial \log L(x,\theta)}{\partial \theta}\frac{\partial \log L(x,\theta)}{\partial \theta}](\Delta \theta)^2 = \frac{1}{2}I(\theta)(\Delta \theta)^2 Δh=21f(x,θ)f2(x,θ)[f(x,θ)]2(Δθ)2dx=E[θlogL(x,θ)θlogL(x,θ)](Δθ)2=21I(θ)(Δθ)2

如果是多维分布,那么
Δ h = 1 2 Δ θ T I ( θ ) Δ θ \Delta h = \frac{1}{2} \Delta \theta^TI(\theta)\Delta \theta Δh=21ΔθTI(θ)Δθ

也就是说,熵的差分可以表示为以Fisher信息量为矩阵的二次型

已标记关键词 清除标记
相关推荐
©️2020 CSDN 皮肤主题: 撸撸猫 设计师:C马雯娟 返回首页