-
Does there exist an "antisymmetric kernel theory" akin to that of RKHS theory? I.e. when is an antisymmetric function
$K(\theta, \theta')$ given by applying$\phi:\Theta \to V$ to$\theta$ and$\theta'$ and then evaluating antisymmetric bilinear form$\omega$ on the images, i.e.$K(\theta, \theta')=\omega(\phi(\theta), \phi(\theta'))$ . -
Is 8.6 of that form?
-
Is that of use for anything?
We assume various regularity conditions.
- If
$p_\theta(x)$ is a family of probability distributions,
- We define
$I(\theta)=E[\frac{\partial \log p(x|\theta) }{\partial \theta_i }\frac{\partial \log p(x|\theta) }{\partial \theta_j}]$ to be the covariance of the derivative of score function. Then under some regularity conditions,
Proof: Abusing notation by writing
$(\log p){ij}=\left(\frac{p_i}{p}\right)j=\frac{p{ij}}{p}-\frac{p_i p_j}{p^2}=\frac{p{ij}}{p}-(\log p)_i (\log p)_j$
So we just need to show that
$\int_X \frac{p_{ij}}{p} p d x=\int_X p_{ij} d x=(\int_X p dx ){ij}=1{ij} =0$
- From Wikipedia: Suppose
$X$ has pdf$f_\theta(X)$ , and$T=T(X)$ of$\theta$ has pdf$g_\theta(T)$ . Let$I(\theta)$ be the Fisher information of$f_\theta$ and$J(\theta)$ be the Fisher information of$g_\theta$ . Then$I(\theta)\geq J(\theta)$ (meaning difference is positive semidefinite) with equality if and only if$T$ is sufficient.
Proof:
Write
$(\log f_\theta (X))i = (\log p\theta(X|T))i +(\log g\theta(T))_i$
By the factorization theorem,
as wanted.
Remark: This last computation is related to the one in Chain rule for mutual information.
This line of reasoning about "disjunction of elementary 'atomic' propositions" seems equally inappropriate as a criticism of either Bayesean or frequentist probability, as long as one keeps in mind distinction between events (which have probabilities, but are often not 'atomic'), and outcomes (which are 'atomic', but often have no probability). See section 8.11 instead.
This is exactly the wrong example to complain about "isolated clever tricks" -- setting up and analysing a Markov chain is a fairly general method to solve similar probability problems, well connected to other key areas of probability theory. This serves as an ironic illustration of a deeper point - many clever tricks when well understood become powerful methods, much more powerful indeed than straightforward but uninspiring computations.
There is less disagreement here than may at first appear. I'm all for "general mathematical techniques which will work not only on our present problem, but on hundreds of others"; it's just that your current "general technique" may solve a given problem, but not explain what is going on in it (Paul Zeitz calls this "How vs. Why"). A clever trick may lead you to a better general theory, closer to answering the "why" question --- as indeed the Peter and Paul coin tossing example illustrates. So, no to gamesmanship, yes to bringing the game to the next level.