You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Of course mixed partials are equal (under some continuity of second derivatives assumption, Clairaut's theorem), so the two expressions are the same.
Continuing to get 2.19 from 2.18:
We have then $G(x,y)G(y,z)=P(x,z)$. Pick any fixed $z$. Denote $P(x,z)=A(x)$ and $G(y,z)=B(y)$. Then $G(x,y)=\frac{A(x)}{B(y)}$ [and $G(y,z)=\frac{A(y)}{B(z)}$].
Plug this in to $G(x,y)G(y,z)=P(x,z)$ to get $\frac{A(x)A(y)}{B(y)B(z)}=P(x,z)$.
So $A(y)/B(y)$ is independent of $y$, so is constant equal to $r$. This means
The variables $v, y, z$ are related by $v=F(y,z)$. One can interpret 2.22 as equality of differential 1-forms on the surface $\Sigma={v,y,z|v=F(y,z)}$ in 3D space with coordinates $v-y-z$. The forms $\frac{v}{H(v)}$, $\frac{y}{H(y)}$ and $\frac{z}{H(z)}$ are exact, meaning are differentials of functions. We can find these functions by single-variable integration because each of the 1-forms depends only on one of the variables (in forms language, is pulled back via coordinate projection from the corresponding coordinate line), meaning we can find the antiderivative of the corresponding 1-form on the coordinate line, and then "pull back" the result, which just means interpret it as a function on $\Sigma$. Namely if we set
$$f(x)=\int_{x_0}^x \frac{1}{H(t)}dt$$
be a function well-defined up to a constant, we have
implies equality of antiderivatives up to an additive constant, i.e.
$$f(v)=f(y)+rf(z).$$
Exponentiating this we get, on the surface $v=F(y, z)$ the equality, up to a multiplicative constant,
$$w(v)=w(y)w(z)^r$$
or
$$w(F(y,z))=w(y)w(z)^r.$$
Brief explanation of the overall line of reasoning on from 2.45 to 2.58
TODO
Note: There is a typo in the book just above equation 2.45 on pg 31. (2.25) should be (2.40).
Symmetry of the domain of 2.45
The domain is $0\leq S(y)\leq x$ because $S(y)$ is plausibility of $\bar{B}=AD$, which is at most plausibility of $A$, aka $x$. for general $A,D$ these (and $x,y\in [0,1]$) are the only requirements, anything else should be possible, so hence the domain).
The symmetry of the domain comes from $S$ being self-inverse and monotone decreasing. In fact, by monotonicity we have $S(y)\leq x \Leftrightarrow S(S(y))\geq S(x)$ and by $SS=Id$ (Eq. 2.46) we have $S(S(y))\geq S(x) \Leftrightarrow y\geq S(x)$.
(In general the graph of the inverse function is obtained by flipping the graph of the original function through the $x=y$ line, and so the graph of $S$ is symmetric to this flip preciely when $S$ is it's own inverse; monotonicity makes the same true for the overgraph region.)
We can slightly rewrite the above argument as: $y\geq S(x)$ means $w(\overline{AD})\geq w(\bar{A})$, which by monotonicity of $S$ means $w(AD)\geq w(A)$, i.e. $x\geq S(y)$.
Note: There's a missing bracket in 2.49 before the second =.
Proof of Equation 2.50
Source: stackexchange, I've reworded it and added detail to (hopefully) make it clearer.
We will use the Taylor series approximation, which is an approximation of $f(t)$ around the point $a$:
We then use a Taylor series approximation of the function $f(\delta) = \frac{1}{1-\delta}$ around with $a = 0$.
$$S(y) = S[S(x)(1+\delta + O(\delta^2))]$$
$$S(y) = S[S(x) + S(x)\delta + S(x)O(\delta^2)]$$
Now we want to get rid of the $S[]$ surrounding the equation, so we will use another Taylor approximation of the function $S(t)$. We approximate around the point $a=S(x)$.
From now on we will treat $x$ as fixed and only vary $q$, sending it to $+\infty$, which in light of 2.48 means keeping $x$ fixed and sending $y$ to $S(x)$ from below.
we would get $J(q+\alpha)+O(\exp{-2q})$ and 2.53 would follow. We don't get that, but we get almost the same thing - just without the 2 in the last exponent.
We use the following fact: If $f(q)=O(g(q))$ then, for any eventually non-zero $h(q)$ one has $f(q)=h(q) O(g(q)/h(q))$. This is simply from definition: both mean $\lim_{q\to something} \frac{g(q)}{f(q)}=\lim_{q\to something} \frac{g(q)/h(q)}{f(q)/h(q)}=0$
So, since $q$ is the variable and so single $\exp{\alpha}$ is a constant and can be absorbed into any $O(g(q))$, we write
I find it simpler to do "directly", rather than to show the asymptotic expansion 2.54.
Remark: To get 2.54 one must first make sure that $\alpha$ actually takes "continuum of values". Since $\alpha$ is a continuous function of $x$, intermediate value theorem implies that the set of values of $\alpha$ is an interval; we just check that it is not a degenerate interval consisting of a single point. Indeed, that would mean $\alpha(x)$ is constant, or $S'(x)/S(x)=-c/x$, $(\ln S)'=-c/x$, $\ln S(x)= a-c/x$, but this breaks $S(0)=1$. Now we do know that $\alpha$ takes ''continuum of values". We will use this as well.
We start with 2.53 in the form
$$J(q+\alpha(x))-J(q)= \beta(x)+ O( \exp{-q})$$
We want to deduce that
$b(x)=\beta(x)/\alpha(x)$ is constant.
Intuitively, $J(q+\alpha(x))-J(q)= \beta(x)+ O(\exp{- q})$ does say that for every increment of $\alpha(x)$ in the input, the output of $J$ increases by $\beta(x)$ (plus a small error), so (asymptotically) $J$ must be linear with slope=rise/run=$b(x),$ and since there can be only one slope, $b(x)$ must be constant. The question is how to make it precise.
First, if the error term was absent
$$J(q + \alpha(x)) - J (q) = \beta(x)$$
implies that if $\alpha(x_1)=\alpha(x_2)$ then $\beta(x_1)=\beta(x_2)$, so
$\beta$ is a well-defined function of $\alpha$, and if $J(q)$ is continuous $\beta(\alpha)$ is also continuous. Now we can write
$$J(q + \alpha) - J (q) = \beta(\alpha).$$
Now this implies by induction
$$J(q + n\alpha)= J (q) +n\beta(\alpha)$$
Then given any two $\alpha$ values $\alpha_0$, $\alpha_1$, we have
$$J(q + n_0\alpha_0)=J(q)+n_0\beta(\alpha_0),$$
$$J(q + n_1\alpha_1)=J(q)+n_1\beta(\alpha_1).$$
If $\alpha_0/\alpha_1$ is rational then $\alpha_1=(n_0/n_1) \alpha_0$, and after plugging into the above
meaning $\beta(\alpha_0)/\alpha_0=\beta(\alpha_1)/\alpha_1$. Since $\beta(\alpha)$ is continuous, bing constant on all rational multiples of a given $\alpha_0$ implies that it is constant (recall $\alpha$ varies over an interval, on which rational multiples of $\alpha_0$ are dense).
Now we want to repeat this argument with error terms.
Suppose $x_1$ and $x_2$ are such that $\alpha(x_1)=\alpha(x_2)$.
Then $J(q+\alpha(x))-J(q)= \beta(x)+ O(\exp{- q })$ implies, by plugging in sufficiently large $q$, that the difference $\beta(x_1)-\beta(x_2)$ is smaller than any positive number, so is zero. Thus, as before $\beta(\alpha)$ is well-defined.
To see that $\beta(\alpha)$ is continuous, given $x_0$ corresponding to some $\alpha_0$ and any $\varepsilon>0$ pick $q$ such that $|O(\exp{-q})| <\varepsilon/2$ and, using continuity of $J$ at $q+\alpha_0$, pick $\delta$ such that $|\alpha-\alpha_0|<\delta$ implies $|J(q+\alpha)-J(q+\alpha_0)|<\varepsilon/2$. Then $|\beta(\alpha)-\beta(\alpha_0)|<\varepsilon$ on the same interval $|\alpha-\alpha_0|<\delta$, meaning that $\beta(\alpha)$ is continuous at $\alpha_0$, as wanted.
We know that for each $x$ and each $C>0$ there exists $Q(x, C)$ such that that for $q\geq Q(x, C)$ we have $|O(\exp{-q })|< C\exp{ -q }$. Pick any $q(x,C)\geq Q(x, C)$.
Now we have by induction (with everything depending on $x$)
$$|J(q+n\alpha)-(J (q)+n\beta)|$$
$$\leq C \exp{-q}(1+\exp{- \alpha }+...+\exp{-(n-1) \alpha })$$
$$<\frac{C}{1-\exp{- \alpha }} \exp{- q } $$
As before is $\alpha_1=(n_0/n_1)\alpha_0$ writing the above and picking sufficiently large $q$ we get
$$\beta(\alpha_0)/\beta(\alpha_1)=n_0/n_1=\alpha_0/\alpha_1.$$
The rest is the sam as in the "error-less" case.
I think this problem is ambiguous and can be interpreted in multiple ways, see here for a different interpretation. But I think the following interpretation makes more sense.
With $X$ representing any background information:
$$
\begin{aligned}
p(C|(A+B)X) &= \frac{p(A+B|CX)p(C|X)}{p(A+B|X)}\
&= \frac{[p(A|CX)+p(B|CX)-p(AB|CX)]p(C|X)}{p(A|X)+p(B|X)-p(AB|X)}\
&= \frac{p(AC|X)+p(BC|X)-p(ABC|X)}{p(A|X)+p(B|X)-p(AB|X)}
\end{aligned}
$$
Exercise 2.2
We will use convention that all $P$ are conditioned on $X$. So $P(A|C)$ actually stands for $P(A|CX)$.
First we do a bunch of lemmas about mutually exclusive propositions.
If $A_i$ and are mutually exclusive, and $C$ is arbitrary, then