From 4621d8bb796a682ac19fca96b7129993b8ebeb12 Mon Sep 17 00:00:00 2001 From: flow254 <61912727+flow254@users.noreply.github.com> Date: Wed, 10 Dec 2025 17:03:43 +0300 Subject: [PATCH] ch01: update subsection 1.1.3 - typo fix --- chapters/part1/ch01_information_compression.tex | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/chapters/part1/ch01_information_compression.tex b/chapters/part1/ch01_information_compression.tex index d76fc65..d97d3e5 100644 --- a/chapters/part1/ch01_information_compression.tex +++ b/chapters/part1/ch01_information_compression.tex @@ -62,7 +62,7 @@ \subsection{Why Log Loss Is Codelength} Note: We are discussing optimal codelengths in theory. In practice, you would use arithmetic coding to actually achieve these codelengths when transmitting data. Arithmetic coding can asymptotically achieve the Shannon limit, encoding data using $H(p) + o(1)$ bits on average. -An interesting variant is \textbf{universal coding}: designing codes that work well even when you do not know the true distribution $p$ in advance. Universal codes adapt to the data they see, achieving near optimal compression without requiring perfect prior knowledge. For instance, the Krichevsky-Trofimov estimator achieves regret of $O(\log n)$ when coding binary sequences, meaning the extra codelength compared to knowing $p$ from the start grows only logarithmically with the amount of data. This connects to online learning: as you see more data, your code (and your model) improves, approaching optimality asymptotically. +An interesting variant is \textbf{universal coding}: designing codes that work well even when you do not know the true distribution $p$ in advance. Universal codes adapt to the data they see, achieving near optimal compression without requiring perfect prior knowledge. For instance, the Krichevsky-Trofimov estimator achieves regret of $O(\log n)$ when coding binary sequences, meaning the extra codelength compared to knowing $p$ from the start grows only logarithmically with the amount of data. This connects to machine learning: as you see more data, your code (and your model) improves, approaching optimality asymptotically. \subsection{Connection to Maximum Likelihood}