You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<li>HW3 (<ahref="/dgm-fall-2025/assets/hw/hw3/STAT453_hw03.zip">zip</a>) due Friday, October 17th 11:59PM to Canvas.</li>
97
97
<li>HW4 (<ahref="https://colab.research.google.com/drive/1Jm2hrqbikyTC221moR9nfuaAoP638YJl?usp=sharing">Colab</a>) due Friday, November 21st 11:59PM to Canvas.</li>
98
98
<li>HW5 (<ahref="https://colab.research.google.com/drive/1A8y1FfcrSb5O0HqI5oLXc1gZet4n4JtM?usp=sharing">Colab</a>) due Sunday, December 7th 11:59PM to Canvas.</li>
99
+
<li><ahref="/dgm-fall-2025/assets/hw/Stat453_F2025_ExamStudyGuide.pdf">Exam Study Guide</a> released.</li>
<li>Estimates: Parameters are learned by maximizing the <strong>joint likelihood</strong>: $\hat{\theta_{NB}} = \arg\max_{\theta} P(X, Y; \theta)$</li>
<!--- Estimates: Parameters are learned by maximizing the **joint likelihood**: $\hat{\theta_{NB}} = \arg\max_{\theta} P(X, Y; \theta)$--></li>
373
+
<li>Estimates: Parameters are learned by maximizing the <strong>joint likelihood</strong>: $\widehat{\theta_{NB}} = \arg\max_{\theta} P(X, Y; \theta)$</li>
367
374
<li>Properties:
368
375
<ul>
369
376
<li><strong>Higher asymptotic error</strong> : Because the independence assumption is not always true, it can produce biased estimates when data features are correlated</li>
<li><ahref="#12-summary-and-takeaways">12. Summary and Takeaways</a></li>
171
+
<li><ahref="#references">References</a></li>
172
+
</ul>
173
+
174
+
<h2id="1-introduction-and-overview">1. Introduction and Overview</h2>
157
175
158
176
<p>This lecture focuses on <strong>Deep Generative Models (DGMs)</strong> — models designed to learn the underlying data distribution, enabling both prediction and generation of new samples.
159
177
We move from <strong>discriminative models</strong>, which model $P(Y|X)$, to <strong>generative models</strong>, which model $P(X, Y)$ or $P(X)$.</p>
<h2id="2-discriminative-vs-generative-modeling">2. Discriminative vs. Generative Modeling</h2>
183
+
<h2id="2-discriminative-vs-generative-modeling">2. Discriminative vs Generative Modeling</h2>
166
184
167
185
<table>
168
186
<thead>
@@ -171,32 +189,39 @@ <h2 id="2-discriminative-vs-generative-modeling">2. Discriminative vs. Generativ
171
189
<th>Learns</th>
172
190
<th>Objective</th>
173
191
<th>Examples</th>
174
-
<th>Â </th>
175
192
</tr>
176
193
</thead>
177
194
<tbody>
178
195
<tr>
179
196
<td><strong>Discriminative</strong></td>
180
-
<td>$P(Y</td>
181
-
<td>X)$</td>
197
+
<td>$P(Y \mid X)$</td>
182
198
<td>Classify or predict outcomes</td>
183
199
<td>Logistic Regression, CNNs</td>
184
200
</tr>
185
201
<tr>
186
202
<td><strong>Generative</strong></td>
187
-
<td>$P(X, Y)$ or $P(X</td>
188
-
<td>Y)$</td>
203
+
<td>$P(X, Y)$ or $P(X \mid Y)$</td>
189
204
<td>Model data generation process</td>
190
205
<td>Autoencoders, VAEs, GANs</td>
191
206
</tr>
192
207
</tbody>
193
208
</table>
194
209
210
+
<!-- format modified by JZ -->
211
+
195
212
<p>In generative models, <strong>latent variables</strong> $z$ represent hidden structure in the data, making the following computations challenging:</p>
<p>Because $z$ is unobserved, both the marginal likelihood and posterior inference are intractable in complex data, requiring <strong>approximate inference</strong>.</p>
201
226
202
227
<hr/>
@@ -211,13 +236,14 @@ <h2 id="3-deep-generative-models-dgms">3. Deep Generative Models (DGMs)</h2>
211
236
<li>Learn probabilistic mappings between $x$ and $z$.</li>
212
237
<li>Use neural networks for non-linear transformations.</li>
213
238
<li>Combine deep learning’s representational power with probabilistic reasoning.</li>
239
+
<li>Latent variable $z$ capture hidden structure that explains high-dimensional observations $x$. <!-- add by JZ --></li>
214
240
</ul>
215
241
216
242
<hr/>
217
243
218
244
<h2id="4-autoencoders-concept-and-motivation">4. Autoencoders: Concept and Motivation</h2>
219
245
220
-
<p>An <strong>Autoencoder (AE)</strong> is an unsupervised neural network trained to reproduce its input.
246
+
<p>An <strong>Autoencoder (AE)</strong> is an <strong>unsupervised</strong> (no labeled) neural network trained to reproduce its input.
221
247
It compresses the input into a <strong>latent representation (code)</strong> and reconstructs the input from this compressed form.</p>
222
248
223
249
<p><strong>Applications:</strong></p>
@@ -240,8 +266,12 @@ <h2 id="5-architecture-of-an-autoencoder">5. Architecture of an Autoencoder</h2>
<p><strong>Motivation:</strong> Different parts of our input relate to different parts of our output. Sometimes these important relationships can be far apart, like in machine translation. Attention helps us dynamically calculate what is important.</p>
153
+
<h3id="why-attention-was-needed-long-range-dependency-problem">Why Attention Was Needed (Long-Range Dependency Problem)</h3>
154
+
<p>RNNs compress an entire input sequence into a <strong>single hidden vector</strong>, which makes capturing long-range dependencies difficult.<br/>
155
+
During backpropagation, gradients must pass through many time steps, causing <strong>vanishing/exploding gradients</strong>.</p>
156
+
157
+
<p>Attention solves this by <strong>directly referencing the entire input sequence</strong> when predicting each output, instead of relying on hidden states to store all information.</p>
153
158
154
159
<ul>
155
160
<li><strong>Origin:</strong> Originally from Natural Language Processing (NLP) and language translation.</li>
@@ -193,6 +198,12 @@ <h4 id="soft-attention-vs-rnn-for-image-captioning">Soft Attention vs. RNN for I
193
198
<p><strong>Aside:</strong> CNNs were an example of <strong>Hard Attention</strong>. As the filter slides over the image, the part of the image inside the filter gets attention weight 1, and the rest gets weight 0.</p>
194
199
</blockquote>
195
200
201
+
<h3id="why-attention-reduces-the-need-for-recurrence">Why Attention Reduces the Need for Recurrence</h3>
202
+
<ul>
203
+
<li>Attention repeatedly refers back to the input, so the hidden state no longer needs to store all global information.</li>
204
+
<li>This insight led to eliminating recurrence entirely in the Transformer.</li>
0 commit comments