-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathindex.html
More file actions
255 lines (244 loc) · 13.9 KB
/
index.html
File metadata and controls
255 lines (244 loc) · 13.9 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
<!DOCTYPE html>
<html>
<title>Improved Techniques for Training GANs - A Reproduction Study</title>
<style>
table,
th,
td {
border: 1px solid black;
font-size: 24px;
}
img {
height: 300px;
width: 300px;
}
figure {
text-align: center;
display: inline-block;
align-content: center;
margin: center;
}
.intro_img {
height: 400px;
width: 1000px;
display: block;
margin-left: auto;
margin-right: auto;
}
body {
margin-top: 50px;
margin-bottom: 50px;
margin-right: 50px;
margin-left: 50px;
}
p {
font-size: 24px;
}
h1 {
font-size: 44px;
}
h2 {
font-size: 34px;
}
</style>
<body>
<h1>Improved Techniques for Training GANs - A Reproduction Study</h1>
<h2>Arka Daw - Mohannad Elhamod - Snehal More</h2>
<br />
<h1>Introduction</h1>
<p>In this report, we attempt to reproduce the results found in paper <a href="#ref1">[1]</a>. Generative Adversarial
Networks (GANs) <a href="#ref1">[2]</a> fall in the broad category of generative models. GANs were first introduced
by Ian Goodfellow et. al. in 2014. Generative models try to capture the data distribution by
approximating the joint distribution P(Y, X) or just P(X) if there are no labels. Yann LeCun called “adversarial
training” as the most interesting idea in machine learning in the last decade.</p>
<p>GANs are composed of two primary components. The first one is the generator, which tries to generate synthetic
plausible data. Second, the discriminator tries to distinguish the samples generated by the generator from the
real data. The paper <a href="#ref1">[2]</a> uses an analogy in terms of a team of counterfeiters and the
police. The generator is analogous to the team of counterfeiters who are trying to produce fake currency and
fool the police into believing that the currency is real, whereas on the other hand the police is trying to
distinguish between the fake currency and the real currency. The competition between the two actually makes both
the teams improve their methods, until the counterfeits are indistinguishable from the genuine data. This
framework is referred to as the adversarial modelling framework, where the generator and the discriminator play
a min-max game till both of them reach the Nash equilibrium.</p>
<figure>
<img class="intro_img" src="./gan_image.png" />
<figcaption>Courtesy of <a
href="https://www.freecodecamp.org/news/an-intuitive-introduction-to-generative-adversarial-networks-gans-7a2264a81394/">Thalles
Silva</a></figcaption>
</figure>
<p>The major limitations of Vanilla GANs proposed in paper <a href="#ref1">[2]</a> are that the training process
is slower due to vanishing gradient problem while training the generator and the output of the generator often
produces a single or limited modes. Another problem with GANs is that the models sometimes never converge and
worse they become unstable. These problems were addressed by the paper “Improved Techniques for Training GANs”
<a href="#ref1">[1]</a> by Tim Salimans in 2016. The major contributions of this paper include feature matching,
mini-batch discrimination and semi-supervised learning using GANs. The feature matching addresses the
instability of the vanilla generator loss (which might not converge) by formulating a new L2 based generator
loss. The minibatch discriminator on the other hand mitigates the problem of mode collapse. The paper also
provides other methods such as batch normalization, random seeds, virtual batch normalization, historic
averaging, one-sided labeled smoothing to further improve training in GANs. The paper <a href="#ref1">[1]</a>
also proposed a semi-supervised learning method, leveraging fake unlabeled data samples to improve results.</p>
<p>The paper “On distinguishability criteria for estimating generative models.” <a href="#ref1">[3]</a> by Ian
Goodfellow in 2014 focuses about the difference between GANs and Noise-contrastive estimation (NCE). These two
class of models are very similar to each other as NCE also trained to distinguish data samples from noise
samples. The NCE trains an internal data model corresponding to the discriminator with a fixed generator network
while the GANs on the other hand learns a dynamic generator network.</p>
<p>Paper <a href="#ref1">[1]</a> demonstrates several improvements by showing result from several wholistic and ablation
experiments over several datasets. Here,
we choose to reproduce only some of those results from two datasets: CIFAR-10 and MNIST.</p>
<p>On the MNIST dataset, <a href="#ref1">[1]</a> shows that using semi-supervised training with a few labeled
examples and minibatch discrimination, the results are indistinguishable from real digits.</p>
<p>On the CIFAR-10 dataset, the paper shows the impact of different combinations of the proposed improvements
through
ablation experiments.</p>
<h1>Hardware and Software Specifications</h1>
<p>We trained both models on CIFAR and MNIST on the two TitanX GPUs. The GPU cluster was managed under an Ubuntu 18.04
OS, with an Intel(R) Xeon(R) W-2135 CPU @ 3.70GHz CPU and 64GB system memory. Each run took ~5mins and ~ 6hrs
for MNIST and CIFAR dataset, respectively, resulting a total time for 2 continuous days of GPU usage.</p>
<p>The machine learning framework used in the original paper is TensorFlow, for our experiments we have used PyTorch
1.3.0 with Python 3.6.8. Considering the time and computational constrains, we conducted both the experiments for
3 times. Our results are averaged over 3 sets as compared to 10 sets in the original paper.</p>
<h1>CIFAR-10</h1>
<p>The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. Our implementation is customised from <a href="https://github.com/theidentity/Improved-GAN-PyTorch ">theidentity</a>. To successfully conduct the experiments, we had to fix a multiple bugs, updates some packages, and add new code for generating images.</p>
<p>The table below compares our results to those mentioned in the paper:</p>
<table style="width:100%;text-align:center">
<tr>
<th>Experiment <br />(Number of labeled examples)</th>
<th>Original paper's results <br />(Test error rate)</th>
<th>Our results <br />(Test error rate)</th>
</tr>
<tr>
<td>1000</td>
<td>21.83 ± 2.1</td>
<td>26.3 ± 3.5</td>
</tr>
<tr>
<td>2000</td>
<td>19.61 ± 2.09</td>
<td>23.2 ± 2.8</td>
</tr>
<tr>
<td>4000</td>
<td>18.63 ± 2.32</td>
<td>21.3 ± 2.95</td>
</tr>
</table>
<br />
<p>We have evaluated the test error rate of the model in semi-supervised settings to those that were mentioned in the paper by varying the number of labelled instances. We have observed that the test error rate as reported by the paper is slightly lower. Overall, we believe that with some hyperparameter tuning it is possible to reproduce the results which were mentioned in the paper.
<br/>
The following shows fake samples generated by the model with feature matching from the CIFAR dataset.
</p>
<table style="width:100%;text-align:center">
<tr>
<th>The original paper's result <br />(with feature matching)</th>
<th colspan="3">Our results <br /> (250 epochs with 1000 labeled samples)</th>
</tr>
<tr>
<td><img class="result" src=".\cifar_original.png"></td>
<td><img src=".\250_epoch_1000labels.png"></td>
<td><img src=".\250_epoch1000label2.png"></td>
<td><img src=".\250_epoch1000label3.png"></td>
</tr>
</table>
<p>Qualitatively, the generated images highly resemble those reported in the paper.</p>
<h1>MNIST</h1>
<p>MNIST is a large image dataset of single digits. In this report, we adopt the code created by <a
href="https://github.com/Sleepychord/ImprovedGAN-pytorch">Sleepychord</a>. Some effort was spent to rerun
the code by committing minor code changes to store the generated pictures and updating outdated packages. The
table below compares our results to
those mentioned in the paper:</p>
<table style="width:100%;text-align:center">
<tr>
<th>Experiment <br />(Number of labeled examples)</th>
<th>The paper's results<br /> (Number of incorrectly predicted test examples out of 10000)</th>
<th>Our results<br /> (Number of incorrectly predicted test examples out of 10000)</th>
</tr>
<tr>
<td>20</td>
<td>1677 ± 452</td>
<td>1952 ± 226.7</td>
</tr>
<tr>
<td>50</td>
<td>221 ± 136</td>
<td>569 ± 11.1</td>
</tr>
<tr>
<td>100</td>
<td>93 ± 6.5</td>
<td>433 ± 7.9</td>
</tr>
<tr>
<td>200</td>
<td>90 ± 4.2</td>
<td>398 ± 13.9</td>
</tr>
</table>
<p>It can be noticed that our results follow the same trend observed in the paper, namely that the higher the number
of labeled training samples is, the lower the number of incorrectly predicted test examples is. However, our
results have a somewhat higher error rate. While the paper does not mention any details about the tests set, we
assumed they use the entire MNIST test set as provided. Also, while <a href="#ref1">[1]</a> reports the mean and
variance of 10 different seeds, we ran our model with 3 seeds for each experiment. This is due to limited time
and resources.</p>
<p>The following also shows generated fake data from each experiment and compares to that shown in the paper:</p>
<table style="width:100%;text-align:center">
<tr>
<th>Experiment<br /> (Number of labeled examples)</th>
<th colspan="3">Fake data</th>
</tr>
<tr>
<td>20</td>
<td><img src=".\20_1.png"></td>
<td><img src=".\20_2.png"></td>
<td><img src=".\20_3.png"></td>
</tr>
<tr>
<td>50</td>
<td><img src=".\50_1.png"></td>
<td><img src=".\50_2.png"></td>
<td><img src=".\50_3.png"></td>
</tr>
<tr>
<td>100</td>
<td><img src=".\100_1.png"></td>
<td><img src=".\100_2.png"></td>
<td><img src=".\100_3.png"></td>
</tr>
<tr>
<td>200</td>
<td><img src=".\200_1.png"></td>
<td><img src=".\200_2.png"></td>
<td><img src=".\200_3.png"></td>
</tr>
<tr>
<td>Original paper</td>
<td><img src=".\real.PNG"></td>
<td></td>
<td></td>
</tr>
</table>
<p>The fake images we obtained are similar in nature to those reported in the paper. However, we suspect that the
authors applied some image processing, such as smoothing, to their results as the raw output of the code we ran
contains some visual noise. This, however, is not central to the contribution of the paper itself and could be
easily obtained using standard well-known image processing methods.</p>
<h1>Conclusion:</h1>
<p>In general, while we were not able to reproduce the results exactly as they appeared in <a href="#ref2">[1]</a>,
we were able to demonstrate their general findings. For example, we were able to show that adding more labeled
examples helps the GAN generate images with lower error. We were also able to show that the proposed
improvements are able to generate images that strongly resemble the training set.</p>
<p>It is also worth mentioning that our results for CIFAR-10 are much closer to what <a href="#ref2">[1]</a> reports that those reported for MNIST. With the architecture provided in the paper, <a
href="https://github.com/Sleepychord/ImprovedGAN-pytorch">Sleepychord</a> reports that the GAN suffered from exploding gradient problem, and had to replace the last layer of the generator with a Softplus instead of a Sigmoid. This, we think, explains why our results for MNIST differ noticeably in value from those reported by the paper.</p>
<h1>References:</h1>
<p><a name="ref1" href="https://arxiv.org/abs/1606.03498">[1] Salimans, Tim, Ian Goodfellow, Wojciech Zaremba, Vicki
Cheung,
Alec Radford, and Xi Chen. "Improved
techniques for training gans." In Advances in neural information processing systems, pp. 2234-2242.
2016.</a></p>
<p><a name="ref2" href="https://arxiv.org/abs/1406.2661">[2] Goodfellow, Ian, Jean Pouget-Abadie, Mehdi Mirza, Bing
Xu, David
Warde-Farley, Sherjil Ozair, Aaron
Courville, and Yoshua Bengio. "Generative adversarial nets." In Advances in neural information processing
systems, pp. 2672-2680. 2014.</a></p>
<p><a href="https://arxiv.org/pdf/1412.6515.pdf">[3] Goodfellow, I. J. (2014). On distinguishability criteria for estimating generative models. arXiv preprint
arXiv:1412.6515.</a></p>
</body>
</html>