AML_GANs/index.html at master · elhamod/AML_GANs · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
<!DOCTYPE html>
<html>
<title>Improved Techniques for Training GANs - A Reproduction Study</title>
<style>
    table,
    th,
    td {
        border: 1px solid black;
        font-size: 24px;
    }

    img {
        height: 300px;
        width: 300px;
    }

    figure {
        text-align: center;
        display: inline-block;
        align-content: center;
        margin: center;
    }

    .intro_img {
        height: 400px;
        width: 1000px;
        display: block;
        margin-left: auto;
        margin-right: auto;
    }

    body {
        margin-top: 50px;
        margin-bottom: 50px;
        margin-right: 50px;
        margin-left: 50px;
    }

    p {
        font-size: 24px;
    }

    h1 {
        font-size: 44px;
    }

    h2 {
        font-size: 34px;
    }
</style>

<body>
    <h1>Improved Techniques for Training GANs - A Reproduction Study</h1>
    <h2>Arka Daw - Mohannad Elhamod - Snehal More</h2>
    <br />
    <h1>Introduction</h1>
    <p>In this report, we attempt to reproduce the results found in paper <a href="#ref1">[1]</a>. Generative Adversarial
        Networks (GANs) <a href="#ref1">[2]</a> fall in the broad category of generative models. GANs were first introduced
        by Ian Goodfellow et. al. in 2014. Generative models try to capture the data distribution by
        approximating the joint distribution P(Y, X) or just P(X) if there are no labels. Yann LeCun called “adversarial
        training” as the most interesting idea in machine learning in the last decade.</p>
    <p>GANs are composed of two primary components. The first one is the generator, which tries to generate synthetic
        plausible data. Second, the discriminator tries to distinguish the samples generated by the generator from the
        real data. The paper <a href="#ref1">[2]</a> uses an analogy in terms of a team of counterfeiters and the
        police. The generator is analogous to the team of counterfeiters who are trying to produce fake currency and
        fool the police into believing that the currency is real, whereas on the other hand the police is trying to
        distinguish between the fake currency and the real currency. The competition between the two actually makes both
        the teams improve their methods, until the counterfeits are indistinguishable from the genuine data. This
        framework is referred to as the adversarial modelling framework, where the generator and the discriminator play
        a min-max game till both of them reach the Nash equilibrium.</p>
    <figure>
        <img class="intro_img" src="./gan_image.png" />
        <figcaption>Courtesy of <a
                href="https://www.freecodecamp.org/news/an-intuitive-introduction-to-generative-adversarial-networks-gans-7a2264a81394/">Thalles
                Silva</a></figcaption>
    </figure>

    <p>The major limitations of Vanilla GANs proposed in paper <a href="#ref1">[2]</a> are that the training process
        is slower due to vanishing gradient problem while training the generator and the output of the generator often
        produces a single or limited modes. Another problem with GANs is that the models sometimes never converge and
        worse they become unstable. These problems were addressed by the paper “Improved Techniques for Training GANs”
        <a href="#ref1">[1]</a> by Tim Salimans in 2016. The major contributions of this paper include feature matching,
        mini-batch discrimination and semi-supervised learning using GANs. The feature matching addresses the
        instability of the vanilla generator loss (which might not converge) by formulating a new L2 based generator
        loss. The minibatch discriminator on the other hand mitigates the problem of mode collapse. The paper also
        provides other methods such as batch normalization, random seeds, virtual batch normalization, historic
        averaging, one-sided labeled smoothing to further improve training in GANs. The paper <a href="#ref1">[1]</a>
        also proposed a semi-supervised learning method, leveraging fake unlabeled data samples to improve results.</p>
    <p>The paper “On distinguishability criteria for estimating generative models.” <a href="#ref1">[3]</a> by Ian
        Goodfellow in 2014 focuses about the difference between GANs and Noise-contrastive estimation (NCE). These two
        class of models are very similar to each other as NCE also trained to distinguish data samples from noise
        samples. The NCE trains an internal data model corresponding to the discriminator with a fixed generator network
        while the GANs on the other hand learns a dynamic generator network.</p>
    <p>Paper <a href="#ref1">[1]</a> demonstrates several improvements by showing result from several wholistic and ablation
        experiments over several datasets. Here,
        we choose to reproduce only some of those results from two datasets: CIFAR-10 and MNIST.</p>
    <p>On the MNIST dataset, <a href="#ref1">[1]</a> shows that using semi-supervised training with a few labeled
        examples and minibatch discrimination, the results are indistinguishable from real digits.</p>
    <p>On the CIFAR-10 dataset, the paper shows the impact of different combinations of the proposed improvements
        through
        ablation experiments.</p>
    <h1>Hardware and Software Specifications</h1>
    <p>We trained both models on CIFAR and MNIST on the two TitanX GPUs. The GPU cluster was managed under an Ubuntu 18.04
        OS, with an Intel(R) Xeon(R) W-2135 CPU @ 3.70GHz CPU and 64GB system memory. Each run took ~5mins and ~ 6hrs
        for MNIST and CIFAR dataset, respectively, resulting a total time for 2 continuous days of GPU usage.</p>
    <p>The machine learning framework used in the original paper is TensorFlow, for our experiments we have used PyTorch
        1.3.0 with Python 3.6.8. Considering the time and computational constrains, we conducted both the experiments for
        3 times. Our results are averaged over 3 sets as compared to 10 sets in the original paper.</p>
    <h1>CIFAR-10</h1>
    <p>The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. Our implementation is customised from <a href="https://github.com/theidentity/Improved-GAN-PyTorch ">theidentity</a>. To successfully conduct the experiments, we had to fix a multiple bugs, updates some packages, and add new code for generating images.</p>
    <p>The table below compares our results to those mentioned in the paper:</p>
        <table style="width:100%;text-align:center">
            <tr>
                <th>Experiment <br />(Number of labeled examples)</th>
                <th>Original paper's results <br />(Test error rate)</th>
                <th>Our results <br />(Test error rate)</th>
            </tr>
            <tr>
                <td>1000</td>
                <td>21.83 ± 2.1</td>
                <td>26.3 ± 3.5</td>
            </tr>
            <tr>
                <td>2000</td>
                <td>19.61 ± 2.09</td>
                <td>23.2 ± 2.8</td>
            </tr>
            <tr>
                <td>4000</td>
                <td>18.63 ± 2.32</td>
                <td>21.3 ± 2.95</td>
            </tr>
        </table>
        <br />
        <p>We have evaluated the test error rate of the model in semi-supervised settings to those that were mentioned in the paper by varying the number of labelled instances. We have observed that the test error rate as reported by the paper is slightly lower. Overall, we believe that with some hyperparameter tuning it is possible to reproduce the results which were mentioned in the paper.
        <br/>
        The following shows fake samples generated by the model with feature matching from the CIFAR dataset.
        </p>
        <table style="width:100%;text-align:center">
            <tr>
                <th>The original paper's result <br />(with feature matching)</th>
                <th colspan="3">Our results <br /> (250 epochs with 1000 labeled samples)</th>
            </tr>
            <tr>
                <td><img class="result" src=".\cifar_original.png"></td>
                <td><img src=".\250_epoch_1000labels.png"></td>
                <td><img src=".\250_epoch1000label2.png"></td>
                <td><img src=".\250_epoch1000label3.png"></td>
            </tr>
        </table>
        <p>Qualitatively, the generated images highly resemble those reported in the paper.</p>
    <h1>MNIST</h1>
    <p>MNIST is a large image dataset of single digits. In this report, we adopt the code created by <a
            href="https://github.com/Sleepychord/ImprovedGAN-pytorch">Sleepychord</a>. Some effort was spent to rerun
        the code by committing minor code changes to store the generated pictures and updating outdated packages. The
        table below compares our results to
        those mentioned in the paper:</p>
    <table style="width:100%;text-align:center">
        <tr>
            <th>Experiment <br />(Number of labeled examples)</th>
            <th>The paper's results<br /> (Number of incorrectly predicted test examples out of 10000)</th>
            <th>Our results<br /> (Number of incorrectly predicted test examples out of 10000)</th>
        </tr>
        <tr>
            <td>20</td>
            <td>1677 ± 452</td>
            <td>1952 ± 226.7</td>
        </tr>
        <tr>
            <td>50</td>
            <td>221 ± 136</td>
            <td>569 ± 11.1</td>
        </tr>
        <tr>
            <td>100</td>
            <td>93 ± 6.5</td>
            <td>433 ± 7.9</td>
        </tr>
        <tr>
            <td>200</td>
            <td>90 ± 4.2</td>
            <td>398 ± 13.9</td>
        </tr>
    </table>
    <p>It can be noticed that our results follow the same trend observed in the paper, namely that the higher the number
        of labeled training samples is, the lower the number of incorrectly predicted test examples is. However, our
        results have a somewhat higher error rate. While the paper does not mention any details about the tests set, we
        assumed they use the entire MNIST test set as provided. Also, while <a href="#ref1">[1]</a> reports the mean and
        variance of 10 different seeds, we ran our model with 3 seeds for each experiment. This is due to limited time
        and resources.</p>
    <p>The following also shows generated fake data from each experiment and compares to that shown in the paper:</p>
    <table style="width:100%;text-align:center">
        <tr>
            <th>Experiment<br /> (Number of labeled examples)</th>
            <th colspan="3">Fake data</th>
        </tr>
        <tr>
            <td>20</td>
            <td><img src=".\20_1.png"></td>
            <td><img src=".\20_2.png"></td>
            <td><img src=".\20_3.png"></td>
        </tr>
        <tr>
            <td>50</td>
            <td><img src=".\50_1.png"></td>
            <td><img src=".\50_2.png"></td>
            <td><img src=".\50_3.png"></td>
        </tr>
        <tr>
            <td>100</td>
            <td><img src=".\100_1.png"></td>
            <td><img src=".\100_2.png"></td>
            <td><img src=".\100_3.png"></td>
        </tr>
        <tr>
            <td>200</td>
            <td><img src=".\200_1.png"></td>
            <td><img src=".\200_2.png"></td>
            <td><img src=".\200_3.png"></td>
        </tr>
        <tr>
            <td>Original paper</td>
            <td><img src=".\real.PNG"></td>
            <td></td>
            <td></td>
        </tr>
    </table>
    <p>The fake images we obtained are similar in nature to those reported in the paper. However, we suspect that the
        authors applied some image processing, such as smoothing, to their results as the raw output of the code we ran
        contains some visual noise. This, however, is not central to the contribution of the paper itself and could be
        easily obtained using standard well-known image processing methods.</p>
    <h1>Conclusion:</h1>
    <p>In general, while we were not able to reproduce the results exactly as they appeared in <a href="#ref2">[1]</a>,
        we were able to demonstrate their general findings. For example, we were able to show that adding more labeled
        examples helps the GAN generate images with lower error. We were also able to show that the proposed
        improvements are able to generate images that strongly resemble the training set.</p>
    <p>It is also worth mentioning that our results for CIFAR-10 are much closer to what <a href="#ref2">[1]</a> reports that those reported for MNIST. With the architecture provided in the paper, <a
            href="https://github.com/Sleepychord/ImprovedGAN-pytorch">Sleepychord</a> reports that the GAN suffered from exploding gradient problem, and had to replace the last layer of the generator with a Softplus instead of a Sigmoid. This, we think, explains why our results for MNIST differ noticeably in value from those reported by the paper.</p>
    <h1>References:</h1>
    <p><a name="ref1" href="https://arxiv.org/abs/1606.03498">[1] Salimans, Tim, Ian Goodfellow, Wojciech Zaremba, Vicki
            Cheung,
            Alec Radford, and Xi Chen. "Improved
            techniques for training gans." In Advances in neural information processing systems, pp. 2234-2242.
            2016.</a></p>
    <p><a name="ref2" href="https://arxiv.org/abs/1406.2661">[2] Goodfellow, Ian, Jean Pouget-Abadie, Mehdi Mirza, Bing
            Xu, David
            Warde-Farley, Sherjil Ozair, Aaron
            Courville, and Yoshua Bengio. "Generative adversarial nets." In Advances in neural information processing
            systems, pp. 2672-2680. 2014.</a></p>
    <p><a href="https://arxiv.org/pdf/1412.6515.pdf">[3] Goodfellow, I. J. (2014). On distinguishability criteria for estimating generative models. arXiv preprint
        arXiv:1412.6515.</a></p>

</body>

</html>