Deep-Learning/svms_filters.py at main · kmaranga/Deep-Learning · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
# -*- coding: utf-8 -*-
"""kmaranga_hw1_problem1.py

Automatically generated by Colaboratory.

Original file is located at
    https://colab.research.google.com/drive/1Ct8ssMGtm0LBupL3txrxE7f0ZFm5-wvE
"""

!pip install scikit-learn scikit-image

from sklearn.datasets import fetch_openml

from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix

ds = fetch_openml('mnist_784', as_frame = False)

x, x_val, y, y_val = train_test_split(ds.data, ds.target, test_size = 0.2, random_state = 42)

# c) check that i've downloaded my data correctly:
import matplotlib.pyplot as plt
a = x[0].reshape((28, 28)) #hmm, instead of x_train just using x instead as it's what i have defined
plt.imshow(a) #should give me a vector 784 long! (tihihi i get a number 5?! cool!)

#downsample data - so instead of 28 x 28 we have 14 x 14
import cv2
b = cv2.resize(a, (14, 14)) #hmm, resizing a and redefining it as b, changed pdf code a bit :)
plt.imshow(b) #and now i get a much blurrier 5 bc it's downsampled! huzzah!

#d) Creating a dataset of 1000 samples for each digit, and subsample from x to make it
#create svm classifier in sklearn:
from sklearn import svm
#added cache size bc i needed speedup when training my SVM
#bc otherwise it took AGES!!!!
classifier = svm.SVC(C = 1.0, kernel = 'rbf', gamma = 'auto', cache_size = 2000, verbose = False)

#fit the SVM classifier to the data
classifier.fit(x, y)
#predict the labels of the validation dataset using the trained classifier
y_pred = classifier.predict(x_val)
y_pred_val = classifier.predict(x_val)
print("Predicted Digits: ", y_pred_val)
print("Classifier Score: ", classifier.score(x_val, y_val))
print("Confusion Matrix: ")

print(confusion_matrix(y_pred_val,y_val))

#Answer:
'''
- the parameter gamma can be defined as the kernel coefficient for rbf, and is inversely
proportional to the variance
- the parameter 'C' on the other hand is a penalty parameter for the error term, i.e how much to penalize
our classifier given the amount of error it outputs in its predictions.

- The default value of C = 1 and gamma = 1/number of features
'''


#report the validation error
from sklearn.metrics import confusion_matrix
print("Predicted Digits: ", y_pred)
print("Classifier Score: ", classifier.score(x_test, y_test))


#ratio of #support samples to total # of training samples for trained classifier
print("Support Vectors Percentage: ", classifier.n_support_.sum()/x.shape[0])
#10 class confusion matrix on test data
print("Confusion Matrix: ")
print(confusion_matrix(y_pred,y_test))
#any patterns noted about the kinds of mistakes being made?

#can i explain these mistakes intuitively?

#Part e)
#read svm manual. identify options i may not have seen in previous courses on SVMs.
'''
- I found it interesting that one could define special custom kernels different from
the regular/commonly provided ones making SVMs really versatile and adaptable to different types of problems
- I hadn't ever quite thought about how difficult/time/computationally intensive it was to get
probability estimates from SVMs, and that one has to perform 5fold cross validation to get these.
'''
#what does "shrinking" param in svm.SVC do?
'''
The shrinking parameter in svm.SVC can be considered a preprocessing tool/heuristic
that speeds up optimization. They reduce the distance between the support vectors
in order to classify the data. Furthermore, shrinking saves training time by trying
to identify and remove some bounded elements such that a smaller optimization
problem is solved thus improving the hit rate of the kernel cache.
'''

#"what optimization algorithm is used to fit the SVM in sklearn?
'''
Sequential Minimal Optimisation(SMO) is the optimisation algorithm used to fit the SVM in sklearn
and it tries to solve the quadratic problem by updating some Lagrange multipliers at each iteration
of the algorithm.
'''

#fitting SVMs requires a significant amount of RAM. Why?
'''
Fitting SVMs can need a significant amount of RAM, especially when one uses a lot of
training samples, but even so, I think that it's due to how SVMs are structured around
a kernel function. Most implementations of SVMs explicitly store this as a large matrix
of n x n distances between training points to avoid recomputation of the same. When one
has a lot of training samples, the matrix will require quite a bit of RAM to store.
'''

#part f)
#how does svm.SVC handle multiple classes?
'''
svm.SVC maps data points to high dimensional spaces and thus attempts to gain mutual
linear separation between every 2 classes.
'''

#can i think of any alternative ways to use binary classifiers to perform multi-class classification?
'''
one can also group points based on the class they are closest to in distance i.e by isolating a point
and then grouping it based on the class its closest too and then repeating the process until all
points are classified/grouped into classes they're closest to.
'''

#part g)
#pick a better value than the default one for the hyper-param C
#sklearn.model_selection.GridSearchCV(estimator, param_grid, scoring = None, n_jobs = None, refit = True,
                                     # cv = None, verbose = 0, pre_dispatch = '2*n_jobs', error_score = nan, return_train_score = True)#default for this last one's False
#try >=5 different hyperparameters, show them by their methods and accuracies.
from sklearn.model_selection import GridSearchCV
params = [{'C':[0.1,10,100,1000,10000], 'gamma':[1e-9,1e-7,1e-6,1e-5,1e-4,5*1e-3,1], 'kernel':['rbf','poly']}]
classifier = GridSearchCV(svm.SVC(cache_size=2000), params, cv=5, scoring='accuracy', verbose=10,n_jobs=4)
classifier.fit(x[:5000],y[:5000])

#how do i pick the best value of the hyper-param using my validation set??
'''
I would pick the best value of the hyperparameter using my validation set by
picking the one that has the best accuracy. One can also use the attribute
.best_estimator to find the best C, kernel, and gamma and as such tune the
hyperparameters accordingly.
'''

#downsample to reduce the size of my dataset and then use Gabor filter
def balancedsample_dataset(x,y,keep_elems):
    all_class_dat = []
    all_class_lab = []

    for cl in np.unique(y):
        per_class_mat = x[(y == cl)]
        num_elems = per_class_mat.shape[0]
        np.random.shuffle(per_class_mat)
        all_class_dat.append(per_class_mat[:min(keep_elems,num_elems)])
        all_class_lab.append([int(cl)]*min(keep_elems,num_elems))

    all_class_dat = np.concatenate(all_class_dat)
    all_class_lab = np.concatenate(all_class_lab)

    all_class_dat = np.hstack((all_class_dat, all_class_lab.reshape(-1,1)))
    np.random.shuffle(all_class_dat)
    all_class_lab = all_class_dat[:,-1]
    all_class_dat = all_class_dat[:,:-1]
    return all_class_dat,all_class_lab.astype(np.int)


nx_train,ny_train = balancedsample_dataset(x,y,500)
nx_val, ny_val = balancedsample_dataset(x_val,y_val,500)

small_nx_train, small_ny_train = balancedsample_dataset(x,y,50)
small_nx_val, small_ny_val = balancedsample_dataset(x_val,y_val,50)

#part h) Make a training set, use a training dataset of 100samples and a validation set of 100 images.
#sample x,y randomly to create this dataset.
#Gabor filter, where F is the spatial frequency of the filter, theta the rotation
#angle of the Gaussian, sigma_x and sigma_y the stdev of the kernel in the x and y directions
# and bandwidth being inversely related to the stdev fixed the frequency.
from skimage.filters import gabor_kernel, gabor
import numpy as np


freq, theta, bandwidth = 0.1, np.pi/4, 1
gk = gabor_kernel(frequency = freq, theta = theta, bandwidth = bandwidth)

f, axarr = plt.subplots(2,2)
axarr[0,0].imshow(gk.real)
axarr[0,1].imshow(gk.imag)

#plt.figure(1); plt.clf(), plt.imshow(gk.real)
#plt.figure(2); plt.clf(), plt.imshow(gk.imag)

#convolve the input image with the kernel and get coefficients, we will only use
#the real part and toss the imaginary part of the coefficients.
image = x[0].reshape((14, 14))
coeff_real, _ = gabor(image, frequencey = freq, theta = theta, bandwidth = bandwidth)


axarr[1,0].imshow(coeff_real)
axarr[1,1].imshow(image)


#plt.figure(1); plt.clf(); plt.imshow(coeff_real)

#part (j):
#run part above couple of times with different values for the parameters and see how
#the filter changes in shape and size and the corresponding output after convolution
# We'll create a filter bank of multiple Gabor filters and fixed params, and the
#coefficients of the Gabor filter bank will be used to train the SVM.
import time
'''
e.g
theta = np.arange(0, np.pi, np.pi/4)
frequency = np.arange(0.05, 0.5, 0.15)
bandwidth = np.arange(0.3, 1, 0.3)
'''

#filterbank params given in worksheet
theta = np.arange(0, np.pi, np.pi/4)
frequency = np.arange(0.05, 0.5, 0.15)
bandwidth = np.arange(0.3, 1, 0.3)

#use these instead!!!!
'''
theta = np.array([np.pi/4,np.pi/2,3*np.pi/4, np.pi])
frequency = np.array([0.05,0.25])
bandwidth = np.array([0.1,1])
#theta = np.arange(0,np.pi,np.pi/4)
#frequency = np.arange(0.05,0.5,0.15)
#bandwidth = np.arange(0.3,1,0.3)
n_filters = len(frequency)*len(theta)*len(bandwidth)
print(n_filters) #expect 16
'''


#now from 14 x 14 = 196 to 196 x 36 = 7056

#plot filter bank to ensure I get a good spread of different filters.
#(want a diverse filter bank that can capture different rotations and scales)
l=0
for i in frequency:
    for j in theta:
        for k in bandwidth:
            gk = gabor_kernel(frequency=i, theta=j, bandwidth=k)
            if l < 8:
                f, axarr = plt.subplots(1,2)
                axarr[0].imshow(gk.real)
                axarr[1].imshow(gk.imag)
                time.sleep(0.1)
            l+=1

#train SVM on these features and report training and validation accuracy

from tqdm import tnrange,tqdm_notebook
def gfilter_dataset(x,freq,theta,band):
    n_filters = len(freq)*len(theta)*len(band)
    x_gabor = np.zeros((x.shape[0],n_filters*x.shape[1]))
    for img in tqdm_notebook(range(x.shape[0])):
        turn = 1
        for f in frequency:
            for t in theta:
                for b in bandwidth:
                    x_gabor[img,(turn-1)*x.shape[1]:turn*x.shape[1]]= gabor(x[img].reshape((14,14)),frequency=f,theta=t,bandwidth=b)[0].flatten() #append
                    turn += 1
    return x_gabor

x_small_gabor_train = gfilter_dataset(small_nx_train,frequency,theta,bandwidth)
x_small_gabor_val = gfilter_dataset(small_nx_val,frequency,theta,bandwidth)

np.save('x_gabor_train',x_small_gabor_train)
np.save('x_gabor_val', x_small_gabor_val)

x_small_gabor_train = np.load('x_gabor_train.npy')
x_small_gabor_val = np.load('x_gabor_val.npy')

print(x_small_gabor_train.shape)

#increase # filters, use PCA to reduce dimensionality of the dataset so SVM
#can fit in RAM. use sklearn.PCA :)
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
scaler = StandardScaler()
scaler.fit(x_small_gabor_train)
x_small_gabor_train = scaler.transform(x_small_gabor_train)
x_small_gabor_val = scaler.transform(x_small_gabor_val)

pca = PCA(0.98) #where 0.98 is the explained variance
pca.fit(x_small_gabor_train)
x_small_gabor_train_pca = pca.transform(x_small_gabor_train)
x_small_gabor_val_pca = pca.transform(x_small_gabor_val)
x_small_gabor_val_pca.shape

#report accuracy scores
classifier = svm.SVC(C=1.0, kernel='poly', gamma='auto', cache_size=1000, verbose=True)
classifier.fit(x_small_gabor_train,small_ny_train)
y_pred = classifier.predict(x_small_gabor_val)
print(y_pred)
print("Classifier Accuracy:", classifier.score(x_small_gabor_val, small_ny_val))