-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathML1
More file actions
86 lines (64 loc) · 2.32 KB
/
ML1
File metadata and controls
86 lines (64 loc) · 2.32 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
Perform the following operations using Python on the given data sets
a)Importing the libraries
b) Importing the Dataset
c) Handling of Missing Data
d) Handling of Categorical Data
e) Splitting the dataset into training and testing datasets
f) Feature Scaling
Region Age Income Online Shopper
India 49 86400 No
Brazil 32 57600 Yes
USA 35 64800 No
Brazil 43 73200 No
USA 45 Yes
India 40 69600 Yes
Brazil 62400 No
India 53 94800 Yes
USA 55 99600 No
India 42 80400 Yes
a)
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
b)
dataset = pd.read_csv("given_data.csv")
c)
print(dataset.isnull())
print(dataset.isnull().sum()) #check if any
#also can be done using SimpleImputer of sci-kit library
dataset["Age"].fillna(dataset["Age"].mean(), inplace=True)
dataset["Income"].fillna(dataset["Income"].mean(), inplace=True)
dataset["Region"].fillna(dataset["Region"].mode().iloc[0], inplace=True)
print(dataset.isnull().sum())
d)
categorical_columns=dataset.select_dtype(include=[np.number]).columns
print(categorial_columns)
imputer=SimpleImputer(strategy='most_frequent')
data[categorical_columns]=imputer.fit_transform(data[categorical_columns])
print(dataset.isnull().sum())
#oneHotEncoder
encoder=OneHotEncoder(sparse_output=False,drop='first')
encoder_data=encoder.fit_transform(data[categorical_columns])
encoder_columns=encoder.get_feature_names_out(categorical_columns)
data=data.drop(categorical_columns,axis=1)
data=pd.concat([data,pd.DataFrame(encode_data,columns=encoded_columns)},axis=1)
e)
x=data.drop('Online Shopper',axis=1)
y=data['Online Shopper'] #dependant variable
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2,random_state=42)
print("Traning set shape :",x_train.shape,y_train.shape)
print("Testing set shape :",x_test.shape,y_test.shape)
f)
# Scale the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Train a logistic regression model
model = LogisticRegression()
model.fit(X_train_scaled, y_train)
# Make predictions on the test set
y_pred = model.predict(X_test_scaled)
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy * 100:.2f}%')