Good work.
I liked your code for converting the timedelta64 into integers.
data['age_days']= data['age'].astype("timedelta64[D]")
My code was kind of weird but had the same results.
student_logins['account_age']/=np.timedelta64(1, 'D')
Your code for adding the class dummy variables was really good since it could work with any number of dummy variables.
for j in data['class_id'].unique():
label = "class_" + str(j)
data[label] = data['class_id'] == j
My code was a little different and not as reusable:
class_dummies = pd.core.reshape.get_dummies(logins['class_id'])
logins[['class_a', 'class_c', 'class_e', 'class_g', 'class_m']]=class_dummies
Excluding one of the class ids makes a lot of sense since every entry is in exactly one class. I didn't even think about that when I ran my model and got the colinearlity warning.
On the OLS regression I set X and y like you did except without the '.values' at the end.
y = logins['duration'] instead of y = logins['duration'].values. That meant my results summary showed the names of the X variables which I think makes it easier to understand the results.
Our final models were very similar. We both had R-Squared of .486 meaning our models explained 48.6% of the variation in Y. Calculating MSE was good. Minimizing MSE is effectively the same as maximizing R-squared since R-squared = 1 - (MSE / variance(y))
@ghego, @craigsakuma, @kebaler
Good work.
I liked your code for converting the timedelta64 into integers.
data['age_days']= data['age'].astype("timedelta64[D]")
My code was kind of weird but had the same results.
student_logins['account_age']/=np.timedelta64(1, 'D')
Your code for adding the class dummy variables was really good since it could work with any number of dummy variables.
for j in data['class_id'].unique():
label = "class_" + str(j)
data[label] = data['class_id'] == j
My code was a little different and not as reusable:
class_dummies = pd.core.reshape.get_dummies(logins['class_id'])
logins[['class_a', 'class_c', 'class_e', 'class_g', 'class_m']]=class_dummies
Excluding one of the class ids makes a lot of sense since every entry is in exactly one class. I didn't even think about that when I ran my model and got the colinearlity warning.
On the OLS regression I set X and y like you did except without the '.values' at the end.
y = logins['duration'] instead of y = logins['duration'].values. That meant my results summary showed the names of the X variables which I think makes it easier to understand the results.
Our final models were very similar. We both had R-Squared of .486 meaning our models explained 48.6% of the variation in Y. Calculating MSE was good. Minimizing MSE is effectively the same as maximizing R-squared since R-squared = 1 - (MSE / variance(y))
@ghego, @craigsakuma, @kebaler