-
Notifications
You must be signed in to change notification settings - Fork 79
Description
In the the chapter 1, during the explanation of the birth weight against age for smoking and non-smoking mothers scatter plot, the code presents a legend that does not represent the two linear functions. It's a good idea to add a label_name variable in the for loop before the plt.plot() function, and use it as argument in the plt.scatter one.
CURRENT ONE:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
urlprefix = 'http://vincentarelbundock.github.io/Rdatasets/csv/'
dataname = 'MASS/birthwt.csv'
bwt = pd.read_csv(urlprefix + dataname)
bwt = bwt.drop('Unnamed: 0' ,1)
#drop unnamed column
styles = {0: ['o','red'], 1: ['^','blue']}
for k in styles:
grp = bwt[bwt.smoke == k]
m,b = np.polyfit(grp.age , grp.bwt , 1) # fit a straight line
plt.scatter(grp.age , grp.bwt , c= styles[k][1] , s=15 , linewidth =0,
marker = styles[k][0])
plt.plot(grp.age , m*grp.age + b, '-', color = styles[k][1])
plt.xlabel('age')
plt.ylabel('birth weight (g)')
plt.legend(['non - smokers','smokers'],prop={'size':8},loc=(0.5 ,0.8))
plt.show()
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
CORRECTION:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
styles = {0: ['o','red'], 1: ['^','blue']}
for k in styles :
grp = bwt[bwt.smoke ==k]
m,b = np.polyfit(grp.age , grp.bwt , 1) # fit a straight line
label_name = 'non-smokers' if k==0 else 'smokers'
plt.scatter(grp.age, grp.bwt, c=styles[k][1], s=15, linewidth=0,
marker=styles[k][0], label=label_name)
plt.plot(grp.age, m*grp.age+b, '-', color=styles[k][1])
plt. xlabel('age')
plt. ylabel('birth weight (g)')
plt. legend(prop ={'size':8}, loc =(0.5,0.8))
plt.show()