-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathkobe.Rmd
More file actions
212 lines (149 loc) · 11.5 KB
/
kobe.Rmd
File metadata and controls
212 lines (149 loc) · 11.5 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
---
title: "Kobe"
output: rmarkdown::github_document
---
Kobe Bryant is a retired NBA legend who has had one of the most decorated careers of all time. However, sports pundits have criticized Bryant of taking too many shots and not passing the ball enough. The goal of this study is to discern a relationship, if any, between Kobe Bryant’s ball dominant play style and the team’s margin of victory.
My sample of 1238 games comes from all the games Kobe has played with the exception of games from the 96'-98' season and games Kobe left early due to injury. Kobe was not a key part of the team during his first couple years in the league so these statistics, along with the incomplete games, would not help in understanding if his ball dominant play style was beneficial or not. The source of this data comes from a verified basketball statistics website: BasketballReference.com. The response variable is margin of victory (in points) of the game Kobe played. The two explanatory variables are the amount of assists Kobe attained and the amount of shots Kobe took per game. The csv used in this project is in the current working directory and is called "kobe.csv"
```{r}
# Load Packages
knitr::opts_chunk$set(echo = TRUE)
library(car)
library(ggplot2)
```
```{r}
# Upload Data
kobe <- read.csv("kobe.csv", stringsAsFactors = FALSE)
# Histogram of Bryant's Margin of Victory per game"
hist(kobe$Margin, n=30, xlab = "Margin of Victory",
main = "Histogram of Bryant's Margin of Victory per game")
# Mean and SD
paste("Mean:",round(mean(kobe$Margin), digits = 3),"SD:",round(sd(kobe$Margin), digits = 3))
```
The distribution of the response variable (margin of victory per game), follows an approximately normal distribution so there was no need for a transformation. Since the distribution is normal roughly symmetric, we will use the mean of ~2.8 points and the standard deviation of ~13 points as the appropriate statistics to describe the distribution.
```{r}
# Histogram of Bryant's Assists per game
hist(kobe$Assists,xlab = "Assists",
main = "Histogram of Bryant's Assists per game")
# Median and IQR
paste("Median:",round(median(kobe$Assists), digits = 3),"IQR:",round(IQR(kobe$Assists), digits = 3))
```
The marginal distribution of the first explanatory variable, assists Kobe had per game, followed a normal distribution that was skewed right. The statistics used to describe this distribution is a median of 5 assists and an IQR of 4 assists.
```{r}
# Histogram of Bryant's Shots.Taken per game
hist(kobe$Shots.Taken, n=30, xlab = "Shots.Taken",
main = "Histogram of Bryant's Shots.Taken per game")
# Mean and SD
paste("Mean:",round(mean(kobe$Shots.Taken), digits = 3),"SD:",round(sd(kobe$Shots.Taken), digits = 3))
```
The marginal distribution of the second explanatory variable, shots taken by Kobe a game, followed a symmetric and normal distribution. The descriptive statistics used to describe this distribution is a mean of 21.01 shots and a standard deviation of 6.39 shots.
```{r}
scatter.smooth(kobe$Assists, kobe$Margin,
main = "Margin of Victory vs Assist Total",
evaluation = 50,
xlab = "Assists", ylab = "Margin of Victory",
col=ifelse(kobe$Margin>0,"purple","black"))
cor(kobe$Assists, kobe$Margin)
```
There does seem to be a slight relationship between the Bryant's assists and margin of victory. The purple circles represent a margin of victory that is greater than zero and the black circles represent a margin of victory that is less than zero. After seeing a correlation of .15 between assists and margin of victory, we can conclude that increased assists had a marginal effect on the margin of victory.
```{r}
zero_assist <- kobe[kobe$Assists == 0,]
zero <- sum(zero_assist$Margin)/ nrow(zero_assist)
one_assist <- kobe[kobe$Assists == 1,]
one <- sum(one_assist$Margin)/ nrow(one_assist)
two_assist <- kobe[kobe$Assists == 2,]
two <- sum(two_assist$Margin)/ nrow(two_assist)
three_assist <- kobe[kobe$Assists == 3,]
three <- sum(three_assist$Margin)/ nrow(three_assist)
four_assist <- kobe[kobe$Assists == 4,]
four <- sum(four_assist$Margin)/ nrow(four_assist)
five_assist <- kobe[kobe$Assists == 5,]
five <- sum(five_assist$Margin)/ nrow(five_assist)
six_assist <- kobe[kobe$Assists == 6,]
six <- sum(six_assist$Margin)/ nrow(six_assist)
seven_assist <- kobe[kobe$Assists == 7,]
seven <- sum(seven_assist$Margin)/ nrow(seven_assist)
eight_assist <- kobe[kobe$Assists == 8,]
eight <- sum(eight_assist$Margin)/ nrow(eight_assist)
nine_assist <- kobe[kobe$Assists == 9,]
nine <- sum(nine_assist$Margin)/ nrow(nine_assist)
ten_assist <- kobe[kobe$Assists == 10,]
ten <- sum(ten_assist$Margin)/ nrow(ten_assist)
eleven_assist <- kobe[kobe$Assists == 11,]
eleven <- sum(eleven_assist$Margin)/ nrow(eleven_assist)
twelve_assist <- kobe[kobe$Assists == 12,]
twelve <- sum(twelve_assist$Margin)/ nrow(twelve_assist)
thirteen_assist <- kobe[kobe$Assists == 13,]
thirteen <- sum(thirteen_assist$Margin)/ nrow(thirteen_assist)
fourteen_assist <- kobe[kobe$Assists == 14,]
fourteen <- sum(fourteen_assist$Margin)/ nrow(fourteen_assist)
fifteen_assist <- kobe[kobe$Assists == 15,]
fifteen <- sum(fifteen_assist$Margin)/ nrow(fifteen_assist)
B <- c(zero, one, two, three, four, five, six, seven, eight,
nine, ten, eleven, twelve, thirteen, fourteen, fifteen)
barplot(B, main = "Margin of Victory by Assist Total",
xlab = "Assist", ylab = "Average Margin of Victory",
ylim=c(-5,10), xlim = c(0,20)
, names.arg=c("0","1","2","3","4","5","6",
"7","8","9","10","11","12","13",
"14", "15"), cex.names=.5, col=ifelse(B>4,"purple","grey"))
```
Although there is a rather weak correlation between assists and margin of victory, we can still explore varying levels of assists. In the bar plot above, bars shaded purple represent and average margin of victory greater than 4 by assist. It looks like Bryant enjoyed the greatest margin of victory in games where he had between 6-8 assists.
```{r}
scatter.smooth(kobe$Shots.Taken, kobe$Margin,
main = "Margin of Victory vs Assist Total",
evaluation = 50,
xlab = "Assists", ylab = "Margin of Victory",
col=ifelse(kobe$Margin>0,"purple","black"))
cor(kobe$Shots.Taken, kobe$Margin)
```
Similarly to our previous comparison, there does seem to be a slight relationship between the Bryant's shots taken and margin of victory. The purple circles represent a margin of victory that is greater than zero and the black circles represent a margin of victory that is less than zero. After seeing a correlation of -.15 between shots taken and margin of victory, we can also conclude that increased shots taken had a marginal effect on the margin of victory.
```{r}
# Average Margin of Victory By Shots Taken
shots_10_14 <- kobe[kobe$`Shots.Taken` >= 10 & kobe$`Shots.Taken` <= 14 ,]
marg_shots_10_14 <- sum(shots_10_14$Margin)/ nrow(shots_10_14)
shots_15_19 <- kobe[kobe$`Shots.Taken` >= 15 & kobe$`Shots.Taken` <= 19 ,]
marg_shots_15_19 <- sum(shots_15_19$Margin)/ nrow(shots_15_19)
shots_20_24 <- kobe[kobe$`Shots.Taken` >= 20 & kobe$`Shots.Taken` <= 24 ,]
marg_shots_20_24 <- sum(shots_20_24$Margin)/ nrow(shots_20_24)
shots_25_29 <- kobe[kobe$`Shots.Taken` >= 25 & kobe$`Shots.Taken` <= 29 ,]
marg_shots_25_29 <- sum(shots_25_29$Margin)/ nrow(shots_25_29)
shots_30_34 <- kobe[kobe$`Shots.Taken` >= 30 & kobe$`Shots.Taken` <= 34 ,]
marg_shots_30_34 <- sum(shots_30_34$Margin)/ nrow(shots_30_34)
shots_35_39 <- kobe[kobe$`Shots.Taken` >= 35 & kobe$`Shots.Taken` <= 39 ,]
marg_shots_35_39 <- sum(shots_35_39$Margin)/ nrow(shots_35_39)
shots_40_44 <- kobe[kobe$`Shots.Taken` >= 40 & kobe$`Shots.Taken` <= 44 ,]
marg_shots_40_44 <- sum(shots_40_44$Margin)/ nrow(shots_40_44)
shots_45_50 <- kobe[kobe$`Shots.Taken` >= 45 & kobe$`Shots.Taken` <= 50 ,]
marg_shots_45_50 <- sum(shots_45_50$Margin)/ nrow(shots_45_50)
C <- c(marg_shots_10_14,
marg_shots_15_19, marg_shots_20_24, marg_shots_25_29,
marg_shots_30_34, marg_shots_35_39, marg_shots_40_44,
marg_shots_45_50)
barplot(C, main = "Average Margin of Victory by Shots Taken",
xlab = "Shots.Taken", ylab = "Margin of Victory",
ylim=c(-2,8), xlim = c(0,11),
names.arg=c("10-14", "15-19", "20-24",
"25-29", "30-34", "35-39", "40-44", "45-50"),
cex.names=.5, col=ifelse(C>0,"green","red"))
```
Despite a low correlation between shots taken and margin of victory, we can still assess how the team faired by specific shooting output. In this barchart, bars in green represent a positve margin of victory while red bars represent a negative margin of victory. Furthermore, I decided to exclude games where bryant shot less than 10 shots. This is due to the scarcity of data for less than 10 shots taken. From this visualization, Bryant experienced a significant margin of victory when he shot between 10-19 shots. Furthermore, the range of 15-19 shots taken had the greatest margin of victory which was roughly 5. This chart also shows a tapering off in margin of victory as Bryant attempts more than 20 shots.
```{r}
#GLM With Center
kobe$`Shots.Taken Centered` <- kobe$`Shots.Taken`-mean(kobe$`Shots.Taken`)
kobe$Assists_c <- kobe$Assists - mean(kobe$Assists)
myglm <- lm(kobe$Margin ~ kobe$Assists_c*kobe$`Shots.Taken Centered`)
summary(myglm)
low.assists <- kobe[kobe$Assists<= 5,]
high.assists <- kobe[kobe$Assists > 5 & kobe$Assists > 5,]
plot(low.assists$`Shots.Taken`, low.assists$Margin , main = "Margin of Victory by Assists and Shots.Taken",
xlab = "Shots.Taken", ylab = "Margin of Victory",
xlim = c(0,50), ylim = c(-50,50),
cex = 0.6, col = "blue", pch = 17)
points(high.assists$`Shots.Taken`, high.assists$Margin, col = "red",
cex = 0.6)
abline(lm(low.assists$Margin~low.assists$`Shots.Taken`),col = "blue", lty = 2)
abline(lm(high.assists$Margin~high.assists$`Shots.Taken`), col = "red")
legend("bottomright", title = "Assists", c("0-5 Assists", "6-10 Assists"),
col = c("blue", "red"), pch = c(17, 18), lty = c(2, 1), inset = 0.01)
```
From our general linear model, the two explanatory variables (assists and shots taken) were significant in predicting margin of victory. However, an R squared of 4% raises some red flags. This small R squared implies that there are other factors that play a role in the margin of victory. After all, basketball is a team sport. Looking at purely 2 variables of one player does not tell the whole story. Despite this, our analysis on margin of victory by levels of assists and shots taken implies that Bryant experienced a higher margin of victory when he took 15-20 shots and attained 6-8 assists. Furthermore, there does seem to be an intereaction between assists and shots taken. This makes sense because in games where Bryant takes too many shots, his assists are bound to be lower. A confounding variable could be the presence or lack of better teammates. Kobe had a higher win percentage when he played with Hall of Fame players Shaquille O' Neal and Pau Gasol and a lower win percentage without them which affects the response variable: margin of victory. Future research efforts will aim to find the other explanatory variables that justify variation in margin of victory and take into account periods where Bryant played with better teammates. Bottom Line: looking at soley Bryant's assists and shots taken a game does not accuratley assess if his ball dominat play style hurt the Lakers. However, the team averaged the highest average margin of victory when he had 6-8 assists and attempted 15-19 shots per game.