plotly_book/animation.Rmd at master · heisenbug47/plotly_book · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
# Animating views

## Key frame animations

Both `plot_ly()` and `ggplotly()` support [key frame](https://en.wikipedia.org/wiki/Key_frame) animations through the `frame` attribute/aesthetic. They also support an `ids` attribute/aesthetic to ensure smooth transitions between objects with the same id (which helps facilitate [object constancy](https://bost.ocks.org/mike/constancy/)). Figure \@ref(fig:animation-ggplotly) recreates the famous gapminder animation of the evolution in the relationship between GDP per capita and life expectancy evolved over time [@gapminder]. The data is recorded on a yearly basis, so the year is assigned to `frame`, and each point in the scatterplot represents a country, so the country is assigned to `ids`, ensuring a smooth transition from year to year for a given country.

```{r animation-ggplotly, fig.cap = "Animation of the evolution in the relationship between GDP per capita and life expectancy in numerous countries.", screenshot.alt = "screenshots/animation-ggplotly"}
data(gapminder, package = "gapminder")
gg <- ggplot(gapminder, aes(gdpPercap, lifeExp, color = continent)) +
  geom_point(aes(size = pop, frame = year, ids = country)) +
  scale_x_log10()
ggplotly(gg)
```

As long as a `frame` variable is provided, an animation is produced with play/pause button(s) and a slider component for controlling the animation. These components can be removed or customized via the `animation_button()` and `animation_slider()` functions. Moreover, various animation options, like the amount of time between frames, the smooth transition duration, and the type of transition easing may be altered via the `animation_opts()` function. Figure \@ref(fig:animation-opts) shows the same data as Figure \@ref(fig:animation-ggplotly), but doubles the amount of time between frames, uses linear transition easing, places the animation buttons closer to the slider, and modifies the default `currentvalue.prefix` settings for the slider.

```{r animation-opts, fig.cap = "Modifying animation defaults with `animation_opts()`, `animation_button()`, and `animation_slider()`.", screenshot.alt = "screenshots/animation-opts"}
base <- gapminder %>%
  plot_ly(x = ~gdpPercap, y = ~lifeExp, size = ~pop,
          text = ~country, hoverinfo = "text") %>%
  layout(xaxis = list(type = "log"))

base %>%
  add_markers(color = ~continent, frame = ~year, ids = ~country) %>%
  animation_opts(1000, easing = "elastic", redraw = FALSE) %>%
  animation_button(
    x = 1, xanchor = "right", y = 0, yanchor = "bottom"
  ) %>%
  animation_slider(
    currentvalue = list(prefix = "YEAR ", font = list(color="red"))
  )
```

If `frame` is a numeric variable (or a character string), frames are always ordered in increasing (alphabetical) order; but for factors, the ordering reflects the ordering of the levels. Consequently, factors provide the most control over the ordering of frames. In Figure \@ref(fig:animation-factors), the continents (i.e., frames) are ordered according their average life expectancy across countries within the continent. Furthermore, since there is no meaningful relationship between objects in different frames of Figure \@ref(fig:animation-factors), the smooth transition duration is set to 0. This helps avoid any confusion that there is a meaningful connection between the smooth transitions. Note that these options control both animations triggered by the play button or via the slider.

```{r animation-factors, fig.cap = "Animation of GDP per capita versus life expectancy by continent. The ordering of the contintents goes from lowest average (across countries) life expectancy to highest.", screenshot.alt = "screenshots/animation-factors"}
meanLife <- with(gapminder, tapply(lifeExp, INDEX = continent, mean))
gapminder$continent <- factor(
  gapminder$continent, levels = names(sort(meanLife))
)

base %>%
  add_markers(data = gapminder, frame = ~continent) %>%
  hide_legend() %>%
  animation_opts(frame = 1000, transition = 0, redraw = FALSE)
```

Both the `frame` and `ids` attributes operate on the trace level -- meaning that we can target specific layers of the graph to be animated. One obvious use case for this is to provide a background which displays every possible frame (which is not animated) and overlay the animated frames onto that background. Figure \@ref(fig:animation-targets) shows the same information as Figure \@ref(fig:animation-opts), but layers animated frames on top of a background of all the frames. As a result, it is easier to put a specific year into a global context.

```{r animation-targets, fig.cap = "Overlaying animated frames on top of a background of all possible frames.", screenshot.alt = "screenshots/animation-targets"}
base %>%
  add_markers(color = ~continent, alpha = 0.2, showlegend = F) %>%
  add_markers(color = ~continent, frame = ~year, ids = ~country) %>%
  animation_opts(1000, redraw = FALSE)
```

## Linking animated views

The section [linking views without shiny](#linking-views-without-shiny) details a framework for linking views through direct manipulation. This same framework can be leveraged to highlight objects as they progress through an animation, or even link objects between animations. Figure \@ref(fig:gapminder-highlight-animation) extends Figure \@ref(fig:animation-ggplotly) by layering on linear models specific to each frame and specifying `continent` as a key variable. As a result, one may interactively highlight any continent they wish, and track the relationship through the animation. In the animated version of Figure \@ref(fig:animation-ggplotly), the user highlights the Americas, which makes it much easier to see that the relationship between GDP per capita and life expectancy was very strong starting in the 1950s, but progressively weakened throughout the years.

```{r gapminder-highlight-animation, echo = FALSE, fig.cap = "Highlighting the relationship between GDP per capita and life expectancy in the Americas and tracking that relationship through several decades."}
knitr::include_graphics("images/gapminder-highlight-animation.gif")
```

```{r gapminder-highlight-animation-fake, include = knitr:::is_html_output()}
g <- crosstalk::SharedData$new(gapminder, ~continent)
gg <- ggplot(g, aes(gdpPercap, lifeExp, color = continent, frame = year)) +
  geom_point(aes(size = pop, ids = country)) +
  geom_smooth(se = FALSE, method = "lm") +
  scale_x_log10()
ggplotly(gg) %>%
  highlight("plotly_hover")
```

In addition to highlighting objects within an animation, objects may also be linked between animations. Figure \@ref(fig:animation-gapminder) links two animated views: on the left-hand side is population density by country and on the right-hand side is GDP per capita versus life expectancy. By default, all of the years are shown in black and the current year is shown in red. By pressing play to animate through the years, we can see that all three of these variables have increased (on average) fairly consistently over time. By linking the animated layers, we may condition on an interesting region of this data space to make comparisons in the overall relationship over time.

For example, in Figure \@ref(fig:animation-gapminder), countries below the 50th percentile in terms of population density are highlighted in blue, then the animation is played again to reveal a fairly interesting difference in these groups. From 1952 to 1977, countries with a low population density seem to enjoy large increases in GDP per capita and moderate increases in life expectancy, then in the early 80s, their GPD seems to decrease while the life expectancy greatly increases. In comparison, the high density countries seems to enjoy a more consistent and steady increase in both GDP and life expectancy. Of course, there are a handful of exceptions to the overall trend, such as the noticeable drop in life expectancy for a handful of countries during the nineties, which are mostly African countries feeling the affects of war.

```{r animation-gapminder, echo = FALSE, fig.cap = "Comparing the evolution in the relationship between per capita GDP and life expectancy in countries with large populations (red) and small populations (blue)."}
knitr::include_graphics("images/animation-gapminder.gif")
```

The `gapminder` data used thus far does not include surface area information, so Figure \@ref(fig:animation-gapminder) leverages a list of countries by area on Wikipedia. The R script used to obtain and clean that list is [here](https://gist.github.com/cpsievert/d4a4ccb7ce61e2cfaecf9736de4f67fa), but the cleaned version is directly available, plus add the areas to the `gapminder` data with the following code:

```{r}
countryByArea <- read.table(
  "https://bit.ly/2h6vscu",
  header = TRUE, stringsAsFactors = FALSE
)

gap <- gapminder %>%
  dplyr::left_join(countryByArea, by = "country") %>%
  transform(popDen = pop / area) %>%
  transform(country = forcats::fct_reorder(country, popDen))
```

The enhanced version of the `gapminder` data, `gap`, includes population density (population per square kilometer) and is used for the background layer (i.e., black points) in Figure \@ref(fig:animation-gapminder). In order to link the animated layers (i.e., red points), we need another version of `gap` that marks the country variable as the link between the plots (`gapKey`). The `new()` method for the `SharedData` class from the **crosstalk** package provides one way to define this link.^[You can also use the `key`/`set` attributes when linking views within **plotly**. The `set` attribute is equivalent to the `group` argument in the `SharedData$new()` function.]

```{r animation-gapminder-fake, include = knitr:::is_html_output()}
gapKey <- crosstalk::SharedData$new(gap, ~country)

p1 <- plot_ly(gap, y = ~country, x = ~popDen, hoverinfo = "x") %>%
  add_markers(alpha = 0.1, color = I("black")) %>%
  add_markers(data = gapKey, frame = ~year, ids = ~country, color = I("red")) %>%
  layout(xaxis = list(type = "log"))

p2 <- plot_ly(gap, x = ~gdpPercap, y = ~lifeExp, size = ~popDen,
              text = ~country, hoverinfo = "text") %>%
  add_markers(color = I("black"), alpha = 0.1) %>%
  add_markers(data = gapKey, frame = ~year, ids = ~country, color = I("red")) %>%
  layout(xaxis = list(type = "log"))

subplot(p1, p2, nrows = 1, widths = c(0.3, 0.7), titleX = TRUE) %>%
  hide_legend() %>%
  animation_opts(1000, redraw = FALSE) %>%
  layout(hovermode = "y", margin = list(l = 100)) %>%
  highlight("plotly_selected", color = "blue", opacityDim = 1, hoverinfo = "none")
```

Although Figure \@ref(fig:animation-gapminder) links two animated layers, it is probably more common to link non-animated display(s) with an animation. A sophisticated use within the statistical graphics literature is to link views with a grand tour to view model predictions projected onto a high-dimensional space [@model-vis-paper]. The grand tour is a special kind of animation that interpolates between random 2D projections of numeric data allowing us, to perceive the shape of a high-dimensional point cloud [@grand-tour]. Figure \@ref(fig:tour-USArrests) links a grand tour to a dendrogram displaying the results of a hierarchical clustering algorithm on the 50 US states with respect to Murder arrests (per 100,000), Assault arrests (per 100,000), Rape arrests (per 100,000), and percent urban population.

```{r tour-USArrests, echo = FALSE, fig.cap = "Linking a dendrogram to a grand tour and map of the `USArrests` data to visualize a classification in 5 dimensions."}
knitr::include_graphics("images/tour-USArrests.gif")
```

Figure \@ref(fig:tour-USArrests) makes use of [hierarchial selection](#hierarchial-selection) to select all the states (as well as all the child nodes) under a given node in both the dendrogram and the grand tour. This effectively provides a model selection tool in an unsupervised setting where one may choose a number of clusters by choosing relevant nodes in the dendrogram and viewing the model fit projected onto the data space. As shown in Figure \@ref(fig:tour-USArrests), after picking the 3 most obvious clusters, it looks as though a straight line could be drawn to completely separate the groups in the initial projection of the tour -- which suggests a split along this linear combination of these variables would provide a good classifier.^[In this situation, it may be desirable to retrieve the relevant linear combination after finding it. Since the slider displays the value of the current frame, one may go back to the data used to create the visualization, subset it to this value, and retrieve this linear combination.]

The code to generate Figure \@ref(fig:tour-USArrests), as well as a few other examples of the grand tour and linking brushing can be found in the package demos. To run the code for Figure \@ref(fig:tour-USArrests), run `demo("tour-USArrests", package = "plotly")`. To see a list of the other available demos, run `readLines(system.file("demo/00Index", package = "plotly"))`.


<!--
IDEAS:
  * Grand tour
  * Demonstrate high variance in density estimation via binning (i.e., same data and different anchor points for the bins can result in very different values for the binned frequencies)
  -->