forked from jkeast/wikitablr
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathREADME.Rmd
More file actions
80 lines (58 loc) · 2.35 KB
/
README.Rmd
File metadata and controls
80 lines (58 loc) · 2.35 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# wikitablr <img src="man/figures/wikitablr_hex_logo.png" align="right" height=150/>
<!-- badges: start -->
[](https://travis-ci.org/jkeast/wikitablr)
<!-- badges: end -->
`wikitablr` is an R package that has the tools to simply webscrape tables from wikipedia, and clean for common formatting issues. The intention here is to empower beginners to explore data on practically any subject that interests them (as long as there's a wikipedia table on it), but anyone can utilize this package.
`wikitablr` takes data that looks like this:
```{r, warning = FALSE, message = FALSE, echo = FALSE}
# Edit this part
library(wikitablr)
head(read_wiki_raw("https://en.wikipedia.org/wiki/List_of_songs_recorded_by_the_Beatles", 2))
```
and makes it look like this:
```{r, warning = FALSE, message = FALSE, echo = FALSE}
# Edit this part
head(read_wiki_table("https://en.wikipedia.org/wiki/List_of_songs_recorded_by_the_Beatles", 2))
```
## Installation
You can install the development version from [GitHub](https://github.com/) with:
``` r
# install.packages("devtools")
devtools::install_github("baumer-lab/wikitablr")
```
## Example
The first step is to read in all tables on a given wikipedia page using `read_wikitables()`. The input is a url, and the output is a data frame including the tables on the page as well as some additional information about each table.
```{r}
#read in tables
colleges <- read_wikitables("https://en.wikipedia.org/wiki/List_of_colleges_and_universities_in_Massachusetts")
head(colleges)
```
The next step is to clean the tables by either extracting a single table and applying cleaning functions individually, or mapping the cleaning functions over all of the tables.
```{r}
# extract table 1
table1 <- read_wikitables("https://en.wikipedia.org/wiki/List_of_colleges_and_universities_in_Massachusetts") %>%
pull(table) %>%
pluck(1)
table1
```
```{r}
#remove footnotes from table
head(remove_footnotes(table1))
```
```{r}
# remove footnotes from all tables
colleges %>%
mutate(clean_table = map(table, remove_footnotes))
```