speechbr/README.Rmd at master · dcardosos/speechbr · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
---
output: github_document
always_allow_html: true
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# speechbr <img src="man/figures/hexlogo.png" align="right" width = "120px"/>


<!-- badges: start -->
[![](https://cranlogs.r-pkg.org/badges/speechbr)](https://cran.r-project.org/package=speechbr)
<!-- badges: end -->

## Overview

The goal of `{speechbr}` is to democratize access to the speeches of the deputies, that is, their ideias and thoughts.

The data is obtained on [Discursos e Notas Taquigráficas](https://www2.camara.leg.br/atividade-legislativa/discursos-e-notas-taquigraficas) of [Câmara dos Deputados](https://www.camara.leg.br/).

## Observation

The released version from CRAN is limited to speeches before 2022. For access speeches after 2021-12-31, use the development version.

## Installation

You can install the released version of `{speechbr}` from [CRAN](https://cran.r-project.org/) with:

```{r eval=FALSE, error=FALSE, message=FALSE, warning=FALSE}
install.packages("speechbr")
```

You can install the development version of `{speechbr}` from [GitHub](https://github.com/) with:

```{r eval=FALSE, error=FALSE, message=FALSE, warning=FALSE}
# install.packages("devtools")
devtools::install_github("dcardosos/speechbr")
```

## Example

An example of a base searching for the term "tecnologia" between 2021-09-01 and 2021-10-01:

```{r example, warning=FALSE, message=FALSE}
library(speechbr)

tab <- speechbr::speech_data(
  keyword = "tecnologia",
  start_date = "2021-09-01",
  end_date = "2021-10-01")

dplyr::glimpse(tab)
```

The others parameters are `party` (political party), `speaker` (speaker's name) and `uf` (state acronym). Their default values are _empty_ ("").

A simple application using the base, a wordcloud:

```{r example_2, eval = FALSE}
# install.package("wordlcoud2")
# install.package("tidytext")

stop_words <- tidytext::get_stopwords("pt")

others_words <- c("nao", "ter", "termos", "r", "fls", "sr", "ja", "sao",
                  "porque", "aqui","ha", "ser", "ano", "presidente", "tambem")

tab %>%
  tibble::rowid_to_column("id") %>%
  dplyr::select(id, discurso) %>%
  tidytext::unnest_tokens(word, discurso) %>%
  dplyr::filter(!grepl('[0-9]', word)) %>%
  dplyr::mutate(word = abjutils::rm_accent(word)) %>%
  dplyr::anti_join(stop_words) %>%
  dplyr::group_by(word) %>%
  dplyr::count(word, sort = TRUE) %>%
  dplyr::filter(n > 5, !word %in% others_words) %>%
  wordcloud2::wordcloud2()
```

### Example of a base

```{r example_3, echo=FALSE}

tab %>%
  head(2) %>%
  knitr::kable()
```

## How to cite

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.5921104.svg)](https://doi.org/10.5281/zenodo.5921104)