Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
c0dfc5d
decide on a project idea
idgetto Feb 25, 2016
aa160aa
find contribution data
idgetto Feb 28, 2016
74bb05d
add gitignore
idgetto Feb 28, 2016
4e25ce2
format contribution data
idgetto Feb 28, 2016
28a78b2
setup contribution database
idgetto Feb 28, 2016
db64c06
start exploring
idgetto Feb 29, 2016
d0735b6
add some figures
idgetto Feb 29, 2016
ae3bf6b
explore MA senators
idgetto Feb 29, 2016
a68943d
first iteration of roll call votes, need subjects/titles
dinopants174 Mar 1, 2016
644c17f
get data for 2013-2014
idgetto Mar 3, 2016
eca3226
Zoher's work scraping data before exploring
dinopants174 Mar 3, 2016
f335b06
Merge branch 'master' of github.com:idgetto/DataScience16CTW
dinopants174 Mar 3, 2016
6f2847d
write text of mid project checkin
idgetto Mar 4, 2016
640b749
Merge branch 'master' of github.com:idgetto/DataScience16CTW
idgetto Mar 4, 2016
6ac1bfe
Started exploring cleaned roll call votes
dinopants174 Mar 4, 2016
1ed315c
Merge branch 'master' of github.com:idgetto/DataScience16CTW
dinopants174 Mar 4, 2016
414cee4
add plots to checkin
idgetto Mar 4, 2016
98836ee
plot total monthly contributions
idgetto Mar 6, 2016
3e1343d
investigate party contributions
idgetto Mar 6, 2016
06935ee
compare monthly contributions between parties
idgetto Mar 8, 2016
7b8be72
add monthly contributions to deliverable
idgetto Mar 8, 2016
f075fa3
add title
idgetto Mar 9, 2016
a65df9c
outline deliverable
idgetto Mar 10, 2016
927144a
Working
dinopants174 Mar 10, 2016
7cec515
plot top contributors by party
idgetto Mar 10, 2016
0a7f5e6
select a senator
idgetto Mar 10, 2016
ae90274
start senator profile
idgetto Mar 10, 2016
d164fde
add top contributors to senator
idgetto Mar 10, 2016
38179e3
select a voting topic
idgetto Mar 10, 2016
de696b0
add search boxes
idgetto Mar 11, 2016
097b135
Zoher's work for deliverable
dinopants174 Mar 11, 2016
ce2a41f
Merge branch 'master' of github.com:idgetto/DataScience16CTW
dinopants174 Mar 11, 2016
4c1c72b
senator inline with party pie
idgetto Mar 11, 2016
547ecdd
fix title
idgetto Mar 11, 2016
05b687d
update titles
idgetto Mar 11, 2016
9288340
finished but uncommented deliverable
dinopants174 Mar 11, 2016
d7db66e
add setup instructions
idgetto Mar 11, 2016
ff922dd
Merge branch 'master' of github.com:idgetto/DataScience16CTW
dinopants174 Mar 11, 2016
3f2f17d
Merge branch 'master' of github.com:idgetto/DataScience16CTW
dinopants174 Mar 11, 2016
b0d59a5
ZG comments on deliverable finished
dinopants174 Mar 11, 2016
bacdfa6
fix file name
idgetto Mar 11, 2016
9ded071
Added further work section
dinopants174 Mar 11, 2016
bd70c2c
Merge branch 'master' of github.com:idgetto/DataScience16CTW
dinopants174 Mar 11, 2016
bc8e989
Testing readme data documentation
dinopants174 Mar 11, 2016
f4099b6
ZG data finished readme
dinopants174 Mar 11, 2016
dd23679
Fixed tiny unclear thing
dinopants174 Mar 11, 2016
de6d595
add some commentary
idgetto Mar 11, 2016
e9774f8
database info
idgetto Mar 11, 2016
e01d196
add data to README
idgetto Mar 12, 2016
d15c803
add query example
idgetto Mar 12, 2016
d8744a7
fix seed file name
idgetto Mar 12, 2016
fa0f224
add reflection
idgetto Mar 12, 2016
c0431b0
fix README link
idgetto Mar 12, 2016
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
*.pyc
*~
*.swp
.ipynb_checkpoints/
Binary file added ProjectProposal.pdf
Binary file not shown.
Binary file added ProjectReflection.pdf
Binary file not shown.
144 changes: 144 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,146 @@
# DataScience16CTW
This is the base repo for the "Change the World" project for Data Science at Olin College, Spring 2016.

## Setup

Follow these steps to run our deliverable jupyter notebook:

1. install ipywidgets, sqlalchemy, pandas, matplotlib, seaborn
2. in data/congress_db.py change the variable ROOT to the absolute path to your cloned DataScience16CTW directory
3. run `jupyter notebook deliverable/deliverable.ipynb`

## Data

### SQL Database: FEC

One source of our data for this project was from the [FEC website](http://www.fec.gov/finance/disclosure/ftpdet.shtml#a2015_2016). The FEC provides detailed information about candidates running for the senate, house, and presidency as well as contribution information related to committees.

We used three files from the FEC:

* The Candidate Master File
* The Committee Master File
* The Contributions to Candidate from Committees file

Using these three files, we setup a sqlite database with sqlalchemy as our object relational mapper. The code that creates the database can be found in [data/setup_db.py](./data/setup_db.py) and [data/seed_db.py](./data/seed_db.py). In `setup_db.py` we create three tables `Candidate`, `Committee`, and `Contribution`. We then make queries to the database to find information related to each of the tables.

The tables contain the following data:

### Candidate

| **Column** | **Contains** |
|-------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `id` | a 9-character alpha-numeric code assigned to a candidate by the Federal Election Commission. |
| `name` | candidate's name |
| `party` | candidate's party |
| `election_year` | candidate's election year |
| `office_st` | candidate's state of representation |
| `office` | office that the candidate is running for (H=House, P=President, S=Senate) |
| `office_district` | candidate's congressional district number |
| `ici` | incumbent challenger status (C=Challenger, I=Incumbent, O=Open Seat) |
| `status` | candidate's status (C=Statutory candidate, F=Statutory candidate for future election, N=Not yet a statutory candidate, P=Statutory candidate in prior cycle) |
| `pcc` | candidate's principal campaign committee ID |
| `mail_street1` | candidate's mailing address street line 1 |
| `mail_street2` | candidate's mailing address street line 2 |
| `mail_state` | candidate's mailing address state

### Committee

| **Column** | **Contains** |
|---------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `id` | a 9-character alpha-numeric code assigned to a committee by the Federal Election Commission |
| `name` | committee's name |
| `treasurer` | officially registered treasurer for the committee |
| `street1` | committee street address line 1 |
| `street2` | committee street address line 2 |
| `city` | committee city |
| `state` | committee state |
| `zip` | committee zip code |
| `designation` | committee designation (A=Authorized by a candidate, B=Lobbyist/Registrant PAC, D=Leadership PAC, J=Joint fundraiser, P=Principal campaign committee of a candidate, U=Unauthorized) |
| `committee_type` | type of committee; [committee types](http://www.fec.gov/finance/disclosure/metadata/CommitteeTypeCodes.shtml) |
| `party` | political party associated with committee; [party list](http://www.fec.gov/finance/disclosure/metadata/DataDictionaryPartyCodeDescriptions.shtml) |
| `filling_freq` | frequency of committee reports (A=Administratively terminated ,D=Debt, M=Monthly filer ,Q=Quarterly filer, T=Terminated, W=Waived) |
| `organization_type` | interest group category (C=Corporation, L=Labor organization, M=Membership organization, T=Trade association, V=Cooperative, W=Corporation without capital stock) |
| `organization_name` | connected organization's name |
| `candidate_id` | id of associated candidate

### Contribution

| **Column** | **Contains** |
|----------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `id` | unique contribution id |
| `committee_id` | a 9-character alpha-numeric code assigned to a committee by the Federal Election Commission |
| `amendment` | indicates if the report being filed is new (N), an amendment (A) to a previous report, or a termination (T) report |
| `report_type` | type of report; [report types](http://www.fec.gov/finance/disclosure/metadata/ReportTypeCodes.shtml) |
| `tx_pgi` | code indicates the,election for which the contribution was made |
| `entity_type` | type of entity (CAN=Candidate, CCM=Candidate Committee, COM=Committee, IND=Individual (a person), ORG=Organization (not a committee and not a person), PAC=Political Action Committee, PTY=Party Organization) |
| `name` | recipient/payee |
| `city` | city/town |
| `state` | state |
| `zip` | zip code |
| `employer` | employer |
| `occupation` | occupation |
| `tx_date` | transaction date |
| `tx_amount` | transaction amount |
| `other_id` | FEC ID of the recipient committee or the supported or opposed candidate ID |
| `candidate_id` | id if candidate receiving the contribution |
| `tx_id` | unique identifier associated with each itemization or transaction appearing in an FEC electronic file |
| `file` | unique report id |
| `memo_code` | memo code |
| `memo_text` | description of the activity |

### Querying Contribution Data

We use sqlalchemy in order to query data from the contribution database. For example, we can find the total contribution amount of candidates grouped by state.

```python
import congress_db
from setup_db import Candidate, Contribution, Committee
from sqlalchemy import func

session = congress_db.create_session()

session.query(Candidate.office_st,
func.sum(Contribution.tx_amount).label('total_contr')).\
join(Contribution).\
group_by(Candidate.office_st).\
order_by('total_contr desc').\
all()

# => [(u'US', 110900511.0), (u'CA', 94463385.0), (u'NC', 86051685.0), ...]
```

### rollCallVotes_cleaned.csv

In order to create the initial rollCallVotes_iter4.csv file:

1. install bs4, lxml, pandas
2. run 'jupyter notebook data/rollCallVotes_createCsv.ipynb'

In the resulting rollCallVotes_iter4.csv file, each row is a roll call vote that occurred in the US Senate during the first session of the 114th Congress. The first 100 columns contain each senator as a string in the following format:

`Last_name (Party-State)`

There are 6 columns remaining and are organized as such:

Column | Contains
--- | ---
`billTitle` | the title used by congress.gov to refer to this legislation
`sponsor` | the senator who sponsored this legislation
`subjects` | a list, stored as a string, of all subjects this legislation concerns according to congress.gov
`title` | since amendments on congress.gov do not contain subjects, links amendments to each bill to find the corresponding subjects
`voteDate` | the date of the roll call vote
`voteResult` | the result of the vote

Now in order to create the rollCallVotes_cleaned.csv file, having created the initial file rollCallVotes_iter4.csv:

run 'jupyter notebook data/rollCallVotes_cleanCsv.ipynb'

The resulting rollCallVotes_clean.csv file contains the same 106 columns as rollCallVotes_iter4.csv. However it contains an additional 662 columns where each column is a topic that a piece of legislation concerned during this session of Congress. Each column contains a boolean series of 1s and 0s to determine whether or not a given bill concerns this subject.

Upon loading rollCallVotes_cleaned.csv, the code below will get all votes concerning 'Qatar' and store them in 'qatar_df':

```python
import pandas as pd
df = pd.read_csv('../data/rollCallVotes_cleaned.csv')
qatar_df = df[df['Qatar'] == 1]
```
8 changes: 8 additions & 0 deletions data/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
create_db:
python setup_db.py

drop_db: congress.db
rm congress.db

seed_db: congress.db
python seed_db.py
Loading