Skip to content

Conversation

@rcliao
Copy link
Contributor

@rcliao rcliao commented Apr 10, 2016

Instructions

This is an example of homework 2 pull request format. Please follow the following instructions to create your pull request for submission.

First you want to start with creating a new branch. I'd suggest to name your branch like homework2 or feature/data-acquisition.

Once you created this branch, please collaborate with your teammate(s) to work within and only within this branch. In other word, all your work for homework 2 should be on this branch.

When you are done with homework 2 implementation, you may open pull request by clicking on the pull request tab like below:
screen shot 2016-04-14 at 7 47 24 pm

Then click on New Pull Request button
screen shot 2016-04-14 at 7 48 04 pm

Base should be master branch and you want to merge your own homework2 branch in.
screen shot 2016-04-14 at 7 49 01 pm

With above steps, your pull request should be created. You can now submit this link on CSNS.

Note that I only need one person per team to submit CSNS submission.
In CSNS submission, please include your team members' name and link to your pull request.

Next you want to answer the following questions in your pull request description.

Questions

What question(s) did you decide to work on as a team?

We want to figure out how much time does California have left before we used out all the water by the water usage of each city and weather data.

What is your data source(s)?

http://water.usgs.gov/watuse/
http://openweathermap.org/

How long does it take for you to download data? Have you download complete data set?

It took us about 24 hours to download the entire data set from usgs.gov. That is the complete data set we have from the government.

As for the open weather, it took us about 48 hours to download the weather data of 2015 to current. We are still getting all the real time values from the open weather api.

Although there are still data coming in from the open weather data, we believe this is the complete data set we have from open weather!

As for water usage data, we'd love to get real time data of water usage but we haven't figured out a place to get such data. Thus, water usage data is not completed.

How large is your data (size wise and number of records wise)?

8 GB total.
About 2 million records of text data.

Do you face any dirty data issue? If you do, how did you clean up your data?

Yes, data from usgs.gov is not returning all the data sometimes. We found some missing records and we have to manually remove them out.

How do you store the data you downloaded?

We store all the data in the CSV format.

screen shot 2016-04-24 at 1 27 08 pm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants