-
Notifications
You must be signed in to change notification settings - Fork 42
query sumlevels, tables, geo by variable #4
Description
This is more of a feature request than a report of a bug.
I have been hugely impressed with your API as I think it could be very useful for social scientists like myself that often use census and economic data and are frustrated with the census site. I especially like that you can specify a variable and a geographic sumlevel, and the results will be returned at the desired unit of analysis. This is a big step forward!
I have created a short guide for constructing queries in R to make your API accessible to scholars:
https://gist.github.com/lecy/0aa782a873cd174573f32d243233ca5b
The hardest part of using this data is that the validity of arguments varies by the underlying data source. So in one instance I can use "show=geo&sumlevel=msa" and in another instance I cannot, which is confusing for users and it makes it frustrating to figure out if you can get data at the desired level.
To address this problem I have included a couple of helper functions in the guide to allow the user to peek at attributes, and to print the set of valid cases based upon the logic view at "http://api.datausa.io/api/logic/?".
For example, I have formatted the query "http://api.datausa.io/api/logic/?show=geo&sumlevel=all" to print a bunch of tables that look like this:
getUsage( "geo" )
# TABLE: ygi_num_emp
# DATA SOURCE: ACS 3-year Estimate
# DEPARTMENT: Census Bureau
# LINK: http://www.census.gov/programs-surveys/acs/
#
# SUPPORTED SUMLEVELS:
#
# acs_ind: 0, 1, 2, all
# geo: nation, state, msa, all
#
# ...67 more tables printedThe problem is that this use case searches by data attribute (show=geo) and references a data table (ygi_num_emp), but in the typical case the user would want to search for valid sumlevels associated with a specific variable. For example, can I find the number of philosophy majors by county?
Ideally there would be a way to specify an API query that submits a variable name and returns all valid attribute-sumlevel pairs (for example show=geo & sumlevel= nation, state, or msa).
It would be possible to create some functions to do this if the following queries are available:
- Return all data tables and their valid sumlevels
- Return all variables associated with a data table (even better if they include definitions)
- Return all valid sumlevels associated with an attribute
Please let me know if these functions currently exist, or can possibly be added.