Created by: Hevander Da Costa
This project was the capstone of the Correlation One Data Science for All program. The purpose of this project was to analyze New York City data on asthma contributors and Social Determinants of Health to uncover what potentially drives asthma disparity in this city. Although there are a wide variety of potential asthma contributors, for this project focused on indoor and outdoor air quality because they are widely believed to be the main contributors.
Full Report: What Contributes to Asthma Disparity in New York City
Power Point: Team 45 Presentation
Data Summary: Data | Source Links
Dataset: airq_34_all
Contains data about the average amounts of toxins: Fine particulate matter, nitrogen dioxide, and ozone. The data is categorized by UHF34 neighborhood for years 2009 to 2018.
There are 330 rows and 6 columns.
| Field | Type | Description |
|---|---|---|
| name_of_column | The python consumable format | Brief description of the field. If the field follows a specific format (e.g. a specific date format) include that here too. |
| year | STRING | Year |
| Borough | STRING | NYC Borough Name associated with UHF 34 neighborhood |
| geo_place_name | STRING | UHF 34 Neighborhood name |
| mean_fpm | Float | Average yearly amount of fine particulate matter |
| mean_no | Float | Average yearly amount of nitrogen dioxide |
| Ozone mean (ppb) | Float | Average yearly amount of ozone |
Dataset: airq_42_all
Contains data about the average amounts of toxins: ?ine particulate matter, nitrogen dioxide, and ozone. The data is categorized by UHF42 neighborhood for years 2009 to 2018.
There are 420 rows and 6 columns.
| Field | Type | Description |
|---|---|---|
| name_of_column | The python consumable format | Brief description of the field. If the field follows a specific format (e.g. a specific date format) include that here too. |
| year | STRING | Year |
| Borough | STRING | NYC Borough Name associated with UHF 42 neighborhood |
| geo_place_name | STRING | UHF 42 Neighborhood name |
| mean_fpm | Float | Average yearly amount of fine particulate matter |
| mean_no | Float | Average yearly amount of nitrogen dioxide |
| Ozone mean (ppb) | Float | Average yearly amount of ozone |
Dataset: benzene_42
Contains data about the average concentration of benzene in the air.The data is categorized by UHF42 neighborhood for years 2005 and 2011.
There are 84 rows and 3 columns.
| Field | Type | Description |
|---|---|---|
| name_of_column | The python consumable format | Brief description of the field. If the field follows a specific format (e.g. a specific date format) include that here too. |
| year | STRING | Year |
| geo_place_name | STRING | UHF 34 Neighborhood name |
| mean_benzene | Float | Average yearly concentration of benzene in the air |
Dataset: formaldehyde_42
Contains data about the average concentration of formal dehyde in the air.The data is categorized by UHF42 neighborhood for years 2005 and 2011.
There are 84 rows and 3 columns.
| Field | Type | Description |
|---|---|---|
| name_of_column | The python consumable format | Brief description of the field. If the field follows a specific format (e.g. a specific date format) include that here too. |
| year | STRING | Year |
| geo_place_name | STRING | UHF 34 Neighborhood name |
| mean_formaldehyde | Float | Average yearly concentration of benzene in the air |
Dataset: boiler_emissions
Contains data about the average boiler emissions of toxins nitrogen dioxide,sulfurdioxide and ine particulate matter.The data is categorized by UHF42 neighborhood for years 2013 and 2015.
There are 84 rows and 6 columns.
| Field | Type | Description |
|---|---|---|
| name_of_column | The python consumable format | Brief description of the field. If the field follows a specific format (e.g. a specific date format) include that here too. |
| year | STRING | Year |
| geo_place_name | STRING | UHF 42 Neighborhood name |
| nox_num_per_km2 | Float | Number of emissions per kilometer squared |
| so2_num_per_km2 | Float | Number of emissions per kilometer squared |
| pm2_num_per_km2 | Float | Number of emissions per kilometer squared |
Dataset: sulfur_34
Contains data about the average amount of sulfurdioxide in the air.The data is categorized by UHF34 neighborhood for years 2008-2015.
There are 272 rows and 3 columns.
| Field | Type | Description |
|---|---|---|
| name_of_column | The python consumable format | Brief description of the field. If the field follows a specific format (e.g. a specific date format) include that here too. |
| year | STRING | Year |
| geo_place_name | STRING | UHF 34 Neighborhood name |
| mean_so2 | Float | Average yearly amount of sulfur |
Dataset: sulfur_42
Contains data about the average amount of sulfurdioxide in the air. The data is categorized by UHF42 neighborhood for years 2008 and 2015.
There are 336 rows and 3 columns.
| Field | Type | Description |
|---|---|---|
| name_of_column | The python consumable format | Brief description of the field. If the field follows a specific format (e.g. a specific date format) include that here too. |
| year | STRING | Year |
| geo_place_name | STRING | UHF 42 Neighborhood name |
| mean_so2 | Float | Average yearly amount of sulfur |
Dataset: o3_pm2_attributable_hospital_visits
Contains data about the number of emergency department visits and hospitalizations for asthma attributed to ?ine particulate matter and ozone toxins. The data is categorized by UHF 42 neighborhood. The data is categorized in the following time periods: 2005-2007, 2009 - 2011, 2012-2014, 2015-2017.
There are 168 rows and 9 columns.
| Field | Type | Description |
|---|---|---|
| name_of_column | The python consumable format | Brief description of the field. If the field follows a specific format (e.g. a specific date format) include that here too. |
| Time Period | STRING | Two year range |
| Start_Date | STRING | Start date for time period |
| geo_place_name | STRING | UHF 42 Neighborhood name |
| child_o3_asthma_hos pital_per_100k | Float | Rate of hospitalizations for asthma in children attributed to ozone out of 100,000 |
| adult_o3_asthma_ho spital_per_100k | Float | Rate of hospitalizations for asthma in adults attributed to ozone out of 100,000 |
| adult_pm2_asthma_e d_visits_per_100k | Float | Rate of emergency department visits for asthma in adults attributed to fine particulate matter out of 100,000 |
| child_pm2_asthma_e d_visits_per_100k | Float | Rate of emergency department visits for asthma in children attributed to fine particulate matter out of 100,000 |
| adult_o3_asthma_ed_ visits_per_100k | Float | Rate of emergency department visits for asthma in adults attributed to ozone out of 100,000 |
| child_o3_asthma_ed_ visits_per_100k | Float | Rate of emergency department visits for asthma in children attributed to ozone out of 100,001 |
Dataset: traffic_merged
Contains data about the number of miles driven by cars and trucks in UHF42 neighborhoods.The data covers years 2005 and 2016.
There are 84 rows and 6 columns.
| Field | Type | Description |
|---|---|---|
| name_of_column | The python consumable format | Brief description of the field. If the field follows a specific format (e.g. a specific date format) include that here too. |
| year | STRING | Year |
| geo_place_name | STRING | UHF 42 Neighborhood name |
| cars_million_miles | Float | Number of miles traveled by cars in millions |
| trucks_million_miles | Float | Number of miles traveled by trucks in millions |
| total_million_miles | Float | Sum of miles traveled by cars and trucks in millions |
Dataset: adult_smoking_joined_UHF34_CLEAN
This is data related to adults smoking and being in smoking environments. Additional meta data was dropped. Data was converted to numeric values to allow for appropriate usage.
There are 120 rows and 10 columns.
| Field | Type | Description |
|---|---|---|
| Year | Category | Year of data, in format yyyy |
| geo_type_name | Category | Granularity level of geography category |
| borough | Category | Borough for data |
| secondhand_smoke_home _adult_count | FLOAT | Count of adults reporting secondhand smoke at home |
| secondhand_smoke_home _adult_percent | FLOAT | Percent of adults reporting secondhand smoke at home |
| smoking_adults_count | FLOAT | Count of adults reporting smoking |
| smoking_adults_percent | FLOAT | Percent of adults reporting smoking |
| secondhand_smoke_work_ adult_count | FLOAT | Count of adults reporting secondhand smoke at work |
| secondhand_smoke_wor k_adult_percent | FLOAT | Percent of adults reporting secondhand smoke at work |
Dataset: NYC_SDOH
The social determinants of health (SDH) are the non-medical factors that in?luence health outcomes. They are the conditions in which people are born, grow, work, live, and age, and the wider set of forces and systems shaping the conditions of daily life. Variables in the SDOH database correspond to the 5 key domains identi?ied by AHRQ: social context, economic context, education, physical infrastructure, and healthcare context. In addition to these domains, there is a category for Geography, which includes ID variables (County, FIPS code, ZCTA, State, and Year) as well as 14 county adjacency variables and urban/rural codes. Data was cleaned based on the values available for the 5 ?ive counties of New York City for 2009-2018. Counties: Brooklyn County - The Bronx, Kings County - Brooklyn, New York County - Manhattan, Queens County - Queens, Richmond County - Staten Island.
There are 51 rows and 231 columns.
| Field | Type | Description |
|---|---|---|
| COUNTY | STRING | County name |
| FIPSCODE | INTEGER | State-county FIPS code, 5 digits (County only) |
| YEAR | DATE | The year the data is from |
| ACS_PCT_AGE_65UP | FLOAT | Percentage of population age 65 and over |
| ACS_PCT_AGE_0_17 | FLOAT | Percentage of population age 0-17 |
| ACS_PCT_AGE_15_17 | FLOAT | Percentage of population age 15-17 |
| ACS_PCT_AGE_0_4 | FLOAT | Percentage of population age 0-4 |
| etc. - full descrip on in NYC_SDOH_dic onary | ||
| NYC_SDOH_dictionary |
Dataset: NYC_SDOH_dictionary
Contains information for researchers about the structure and contents of the database and descriptions of each data source used to populate the database.
There are 236 rows and 4 columns.
Dataset: Asthma_ED_Visits
Asthma emergency room visits for NYC residents. Data cleaned based on “lowest common denominator” or based on the least detailed data set which was the SDOH data set that contained more general data gathered based on county rather than individual UHF 42 neighbourhood. The average of the total ED visits from all neighborhoods in each county was taken and organized by year. The age-adjusted rate (for adults only, per 10000 residents) and estimated annual rate (per 10000 residents) from all counties was taken and organized by year. Asthma ED Visit data was only taken/available for the years 2009-2016 with no data available per county for the year 2015.
There are 106 rows and 6 columns.
| Field | Type | Description |
|---|---|---|
| COUNTY | STRING | County name |
| YEAR | DATE | Year of data collection |
| INDICATOR_NAME | STRING | Population name by age. i.e. Adults (18+), Children (0-4), Children (5-17) |
| NUMBER | INTEGER | Average of the total ED visits from all neighborhoods in each county. |
| AGE_ADJUSTED_RA TE | FLOAT | Number of ED visits per country adjusted for population older than 18 years (adults), per 10,000 residents. |
| ESTIMATED_ANNUA L_RATE | FLOAT | Number of ED visits per country adjusted for population older than 18 years, per 10,000 residents for that year. |
Dataset: Indoor_air_quality_all
Dataset contains resident reported complaints on indoor air quality. Complaints are tabulated individually per report; report dates range from 2010 to 2021. The included columns: Borough, geo_place_name, zip code, longitude and latitude, are used to identify location.
There are 65050 rows and 7 columns.
| Field | Type | Description |
|---|---|---|
| name_of_column | The python consumable format | Brief description of the field. If the field follows a specific format (e.g. a specific date format) include that here too. |
| Year | STRING | Year |
| Borough | STRING | NYC Borough Name associated with UHF 42 neighborhood |
| geo_place_name | STRING | UHF 42 Neighborhood name |
| Zip code | STRING | Zip code of complaint address |
|---|---|---|
| Complaint type | STRING | Type of complaint reported by residents |
| Longitude | FLOAT | Longitude of complaint location |
| Latitude | FLOAT | Latitude of complaint location |
Dataset: Adultswith Asthma in the Past 12 Months.csv
Description: Prevelance of adults with asthma in the past 12 months. Listed by NYC UHF Neighborhoods and year. I removed metadata, removed commas, changed column names, made everything lowercase, and changed the datatypes to the appropriate datatypes for each column.
There are 521 rows and 8 columns.
| Field | Type | Description |
|---|---|---|
| year | category | The year of that data point |
| geo_type_name | cateogry | The type of geography for that data point (ex. Citywide, Neighborhood, Borough) |
| borough | category | Name of borough in NYC |
| geography | category | The most specific geographical location |
| geography_id | category | Unique geographical ID for every geographical location |
| adults_12mo_asthma_ag e_adjusted_percent | float | Percentage of adults with asthma adjusted for age |
| adults_12mo_asthma_nu mber | float | Number of adults with asthma |
| adults_12mo_asthma_pe rcent | float | Percentage of adults with asthma |
Dataset: Public School Children (5-14 YrsOld) with Asthma.csv
Description: Prevelance of public school children from ages 5-14 with asthma. Listed by NYC UHF Neighborhoods and year. I removed metadata, removed commas, changed column names, made everything lowercase, and changed the datatypes to the appropriate datatypes for each column
There are 193 rows and 7 columns.
| Field | Type | Description |
|---|---|---|
| year | category | The year of that data point |
| geo_type_name | cateogry | The type of geography for that data point (ex. Citywide, Neighborhood, Borough) |
| borough | category | Name of borough in NYC |
| geography | category | The most specific geographical location |
| geography_id | category | Unique geographical ID for every geographical location |
| children_5_14_estimated _annual_rate_per_1000 | float | Rate of children age 5-14 with asthma (per 1000) |
| children_5_14_number | float | Number of children age 5-14 with asthma |
Dataset: Asthma Emergency Department Visits(Adults).csv
Description: Asthma related emergency department visits for adults. Listed by NYC UHF Neighborhoods and year. I removed metadata, removed commas, changed column names, made everything lowercase, and changed the datatypes to the appropriate datatypes for each column.
There are 530 rows and 8 columns.
| Field | Type | Description |
|---|---|---|
| geo_type_name | cateogry | The type of geography for that data point (ex. Citywide, Neighborhood, Borough) |
| borough | category | Name of borough in NYC |
| geography | category | The most specific geographical location |
| geography_id | category | Unique geographical ID for every geographical location |
| ed_annual_adult_estima ted_age_adjusted_rate_ per10k | float | Age adjusted rate of adults (per10,000) that visited the emergency department for asthma |
| ed_annual_adult_rate_p er10k | float | Rate of adults (per10,000) that visited the emergency department for asthma |
| ed_annual_adult_numbe r | float | Number of adults that visited the emergency department for asthma |
| year | category | The year of that data point |
Dataset: Asthma Emergency Department Visits(Children 5 to 17 YrsOld).csv
Description: Asthma related emergency department visits for children 5-17 years old. Listed by NYC UHF Neighborhoods and year. I removed metadata, removed commas, changed column names, made everything lowercase, and changed the datatypes to the appropriate datatypes for each column.
There are 577 rows and 7 columns.
| Field | Type | Description |
|---|---|---|
| geo_type_name | cateogry | The type of geography for that data point (ex. Citywide, Neighborhood, Borough) |
| borough | category | Name of borough in NYC |
| geography | category | The most specific geographical location |
| geography_id | category | Unique geographical ID for every geographical location |
| ed_annual_5_17_rate_pe r10k | float | Rate of children 5-17 years old (per10,000) that visited the emergency department for asthma |
| ed_5_17_number | float | Number of children 5-17 years old that visited the emergency department for asthma |
| year | category | The year of that data point |
Dataset: Asthma Hospitalizations(Adults).csv
Description: Number of adults hospitalized for asthma. Listed by NYC UHF Neighborhoods and year. I removed metadata, removed commas, changed column names, made everything lowercase, and changed the datatypes to the appropriate datatypes for each column.
There are 530 rows and 8 columns.
| Field | Type | Description |
|---|---|---|
| geo_type_name | cateogry | The type of geography for that data point (ex. Citywide, Neighborhood, Borough) |
| borough | category | Name of borough in NYC |
| geography | category | The most specific geographical location |
| geography_id | category | Unique geographical ID for every geographical location |
| asthma_hosp_adult_esti mated__age_adjusted_r ate_per10k | float | Age adjusted rate of adults (per10,000) that were hospitalized for asthma |
| asthma_hosp_adult_esti mated__rate_per10k | float | Rate of adults (per10,000) that were hospitalized for asthma |
| asthma_hosp_adult_nu mber | float | Number of adults that were hospitalized for asthma |
| year | category | The year of that data point |
Dataset: AsthmaHospitalizations(Children5to17YrsOld).csv
Description: Number of children 5-17 years old hospitalized for asthma. Listed by NYC UHF Neighborhoods and year. I removed metadata, removed commas, changed column names, made everything lowercase, and changed the datatypes to the appropriate datatypes for each column.
There are 577 rows and 7 columns.
| Field | Type | Description |
|---|---|---|
| geo_type_name | cateogry | The type of geography for that data point (ex. Citywide, Neighborhood, Borough) |
| borough | category | Name of borough in NYC |
| geography | category | The most specific geographical location |
| geography_id | category | Unique geographical ID for every geographical location |
| asthma_hosp_5_17_esti mated_annual_rate_per _10000 | float | Rate of children 5-17 years old (per10,000) that were hospitalized for asthma |
| asthma_hosp_5_17_num ber | float | Number of children 5-17 years old hospitalized for asthma |
| year | category | The year of that data point |
Dataset: MedianHouseholdIncomeByRacebyTract,2012-2016
Contains data about the average household income organized by race in different state and regional levels the data is organize by year from 2012 to 2016 Having over 30 ?ields I will be organizing and cleaning up including below what seems more relevant to our research
There are 72730 rows and 33 columns.
| Field | Type | Description |
|---|---|---|
| name_of_column | The python consumable format | Brief description of the field. If the field follows a specific format (e.g. a specific date format) include that here too. |
| STATE_NAME | STRING | Ex: Maryland or New York |
| ST_ABBREV | STRING | EX: MD or NY |
| Median Household Income in Past 12 Months, Some Other Race Householder - Estimate | FLOAT | Calculates the median income over a year for a specific group – these are estimates from public census data |
| Median Household Income in Past 12 Months - Estimate | FLOAT | Calculates the median income over a year for a specific group – these are estimates from public census data |
| Median Household Income in Past 12 Months, 2 or More Races Householder - Estimate | FLOAT | Calculates the median income over a year for a specific group – these are estimates from public census data |
| Median Household Income in Past 12 Months, American Indian and Alaska Native Householder - Estimate | FLOAT | Calculates the median income over a year for a specific group – these are estimates from public census data |
|---|---|---|
| Median Household Income in Past 12 Months, Asian Householder - Estimate | FLOAT | Calculates the median income over a year for a specific group – these are estimates from public census data |
| Median Household Income in Past 12 Months, Black or African American Householder - Estimate | FLOAT | Calculates the median income over a year for a specific group – these are estimates from public census data |
| Median Household Income in Past 12 Months, Hispanic or Latino Householder - Estimate | FLOAT | Calculates the median income over a year for a specific group – these are estimates from public census data |
| Median Household Income in Past 12 Months, Native Hawaiian and Other Pacific Islander Householder - Estimate | FLOAT | Calculates the median income over a year for a specific group – these are estimates from public census data |
| Median Household Income in Past 12 Months, Non-Hispanic White Householder – Estimate | FLOAT | Calculates the median income over a year for a specific group – these are estimates from public census data |
