Skip to content

Correlation One data science project on NY City asthma rate disparities.

Notifications You must be signed in to change notification settings

Hevander27/AsthmaAnalysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

109 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Asthma Disparities in New York City

Created by: Hevander Da Costa

  1. Purpose
  2. Analysis
  3. Report and Presentation
  4. Data Table Schema

Purpose

This project was the capstone of the Correlation One Data Science for All program. The purpose of this project was to analyze New York City data on asthma contributors and Social Determinants of Health to uncover what potentially drives asthma disparity in this city. Although there are a wide variety of potential asthma contributors, for this project focused on indoor and outdoor air quality because they are widely believed to be the main contributors.

Team45_projectBoard

Analysis

Regression Analyis

Chi-Squared Analysis

Report and Presentation

Full Report: What Contributes to Asthma Disparity in New York City

Power Point: Team 45 Presentation

Data Summary: Data | Source Links

Data Table Schema

Dataset: airq_34_all

Contains data about the average amounts of toxins: Fine particulate matter, nitrogen dioxide, and ozone. The data is categorized by UHF34 neighborhood for years 2009 to 2018.

There are 330 rows and 6 columns.

Field Type Description
name_of_column The python consumable format Brief description of the field. If the field follows a specific format (e.g. a specific date format) include that here too.
year STRING Year
Borough STRING NYC Borough Name associated with UHF 34 neighborhood
geo_place_name STRING UHF 34 Neighborhood name
mean_fpm Float Average yearly amount of fine particulate matter
mean_no Float Average yearly amount of nitrogen dioxide
Ozone mean (ppb) Float Average yearly amount of ozone

Dataset: airq_42_all

Contains data about the average amounts of toxins: ?ine particulate matter, nitrogen dioxide, and ozone. The data is categorized by UHF42 neighborhood for years 2009 to 2018.

There are 420 rows and 6 columns.

Field Type Description
name_of_column The python consumable format Brief description of the field. If the field follows a specific format (e.g. a specific date format) include that here too.
year STRING Year
Borough STRING NYC Borough Name associated with UHF 42 neighborhood
geo_place_name STRING UHF 42 Neighborhood name
mean_fpm Float Average yearly amount of fine particulate matter
mean_no Float Average yearly amount of nitrogen dioxide
Ozone mean (ppb) Float Average yearly amount of ozone

Dataset: benzene_42

Contains data about the average concentration of benzene in the air.The data is categorized by UHF42 neighborhood for years 2005 and 2011.

There are 84 rows and 3 columns.

Field Type Description
name_of_column The python consumable format Brief description of the field. If the field follows a specific format (e.g. a specific date format) include that here too.
year STRING Year
geo_place_name STRING UHF 34 Neighborhood name
mean_benzene Float Average yearly concentration of benzene in the air

Dataset: formaldehyde_42

Contains data about the average concentration of formal dehyde in the air.The data is categorized by UHF42 neighborhood for years 2005 and 2011.

There are 84 rows and 3 columns.

Field Type Description
name_of_column The python consumable format Brief description of the field. If the field follows a specific format (e.g. a specific date format) include that here too.
year STRING Year
geo_place_name STRING UHF 34 Neighborhood name
mean_formaldehyde Float Average yearly concentration of benzene in the air

Dataset: boiler_emissions

Contains data about the average boiler emissions of toxins nitrogen dioxide,sulfurdioxide and ine particulate matter.The data is categorized by UHF42 neighborhood for years 2013 and 2015.

There are 84 rows and 6 columns.

Field Type Description
name_of_column The python consumable format Brief description of the field. If the field follows a specific format (e.g. a specific date format) include that here too.
year STRING Year
geo_place_name STRING UHF 42 Neighborhood name
nox_num_per_km2 Float Number of emissions per kilometer squared
so2_num_per_km2 Float Number of emissions per kilometer squared
pm2_num_per_km2 Float Number of emissions per kilometer squared

Dataset: sulfur_34

Contains data about the average amount of sulfurdioxide in the air.The data is categorized by UHF34 neighborhood for years 2008-2015.

There are 272 rows and 3 columns.

Field Type Description
name_of_column The python consumable format Brief description of the field. If the field follows a specific format (e.g. a specific date format) include that here too.
year STRING Year
geo_place_name STRING UHF 34 Neighborhood name
mean_so2 Float Average yearly amount of sulfur

Dataset: sulfur_42

Contains data about the average amount of sulfurdioxide in the air. The data is categorized by UHF42 neighborhood for years 2008 and 2015.

There are 336 rows and 3 columns.

Field Type Description
name_of_column The python consumable format Brief description of the field. If the field follows a specific format (e.g. a specific date format) include that here too.
year STRING Year
geo_place_name STRING UHF 42 Neighborhood name
mean_so2 Float Average yearly amount of sulfur

Dataset: o3_pm2_attributable_hospital_visits

Contains data about the number of emergency department visits and hospitalizations for asthma attributed to ?ine particulate matter and ozone toxins. The data is categorized by UHF 42 neighborhood. The data is categorized in the following time periods: 2005-2007, 2009 - 2011, 2012-2014, 2015-2017.

There are 168 rows and 9 columns.

Field Type Description
name_of_column The python consumable format Brief description of the field. If the field follows a specific format (e.g. a specific date format) include that here too.
Time Period STRING Two year range
Start_Date STRING Start date for time period
geo_place_name STRING UHF 42 Neighborhood name
child_o3_asthma_hos pital_per_100k Float Rate of hospitalizations for asthma in children attributed to ozone out of 100,000
adult_o3_asthma_ho spital_per_100k Float Rate of hospitalizations for asthma in adults attributed to ozone out of 100,000
adult_pm2_asthma_e d_visits_per_100k Float Rate of emergency department visits for asthma in adults attributed to fine particulate matter out of 100,000
child_pm2_asthma_e d_visits_per_100k Float Rate of emergency department visits for asthma in children attributed to fine particulate matter out of 100,000
adult_o3_asthma_ed_ visits_per_100k Float Rate of emergency department visits for asthma in adults attributed to ozone out of 100,000
child_o3_asthma_ed_ visits_per_100k Float Rate of emergency department visits for asthma in children attributed to ozone out of 100,001

Dataset: traffic_merged

Contains data about the number of miles driven by cars and trucks in UHF42 neighborhoods.The data covers years 2005 and 2016.

There are 84 rows and 6 columns.

Field Type Description
name_of_column The python consumable format Brief description of the field. If the field follows a specific format (e.g. a specific date format) include that here too.
year STRING Year
geo_place_name STRING UHF 42 Neighborhood name
cars_million_miles Float Number of miles traveled by cars in millions
trucks_million_miles Float Number of miles traveled by trucks in millions
total_million_miles Float Sum of miles traveled by cars and trucks in millions

Dataset: adult_smoking_joined_UHF34_CLEAN

This is data related to adults smoking and being in smoking environments. Additional meta data was dropped. Data was converted to numeric values to allow for appropriate usage.

There are 120 rows and 10 columns.

Field Type Description
Year Category Year of data, in format yyyy
geo_type_name Category Granularity level of geography category
borough Category Borough for data
secondhand_smoke_home _adult_count FLOAT Count of adults reporting secondhand smoke at home
secondhand_smoke_home _adult_percent FLOAT Percent of adults reporting secondhand smoke at home
smoking_adults_count FLOAT Count of adults reporting smoking
smoking_adults_percent FLOAT Percent of adults reporting smoking
secondhand_smoke_work_ adult_count FLOAT Count of adults reporting secondhand smoke at work
secondhand_smoke_wor k_adult_percent FLOAT Percent of adults reporting secondhand smoke at work

Dataset: NYC_SDOH

The social determinants of health (SDH) are the non-medical factors that in?luence health outcomes. They are the conditions in which people are born, grow, work, live, and age, and the wider set of forces and systems shaping the conditions of daily life. Variables in the SDOH database correspond to the 5 key domains identi?ied by AHRQ: social context, economic context, education, physical infrastructure, and healthcare context. In addition to these domains, there is a category for Geography, which includes ID variables (County, FIPS code, ZCTA, State, and Year) as well as 14 county adjacency variables and urban/rural codes. Data was cleaned based on the values available for the 5 ?ive counties of New York City for 2009-2018. Counties: Brooklyn County - The Bronx, Kings County - Brooklyn, New York County - Manhattan, Queens County - Queens, Richmond County - Staten Island.

There are 51 rows and 231 columns.

Field Type Description
COUNTY STRING County name
FIPSCODE INTEGER State-county FIPS code, 5 digits (County only)
YEAR DATE The year the data is from
ACS_PCT_AGE_65UP FLOAT Percentage of population age 65 and over
ACS_PCT_AGE_0_17 FLOAT Percentage of population age 0-17
ACS_PCT_AGE_15_17 FLOAT Percentage of population age 15-17
ACS_PCT_AGE_0_4 FLOAT Percentage of population age 0-4
etc. - full descrip on in NYC_SDOH_dic onary
NYC_SDOH_dictionary

Dataset: NYC_SDOH_dictionary

Contains information for researchers about the structure and contents of the database and descriptions of each data source used to populate the database.

There are 236 rows and 4 columns.

Dataset: Asthma_ED_Visits

Asthma emergency room visits for NYC residents. Data cleaned based on “lowest common denominator” or based on the least detailed data set which was the SDOH data set that contained more general data gathered based on county rather than individual UHF 42 neighbourhood. The average of the total ED visits from all neighborhoods in each county was taken and organized by year. The age-adjusted rate (for adults only, per 10000 residents) and estimated annual rate (per 10000 residents) from all counties was taken and organized by year. Asthma ED Visit data was only taken/available for the years 2009-2016 with no data available per county for the year 2015.

There are 106 rows and 6 columns.

Field Type Description
COUNTY STRING County name
YEAR DATE Year of data collection
INDICATOR_NAME STRING Population name by age. i.e. Adults (18+), Children (0-4), Children (5-17)
NUMBER INTEGER Average of the total ED visits from all neighborhoods in each county.
AGE_ADJUSTED_RA TE FLOAT Number of ED visits per country adjusted for population older than 18 years (adults), per 10,000 residents.
ESTIMATED_ANNUA L_RATE FLOAT Number of ED visits per country adjusted for population older than 18 years, per 10,000 residents for that year.

Dataset: Indoor_air_quality_all

Dataset contains resident reported complaints on indoor air quality. Complaints are tabulated individually per report; report dates range from 2010 to 2021. The included columns: Borough, geo_place_name, zip code, longitude and latitude, are used to identify location.

There are 65050 rows and 7 columns.

Field Type Description
name_of_column The python consumable format Brief description of the field. If the field follows a specific format (e.g. a specific date format) include that here too.
Year STRING Year
Borough STRING NYC Borough Name associated with UHF 42 neighborhood
geo_place_name STRING UHF 42 Neighborhood name
Zip code STRING Zip code of complaint address
Complaint type STRING Type of complaint reported by residents
Longitude FLOAT Longitude of complaint location
Latitude FLOAT Latitude of complaint location

Dataset: Adultswith Asthma in the Past 12 Months.csv

Description: Prevelance of adults with asthma in the past 12 months. Listed by NYC UHF Neighborhoods and year. I removed metadata, removed commas, changed column names, made everything lowercase, and changed the datatypes to the appropriate datatypes for each column.

There are 521 rows and 8 columns.

Field Type Description
year category The year of that data point
geo_type_name cateogry The type of geography for that data point (ex. Citywide, Neighborhood, Borough)
borough category Name of borough in NYC
geography category The most specific geographical location
geography_id category Unique geographical ID for every geographical location
adults_12mo_asthma_ag e_adjusted_percent float Percentage of adults with asthma adjusted for age
adults_12mo_asthma_nu mber float Number of adults with asthma
adults_12mo_asthma_pe rcent float Percentage of adults with asthma

Dataset: Public School Children (5-14 YrsOld) with Asthma.csv

Description: Prevelance of public school children from ages 5-14 with asthma. Listed by NYC UHF Neighborhoods and year. I removed metadata, removed commas, changed column names, made everything lowercase, and changed the datatypes to the appropriate datatypes for each column

There are 193 rows and 7 columns.

Field Type Description
year category The year of that data point
geo_type_name cateogry The type of geography for that data point (ex. Citywide, Neighborhood, Borough)
borough category Name of borough in NYC
geography category The most specific geographical location
geography_id category Unique geographical ID for every geographical location
children_5_14_estimated _annual_rate_per_1000 float Rate of children age 5-14 with asthma (per 1000)
children_5_14_number float Number of children age 5-14 with asthma

Dataset: Asthma Emergency Department Visits(Adults).csv

Description: Asthma related emergency department visits for adults. Listed by NYC UHF Neighborhoods and year. I removed metadata, removed commas, changed column names, made everything lowercase, and changed the datatypes to the appropriate datatypes for each column.

There are 530 rows and 8 columns.

Field Type Description
geo_type_name cateogry The type of geography for that data point (ex. Citywide, Neighborhood, Borough)
borough category Name of borough in NYC
geography category The most specific geographical location
geography_id category Unique geographical ID for every geographical location
ed_annual_adult_estima ted_age_adjusted_rate_ per10k float Age adjusted rate of adults (per10,000) that visited the emergency department for asthma
ed_annual_adult_rate_p er10k float Rate of adults (per10,000) that visited the emergency department for asthma
ed_annual_adult_numbe r float Number of adults that visited the emergency department for asthma
year category The year of that data point

Dataset: Asthma Emergency Department Visits(Children 5 to 17 YrsOld).csv

Description: Asthma related emergency department visits for children 5-17 years old. Listed by NYC UHF Neighborhoods and year. I removed metadata, removed commas, changed column names, made everything lowercase, and changed the datatypes to the appropriate datatypes for each column.

There are 577 rows and 7 columns.

Field Type Description
geo_type_name cateogry The type of geography for that data point (ex. Citywide, Neighborhood, Borough)
borough category Name of borough in NYC
geography category The most specific geographical location
geography_id category Unique geographical ID for every geographical location
ed_annual_5_17_rate_pe r10k float Rate of children 5-17 years old (per10,000) that visited the emergency department for asthma
ed_5_17_number float Number of children 5-17 years old that visited the emergency department for asthma
year category The year of that data point

Dataset: Asthma Hospitalizations(Adults).csv

Description: Number of adults hospitalized for asthma. Listed by NYC UHF Neighborhoods and year. I removed metadata, removed commas, changed column names, made everything lowercase, and changed the datatypes to the appropriate datatypes for each column.

There are 530 rows and 8 columns.

Field Type Description
geo_type_name cateogry The type of geography for that data point (ex. Citywide, Neighborhood, Borough)
borough category Name of borough in NYC
geography category The most specific geographical location
geography_id category Unique geographical ID for every geographical location
asthma_hosp_adult_esti mated__age_adjusted_r ate_per10k float Age adjusted rate of adults (per10,000) that were hospitalized for asthma
asthma_hosp_adult_esti mated__rate_per10k float Rate of adults (per10,000) that were hospitalized for asthma
asthma_hosp_adult_nu mber float Number of adults that were hospitalized for asthma
year category The year of that data point

Dataset: AsthmaHospitalizations(Children5to17YrsOld).csv

Description: Number of children 5-17 years old hospitalized for asthma. Listed by NYC UHF Neighborhoods and year. I removed metadata, removed commas, changed column names, made everything lowercase, and changed the datatypes to the appropriate datatypes for each column.

There are 577 rows and 7 columns.

Field Type Description
geo_type_name cateogry The type of geography for that data point (ex. Citywide, Neighborhood, Borough)
borough category Name of borough in NYC
geography category The most specific geographical location
geography_id category Unique geographical ID for every geographical location
asthma_hosp_5_17_esti mated_annual_rate_per _10000 float Rate of children 5-17 years old (per10,000) that were hospitalized for asthma
asthma_hosp_5_17_num ber float Number of children 5-17 years old hospitalized for asthma
year category The year of that data point

Dataset: MedianHouseholdIncomeByRacebyTract,2012-2016

Contains data about the average household income organized by race in different state and regional levels the data is organize by year from 2012 to 2016 Having over 30 ?ields I will be organizing and cleaning up including below what seems more relevant to our research

There are 72730 rows and 33 columns.

Field Type Description
name_of_column The python consumable format Brief description of the field. If the field follows a specific format (e.g. a specific date format) include that here too.
STATE_NAME STRING Ex: Maryland or New York
ST_ABBREV STRING EX: MD or NY
Median Household Income in Past 12 Months, Some Other Race Householder - Estimate FLOAT Calculates the median income over a year for a specific group – these are estimates from public census data
Median Household Income in Past 12 Months - Estimate FLOAT Calculates the median income over a year for a specific group – these are estimates from public census data
Median Household Income in Past 12 Months, 2 or More Races Householder - Estimate FLOAT Calculates the median income over a year for a specific group – these are estimates from public census data
Median Household Income in Past 12 Months, American Indian and Alaska Native Householder - Estimate FLOAT Calculates the median income over a year for a specific group – these are estimates from public census data
Median Household Income in Past 12 Months, Asian Householder - Estimate FLOAT Calculates the median income over a year for a specific group – these are estimates from public census data
Median Household Income in Past 12 Months, Black or African American Householder - Estimate FLOAT Calculates the median income over a year for a specific group – these are estimates from public census data
Median Household Income in Past 12 Months, Hispanic or Latino Householder - Estimate FLOAT Calculates the median income over a year for a specific group – these are estimates from public census data
Median Household Income in Past 12 Months, Native Hawaiian and Other Pacific Islander Householder - Estimate FLOAT Calculates the median income over a year for a specific group – these are estimates from public census data
Median Household Income in Past 12 Months, Non-Hispanic White Householder – Estimate FLOAT Calculates the median income over a year for a specific group – these are estimates from public census data

About

Correlation One data science project on NY City asthma rate disparities.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published