GitHub - Hevander27/AsthmaAnalysis: Correlation One data science project on NY City asthma rate disparities.

Asthma Disparities in New York City

Purpose
Analysis
Report and Presentation
Data Table Schema

Purpose

This project was the capstone of the Correlation One Data Science for All program. The purpose of this project was to analyze New York City data on asthma contributors and Social Determinants of Health to uncover what potentially drives asthma disparity in this city. Although there are a wide variety of potential asthma contributors, for this project focused on indoor and outdoor air quality because they are widely believed to be the main contributors.

Analysis

Regression Analyis

Chi-Squared Analysis

Report and Presentation

Full Report: What Contributes to Asthma Disparity in New York City

Power Point: Team 45 Presentation

Data Summary: Data | Source Links

Data Table Schema

Dataset: airq_34_all

Contains data about the average amounts of toxins: Fine particulate matter, nitrogen dioxide, and ozone. The data is categorized by UHF34 neighborhood for years 2009 to 2018.

There are 330 rows and 6 columns.

Field	Type	Description
name_of_column	The python consumable format	Brief description of the field. If the field follows a specific format (e.g. a specific date format) include that here too.
year	STRING	Year
Borough	STRING	NYC Borough Name associated with UHF 34 neighborhood
geo_place_name	STRING	UHF 34 Neighborhood name
mean_fpm	Float	Average yearly amount of fine particulate matter
mean_no	Float	Average yearly amount of nitrogen dioxide
Ozone mean (ppb)	Float	Average yearly amount of ozone

Dataset: airq_42_all

Contains data about the average amounts of toxins: ?ine particulate matter, nitrogen dioxide, and ozone. The data is categorized by UHF42 neighborhood for years 2009 to 2018.

There are 420 rows and 6 columns.

Field	Type	Description
name_of_column	The python consumable format	Brief description of the field. If the field follows a specific format (e.g. a specific date format) include that here too.
year	STRING	Year
Borough	STRING	NYC Borough Name associated with UHF 42 neighborhood
geo_place_name	STRING	UHF 42 Neighborhood name
mean_fpm	Float	Average yearly amount of fine particulate matter
mean_no	Float	Average yearly amount of nitrogen dioxide
Ozone mean (ppb)	Float	Average yearly amount of ozone

Dataset: benzene_42

Contains data about the average concentration of benzene in the air.The data is categorized by UHF42 neighborhood for years 2005 and 2011.

There are 84 rows and 3 columns.

Field	Type	Description
name_of_column	The python consumable format	Brief description of the field. If the field follows a specific format (e.g. a specific date format) include that here too.
year	STRING	Year
geo_place_name	STRING	UHF 34 Neighborhood name
mean_benzene	Float	Average yearly concentration of benzene in the air

Dataset: formaldehyde_42

Contains data about the average concentration of formal dehyde in the air.The data is categorized by UHF42 neighborhood for years 2005 and 2011.

There are 84 rows and 3 columns.

Field	Type	Description
name_of_column	The python consumable format	Brief description of the field. If the field follows a specific format (e.g. a specific date format) include that here too.
year	STRING	Year
geo_place_name	STRING	UHF 34 Neighborhood name
mean_formaldehyde	Float	Average yearly concentration of benzene in the air

Dataset: boiler_emissions

Contains data about the average boiler emissions of toxins nitrogen dioxide,sulfurdioxide and ine particulate matter.The data is categorized by UHF42 neighborhood for years 2013 and 2015.

There are 84 rows and 6 columns.

Field	Type	Description
name_of_column	The python consumable format	Brief description of the field. If the field follows a specific format (e.g. a specific date format) include that here too.
year	STRING	Year
geo_place_name	STRING	UHF 42 Neighborhood name
nox_num_per_km2	Float	Number of emissions per kilometer squared
so2_num_per_km2	Float	Number of emissions per kilometer squared
pm2_num_per_km2	Float	Number of emissions per kilometer squared

Dataset: sulfur_34

Contains data about the average amount of sulfurdioxide in the air.The data is categorized by UHF34 neighborhood for years 2008-2015.

There are 272 rows and 3 columns.

Field	Type	Description
name_of_column	The python consumable format	Brief description of the field. If the field follows a specific format (e.g. a specific date format) include that here too.
year	STRING	Year
geo_place_name	STRING	UHF 34 Neighborhood name
mean_so2	Float	Average yearly amount of sulfur

Dataset: sulfur_42

Contains data about the average amount of sulfurdioxide in the air. The data is categorized by UHF42 neighborhood for years 2008 and 2015.

There are 336 rows and 3 columns.

Field	Type	Description
name_of_column	The python consumable format	Brief description of the field. If the field follows a specific format (e.g. a specific date format) include that here too.
year	STRING	Year
geo_place_name	STRING	UHF 42 Neighborhood name
mean_so2	Float	Average yearly amount of sulfur

Dataset: o3_pm2_attributable_hospital_visits

Contains data about the number of emergency department visits and hospitalizations for asthma attributed to ?ine particulate matter and ozone toxins. The data is categorized by UHF 42 neighborhood. The data is categorized in the following time periods: 2005-2007, 2009 - 2011, 2012-2014, 2015-2017.

There are 168 rows and 9 columns.

Field	Type	Description
name_of_column	The python consumable format	Brief description of the field. If the field follows a specific format (e.g. a specific date format) include that here too.
Time Period	STRING	Two year range
Start_Date	STRING	Start date for time period
geo_place_name	STRING	UHF 42 Neighborhood name
child_o3_asthma_hos pital_per_100k	Float	Rate of hospitalizations for asthma in children attributed to ozone out of 100,000
adult_o3_asthma_ho spital_per_100k	Float	Rate of hospitalizations for asthma in adults attributed to ozone out of 100,000
adult_pm2_asthma_e d_visits_per_100k	Float	Rate of emergency department visits for asthma in adults attributed to fine particulate matter out of 100,000
child_pm2_asthma_e d_visits_per_100k	Float	Rate of emergency department visits for asthma in children attributed to fine particulate matter out of 100,000
adult_o3_asthma_ed_ visits_per_100k	Float	Rate of emergency department visits for asthma in adults attributed to ozone out of 100,000
child_o3_asthma_ed_ visits_per_100k	Float	Rate of emergency department visits for asthma in children attributed to ozone out of 100,001

Dataset: traffic_merged

Contains data about the number of miles driven by cars and trucks in UHF42 neighborhoods.The data covers years 2005 and 2016.

There are 84 rows and 6 columns.

Field	Type	Description
name_of_column	The python consumable format	Brief description of the field. If the field follows a specific format (e.g. a specific date format) include that here too.
year	STRING	Year
geo_place_name	STRING	UHF 42 Neighborhood name
cars_million_miles	Float	Number of miles traveled by cars in millions
trucks_million_miles	Float	Number of miles traveled by trucks in millions
total_million_miles	Float	Sum of miles traveled by cars and trucks in millions

Dataset: adult_smoking_joined_UHF34_CLEAN

This is data related to adults smoking and being in smoking environments. Additional meta data was dropped. Data was converted to numeric values to allow for appropriate usage.

There are 120 rows and 10 columns.

Field	Type	Description
Year	Category	Year of data, in format yyyy
geo_type_name	Category	Granularity level of geography category
borough	Category	Borough for data
secondhand_smoke_home _adult_count	FLOAT	Count of adults reporting secondhand smoke at home
secondhand_smoke_home _adult_percent	FLOAT	Percent of adults reporting secondhand smoke at home
smoking_adults_count	FLOAT	Count of adults reporting smoking
smoking_adults_percent	FLOAT	Percent of adults reporting smoking
secondhand_smoke_work_ adult_count	FLOAT	Count of adults reporting secondhand smoke at work
secondhand_smoke_wor k_adult_percent	FLOAT	Percent of adults reporting secondhand smoke at work

Dataset: NYC_SDOH

The social determinants of health (SDH) are the non-medical factors that in?luence health outcomes. They are the conditions in which people are born, grow, work, live, and age, and the wider set of forces and systems shaping the conditions of daily life. Variables in the SDOH database correspond to the 5 key domains identi?ied by AHRQ: social context, economic context, education, physical infrastructure, and healthcare context. In addition to these domains, there is a category for Geography, which includes ID variables (County, FIPS code, ZCTA, State, and Year) as well as 14 county adjacency variables and urban/rural codes. Data was cleaned based on the values available for the 5 ?ive counties of New York City for 2009-2018. Counties: Brooklyn County - The Bronx, Kings County - Brooklyn, New York County - Manhattan, Queens County - Queens, Richmond County - Staten Island.

There are 51 rows and 231 columns.

Field	Type	Description
COUNTY	STRING	County name
FIPSCODE	INTEGER	State-county FIPS code, 5 digits (County only)
YEAR	DATE	The year the data is from
ACS_PCT_AGE_65UP	FLOAT	Percentage of population age 65 and over
ACS_PCT_AGE_0_17	FLOAT	Percentage of population age 0-17
ACS_PCT_AGE_15_17	FLOAT	Percentage of population age 15-17
ACS_PCT_AGE_0_4	FLOAT	Percentage of population age 0-4
etc. - full descrip on in NYC_SDOH_dic onary
NYC_SDOH_dictionary

Dataset: NYC_SDOH_dictionary

Contains information for researchers about the structure and contents of the database and descriptions of each data source used to populate the database.

There are 236 rows and 4 columns.

Dataset: Asthma_ED_Visits

Asthma emergency room visits for NYC residents. Data cleaned based on “lowest common denominator” or based on the least detailed data set which was the SDOH data set that contained more general data gathered based on county rather than individual UHF 42 neighbourhood. The average of the total ED visits from all neighborhoods in each county was taken and organized by year. The age-adjusted rate (for adults only, per 10000 residents) and estimated annual rate (per 10000 residents) from all counties was taken and organized by year. Asthma ED Visit data was only taken/available for the years 2009-2016 with no data available per county for the year 2015.

There are 106 rows and 6 columns.

Field	Type	Description
COUNTY	STRING	County name
YEAR	DATE	Year of data collection
INDICATOR_NAME	STRING	Population name by age. i.e. Adults (18+), Children (0-4), Children (5-17)
NUMBER	INTEGER	Average of the total ED visits from all neighborhoods in each county.
AGE_ADJUSTED_RA TE	FLOAT	Number of ED visits per country adjusted for population older than 18 years (adults), per 10,000 residents.
ESTIMATED_ANNUA L_RATE	FLOAT	Number of ED visits per country adjusted for population older than 18 years, per 10,000 residents for that year.

Dataset: Indoor_air_quality_all

Dataset contains resident reported complaints on indoor air quality. Complaints are tabulated individually per report; report dates range from 2010 to 2021. The included columns: Borough, geo_place_name, zip code, longitude and latitude, are used to identify location.

There are 65050 rows and 7 columns.

Field	Type	Description
name_of_column	The python consumable format	Brief description of the field. If the field follows a specific format (e.g. a specific date format) include that here too.
Year	STRING	Year
Borough	STRING	NYC Borough Name associated with UHF 42 neighborhood
geo_place_name	STRING	UHF 42 Neighborhood name

Zip code	STRING	Zip code of complaint address
Complaint type	STRING	Type of complaint reported by residents
Longitude	FLOAT	Longitude of complaint location
Latitude	FLOAT	Latitude of complaint location

Dataset: Adultswith Asthma in the Past 12 Months.csv

Description: Prevelance of adults with asthma in the past 12 months. Listed by NYC UHF Neighborhoods and year. I removed metadata, removed commas, changed column names, made everything lowercase, and changed the datatypes to the appropriate datatypes for each column.

There are 521 rows and 8 columns.

Field	Type	Description
year	category	The year of that data point
geo_type_name	cateogry	The type of geography for that data point (ex. Citywide, Neighborhood, Borough)
borough	category	Name of borough in NYC
geography	category	The most specific geographical location
geography_id	category	Unique geographical ID for every geographical location
adults_12mo_asthma_ag e_adjusted_percent	float	Percentage of adults with asthma adjusted for age
adults_12mo_asthma_nu mber	float	Number of adults with asthma
adults_12mo_asthma_pe rcent	float	Percentage of adults with asthma

Dataset: Public School Children (5-14 YrsOld) with Asthma.csv

Description: Prevelance of public school children from ages 5-14 with asthma. Listed by NYC UHF Neighborhoods and year. I removed metadata, removed commas, changed column names, made everything lowercase, and changed the datatypes to the appropriate datatypes for each column

There are 193 rows and 7 columns.

Field	Type	Description
year	category	The year of that data point
geo_type_name	cateogry	The type of geography for that data point (ex. Citywide, Neighborhood, Borough)
borough	category	Name of borough in NYC
geography	category	The most specific geographical location
geography_id	category	Unique geographical ID for every geographical location
children_5_14_estimated _annual_rate_per_1000	float	Rate of children age 5-14 with asthma (per 1000)
children_5_14_number	float	Number of children age 5-14 with asthma

Dataset: Asthma Emergency Department Visits(Adults).csv

Description: Asthma related emergency department visits for adults. Listed by NYC UHF Neighborhoods and year. I removed metadata, removed commas, changed column names, made everything lowercase, and changed the datatypes to the appropriate datatypes for each column.

There are 530 rows and 8 columns.

Field	Type	Description
geo_type_name	cateogry	The type of geography for that data point (ex. Citywide, Neighborhood, Borough)
borough	category	Name of borough in NYC
geography	category	The most specific geographical location
geography_id	category	Unique geographical ID for every geographical location
ed_annual_adult_estima ted_age_adjusted_rate_ per10k	float	Age adjusted rate of adults (per10,000) that visited the emergency department for asthma
ed_annual_adult_rate_p er10k	float	Rate of adults (per10,000) that visited the emergency department for asthma
ed_annual_adult_numbe r	float	Number of adults that visited the emergency department for asthma
year	category	The year of that data point

Dataset: Asthma Emergency Department Visits(Children 5 to 17 YrsOld).csv

Description: Asthma related emergency department visits for children 5-17 years old. Listed by NYC UHF Neighborhoods and year. I removed metadata, removed commas, changed column names, made everything lowercase, and changed the datatypes to the appropriate datatypes for each column.

There are 577 rows and 7 columns.

Field	Type	Description
geo_type_name	cateogry	The type of geography for that data point (ex. Citywide, Neighborhood, Borough)
borough	category	Name of borough in NYC
geography	category	The most specific geographical location
geography_id	category	Unique geographical ID for every geographical location
ed_annual_5_17_rate_pe r10k	float	Rate of children 5-17 years old (per10,000) that visited the emergency department for asthma
ed_5_17_number	float	Number of children 5-17 years old that visited the emergency department for asthma
year	category	The year of that data point

Dataset: Asthma Hospitalizations(Adults).csv

Description: Number of adults hospitalized for asthma. Listed by NYC UHF Neighborhoods and year. I removed metadata, removed commas, changed column names, made everything lowercase, and changed the datatypes to the appropriate datatypes for each column.

There are 530 rows and 8 columns.

Field	Type	Description
geo_type_name	cateogry	The type of geography for that data point (ex. Citywide, Neighborhood, Borough)
borough	category	Name of borough in NYC
geography	category	The most specific geographical location
geography_id	category	Unique geographical ID for every geographical location
asthma_hosp_adult_esti mated__age_adjusted_r ate_per10k	float	Age adjusted rate of adults (per10,000) that were hospitalized for asthma
asthma_hosp_adult_esti mated__rate_per10k	float	Rate of adults (per10,000) that were hospitalized for asthma
asthma_hosp_adult_nu mber	float	Number of adults that were hospitalized for asthma
year	category	The year of that data point

Dataset: AsthmaHospitalizations(Children5to17YrsOld).csv

Description: Number of children 5-17 years old hospitalized for asthma. Listed by NYC UHF Neighborhoods and year. I removed metadata, removed commas, changed column names, made everything lowercase, and changed the datatypes to the appropriate datatypes for each column.

There are 577 rows and 7 columns.

Field	Type	Description
geo_type_name	cateogry	The type of geography for that data point (ex. Citywide, Neighborhood, Borough)
borough	category	Name of borough in NYC
geography	category	The most specific geographical location
geography_id	category	Unique geographical ID for every geographical location
asthma_hosp_5_17_esti mated_annual_rate_per _10000	float	Rate of children 5-17 years old (per10,000) that were hospitalized for asthma
asthma_hosp_5_17_num ber	float	Number of children 5-17 years old hospitalized for asthma
year	category	The year of that data point

Dataset: MedianHouseholdIncomeByRacebyTract,2012-2016

Contains data about the average household income organized by race in different state and regional levels the data is organize by year from 2012 to 2016 Having over 30 ?ields I will be organizing and cleaning up including below what seems more relevant to our research

There are 72730 rows and 33 columns.

Field	Type	Description
name_of_column	The python consumable format	Brief description of the field. If the field follows a specific format (e.g. a specific date format) include that here too.
STATE_NAME	STRING	Ex: Maryland or New York
ST_ABBREV	STRING	EX: MD or NY
Median Household Income in Past 12 Months, Some Other Race Householder - Estimate	FLOAT	Calculates the median income over a year for a specific group – these are estimates from public census data
Median Household Income in Past 12 Months - Estimate	FLOAT	Calculates the median income over a year for a specific group – these are estimates from public census data
Median Household Income in Past 12 Months, 2 or More Races Householder - Estimate	FLOAT	Calculates the median income over a year for a specific group – these are estimates from public census data

Median Household Income in Past 12 Months, American Indian and Alaska Native Householder - Estimate	FLOAT	Calculates the median income over a year for a specific group – these are estimates from public census data
Median Household Income in Past 12 Months, Asian Householder - Estimate	FLOAT	Calculates the median income over a year for a specific group – these are estimates from public census data
Median Household Income in Past 12 Months, Black or African American Householder - Estimate	FLOAT	Calculates the median income over a year for a specific group – these are estimates from public census data
Median Household Income in Past 12 Months, Hispanic or Latino Householder - Estimate	FLOAT	Calculates the median income over a year for a specific group – these are estimates from public census data
Median Household Income in Past 12 Months, Native Hawaiian and Other Pacific Islander Householder - Estimate	FLOAT	Calculates the median income over a year for a specific group – these are estimates from public census data
Median Household Income in Past 12 Months, Non-Hispanic White Householder – Estimate	FLOAT	Calculates the median income over a year for a specific group – these are estimates from public census data

Name		Name	Last commit message	Last commit date
Latest commit History 109 Commits
AirData		AirData
RegressionAnalysis		RegressionAnalysis
AsthmaRates_chi2analysis.ipynb		AsthmaRates_chi2analysis.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Asthma Disparities in New York City

Purpose

Analysis

Report and Presentation

Data Table Schema

About

Uh oh!

Releases

Packages

Languages

Hevander27/AsthmaAnalysis

Folders and files

Latest commit

History

Repository files navigation

Asthma Disparities in New York City

Purpose

Analysis

Report and Presentation

Data Table Schema

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages