-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathwriteup.Rmd
More file actions
18 lines (12 loc) · 6.61 KB
/
writeup.Rmd
File metadata and controls
18 lines (12 loc) · 6.61 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
---
title: "writeup"
author: "Jose Serrano"
date: "2024-12-07"
output: pdf_document
---
For my project, I decided to investigate how participation in the Community Eligibility Provision (CEP), a universal free school meal program, impacts chronic absenteeism and truancy in Illinois public schools. Specifically, it examines whether schools participating in CEP differ from non-CEP schools in absenteeism and truancy rates across varying levels of poverty. To address this question, I combined and analyzed several data sets from three sources. I obtained Illinois school-level data from 2019 from the Illinois State Board of Education. This data set contains many variables such as rates of chronic absenteeism and chronic truancy from almost every public school in the state. The second data set, which I also obtained from the Illinois State Board of Education, contains 2019 school-level CEP participation and eligibility data. Lastly, I used 2019 poverty metrics for school districts in Illinois from the American Community Survey retrieved via the US Census API. The reason I chose 2019 data was twofold. First, the most recent year for which data exists for all three data sets was 2019. Second, due to USDA flexibility in program operations during the COVID-19 pandemic, CEP data from 2021 and 2022 does not exist.
The data sets were cleaned, merged, and analyzed to identify trends and relationships. Poverty levels were categorized based on federal Free and Reduced-Price Lunch (FRPL) definitions to group schools into four categories: low poverty, mid-low poverty, mid-high poverty, and high poverty. To make these groups, I used percent of low income students enrolled. Doing so would allow me to compare schools that are similar in poverty levels. Using the combined Illinois State Board of Education data sets, I created several bar plots to visualize the differences in chronic absenteeism and truancy between CEP participating and non-CEP participating schools. I also conducted two multiple linear regression models to investigate the effects of CEP participation, poverty level, and Title 1 status on chronic absenteeism and truancy rates. Using the AcS data, I created two interactive Shiny dashboards. The first one is a choropleth map of all school districts in Illinois showing family poverty rates in each district. The purpose of this map is to contextualize poverty in school districts across Illinois and visualize where a program like the CEP would be best implemented. The second interactive dashboard allows the user to choose different plots to visualize chronic absenteeism and truancy rates. Lastly, for my text analysis, I created a web scraper to scrape media coverage on the CEP program and generally school lunches from AP News. I then used that data to create a sentiment analysis and word frequency bar plot to analyze overall perceptions of school lunch programs.
The project was conducted using the latest version of R. Data cleaning involved the use of the dplyr package to standardize column names, remove irrelevant rows, and handle missing values. Merging school-level data proved to be one of the most challenging aspects of the project. The data sets contained inconsistent identifiers such as district names, school names, and unique IDs. For example, both data sets contained a Region County District Type Schools (RCDTS) code for each school. However, the CEP data set only contained the Region, County, District, and Type portion of the identifier for each school, making it impossible to merge the data sets based on that alone. Two successfully merge the data I had to use an imperfect combination of school name, district name, and city name. Visualization was another key component of the project, with static plots created using ggplot2 and interactive maps built using leaflet. Another issue that I ran into was that the shapefiles required for mapping were too large for Github hosting, necessitating a workaround by hosting the files on Google Drive. The code automatically detects if the user has the required files in their directory and prompts the user with a link to download the files from Google Drive if they do not. For the text analysis, I used rvest to scrape AP News articles related to CEP and conducted sentiment and word frequency analysis using tidytext. This part of the project also had some difficulties. Scraping articles from AP News posed challenges due to changes in URL structures and also recurring boilerplate text that distorted text analysis results.
Despite the challenges, the analysis provided several interesting findings. Non-CEP schools consistently demonstrated lower absenteeism and truancy rates compared to cEP schools, even after controlling for poverty level, Title 1 status, and percentage of low-income students. The pattern was particularly pronounced in low-poverty schools, where CEP participation was associated with significantly higher truancy rates. On the other hand, schools with higher percentages of low-income students generally exhibited worse attendance outcomes, emphasizing the persistent link between economic disadvantage and absenteeism. Title 1 funding also correlated with lower rates of absenteeism and truancy. The text analysis found that messaging was overall very similar in terms of positive and negative words, with negative words slightly above positive words. Word frequency results showed
However, the limitations of this study cannot be overstated. The analysis likely suffers from significant selection bias, as CEP schools could be different from non-CEP schools in terms of demographics or baseline challenges. Additionally, many unobserved factors, such as parental engagement, transportation access, and school disciplinary policies, could influence absenteeism and truancy, but were not accounted for in the models. Other possible limitations include data inconsistencies and reporting biases which may have further skewed the results as truancy and absenteeism are often self-reported by districts.
To build upon this work, future research should consider employing quasi-experimental designs, such as difference-in-differences or propensity score matching. To better isolate the causal effects of CEP participation, RCTs would provide the strongest evidence by ensuring that schools are randomly assigned to participate in CEP or a control program, eliminating selection bias. Additionally, incorporating longitudinal data would allow for the examination of trends over time, providing a more comprehensive picture of how CEP impacts attendance outcomes. Furthermore, expanding the data set to include variables related to family engagement, school policies, and community resources would further enhance the analysis