Skip to content

Public-facing repository containing the code used to develop my MSDS Capstone at the University of Wisconsin La Crosse

Notifications You must be signed in to change notification settings

bchileen/ChileenMartinez_MSDS_Capstone_Final

Repository files navigation

Predictive Dredging Models for Upper Mississippi River and Illinois Waterway

Overview

This repository contains the code used in my capstone project for the University of Wisconsin LaCrosse Masters of Data Science program titled “Predictive Dredging Models for the Mississippi River and Illinois Waterway”. Input data is stored in the Data Folder and includes historic river gage data, dredge data, Corps Shoaling Analysis Tool (CSAT) output and gage metadata. The code for this capstone is organized in an RMD document that can be easily run for analysis.

This capstone project uses 25 years of historic river gage observations and shoaling rates derived from hydrosurveys to develop machine learning models to predict shoaling rates as a proxy for dredging need. These models extend the forecasting window of dredging to support the U.S. Army Corps of Engineers Rock Island District dredging operations.

This script creates additional data plots and products that may not have been presented in capstone project. These products however; will be used in operational use of this modeling framework and in communications with key stakeholders.

Author: Barrie Chileen Martinez, Geographer, Rock Island District, U.S. Army Corps of Engineers ORCID iD iconhttps://orcid.org/0000-0002-6960-8167
Program: Department of Data Science, University of Wisconsin – La Crosse
Course: DS 785: Capstone

Disclaimer

LSTM Model and suggestions for performance metrics visualizations were developed using Claude AI - Opus 4.5 on 12/01/2025. AI was used to build out the data preparation handling, performance checks, and workflow for LSTM. AI assisted in debugging and helped build model architecture to correctly handle temporal splits. Other reference materials include R Bloggers Forecasting Sunspots and Time Series Forecasting with LSTM RNN. LSTM structure is modeled after Asborno et al. 2024

Project Structure

├── Chileen_Martinez_Capstone_Code_Final.Rmd    # Main analysis script
├── Data/
│   ├── UMR_IWW_1999_2024.csv           # River gage observations (1999-2024)
│   ├── CSAT_DATA_Combined.csv          # Shoaling rates from CSAT tool
│   ├── gage_metadata.csv               # Gage locations and metadata
│   └── Dredge_Event_data.csv           # Historical dredging events
├── Output/
│   ├── PCA/                            # PCA biplots and variance tables
│   ├── xGBoost/                        # xGBoost results and feature importance
│   ├── LSTM/                           # LSTM predictions and horizon analysis
│   ├── Pool_Models/                    # Pool-level model comparisons
│   ├── Comparisons/                    # Model comparison tables
│   ├── Maps/                           # Dredging urgency maps
│   └── EDA/                            # Exploratory data analysis plots
└── README.Rmd

Requirements

R Packages

# Data manipulation and visualization
library(tidyverse)
library(lubridate)
library(zoo)
library(scales)
library(patchwork)
library(gridExtra)
library(viridis)

# Tables and reporting
library(knitr)
library(kableExtra)
library(gt)

# Machine learning
library(caret)
library(xgboost)
library(forecast)

# Deep learning
library(keras3)
library(tensorflow)

# Visualization
library(corrplot)
library(ggfortify)
library(webshot)

# Parallel processing
library(doParallel)
library(foreach)

Version Information

  • R version 4.3.3
  • Python 3.10
  • TensorFlow v2.20.0

Models Implemented

Model Purpose Key Parameters
ARIMA Baseline comparison Seasonal, frequency=12
xGBoost Regression + Classification 5-fold temporal CV, tuned hyperparameters
LSTM Extended forecasting 30-day lookback, 45-day horizon, 64→32 architecture

Usage

Run Full Analysis

# Open and knit the RMarkdown file
rmarkdown::render("Chileen_Martinez_Capstone_Code_Final.Rmd")

Run Individual Sections

The RMarkdown is organized into modular chunks and can be run in a step-wise approach:

  1. Data Loading & Cleaning - Load and preprocess gage/CSAT data
  2. EDA - Distribution plots, correlation matrix, seasonal analysis
  3. PCA - Principal component analysis by river
  4. Baseline Models - ARIMA, persistence, mean baselines
  5. xGBoost - River and pool-level gradient boosting
  6. LSTM - Deep learning time series forecasting
  7. Visualizations - Maps and comparison plots
  8. Results Summary - Export tables for paper

References

Asborno, M., et al. (2024). Forecasting sediment accumulation in the Southwest Pass with machine-learning models. Journal of Waterway, Port, Coastal, and Ocean Engineering, 150(2), 04023022.

Dunkin, L. M., Coe, L. A., & Ratcliff, J. J. (2018). Corps shoaling analysis tool: Predicting channel shoaling. U.S. Army Engineer Research and Development Center.

License

This project was developed for academic purpose and internal operational use within USACE Rock Island District. Code is available upon request.

Data Availability

Input data sources are publicly available:

About

Public-facing repository containing the code used to develop my MSDS Capstone at the University of Wisconsin La Crosse

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published