Skip to content

finm-python-for-finance/assignment-10

Repository files navigation

Assignment 10 — Market Data Storage and Querying

This folder contains a fully modular workflow for the Market Data Storage and Querying with SQLite3 and Parquet assignment. Each concern lives in its own module so you can mix and match components depending on whether you are demonstrating ingestion, SQL analytics, Parquet analytics, or cross-format comparisons.

File guide

File / Dir Purpose
data_loader.py Loads the raw CSVs, validates ticker references, and returns ready-to-ingest DataFrames.
sqlite_storage.py Creates the SQLite schema, inserts data, and exposes the four SQL query tasks.
parquet_storage.py Materializes a ticker-partitioned Parquet dataset and implements the three Parquet analytics tasks plus storage comparisons.
assignment10.py Command-line entry point that orchestrates ingestion, querying, and reporting.
market_data.db SQLite3 database generated by the script (deleted/recreated when running with --rebuild).
market_data/ Parquet directory partitioned by ticker for fast symbol-specific scans.
query_tasks.md Narrative of SQL & Parquet query outputs along with observed runtimes.
comparison.md Discussion of SQLite vs Parquet storage characteristics and recommended use cases.
tests/ Unit tests covering ingestion and representative queries for both storage layers.

Source CSVs (market_data_multi.csv, tickers.csv) plus the schema definition remain unchanged and live alongside these modules.

Running the solution

cd assignment-10
python3 assignment10.py --rebuild
  • --rebuild drops the existing SQLite database and Parquet directory before re-loading from the CSVs. Omit the flag to reuse prior artifacts and skip the expensive load step.
  • Use --results /path/to/file.json to change the location of the JSON summary (results.json by default).

Dependencies: pandas (already available) and pyarrow (install once via pip3 install pyarrow). No other services are needed.

Delivered analytics

SQLite tasks

  1. TSLA windowed extract – Pulls every Tesla trade between 2025‑11‑17 and 2025‑11‑18 and captures the query latency for the storage comparison.
  2. Average daily volume per ticker – Aggregates each ticker’s average volume by trade date.
  3. Top 3 tickers by total return – Uses first open and last close prices over the sample period to rank tickers.
  4. First/last trade price per day – Returns closing prices for the first and last minute bar per ticker per day.

Parquet tasks

  1. AAPL 5-minute rolling mean – Computes rolling averages over the close column using the partitioned Parquet dataset.
  2. 5-day rolling volatility – Resamples to daily closes, calculates daily returns, and builds a rolling standard deviation per ticker.
  3. SQLite vs Parquet comparison – Compares runtime and file sizes for the TSLA window query between the two storage formats.

Open results.json after running the script to see the latest row counts, timings, and storage comparisons; re-run the script whenever you need fresh metrics or want to regenerate the SQLite/Parquet artifacts from the source CSVs.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages