Speech Forecasting and Pricing

Predicting mentions in public speech and identifying probabilistic mispricing in prediction markets.

Overview

This project forecasts whether specific words or phrases will be mentioned in a future public speech (for example, a central bank statement). It assigns probabilities to those events using a statistical predictive model and compares them against market-implied probabilities from prediction markets.

The system is intentionally conservative and robustness-focused. Rather than maximizing raw backtest performance, it emphasizes stability, disciplined filtering, and risk control. Trades are selected only when multiple independent signals align, and parameters are chosen based on robustness to small changes rather than peak optimization.

At a high level, the project combines natural language processing, probabilistic modeling, and market pricing analysis to study where language expectations and market beliefs diverge in a controlled, systematic way.

Repository Structure

scrape-fomc.py
extract-powell.py
clean-text.py
fetch-kalshi.py
clean-kalshi.py
build-features.py
model-v1.py
score-and-test.py
sweep-params.py
README.md

Workflow

The files should be run in the order described above, with the exception of score-and-test.py, which is automatically executed by sweep-params.py.

The overall workflow is as follows:

Fetch transcripts from all Federal Reserve Chair Jerome Powell press conferences using publicly available data from www.federalreserve.gov
Clean the transcripts to retain only Powell’s spoken remarks, restricted to alphanumeric content
Fetch and clean market data from Kalshi
Build a feature set using only Powell’s speech data
Train a logistic regression model to predict word and phrase mentions
Apply a scoring system to identify suitable prediction targets and run walk-forward backtests
Perform hyperparameter tuning to identify a stable maximum across key performance metrics

The primary outputs are:

recommendations.csv: a ranked list of potential trades for the upcoming event
trade_log.csv: detailed backtest results for historical evaluation

Requirements

Python 3.9+
Standard Python libraries
requests (for market data fetching)

Disclaimer

This project is for educational and demonstration purposes only.
It does not constitute financial advice or a recommendation to trade.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech Forecasting and Pricing

Overview

Repository Structure

Workflow

Requirements

Disclaimer

About

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
build-features.py		build-features.py
clean-kalshi.py		clean-kalshi.py
clean-text.py		clean-text.py
extract-powell.py		extract-powell.py
fetch-kalshi.py		fetch-kalshi.py
model-v1.py		model-v1.py
score-and-test.py		score-and-test.py
scrape-fomc.py		scrape-fomc.py
sweep-params.py		sweep-params.py

Folders and files

Latest commit

History

Repository files navigation

Speech Forecasting and Pricing

Overview

Repository Structure

Workflow

Requirements

Disclaimer

About

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages