An NLP pipeline that scores earnings call transcripts using domain-specific financial sentiment lexicons and tests whether management tone predicts short-term abnormal equity returns — replicating the methodology of academic finance NLP research.
Does positive (negative) sentiment in earnings call transcripts predict positive (negative) abnormal stock returns in the 1, 3, and 5 trading days following the call?
SEC EDGAR 8-K filings + synthetic fallback
│
▼
┌─────────────────────────┐
│ 1. Transcript │ SEC EDGAR API → text extraction
│ Collection │ → synthetic fallback for unavailable filings
│ fetcher.py │
└──────────┬──────────────┘
│ raw transcript text
▼
┌─────────────────────────┐
│ 2. Sentiment │ Loughran-McDonald financial lexicon
│ Analysis │ + VADER rule-based scorer
│ analyser.py │ → composite score + label
└──────────┬──────────────┘
│ sentiment scores per earnings event
▼
┌─────────────────────────┐
│ 3. Price Reaction │ Event study: abnormal returns
│ Analysis │ → correlation test + t-test
│ price_reaction.py │ → interactive dashboard
└─────────────────────────┘
Two complementary approaches are combined into a composite score:
Loughran-McDonald (LM) Financial Lexicon Domain-specific wordlist calibrated for financial text. Words like "uncertainty" and "risk" are negative in this context — unlike general-purpose sentiment models which treat them as neutral. Standard in academic finance NLP. Weight: 60% of composite score.
VADER (rule-based) Handles negation ("not profitable"), intensifiers ("significantly exceeded"), and capitalisation. Fast and interpretable baseline. Weight: 40% of composite score.
Composite score: 0.6 × LM_net + 0.4 × VADER_compound ∈ [-1, 1]
- Abnormal return:
AR(t) = R_stock(t) - R_market(t)(market-adjusted model) - Cumulative abnormal return:
CAR[1,N] = Σ ARfrom t+1 to t+N - Event windows: 1-day, 3-day, 5-day post-earnings
- Statistical tests: Pearson correlation + independent samples t-test (positive vs negative sentiment groups)
- Queries SEC EDGAR submissions API for recent 8-K filings
- Extracts text from filing documents
- Falls back to deterministic synthetic transcripts where SEC data unavailable
- Outputs:
output/transcripts/transcripts.csv
- Scores each transcript with LM lexicon and VADER
- Computes composite score and categorical label (positive/neutral/negative)
- Outputs:
output/sentiment/sentiment_scores.csv,ticker_sentiment_summary.csv
- Joins sentiment scores to forward abnormal returns
- Runs correlation analysis and t-tests across event windows
- Generates interactive Plotly dashboard
- Outputs:
output/analysis/event_study_results.csv,statistical_summary.csv,sentiment_analysis_dashboard.html
1. Install dependencies
pip install -r requirements.txt2. Run the full pipeline
python run_pipeline.py3. View results
open output/analysis/sentiment_analysis_dashboard.htmlOr run modules individually:
python 1_collection/fetcher.py
python 2_sentiment/analyser.py
python 3_analysis/price_reaction.py| File | Description |
|---|---|
output/transcripts/transcripts.csv |
Raw transcript text per ticker per quarter |
output/sentiment/sentiment_scores.csv |
Per-transcript LM, VADER, and composite scores |
output/sentiment/ticker_sentiment_summary.csv |
Aggregated sentiment per ticker |
output/analysis/event_study_results.csv |
Sentiment joined to forward CAR |
output/analysis/statistical_summary.csv |
Correlation and t-test results |
output/analysis/sentiment_analysis_dashboard.html |
Interactive dashboard |
yfinance>=0.2.28
pandas>=2.0.0
numpy>=1.24.0
scipy>=1.10.0
plotly>=5.0.0
requests>=2.28.0
Known limitations:
- Synthetic transcripts are used where SEC filings are unavailable — results using synthetic data should be interpreted as methodology demonstration only, not empirical findings
- Simple market-adjusted abnormal returns (no beta estimation)
- No correction for multiple testing across windows
Natural extensions:
- Replace synthetic transcripts with a paid transcript provider (Refinitiv, Bloomberg, Motley Fool)
- Add FinBERT transformer-based scoring for comparison
- Incorporate conference call Q&A section separately from prepared remarks
- Extend to analyst report sentiment
Atrija Haldar LinkedIn MSc Engineering, Technology and Business Management — University of Leeds