Skip to content

Atri2-code/earnings-sentiment-analyser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Earnings Sentiment & Price Reaction Analyser

An NLP pipeline that scores earnings call transcripts using domain-specific financial sentiment lexicons and tests whether management tone predicts short-term abnormal equity returns — replicating the methodology of academic finance NLP research.


Research question

Does positive (negative) sentiment in earnings call transcripts predict positive (negative) abnormal stock returns in the 1, 3, and 5 trading days following the call?


Pipeline architecture

SEC EDGAR 8-K filings + synthetic fallback
              │
              ▼
┌─────────────────────────┐
│  1. Transcript          │  SEC EDGAR API → text extraction
│     Collection          │  → synthetic fallback for unavailable filings
│     fetcher.py          │
└──────────┬──────────────┘
           │  raw transcript text
           ▼
┌─────────────────────────┐
│  2. Sentiment           │  Loughran-McDonald financial lexicon
│     Analysis            │  + VADER rule-based scorer
│     analyser.py         │  → composite score + label
└──────────┬──────────────┘
           │  sentiment scores per earnings event
           ▼
┌─────────────────────────┐
│  3. Price Reaction      │  Event study: abnormal returns
│     Analysis            │  → correlation test + t-test
│     price_reaction.py   │  → interactive dashboard
└─────────────────────────┘

Methods

Sentiment scoring

Two complementary approaches are combined into a composite score:

Loughran-McDonald (LM) Financial Lexicon Domain-specific wordlist calibrated for financial text. Words like "uncertainty" and "risk" are negative in this context — unlike general-purpose sentiment models which treat them as neutral. Standard in academic finance NLP. Weight: 60% of composite score.

VADER (rule-based) Handles negation ("not profitable"), intensifiers ("significantly exceeded"), and capitalisation. Fast and interpretable baseline. Weight: 40% of composite score.

Composite score: 0.6 × LM_net + 0.4 × VADER_compound ∈ [-1, 1]

Event study

  • Abnormal return: AR(t) = R_stock(t) - R_market(t) (market-adjusted model)
  • Cumulative abnormal return: CAR[1,N] = Σ AR from t+1 to t+N
  • Event windows: 1-day, 3-day, 5-day post-earnings
  • Statistical tests: Pearson correlation + independent samples t-test (positive vs negative sentiment groups)

Modules

Module 1 — Transcript Collection (1_collection/fetcher.py)

  • Queries SEC EDGAR submissions API for recent 8-K filings
  • Extracts text from filing documents
  • Falls back to deterministic synthetic transcripts where SEC data unavailable
  • Outputs: output/transcripts/transcripts.csv

Module 2 — Sentiment Analysis (2_sentiment/analyser.py)

  • Scores each transcript with LM lexicon and VADER
  • Computes composite score and categorical label (positive/neutral/negative)
  • Outputs: output/sentiment/sentiment_scores.csv, ticker_sentiment_summary.csv

Module 3 — Price Reaction Analysis (3_analysis/price_reaction.py)

  • Joins sentiment scores to forward abnormal returns
  • Runs correlation analysis and t-tests across event windows
  • Generates interactive Plotly dashboard
  • Outputs: output/analysis/event_study_results.csv, statistical_summary.csv, sentiment_analysis_dashboard.html

How to run

1. Install dependencies

pip install -r requirements.txt

2. Run the full pipeline

python run_pipeline.py

3. View results

open output/analysis/sentiment_analysis_dashboard.html

Or run modules individually:

python 1_collection/fetcher.py
python 2_sentiment/analyser.py
python 3_analysis/price_reaction.py

Outputs

File Description
output/transcripts/transcripts.csv Raw transcript text per ticker per quarter
output/sentiment/sentiment_scores.csv Per-transcript LM, VADER, and composite scores
output/sentiment/ticker_sentiment_summary.csv Aggregated sentiment per ticker
output/analysis/event_study_results.csv Sentiment joined to forward CAR
output/analysis/statistical_summary.csv Correlation and t-test results
output/analysis/sentiment_analysis_dashboard.html Interactive dashboard

Dependencies

yfinance>=0.2.28
pandas>=2.0.0
numpy>=1.24.0
scipy>=1.10.0
plotly>=5.0.0
requests>=2.28.0

Limitations and extensions

Known limitations:

  • Synthetic transcripts are used where SEC filings are unavailable — results using synthetic data should be interpreted as methodology demonstration only, not empirical findings
  • Simple market-adjusted abnormal returns (no beta estimation)
  • No correction for multiple testing across windows

Natural extensions:

  • Replace synthetic transcripts with a paid transcript provider (Refinitiv, Bloomberg, Motley Fool)
  • Add FinBERT transformer-based scoring for comparison
  • Incorporate conference call Q&A section separately from prepared remarks
  • Extend to analyst report sentiment

Author

Atrija Haldar LinkedIn MSc Engineering, Technology and Business Management — University of Leeds

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages