A comprehensive Python application for analyzing SEC 10-K filings and generating automated SWOT (Strengths, Weaknesses, Opportunities, Threats) reports using natural language processing and machine learning techniques.
- Automated SEC Filing Analysis: Download and process 10-K filings directly from the SEC EDGAR database
- AI-Powered SWOT Classification: Uses machine learning to classify text into SWOT categories
- Interactive Dashboard: Modern Streamlit web interface with dark theme and professional styling
- Comprehensive Visualizations: Interactive charts and graphs powered by Plotly
- Export Options: Download results in CSV, JSON, and PDF formats
- Multi-Company Support: Analyze multiple companies and time periods
- Real-Time Processing: Live progress tracking during analysis
The application features three main modes:
- ๐ Quick Analysis: Select a ticker and date range for automated analysis
- ๐ Upload Documents: Process custom SEC filings (coming soon)
- ๐ View Results: Browse and visualize previously generated reports
-
Clone the repository:
git clone <repository-url> cd nlp-project
-
Install dependencies:
pip install -r requirements.txt
-
Run the dashboard:
streamlit run dashboard.py
streamlit>=1.28.0
plotly>=5.15.0
pandas>=1.5.0
datamule
tqdm
pathlibโโโ dashboard.py # Main Streamlit dashboard
โโโ swot_analysis.ipynb # Jupyter notebook for SWOT analysis
โโโ requirements.txt # Python dependencies
โโโ sec_10k_sentences.csv # Raw SEC filing sentences
โโโ sec_10k_sentences_clean.csv # Cleaned sentences
โโโ sec_portfolio/ # Downloaded SEC filings
โ โโโ 000032019323000106.tar
โ โโโ 000032019324000123.tar
โโโ sec_swot_output/ # Analysis results
โโโ index.json # Master index of reports
โโโ swot_AAPL_*.csv # Individual SWOT data
โโโ swot_report_AAPL_*.json # Structured reports
- Launch the dashboard:
streamlit run dashboard.py - Select ๐ Quick Analysis from the sidebar
- Choose a company ticker (e.g., AAPL, MSFT, GOOGL)
- Set your desired date range
- Click ๐ Run Analysis
- View results in the ๐ View Results section
- Company Selection: Choose from popular tickers (AAPL, MSFT, GOOGL, AMZN, TSLA, META, NVDA) or enter custom ticker
- Date Range: Set start and end dates for filing analysis (2020-2025)
- One-Click Analysis: Automated processing with real-time progress tracking
- Support for
.txt,.pdf, and.htmlfiles - Custom document processing capabilities
- Batch upload functionality
- Interactive report selector
- Professional SWOT visualizations
- Detailed category breakdowns with key themes and insights
- Export options (CSV, JSON, PDF)
For advanced users and development:
- Open
swot_analysis.ipynb - Configure parameters:
TICKERS = ["AAPL"] # Companies to analyze FORMS = ["10-K"] # SEC form types DATE_RANGE = ("2023-01-01", "2024-12-31")
- Run all cells to perform analysis
The analysis generates several output files:
- CSV Files: Raw SWOT classifications with confidence scores
- JSON Reports: Structured reports with key themes and insights
- Index File: Master catalog of all generated reports
{
"meta": {
"ticker": "AAPL",
"accession": "000032019324000123",
"filing_date": "2024-11-01"
},
"report": {
"Strength": {
"count": 25,
"top_bullets": ["Key strength indicators..."],
"key_themes": ["company", "growth", "innovation"],
"key_insights": ["Strategic advantages identified..."],
"summary": "25 strength indicators found"
},
"Weakness": { ... },
"Opportunity": { ... },
"Threat": { ... }
}
}- Gradient backgrounds and modern styling
- Color-coded SWOT categories:
- ๐ช Strengths: Green gradient
โ ๏ธ Weaknesses: Red gradient- ๐ฏ Opportunities: Blue gradient
- โก Threats: Orange gradient
- Pie Charts: SWOT category distribution with pull-out effects
- Key Themes: Highlighted tag-style display of common topics
- Sample Evidence: Expandable sections with filing excerpts
- Metrics Cards: Professional summary statistics
- CSV Export: Raw classification data for further analysis
- JSON Export: Complete structured reports
- PDF Generation: Professional report formatting (coming soon)
Main Streamlit application featuring:
- Modern dark theme with professional styling
- Interactive visualizations with Plotly
- Real-time analysis progress tracking
- Multi-format export capabilities
- Responsive design with custom CSS
Core analysis engine providing:
- SEC filing download via datamule
- Text preprocessing and sentence extraction
- ML-based SWOT classification
- Report generation and export
The system uses keyword-based weak supervision to classify sentences:
- Strengths: Competitive advantages, strong performance metrics, market leadership
- Weaknesses: Risk factors, operational challenges, regulatory concerns
- Opportunities: Growth potential, market expansion, new technologies
- Threats: External risks, competitive pressures, economic factors
- Distribution Analysis: Pie charts showing SWOT category proportions
- Theme Analysis: Most frequent topics per category
- Evidence Display: Representative sentences for each category
- Executive Metrics: Key performance indicators and summaries
- Dark Theme: Modern gradient backgrounds
- Color Coding: Intuitive category identification
- Responsive Layout: Adapts to different screen sizes
- Professional Typography: Clean, readable font choices
- SEC EDGAR Database: Official 10-K filings
- Supported Companies: All publicly traded US companies
- Time Range: 2020-present (configurable)
- Filing Types: 10-K annual reports (expandable)
- PDF document upload support
- Enhanced text preprocessing
- Batch analysis capabilities
- PDF report generation
- Advanced NLP models (BERT, GPT)
- Comparative analysis across companies
- Trend analysis over time
- Email notification system
- Multi-language support
- Real-time market integration
- Automated report scheduling
- API development
-
First Time Setup:
git clone <repository-url> cd nlp-project pip install -r requirements.txt
-
Launch Dashboard:
streamlit run dashboard.py
-
Run Your First Analysis:
- Navigate to "Quick Analysis" mode
- Select "AAPL" from the ticker dropdown
- Set date range to last year
- Click "Run Analysis"
- View results in "View Results" mode
- datamule: For SEC filing download and processing
- Streamlit: For the interactive web interface framework
- Plotly: For advanced data visualizations
- SEC EDGAR: For providing public access to corporate filings