If this tool is helping you, please β the repo! It really helps discoverability.
SEC 13F Filing Tracker | Institutional Portfolio Analysis | AI-Powered Stock Research
A comprehensive Python tool for tracking hedge fund portfolios through SEC filings (13F, 13D/G, Form 4). Transform raw SEC EDGAR data into actionable investment insights. Built for financial analysts, quantitative traders, and retail investors seeking to analyze institutional investor strategies, portfolio changes, and discover stock opportunities by following elite fund managers.
Keywords: SEC filings tracker, 13F analysis, hedge fund portfolio, institutional investors, stock research, investment intelligence, CUSIP converter, financial data scraper, AI stock analysis
- π Hedge Fund Tracker
- β«Άβ° Table of Contents
- π Quick Start
- β¨ Key Features
- π¦ Installation
- π Project Structure
- π¨π»βπ» How This Tool Tracks Hedge Funds
- π’ Hedge Funds Selection
- π§ AI Models Selection
- Limitations & Considerations
- βοΈ Automation with GitHub Actions
- ποΈ Technical Stack
- π€πΌ Contributing & Support
- π References
- ππΌ Acknowledgments
- π License
# Clone the repository
git clone https://github.com/dokson/hedge-fund-tracker.git
cd hedge-fund-tracker
# Install dependencies
pipenv install
# Set up environment variables
cp .env.example .env
# Add your tokens/API keys (FinnHub, GitHub, Google AI Studio, Groq, HuggingFace, OpenRouter) to the .env file
# Run the application
pipenv run python -m app.main| Feature | Description |
|---|---|
| π Comparative Analysis | Combines quarterly (13F) and non-quarterly (13D/G, Form 4) filings for an up-to-date view |
| π Detailed Reports | Generates clear, console-based reports with intuitive formatting |
| ποΈ Curated Database | Includes list of top hedge funds and AI models, both easily editable via CSV files |
| π Ticker Resolution | Converts CUSIPs to tickers using a smart fallback system (yfinance, Finnhub, FinanceDatabase) |
| π€ Multi-Provider AI Analysis | Leverages different AI models to identify promising stocks based on filings |
| π Flexible Management | Offers multiple analysis modes: all funds, a single fund and also custom CIKs |
| βοΈ Automated Data Update | Includes a GitHub Actions workflow to automatically fetch and commit the latest SEC filings |
| ποΈ GICS Hierarchy | Features an autonomous parser to build a full GICS classification database |
- Python 3.13+
- pipenv (install with
pip install pipenv)
-
π₯ Clone and navigate:
git clone https://github.com/dokson/hedge-fund-tracker.git cd hedge-fund-tracker -
π² Install dependencies: Navigate to the project root and run the following command. This will create a virtual environment and install all required packages.
pipenv install
π‘ Tip: If
pipenvis not found, you might need to usepython -m pipenv install. This can happen if the user scripts directory is not in your system's PATH. -
π οΈ Configure environment: Create a
.envfile in the root directory of the project and add your keys (Finnhub and Google API)# Create environment file cp .env.example .env # Edit .env file and add your API keys: # FINNHUB_API_KEY="your_finnhub_key" # GITHUB_TOKEN="your_github_token" # GOOGLE_API_KEY="your_google_api_key" # GROQ_API_KEY="your_groq_api_key" # HF_TOKEN="your_hugging_face_token" # OPENROUTER_API_KEY="your_openrouter_api_key"
-
βΆοΈ Run the script: Execute within the project's virtual environment:pipenv run python -m app.main
-
π Choose an action: Once the script starts, you'll see the main interactive menu for data analysis:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Hedge Fund Tracker β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ β 0. Exit β β 1. View latest non-quarterly filings activity by funds (from 13D/G, Form 4) β β 2. Analyze overall hedge-funds stock trends for a quarter β β 3. Analyze a specific fund's quarterly portfolio β β 4. Analyze a specific stock's activity for a quarter β β 5. Run AI Analyst to find most promising stocks β β 6. Run AI Due Diligence on a stock β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
The data update operations (downloading and processing filings) are inside a dedicated script. This keeps the main application focused on analysis, while the updater handles populating and refreshing the database.
To run the data update operations, you need to use the updater.py script from the project root:
pipenv run python -m database.updaterThe updater.py script includes semi-automated maintenance tasks:
- Sorting: Upon exit (option
0), the script automatically sorts thedatabase/stocks.csvfile by ticker to maintain performance and prevent Git diff noise. - Auto-Documentation: This README's excluded funds section is synchronized whenever the database is refreshed manually.
This will open a separate menu for data management:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Hedge Fund Tracker - Database Updater β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β 0. Exit β
β 1. Generate latest 13F reports for all known hedge funds β
β 2. Fetch latest non-quarterly filings for all known hedge funds β
β 3. Generate 13F report for a known hedge fund β
β 4. Manually enter a hedge fund CIK to generate a 13F report β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββThe project includes an autonomous GICS (Global Industry Classification Standard) parser (database/gics/updater.py). Originally developed by MSCI and S&P, it scrapes Wikipedia to build a full hierarchy of 163 sub-industries. This provides the AI Analyst with granular industry context while remaining independent of third-party libraries.
The tool can utilize API keys for enhanced functionality, but all are optional:
| Service | Purpose | Get Free API Key |
|---|---|---|
Finnhub |
CUSIP to stock ticker conversion | Finnhub Keys |
GitHub Models |
Access to top-tier models (e.g., xAI Grok-3, OpenAI GPT-5, etc...) | GitHub Tokens |
Google AI Studio |
Access to Google Gemini models | AI Studio Keys |
Groq AI |
Access to various LLMs (e.g., OpenAI gpt-oss, Meta Llama, etc...) | Groq Keys |
Hugging Face |
Access to open weights models (e.g., DeepSeek R1, Kimi-Linear-48B, etc...) | HF Tokens |
OpenRouter |
Access to various LLMs (e.g., Claude 4.5 Opus, GLM 4.5 Air, etc...) | OpenRouter Keys |
π‘ Note: Ticker resolution primarily uses yfinance, which is free and requires no API key. If that fails, the system falls back to Finnhub (if an API key is provided), with the final fallback being FinanceDatabase.
π‘ Note: You don't need to use all the APIs. For the generative AI models (Google AI Studio, GitHub Models, Groq AI, Hugging Face, and OpenRouter), you only need the API keys for the services you plan to use. For instance, if you want to experiment with models like OpenAI GPT-4o mini, you just need a GitHub Token. Experimenting with different models is encouraged, as the quality of AI-generated analysis, both for identifying promising stocks and for conducting due diligence, can vary. However, top-performing stocks are typically identified consistently across all tested models. All APIs used in this project are currently free (with GitHub Models providing a generous free tier for developers).
hedge-fund-tracker/
βββ π .github/
β βββ π scripts/
β β βββ π fetcher.py # Daily script for data fetching (scheduled by workflows/daily-fetch.yml)
β βββ π workflows/ # GitHub Actions for automation
β βββ βοΈ filings-fetch.yml # GitHub Actions: Filings fetching job
β βββ βοΈ python-tests.yml # GitHub Actions: Unit tests
βββ π app/ # Main application logic
β βββ βΆοΈ main.py # Main entry point for Data & AI analysis
βββ π database/ # Data storage
β βββ π 2025Q1/ # Quarterly reports
β β βββ π fund_1.csv # Individual fund quarterly report
β β βββ π fund_2.csv
β β βββ π fund_n.csv
β βββ π YYYYQN/
β βββ π GICS/
β β βββ ποΈ hierarchy.csv # Full GICS hierarchy
β β βββ βΆοΈ updater.py # GICS updater script
β βββ π hedge_funds.csv # Curated hedge funds list -> EDIT THIS to add or remove funds to track
β βββ π models.csv # LLMs list to use for AI Financial Analyst -> EDIT THIS to add or remove AI models
β βββ π non_quarterly.csv # Stores latest 13D/G and Form 4 filings
β βββ π stocks.csv # Master data for stocks (CUSIP-Ticker-Name)
β βββ βΆοΈ updater.py # Main entry point for updating the database
βββ π tests/ # Test suite
βββ π .env.example # Template for your API keys
βββ β .gitignore # Git ignore rules
βββ π§Ύ LICENSE # MIT License
βββ π οΈ Pipfile # Project dependencies
βββ π Pipfile.lock # Locked dependency versions
βββ π README.md # Project documentation (this file)
π Hedge Funds Configuration File:
database/hedge_funds.csvcontains the list of hedge funds to monitor (CIK, name, manager) and can also be edited at runtime.π LLMs Configuration File:
database/models.csvcontains the list of available LLMs for AI analysis and can also be edited at runtime.
This tracker leverages the following types of SEC filings to provide a comprehensive view of institutional activity.
-
π Quarterly 13F Filings
- Required for funds managing $100M+
- Filed within 45 days of quarter-end
- Shows portfolio snapshot on last day of quarter
-
π Non-Quarterly 13D/G Filings
- Required when acquiring 5%+ of company shares
- Filed within 10 days of the transaction
- Provides a timely view of significant investments
-
βπ» Non-Quarterly SEC Form 4 Insider Filings
- Filed by insiders (executives, directors) or large shareholders (>10%) when they trade company stocks
- Must be filed within 2 business days of the transaction
- Offers real-time insight into the actions of key individuals and institutions
This tool tracks a curated list of what I found to be the top-performing institutional investors that file with the U.S. SEC, identified based on their performance over the last 3-5 years. This curation is the result of my own methodology designed to identify the top percentile of global investment funds. My selection methodology is detailed below.
Modern portfolio theory (MPT) offers many methods for quantifying the risk-return trade-off, but they are often ill-suited for analyzing the limited data available in public filings. Consequently, the hedge_funds.csv was therefore generated using my own custom selection algorithm designed to identify top-performing funds while managing for volatility.
Note: The selection algorithm is external to this project and was used only to produce the curated
hedge_funds.csvlist.
My approach prioritizes high cumulative returns but also analyzes the path taken to achieve them: it penalizes volatility, similar to the Sharpe Ratio, but this penalty is dynamically adjusted based on performance consistency; likewise, drawdowns are penalized, echoing the principle of the Sterling Ratio, but the penalty is intentionally dampened to avoid overly punishing funds that recover effectively from temporary downturns.
The list of hedge funds is actively managed to maintain its quality; funds that underperform may be replaced, while new top performers are periodically added.
However, despite their strong performance, several funds with portfolios predominantly focused on Healthcare and Biotech, such as Nextech Invest, Enavate Sciences, Caligan Partners, and Boxer Capital Management, have been intentionally excluded. These funds invest in highly specialized sectors where I lack the necessary expertise. Consequently, I consider them too risky for my personal investment profile, given the complexity and volatility inherent in biotech and healthcare ventures.
The quality of the output analysis is directly tied to the quality of the input data. To enhance the accuracy of the insights and opportunities identified, many popular high-profile funds have been intentionally excluded by design (the list below is automatically managed and capped to 50 funds, but you can see the full list in excluded_hedge_funds.csv):
- Warren Buffett's Berkshire Hathaway
- Ken Griffin's Citadel Advisors
- Ray Dalio's Bridgewater Associates
- Michael Burry's Scion Asset Management
- Peter Thiel's Thiel Macro
- Cathie Wood's ARK Invest
- Bill Ackman's Pershing Square
- Dmitry Balyasny's Balyasny Asset Management
- Alec Litowitz's Magnetar Capital
- Cliff Asness's AQR Capital Management
- David Tepper's Appaloosa
- Israel Englander's Millennium Management
- Frank Sands's Sands Capital Management
- Murray Stahl's Horizon Kinetics
- David Abrams's Abrams Capital Management
- Jeffrey Ubben's ValueAct Capital
- Paul Singer's Elliott Investment
- Chris Hohn's The Children's Investment
- Daniel Loeb's Third Point
- Boaz Weinstein's Saba Capital
- William Huffman's Nuveen
- George Soros's Soros Fund Management
- Bill Gates's Gates Foundation Trust
- Carl Icahn's Icahn Enterprises
- Dev Kantesaria's Valley Forge Capital Management
- Lewis Sanders's Sanders Capital
- Brad Gerstner's Altimeter Capital Management
- Andreas Halvorsen's Viking Global Investors
- Paul Tudor Jones's Tudor Investment Corporation
- Chris Davis's Davis Advisors
- Paul Isaac's Arbiter Partners
- Robert Robotti's Robotti Value Investors
- Jim Cracchiolo's Ameriprise Financial
- Li Lu's Himalaya Capital Management
- Francis Chou's Chou Associates
- Anand Parekh's Alyeska Investment Group
- Ken Fisher's Fisher Asset Management
- David Katz's Matrix Asset Advisors
- Lee Ainslie's Maverick Capital
- Joel Greenblatt's Gotham Funds
- Barry Ritholtz's Ritholtz Wealth Management
- Robert Pitts's Steadfast Capital Management
- John Paulson's Paulson & Co.
- Jeremy Grantham's GMO
- Paul Marshall & Ian Wace's Marshall Wace
- Seymour Kaufman's Crosslink Capital
- Mario Gabelli's GAMCO Investors
- John Overdeck's Two Sigma
- Richard Pzena's Pzena Investment Management
- Bill Nygren's Harris Associates
- and many more... (see
database/excluded_hedge_funds.csvfor the full list)
π‘ Note: For convenience, key information for these funds, including their CIKs, is maintained in the
database/excluded_hedge_funds.csvfile.
Want to track additional funds? Simply edit database/hedge_funds.csv and add your preferred institutional investors. For example, to add Berkshire Hathaway, Pershing Square and ARK-Invest, you would add the following lines:
"CIK","Fund","Manager","Denomination","CIKs"
...
"0001067983","Berkshire Hathaway","Warren Buffett","Berkshire Hathaway Inc",""
"0001336528","Pershing Square","Bill Ackman","Pershing Square Capital Management, L.P.",""
"0001697748","ARK Invest","Cathie Wood","ARK Investment Management LLC",""π‘ Note:
hedge_funds.csvcurrently includes not only traditional hedge funds but also other institutional investors (private equity funds, large banks, VCs, pension funds, etc., that file 13F to the SEC) selected from what I consider the top 5% of performers.If you wish to track any of the Notable Exclusions hedge funds, you can copy the relevant rows from
excluded_hedge_funds.csvintohedge_funds.csv.Columns for Custom Funds:
Denomination: This is the exact legal name used by the fund in its filings. It is essential for accurately processing non-quarterly filings (13D/G, Form 4) as the scraper uses it to identify the fund's specific transactions within complex filing documents.CIKs: A comma-separated list of additional CIKs. This field is used to track filings from related entities or subsidiaries. Some investment firms have complex structures where different legal entities file separately (e.g., a management company and a holding company).
- Example: Jeffrey Ubben's
ValueAct Holdings(CIK=0001418814) also has filings underValueAct Capital Management(CIK=0001418812). By adding0001418812to theCIKscolumn, the tool aggregates non-quarterly filings from both entities for a complete view."CIK","Fund","Manager","Denomination","CIKs" "0001418814","ValueAct","Jeffrey Ubben","ValueAct Holdings, L.P.","0001418812"
The AI Financial Analyst's primary goal is to identify stocks with the highest growth potential based on hedge fund activity. It achieves this by calculating a "Promise Score" for each stock. This score is a weighted average of various metrics derived from 13F filings. The AI's first critical task is to act as a strategist, dynamically defining the heuristic by assigning the optimal weights for these metrics based on the market conditions of the selected quarter. Its second task is to provide quantitative scores (e.g., momentum, risk) for the top-ranked stocks.
The models included in database/models.csv have been selected because they have demonstrated the best performance and reliability for these specific tasks. Through experimentation, they have proven effective at interpreting the prompts and providing insightful, well-structured responses.
π‘ Note on Meta's
llama-3.3-70b-versatile: while it can occasionally be less precise in defining the heuristic for the "Promise Score" compared to other top-tier models, it remains a valuable option. Its exceptional speed and lightweight nature make it ideal for rapid experimentation and iterative analysis, providing a useful trade-off between accuracy and performance. As the AI landscape evolves, it is expected that this model will eventually be replaced by newer alternatives that offer similar or better speed and efficiency.π‘ Note on xAI's Grok-3: This tool now supports GitHub Models, which provides access to Grok-3 and other next-generation models like GPT-5 and Llama 4. This integration allows for state-of-the-art financial reasoning and due diligence directly through your GitHub account.
π‘ Note on OpenRouter: OpenRouter was initially included because it offered free access to top-tier models; while some are no longer free, you can still use it with this tool if you have an existing API key.
You can easily add or change the AI models used for analysis by editing the database/models.csv file. This allows you to experiment with different Large Language Models (LLMs) from supported providers.
To add a new model, open database/models.csv and add a new row with the following columns:
- ID: The specific model identifier as required by the provider's API.
- Description: A brief, user-friendly description that will be displayed in the selection menu.
- Client: The provider of the model. Must be one of
GitHub,Google,Groq,HuggingFace, orOpenRouter.
Here are the official model lists for each provider:
It's crucial to understand the inherent limitations of tracking investment strategies solely through SEC filings:
| Limitation | Impact | Mitigation |
|---|---|---|
| π Filing Delay | Data can be 45+ days old | Focus on long-term strategies |
| π§© Incomplete Picture | Only US long positions shown | Use as part of broader analysis |
| π No Short Positions | Missing hedge information | Consider reported positions carefully |
| π Limited Scope | No non-US stocks or other assets | Supplement with additional data |
Many tracking websites rely solely on quarterly 13F filings, which means their data can be over 45 days old and miss many significant trades. Non-quarterly filings like 13D/G and Form 4 are often ignored because they are more complex to process and merge.
This tracker helps overcome that limitation by integrating multiple filing types. When analyzing the most recent quarter, the tool automatically incorporates the latest data from 13D/G and Form 4 filings. As a result, the holdings, deltas, and portfolio percentages reflect not just the static 13F snapshot, but also any significant trades that have occurred since. This provides a more dynamic and complete picture of institutional activity.
This repository includes a GitHub Actions workflow (.github/workflows/filings-fetch.yml) designed to keep your data effortlessly up-to-date by automatically fetching the latest SEC filings.
- Scheduled Runs: The workflow runs automatically to check for new 13F, 13D/G, and Form 4 filings from the funds you are tracking (
hedge_funds.csv). It runs four times a day from Monday to Friday (at 01:30, 13:30, 17:30, and 21:30 UTC) and once on Saturday (at 04:00 UTC). - Safe Branching Strategy: Instead of committing directly to your main branch, the workflow pushes all new data to a dedicated branch named
automated/filings-fetch. - User-Controlled Merging: This approach gives you full control. You can review the changes committed by the bot and then merge them into your main branch whenever you're ready. This prevents unexpected changes and allows you to manage updates at your own pace.
- Automated Alerts: If the script encounters a non-quarterly filing where it cannot identify the fund owner based on your
hedge_funds.csvconfiguration, it will automatically open a GitHub Issue in your repository, alerting you to a potential data mismatch that needs investigation.
- Fork the Repository: Create your own fork of this project on GitHub.
- Enable Actions: GitHub Actions are typically enabled by default on forked repositories. You can verify this under the Actions tab of your fork.
- Configure Secrets: For the workflow to resolve tickers and create issues, you need to add your API keys as repository secrets. In your forked repository, you must add your
FINNHUB_API_KEYas a repository secret. Go toSettings>Secrets and variables>Actionsin your forked repository to add it.
| ποΈ Category | π¦Ύ Technology |
|---|---|
| Core | Python 3.13+, pipenv |
| Web Scraping | Requests, Beautiful Soup, lxml |
| Reliability | Tenacity (Smart retries for API rate-limiting and AI responses) |
| Config | python-dotenv |
| Data Processing | pandas, csv |
| Stocks Libraries | Finnhub-Stock-API, FinanceDatabase |
| Gen AI | python-toon, Google Gen AI SDK, OpenAI |
- π Bug Reports
- π Feature Requests
- π Fork & PR
- π Share on X or LinkedIn
This tool is in active development, and your input is valuable. If you have any suggestions or ideas for new features, please feel free to get in touch.
- SEC Developer Resources
- SEC: Frequently Asked Questions About Form 13F
- SEC: Guidance on Beneficial Ownership Reporting (Sections 13D/G)
- Wikipedia: Global Industry Classification Standard
- MSCI: Global Industry Classification Standard (GICS)
- S&P Global: GICS Structure & Methodology
- CUSIP (Committee on Uniform Security Identification Procedures)
- Modern Portfolio Theory (MPT)
This project began as a fork of sec-web-scraper-13f by Gary Pang. The original tool provided a solid foundation for scraping 13F filings from the SEC's EDGAR database. It has since been significantly re-architected and expanded into a comprehensive analysis platform, incorporating multiple filing types, AI-driven insights, and automated data management.
This project is released under the MIT License, an open-source license that grants you the freedom to use, modify, and distribute the software. For the full terms, please see the LICENSE file.





