An AI-powered tool that automatically generates and tests hypotheses on data, providing actionable insights through statistical analysis with real-time streaming responses.
# Install with uv
uv run git+https://github.com/prudhvi1709/hypoforge-python.git
# Or clone and run locally
git clone https://github.com/prudhvi1709/hypoforge-python.git
cd hypoforge-python
uv run app.pyOpen http://localhost:8000, configure API settings (βοΈ), and start analyzing data.
Hypothesis Forge analyzes your data and generates hypotheses that you can test. It then automatically tests them and provides detailed results with statistical significance, all powered by a FastAPI backend with streaming LLM responses.
flowchart LR
subgraph "π€ User Actions"
A[Upload Data<br/>CSV/SQLite/URL]
B[Configure API<br/>Settings]
end
subgraph "π€ AI Processing"
C[Generate<br/>Hypotheses]
D[Test Each<br/>Hypothesis]
E[Summarize<br/>Results]
end
subgraph "πΎ Data & Storage"
F[Load & Process<br/>Data Files]
G[Store Sessions<br/>Temporarily]
end
%% Simple User Flow
A --> F
F --> C
C --> D
D --> E
B -.-> C
B -.-> D
%% Storage Flow
F --> G
G --> D
%% Styling for clarity
classDef userAction fill
classDef aiProcess fill
classDef dataStore fill
class A,B userAction
class C,D,E aiProcess
class F,G dataStore
- Automated Hypothesis Generation: Creates relevant hypotheses based on data context and audience
- Statistical Testing:
- Automatic selection of appropriate statistical tests
- Support for t-tests, chi-square, correlation significance tests
- P-value calculation and interpretation
- Server-side Python execution for reliable results
- Real-time Streaming: See responses generate in real-time as the AI processes your data
- Interactive Interface:
- Dynamic results visualization with live updates
- Dark mode support
- Mobile-responsive design
- "Run All" feature to test multiple hypotheses at once
- Result synthesis for actionable insights
- Configurable API Settings:
- Frontend settings modal for API configuration
- Support for any OpenAI-compatible API endpoint
- Secure localStorage for API credentials
- Default integration with LLM Foundry
- Multiple Data Formats:
- CSV files
- SQLite databases (.sqlite3, .sqlite, .db, .s3db, .sl3)
- Support for various data types (numeric, categorical, temporal)
- Demo Datasets: Pre-configured datasets for immediate experimentation
- uv-compatible: Uses PEP 723 inline dependency specification
- Streaming responses: Real-time LLM response streaming via Server-Sent Events
- Type safety: Full type hints with Pydantic models
- Secure execution: Server-side Python code execution in controlled environment
- API endpoints:
/upload- File upload and data processing/generate-hypotheses- Streaming hypothesis generation/test-hypothesis- Streaming hypothesis testing with multi-phase responses/synthesize- Streaming results synthesis
- Modern JavaScript: ES6+ with native fetch and streaming APIs
- Settings management: localStorage-based API configuration
- Progressive rendering: Content appears as it streams from the backend
- Responsive design: Bootstrap 5 with dark mode support
- Configure API Settings: Click the gear icon and enter your API credentials
- Load Data: Select a demo dataset or upload your own file
- Generate Hypotheses: Watch as the AI generates relevant hypotheses in real-time
- Test Hypotheses: Click "Test" on any hypothesis to see streaming analysis
- Review Results: See statistical analysis, p-values, and plain English summaries
- Synthesize Insights: Click "Synthesize" to get actionable recommendations
- Hypothesis Generation: See JSON content build up as hypotheses are created
- Hypothesis Testing: Three-phase streaming:
- Analysis code generation
- Statistical execution results
- Plain English summary
- Results Synthesis: Watch markdown recommendations appear progressively
- FastAPI: Modern async web framework
- uvicorn: ASGI server for production
- pandas: Data manipulation and analysis
- scipy: Statistical computing
- numpy: Numerical computing
- aiohttp: Async HTTP client for LLM API calls
- Bootstrap 5: UI framework and responsive design
- Bootstrap Icons: Icon system
- d3.js: Data processing and CSV parsing
- Marked: Markdown parsing and rendering
- Highlight.js: Code syntax highlighting
- Type Safety: Full type hints throughout Python code
- Error Handling: Comprehensive error handling with user-friendly messages
- Security: API keys stored client-side, never transmitted to backend
- Performance: Streaming responses for immediate feedback
Configure through the frontend settings modal:
- API Base URL: Any OpenAI-compatible endpoint
- API Key: Your authentication token
- Model Name: The LLM model to use
Settings are stored securely in browser localStorage and never sent to the backend server.
No server-side environment variables needed. All configuration is handled through the frontend interface.
Included datasets for immediate experimentation:
- EHR Data: Electronic health records for pharmaceutical analysis
- Tourist Spend: Tourism economic data
- Card Transactions: Financial transaction patterns
- Employee Data: HR and workforce analytics
- Marvel Powers: Character abilities analysis
- World Cities: Geographic and demographic data
- NBA Games: Sports statistics and performance
- Craft Beer: Brewery and product analysis
- Atherosclerosis: Medical research data
- Fork the repository
- Create a feature branch
- Make your changes
- Test with
uv run app.py - Submit a pull request
For issues and questions:
- GitHub Issues: Report bugs or request features
- Documentation: This README and inline code documentation
This project includes comprehensive automated tests using pytest. The test suite covers unit tests, integration tests, and end-to-end workflows.
All Python files include inline requirements that work with uv, allowing you to run tests without any setup:
# Run all tests (auto-installs dependencies)
uv run pytest
# or
make uv-test
# Note: Always use pytest to run tests, not the test files directly
# Run the main application
uv run app.py
# or
make uv-run
# Test that uv works
make uv-verify-
Install test dependencies:
pip install -e ".[test]" # or make install-test
-
Run all tests:
pytest # or make test
-
Run specific test types:
# Unit tests only make test-unit # Integration tests only make test-integration # Fast tests (excluding slow ones) make test-fast
-
Run tests with coverage:
make test-coverage # Opens HTML coverage report make coverage-html -
Run specific tests:
# Run a specific test file
make test-file FILE=tests/test_all.py
make test-pattern PATTERN=test_load_data
### Test Structure
- `tests/test_all.py` - Comprehensive test suite covering all functionality
- `tests/conftest.py` - Test fixtures and configuration
### Test Categories
- **Unit Tests**: Test individual functions and components
- **Integration Tests**: Test complete workflows and component interactions
- **Performance Tests**: Test system behavior under load (marked as `slow`)
### Writing Tests
The test suite uses pytest with the following conventions:
- Test files start with `test_`
- Test functions start with `test_`
- Use descriptive test names explaining what is being tested
- Group related tests in classes
- Use appropriate pytest markers (`@pytest.mark.integration`, `@pytest.mark.slow`)
Example test structure:
```python
import pytest
class TestAPIEndpoints:
def test_load_csv_data(self, client, sample_csv_file):
"""Test CSV data loading"""
response = client.post("/load-data", json={"source": sample_csv_file})
assert response.status_code == 200
# ... more assertions
class TestUtilityFunctions:
def test_data_loading_functions(self, sample_csv_file):
"""Test data loading utility functions"""
# ... test implementation
class TestIntegration:
def test_complete_csv_workflow(self, client, sample_csv_file):
"""Test complete workflow: CSV loading β hypothesis testing"""
# ... test implementation
Tests are designed to run in CI environments and include:
- Automatic cleanup of test artifacts
- Isolated test sessions
- Comprehensive error handling
- Cross-platform compatibility
Each Python file in this project includes inline dependencies for uv:
# /// script
# dependencies = [
# "pytest>=7.4.0",
# "fastapi>=0.104.1",
# "pandas>=2.1.3",
# ]
# ///This allows any Python file to be run directly with uv run <filename>.py without requiring a separate virtual environment or dependency installation step.
For more testing commands, run:
make help