Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 29 additions & 14 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,30 +43,45 @@ This document serves as the foundational mandate for all development work perfor

## 7. Local Quality Gate (Required Before Every Commit or Push)

All of the following must pass with zero errors before any commit or push to GitHub. AI agents must run these in order and fix all failures before proceeding.
**This gate is mandatory — no exceptions.** All checks must pass with zero errors before any `git push` or PR submission. CI runs the same checks; a failing CI is a sign the gate was skipped locally. Fix locally first, then push.

AI agents must run these steps in order and fix every failure before proceeding. Do not open or update a PR until the full gate passes locally.

### Step 1 — Auto-fix what can be fixed automatically
```bash
# 1. Lint and format check
ruff check --fix .
ruff format .
```

### Step 2 — Verify everything is clean
```bash
# Lint and format
ruff check .
ruff format --check .

# 2. Type checking
# Type checking
mypy .

# 3. Security scan
bandit -r . -x venv,tests

# 4. Tests with coverage
# Tests with coverage (must stay above 80%)
pytest --cov=. --cov-report=term-missing --cov-fail-under=80 tests/
```

To auto-fix ruff lint and format issues before checking:
```bash
ruff check --fix .
ruff format .
```
All four commands must exit with code 0. If any fail, fix the reported errors and re-run the full gate from Step 1 before pushing.

### Common failure patterns and fixes

| Failure | Fix |
|---|---|
| `ruff` import order / formatting | Run `ruff check --fix . && ruff format .` |
| `ruff` unused import (`F401`) | Remove the import or add `# noqa: F401` only if intentional |
| `ruff` unused variable (`F841`, `B007`) | Remove assignment or rename to `_varname` |
| `ruff` line too long (`E501`) | Extract to a named variable or add file to `per-file-ignores` in `pyproject.toml` |
| `mypy` type mismatch | Fix the type annotation or add an explicit cast; do not use `# type: ignore` without a comment explaining why |
| `pytest` test failure | Fix the code or test — never skip or delete a failing test |
| `pytest` coverage below 80% | Add tests for the new code path |

### Installing tools

Install all tools into the venv if not present:
```bash
pip install ruff mypy bandit
pip install ruff mypy types-requests
```
103 changes: 92 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -132,20 +132,101 @@ Additional utility scripts are available in the `tools/` directory:

## Project Structure

- `autobiographer.py`: Core script for API interaction and data fetching.
- `visualize.py`: Streamlit dashboard implementation.
- `analysis_utils.py`: Shared data processing and analysis logic.
- `notebooks/`: Ready-to-use Jupyter notebooks.
- `tools/`: Utility scripts for video generation and post-processing.
- `data/`: Default local storage for your listening history CSVs.
- `tests/`: Comprehensive test suite (90%+ coverage).
```
autobiographer.py # Last.fm API fetch + data save CLI
visualize.py # Streamlit dashboard (assembles views from plugins)
analysis_utils.py # Shared data processing and caching logic
core/
broker.py # DataBroker: loads plugins, merges what-when + where-when
plugins/
sources/
base.py # SourcePlugin ABC + validate_schema()
__init__.py # REGISTRY + @register decorator + load_builtin_plugins()
lastfm/loader.py # Last.fm source plugin
swarm/loader.py # Foursquare/Swarm source plugin
notebooks/ # Jupyter notebooks for custom analysis
tools/ # Utility scripts (audio muxing, etc.)
data/ # Local data storage (CSVs, cache, Swarm JSON exports)
tests/ # Pytest suite (80%+ coverage)
```

## Plugin System

Data sources are implemented as `SourcePlugin` subclasses and self-register via a decorator. The `DataBroker` loads them at runtime and makes their data available to the dashboard.

### Adding a source plugin

**1. Create the plugin file**, e.g. `plugins/sources/letterboxd/loader.py`:

```python
from __future__ import annotations
from typing import Any
import pandas as pd
from plugins.sources import register
from plugins.sources.base import SourcePlugin, validate_schema

@register
class LetterboxdPlugin(SourcePlugin):
PLUGIN_TYPE = "what-when" # or "where-when"
PLUGIN_ID = "letterboxd"
DISPLAY_NAME = "Letterboxd Film Diary"

def get_config_fields(self) -> list[dict[str, Any]]:
return [{"key": "data_path", "label": "Letterboxd CSV export", "type": "path"}]

def load(self, config: dict[str, Any]) -> pd.DataFrame:
# Load your data, then map to the normalized schema columns:
# what-when: timestamp, label, sublabel, category, source_id
# where-when: timestamp, lat, lng, place_name, place_type, source_id
df = ... # your loading logic
df = df.assign(label=df["film"], sublabel=df["director"],
category=df["year"], source_id=self.PLUGIN_ID)
validate_schema(df, self.PLUGIN_TYPE)
return df
```

**2. Register it** in `plugins/sources/__init__.py`:

```python
def load_builtin_plugins() -> None:
import plugins.sources.lastfm.loader # noqa: F401
import plugins.sources.swarm.loader # noqa: F401
import plugins.sources.letterboxd.loader # noqa: F401 ← add this
```

**3. Add tests** in `tests/test_source_plugins.py` using the existing `TestLastFmPlugin` class as a template. Mock your data loader to keep tests fast and offline.

### Plugin types and required schema columns

| `PLUGIN_TYPE` | Required columns |
|---|---|
| `what-when` | `timestamp`, `label`, `sublabel`, `category`, `source_id` |
| `where-when` | `timestamp`, `lat`, `lng`, `place_name`, `place_type`, `source_id` |

`validate_schema()` raises `ValueError` at load time if any required column is absent, so errors surface immediately.

### Using the DataBroker directly

```python
from plugins.sources import load_builtin_plugins, REGISTRY
from core.broker import DataBroker

load_builtin_plugins()
broker = DataBroker()
broker.load(REGISTRY["lastfm"](), {"data_path": "data/tracks.csv"})
broker.load(REGISTRY["swarm"](), {"swarm_dir": "data/swarm"})

df = broker.get_merged_frame(assumptions=my_assumptions) # temporally joined
broker.is_type_available("where-when") # → True
```

## Contributing

Contributions are welcome! Please follow the Gemini Workflow documented in `GEMINI.md`:
1. Create a descriptive feature branch.
2. Implement your changes with test coverage.
3. Submit a Pull Request.
Contributions are welcome! Please follow the engineering standards in `CLAUDE.md`:
1. Create a descriptive feature branch using [Conventional Commits](https://www.conventionalcommits.org/) (`feat:`, `fix:`, `perf:`, etc.).
2. Implement your changes with test coverage (80% minimum).
3. Run the local quality gate before pushing: `ruff check . && ruff format --check . && mypy && pytest --cov=. --cov-fail-under=80 tests/`
4. Submit a Pull Request — the PR title must also follow Conventional Commits format.

## License

Expand Down
Binary file added assets/dashboard_mockup.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Empty file added core/__init__.py
Empty file.
131 changes: 131 additions & 0 deletions core/broker.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
"""DataBroker: loads, aligns, and merges data from registered source plugins.

The DataBroker is the central data coordinator. It holds loaded DataFrames
from each plugin, tracks which source types are available, and provides a
merged DataFrame that combines what-when and where-when sources via temporal
join (powered by apply_swarm_offsets for the Swarm/Last.fm case).

Typical usage::

from plugins.sources import load_builtin_plugins, REGISTRY
from plugins.sources.base import SourcePlugin
from core.broker import DataBroker

load_builtin_plugins()

broker = DataBroker()
broker.load(REGISTRY["lastfm"], {"data_path": "data/tracks.csv"})
broker.load(REGISTRY["swarm"], {"swarm_dir": "data/swarm"})

df = broker.get_merged_frame(assumptions=assumptions)
"""

from __future__ import annotations

from typing import TYPE_CHECKING, Any

import pandas as pd

if TYPE_CHECKING:
from plugins.sources.base import SourcePlugin


class DataBroker:
"""Coordinates loading and merging of multiple source plugins.

Attributes:
_sources: Loaded DataFrames keyed by plugin PLUGIN_ID.
_available_types: Distinct PLUGIN_TYPE values of loaded sources.
"""

def __init__(self) -> None:
self._sources: dict[str, pd.DataFrame] = {}
self._available_types: list[str] = []

@property
def available_types(self) -> list[str]:
"""Return list of distinct plugin types currently loaded.

Returns:
List of strings, each either "what-when" or "where-when".
"""
return list(self._available_types)

def load(self, plugin: SourcePlugin, config: dict[str, Any]) -> pd.DataFrame:
"""Load a source plugin and store the resulting DataFrame.

Args:
plugin: An instantiated SourcePlugin subclass.
config: Config dict matching the plugin's get_config_fields() keys.

Returns:
The DataFrame returned by the plugin (may be empty on failure).
"""
df = plugin.load(config)
self._sources[plugin.PLUGIN_ID] = df
if plugin.PLUGIN_TYPE not in self._available_types:
self._available_types.append(plugin.PLUGIN_TYPE)
return df

def get_frame(self, plugin_id: str) -> pd.DataFrame:
"""Return the raw loaded DataFrame for a given plugin.

Args:
plugin_id: The PLUGIN_ID of the desired source.

Returns:
The loaded DataFrame, or an empty DataFrame if not loaded.
"""
return self._sources.get(plugin_id, pd.DataFrame())

def get_frames(self) -> dict[str, pd.DataFrame]:
"""Return all loaded DataFrames keyed by plugin ID.

Returns:
Dict of {plugin_id: DataFrame}.
"""
return dict(self._sources)

def get_merged_frame(self, assumptions: dict[str, Any] | None = None) -> pd.DataFrame:
"""Return a merged DataFrame combining what-when and where-when sources.

If both a what-when source (Last.fm) and a where-when source (Swarm)
are loaded, applies temporal merging via apply_swarm_offsets() to
annotate what-when records with location and timezone data.

If only a what-when source is loaded, returns it unmodified.
If no what-when source is loaded, returns an empty DataFrame.

Args:
assumptions: Location assumptions dict from load_assumptions().
Required for the Swarm temporal join; pass None to
skip location enrichment.

Returns:
Merged DataFrame, or the raw what-when frame if no where-when
source is available.
"""
lastfm_df = self._sources.get("lastfm", pd.DataFrame())

if lastfm_df.empty:
return lastfm_df

swarm_df = self._sources.get("swarm", pd.DataFrame())

if swarm_df.empty or assumptions is None:
return lastfm_df

from analysis_utils import apply_swarm_offsets

return apply_swarm_offsets(lastfm_df, swarm_df, assumptions)

def is_type_available(self, plugin_type: str) -> bool:
"""Check whether any loaded source provides the given plugin type.

Args:
plugin_type: Either "what-when" or "where-when".

Returns:
True if at least one loaded source has the given type.
"""
return plugin_type in self._available_types
Empty file added plugins/__init__.py
Empty file.
46 changes: 46 additions & 0 deletions plugins/sources/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
"""Source plugin registry.

Plugins self-register by applying the @register decorator to their
SourcePlugin subclass. The registry is keyed by PLUGIN_ID.

Usage::

from plugins.sources import REGISTRY, register
from plugins.sources.base import SourcePlugin

@register
class MyPlugin(SourcePlugin):
PLUGIN_ID = "my_plugin"
...
"""

from __future__ import annotations

from typing import TYPE_CHECKING

if TYPE_CHECKING:
from plugins.sources.base import SourcePlugin

REGISTRY: dict[str, type[SourcePlugin]] = {}


def register(cls: type[SourcePlugin]) -> type[SourcePlugin]:
"""Register a SourcePlugin subclass in the global registry.

Args:
cls: SourcePlugin subclass to register.

Returns:
The class unchanged (decorator pattern).
"""
REGISTRY[cls.PLUGIN_ID] = cls
return cls


def load_builtin_plugins() -> None:
"""Import built-in plugins so they self-register via @register.

Call this once at application startup before reading REGISTRY.
"""
import plugins.sources.lastfm.loader # noqa: F401
import plugins.sources.swarm.loader # noqa: F401
Loading
Loading