A robust, enterprise-grade automation tool designed to organise your Spotify "Liked Songs" into genre-specific playlists. Engineered for high reliability, it navigates the complexities of the 2026 Spotify API landscape using intelligent synchronisation, strict rate limiting, and persistent state management.
- Smart Incremental Sync: The sorter tracks your library state in a local SQLite database, processing only new tracks since the last run. It handles large libraries (10,000+ songs) efficiently by using Spotify's
snapshot_idto avoid redundant API calls. - Interactive CLI: Re-architected into 8 segmented, independent operations. You can safely stop, resume, or replay any step (fetching, tagging, creating playlists) without locking up the entire application.
- Precision Tagging: Utilises the Last.fm API to fetch granular song and artist tags. It employs a multi-tiered fallback strategy (Track Tags -> Artist Tags -> Cache) to ensure high classification accuracy.
- Enterprise-Grade Rate Limiting: Implements a Leaky Bucket algorithm with jitter and exponential backoff. It proactively respects Spotify's strict "Development Mode" limits and automatically handles
HTTP 429 Too Many Requestsresponses withRetry-Afterheaders. - Robust Persistence: All state—including processed tracks, API rate limits, and tag caches—is stored in a local SQLite database (
state.db) with Write-Ahead Logging (WAL) enabled for performance and data integrity. - Non-Destructive Operation: Appends tracks to playlists safely. It will never delete or re-order your existing songs, allowing you to manually curate playlists alongside the automation.
This application is built on a modular, service-oriented architecture designed for maintainability and scalability.
graph TD
subgraph Core Logic
Main[main.py] -->|Orchestrates| Client[spotify_client.py]
Main -->|Classifies| Sorter[sorter.py]
Main -->|Persists State| State[app_state.py]
Main -->|Enriches| LastFM[lastfm_client.py]
end
subgraph Infrastructure
Client -->|Rate Limited Requests| Session[rate_limiter.py]
LastFM -->|Rate Limited Requests| Session
Session -->|HTTP/HTTPS| SpotifyAPI[Spotify Web API]
Session -->|HTTP/HTTPS| LastFM_API[Last.fm API]
end
subgraph Persistence
State -->|Reads/Writes| DB[(SQLite: state.db)]
end
subgraph Configuration
Config[config.py] -.->|Settings| Main
Config -.->|Creds| Client
end
| Component | Responsibility | Key Technical Details |
|---|---|---|
main.py |
Orchestration | Manages the interactive 8-step CLI loop. Connects user input to independent phase execution. |
spotify_client.py |
API Gateway | Uses pure requests with custom error handling. Implements batched operations (fetching 50 liked songs, adding 100 tracks) to minimise network RTT. |
rate_limiter.py |
Traffic Control | Implements a thread-safe LeakyBucket. Spotify: 1 req/30s (Safety). Last.fm: 5 req/sec. Adds random jitter (10-20%) to mimic human behaviour. |
app_state.py |
Persistence Layer | A raw SQL wrapper around sqlite3. Manages tables for likedSongs, artistTagsCache, and snapshots. Ensures independence across the 8 operations. |
lastfm_client.py |
Metadata Provider | Fetches top tags for tracks/artists. Caches results in SQLite to reduce external API dependency by 90%+ on subsequent runs. |
Spotify's API (especially in non-commercial usage) can be aggressive with rate limits.
- Solution: We moved from a reactive "try and catch error" approach to a proactive Leaky Bucket. The application "pays" for every request from a local bucket. If the bucket is empty, it sleeps.
- Resilience: A custom
requests.HTTPAdapterintercepts429errors and sleeps for the exact duration specified in theRetry-Afterheader, plus a safety buffer. Note: For very large libraries, fetches and syncs may take hours due to these strict enforced wait times.
Syncing 10,000 songs linearly is too slow, error-prone, and impossible to debug if it crashes on song 9,999.
- Segmented CLI: The application is broken into 8 independent operations. You can pull songs today, classify them tomorrow, and push them to Spotify the next day.
- Snapshots: We store the
snapshot_idof every target playlist. If the snapshot hasn't changed on Spotify, we skip fetching its tracks entirely, saving huge amounts of time. - Incremental Checkpointing: Resumable offsets prevent data loss. The fetcher stops immediately once it sees a song older than the last sync date.
Spotify removed public genre data from their API years ago.
- Solution: We integrate Last.fm. However, querying Last.fm for every single track is slow.
- Optimisation: We implement a Two-Layer Cache:
- Memory: For the current runtime session.
- SQLite: Permanent storage of Artist->Genre mappings.
- Result: After the initial run, 95% of tags are served locally from disk in milliseconds.
- Python 3.9+
- Spotify Premium Account
- Last.fm API Key (Free to apply)
git clone https://github.com/sahmed0/spotify-sorter.git
cd spotify-sorter
pip install -r requirements.txtCreate a .env file in the root directory (process is detailed in .env.example):
SPOTIPY_CLIENT_ID=your_spotify_client_id
SPOTIPY_CLIENT_SECRET=your_spotify_client_secret
SPOTIPY_REDIRECT_URI=http://127.0.0.1:8888/callback
LASTFM_API_KEY=your_lastfm_api_keyRun the script manually to open the interactive CLI:
python main.pyThe 8-Operation Sequence: The tool is broken down into 8 independent operations. You can run them sequentially or resume from any point:
- Authenticate: Generates your initial Refresh Token via a manual, terminal-based OAuth flow.
- Fetch Liked Songs: Downloads your liked songs to the local
state.db. Resumable if interrupted. - Fetch Last.fm Tags: Enriches your downloaded tracks with Last.fm genres locally.
- Sort Songs: Categorises songs into buckets based on your
config.pygenre mappings. - Fetch User Playlists: Caches your existing Spotify playlists into the database.
- Create Missing Playlists: Analyses
config.pyagainst your cache and creates missing playlists on Spotify. - Snapshot Playlist: Backs up the current state of a playlist locally to prevent redundant future API calls.
- Sync & Add Songs: Pushes the newly sorted, missing tracks to their respective Spotify playlists.
Dry Run Mode:
To test without modifying your library or consuming Spotify rate limits, set DRY_RUN = True in config.py or use the environment variable. The script will simulate all sorting logic.
This repository includes a GitHub Actions workflow (.github/workflows/sync.yml) designed to run the sorter automatically and update your playlists at a regular interval.
- It caches the
state.dbfile between runs to maintain incremental sync state. - It uses GitHub Secrets to securely inject your API credentials.
Copyright (c) 2026 Sajid Ahmed. All Rights Reserved.
This repository is a Proprietary Project.
While I am a strong supporter of Open Source Software, this specific codebase represents a significant personal investment of time and effort and is therefore provided with the following restrictions:
- Permitted: Viewing, forking (within GitHub only), and local execution for evaluation and personal, non-commercial usage only.
- Prohibited: Modification, redistribution, commercial use, and AI/LLM training.
For the full legal terms, please see the LICENSE file.

