Skip to content

feat: Feedly Read Later list as a data source #30

@jschloman

Description

@jschloman

Overview

Add support for importing a user's Feedly Read Later list as a data source in Autobiographer. This allows users to include articles they've saved for later reading as part of their personal history/timeline.

Motivation

Feedly's Read Later list represents a meaningful signal of a user's interests and reading habits over time. Ingesting this data enables richer autobiographical context alongside existing sources (Last.fm, Foursquare/Swarm, etc.).

Core Design Principles

1. Download-then-Display

All source plugins follow a strict two-phase model:

  1. Download phase (run manually / offline): a CLI script fetches data from the external source and writes it to a local file in data/.
  2. Display phase (runtime): the plugin reads only from that local file. No outbound network calls are made at Streamlit runtime.

2. Data Sovereignty

Each SourcePlugin is the sole authority over its own data. A plugin:

  • Knows: its own schema, how to load its raw data, and how to normalize it into a clean DataFrame.
  • Does not know: how its output will be filtered, joined, merged, or correlated with any other source.

No cross-source logic, foreign-key awareness, or join hints belong inside a plugin. The DataBroker (or equivalent orchestration layer) is the only place where data from multiple sources is combined. This keeps each plugin independently testable and replaceable without ripple effects.

┌──────────────────────┐   ┌──────────────────────┐   ┌──────────────────────┐
│  FeedlyPlugin        │   │  LastFmPlugin         │   │  SwarmPlugin         │
│  fetch() → DataFrame │   │  fetch() → DataFrame  │   │  fetch() → DataFrame │
│                      │   │                       │   │                      │
│  No knowledge of     │   │  No knowledge of      │   │  No knowledge of     │
│  other sources       │   │  other sources        │   │  other sources       │
└──────────┬───────────┘   └──────────┬────────────┘   └──────────┬───────────┘
           │                          │                            │
           └──────────────────────────┼────────────────────────────┘
                                      ▼
                               ┌─────────────┐
                               │  DataBroker │  ← joins, merges, correlates
                               └─────────────┘
┌─────────────────────────────────┐     ┌──────────────────────────────────┐
│  DOWNLOAD  (one-time / manual)  │     │  DISPLAY  (Streamlit runtime)    │
│                                 │     │                                   │
│  python -m autobiographer.sync  │────▶│  FeedlyReadLaterPlugin.fetch()   │
│    feedly                       │     │    reads data/feedly_read_later   │
│                                 │     │    .json — no network calls       │
│  Uses AUTOBIO_FEEDLY_TOKEN      │     │                                   │
└─────────────────────────────────┘     └──────────────────────────────────┘

Proposed Implementation

Following the existing SourcePlugin ABC and plugin registry pattern introduced in #8.

Download script

A CLI entry-point (e.g. autobiographer/sync/feedly.py) handles all Feedly API interaction:

  • Auth via AUTOBIO_FEEDLY_TOKEN env var
  • GET /v3/streams/contents?streamId=user/{userId}/tag/global.saved with continuation pagination
  • Writes raw response to data/feedly_read_later.json
  • Accepts an optional --export-path flag for users who already have a Feedly JSON export (no API call needed)

Plugin (runtime-only, no network, no cross-source awareness)

class FeedlyReadLaterPlugin(SourcePlugin):
    source_name = "feedly_read_later"

    def fetch(self) -> pd.DataFrame:
        """Load previously downloaded Feedly Read Later data from local JSON.

        The returned DataFrame represents only this source's data in its
        canonical schema. No filtering, joining, or merging with other
        sources is performed here.

        Returns:
            DataFrame with columns: timestamp, title, url, tags, source_feed.

        Raises:
            FileNotFoundError: if the local data file has not been downloaded yet.
        """
        ...

Output Schema

Column Type Description
timestamp datetime64[ns, UTC] When the article was saved
title str Article title
url str Canonical article URL
tags list[str] User-applied Feedly tags
source_feed str Feed/publication name

Acceptance Criteria

  • Download script lives separately from the plugin; plugin contains zero HTTP/network code
  • Plugin contains zero references to other source plugins, their schemas, or join keys
  • FeedlyReadLaterPlugin.fetch() raises FileNotFoundError with a clear message if local data file is absent
  • Download script reads auth token from AUTOBIO_FEEDLY_TOKEN; fails clearly if unset
  • Download script handles pagination via continuation token
  • Supports local JSON export as an alternative to live download (--export-path / AUTOBIO_FEEDLY_EXPORT_PATH)
  • Results cached in data/cache/ using existing get_cache_key logic
  • Unit tests cover: successful load from local file, missing file error, malformed JSON
  • Unit tests for download script cover: pagination, missing credentials, API error response
  • Integration test covers plugin registration and DataBroker wiring
  • Full local quality gate passes (ruff, mypy, pytest --cov ≥ 80%)
  • default_assumptions.json.example updated if any new config keys are introduced
  • No personal data, tokens, or URLs hardcoded anywhere

Impact on PR #29

PR #29 (multi-page navigation / components/sidebar.py) loads data via SourcePlugin. That PR should be audited to confirm:

  • No plugin makes outbound calls at Streamlit startup or page render time
  • No page or component passes cross-source context into a plugin's fetch()
  • Consider adding is_data_available() -> bool to the SourcePlugin ABC for clean empty-state handling

Out of Scope

  • Feedly RSS feed subscription management
  • Writing back to Feedly (marking read, etc.)
  • Other Feedly collections beyond Read Later (follow-up issue)

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions