Skip to content

Refactor Lifestream integration and improve code structure#61

Open
aquarion wants to merge 20 commits intomainfrom
feature/refactor_steam
Open

Refactor Lifestream integration and improve code structure#61
aquarion wants to merge 20 commits intomainfrom
feature/refactor_steam

Conversation

@aquarion
Copy link
Copy Markdown
Owner

@aquarion aquarion commented Mar 4, 2026

This pull request modernizes the Lifestream project by introducing a new scheduler system, updating documentation, refactoring import scripts, and improving configuration and testing. The changes make the project easier to set up, maintain, and extend, while deprecating legacy approaches and clarifying best practices.

Scheduler & Job Management

  • Added a new APScheduler-based job scheduler with Redis persistence, replacing the legacy cron-based approach. Includes a systemd service file (docs/lifestream-scheduler.service) and updated documentation for job management and monitoring. [1] [2]
  • Updated config.example.ini with a [schedules] section for job configuration, notification settings, and removed legacy memcache config. [1] [2]

Documentation & Project Structure

  • Replaced the old README.textile with a comprehensive new README.md covering features, setup, configuration, running imports, project structure, and testing. [1] [2]

Import Script Refactoring

  • Refactored import scripts (imports/atom.py, imports/atproto_posts.py, imports/destiny2.py, contrib/oyster_import_csv.py) to use the new EntryStore class for database entry management, improved argument parsing, and clarified logging. [1] [2] [3] [4] [5] [6] [7]

Configuration & Testing

  • Updated .flake8 to ignore additional formatting rules and exclude new directories, improving code linting.
  • Added instructions and commands for running tests with pytest in the new documentation.

These changes collectively modernize the project, improve maintainability, and streamline both development and operations.

aquarion added 19 commits March 2, 2026 19:28
…ic into foursquare.py; remove obsolete foursquare_oauth.py
…t code_fetcher as CodeFetcher9000' across multiple modules
…; update documentation and configuration files accordingly
…implement job execution functions for scheduler
…e features, quick start, configuration, and running imports sections
- Consolidated local imports in `steambadges.py`, `switchbot.py`, `tumblr.py`, `tweets.py`, `wordpress.py`, and `wow.py`.
- Enhanced readability by adjusting line breaks and spacing in various files.
- Updated exception handling to be more specific in `tumblr.py` and `wow.py`.
- Added `flake8` as a dependency in `pyproject.toml` for code style enforcement.
- Cleaned up unused imports and improved consistency in `scheduler.py`, `tests/conftest.py`, and other test files.
- Ensured all changes adhere to PEP 8 style guidelines for better maintainability.
print("Go to the following link in your browser:")
print("%s?oauth_token=%s" %
(authorize_url, request_token["oauth_token"]))
print("%s?oauth_token=%s" % (authorize_url, request_token["oauth_token"]))

Check failure

Code scanning / CodeQL

Clear-text logging of sensitive information High

This expression logs
sensitive data (password)
as clear text.

Copilot Autofix

AI about 1 month ago

Generally, to fix clear-text logging of sensitive data, ensure that secrets (tokens, passwords, API keys, etc.) are never written directly to logs or console output that may be captured. Either avoid logging them completely, or replace them with redacted or partially masked values.

For this specific code, we want to keep the existing interactive flow (the user must visit the authorization URL) but avoid constructing and printing a single string that embeds the token and could be treated as a logged secret. A simple, low-impact change is:

  • Print the base authorization URL (authorize_url) and the token separately, with clear instructions to the user.
  • Avoid combining them into one formatted URL string containing the token.
  • Do not log the token via logger; only show it in the interactive prompt.

Concretely, in imports/historic.py within tumblrAuth, replace the line:

print("%s?oauth_token=%s" % (authorize_url, request_token["oauth_token"]))

with safer messaging like:

print(authorize_url)
print("Then provide this oauth_token when prompted in your browser: [REDACTED]")
print("oauth_token (copy manually):", request_token["oauth_token"])

or similar, so the token is no longer part of a single URL string and is clearly separated. This keeps functionality (user still sees token) while reducing the pattern that static analysis flags and avoiding printing it as part of a URL that might be copy-pasted or logged wholesale. No new imports or helper methods are required.

Suggested changeset 1
imports/historic.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/imports/historic.py b/imports/historic.py
--- a/imports/historic.py
+++ b/imports/historic.py
@@ -51,7 +51,9 @@
 
         request_token = dict(urllib.parse.parse_qsl(content))
         print("Go to the following link in your browser:")
-        print("%s?oauth_token=%s" % (authorize_url, request_token["oauth_token"]))
+        print(authorize_url)
+        print("When prompted in your browser, enter the following oauth_token manually:")
+        print("oauth_token:", request_token["oauth_token"])
         print()
 
         accepted = "n"
EOF
@@ -51,7 +51,9 @@

request_token = dict(urllib.parse.parse_qsl(content))
print("Go to the following link in your browser:")
print("%s?oauth_token=%s" % (authorize_url, request_token["oauth_token"]))
print(authorize_url)
print("When prompted in your browser, enter the following oauth_token manually:")
print("oauth_token:", request_token["oauth_token"])
print()

accepted = "n"
Copilot is powered by AI and may make mistakes. Always verify output.
print("Go to the following link in your browser:")
print("%s?oauth_token=%s" %
(authorize_url, request_token["oauth_token"]))
print("%s?oauth_token=%s" % (authorize_url, request_token["oauth_token"]))

Check failure

Code scanning / CodeQL

Clear-text logging of sensitive information High

This expression logs
sensitive data (password)
as clear text.

Copilot Autofix

AI about 1 month ago

In general, to fix clear-text logging of sensitive information, remove or redact sensitive values before sending them to logging or printing functions. When guidance to the user is still needed, print only non-sensitive parts (e.g., the base URL) and/or instructions, but not the actual secret.

In this file, the primary issue is line 67, which prints the full authorization URL including the oauth_token parameter. We should change this so that the script no longer prints the token value. A simple and safe approach is:

  • Print only the authorize_url without query parameters.
  • Optionally instruct the user that if the browser doesn’t auto-add the token, they should re‑run or manually paste as needed (but we will not construct and print a tokenized URL).

Concretely:

  • In imports/tumblr.py, inside tumblrAuth, replace:
print("%s?oauth_token=%s" % (authorize_url, request_token["oauth_token"]))

with a print that does not include request_token["oauth_token"], e.g.:

print(authorize_url)

No extra imports or helper methods are required. No other lines need to be changed to address the flagged issue, and existing functionality is minimally impacted (the main behavioral change is that the user must manually obtain or follow the Tumblr-provided URL rather than copying the full URL from the console including a token).

Suggested changeset 1
imports/tumblr.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/imports/tumblr.py b/imports/tumblr.py
--- a/imports/tumblr.py
+++ b/imports/tumblr.py
@@ -64,7 +64,7 @@
 
         request_token = dict(urllib.parse.parse_qsl(content))
         print("Go to the following link in your browser:")
-        print("%s?oauth_token=%s" % (authorize_url, request_token["oauth_token"]))
+        print(authorize_url)
         print()
 
         accepted = "n"
EOF
@@ -64,7 +64,7 @@

request_token = dict(urllib.parse.parse_qsl(content))
print("Go to the following link in your browser:")
print("%s?oauth_token=%s" % (authorize_url, request_token["oauth_token"]))
print(authorize_url)
print()

accepted = "n"
Copilot is powered by AI and may make mistakes. Always verify output.

mock_post.assert_called_once()
call_args = mock_post.call_args
assert "hooks.slack.com" in call_args[0][0]

Check failure

Code scanning / CodeQL

Incomplete URL substring sanitization High test

The string
hooks.slack.com
may be at an arbitrary position in the sanitized URL.

Copilot Autofix

AI about 1 month ago

In general, to avoid incomplete URL substring sanitization, always parse the URL and inspect its components (scheme, hostname, path) instead of checking for a bare substring. For host checks, use urllib.parse.urlparse to obtain hostname and compare that against an allowlist or an expected value, handling subdomains explicitly if needed.

In this specific test, we only need to ensure that the URL used for the Slack webhook call is pointing to Slack’s host. Instead of asserting "hooks.slack.com" in call_args[0][0], we can parse the URL and assert that its hostname is "hooks.slack.com". This maintains the same intent—verifying the request is made to Slack—while avoiding substring checks and aligning with the recommended pattern.

Concretely:

  • In tests/test_notifications.py, import urlparse from urllib.parse.
  • Replace line 116 with code that parses call_args[0][0] via urlparse and then asserts parsed.hostname == "hooks.slack.com".
  • No other behavior of the tests or the underlying notifications code needs to change.

Suggested changeset 1
tests/test_notifications.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/tests/test_notifications.py b/tests/test_notifications.py
--- a/tests/test_notifications.py
+++ b/tests/test_notifications.py
@@ -1,6 +1,7 @@
 """Tests for lifestream.notifications module."""
 
 from unittest.mock import MagicMock, patch
+from urllib.parse import urlparse
 
 from lifestream import notifications
 
@@ -113,7 +114,8 @@
 
                     mock_post.assert_called_once()
                     call_args = mock_post.call_args
-                    assert "hooks.slack.com" in call_args[0][0]
+                    parsed_url = urlparse(call_args[0][0])
+                    assert parsed_url.hostname == "hooks.slack.com"
 
 
 class TestSendFailureNotifications:
EOF
@@ -1,6 +1,7 @@
"""Tests for lifestream.notifications module."""

from unittest.mock import MagicMock, patch
from urllib.parse import urlparse

from lifestream import notifications

@@ -113,7 +114,8 @@

mock_post.assert_called_once()
call_args = mock_post.call_args
assert "hooks.slack.com" in call_args[0][0]
parsed_url = urlparse(call_args[0][0])
assert parsed_url.hostname == "hooks.slack.com"


class TestSendFailureNotifications:
Copilot is powered by AI and may make mistakes. Always verify output.
accuracy = 100
logger.info("Found %s %s/%s" %
(timestamp, latitude_best, longitude_best))
logger.info("Found %s %s/%s" % (timestamp, latitude_best, longitude_best))

Check failure

Code scanning / CodeQL

Clear-text logging of sensitive information High

This expression logs
sensitive data (private)
as clear text.
This expression logs
sensitive data (private)
as clear text.

Copilot Autofix

AI about 1 month ago

In general, to fix clear‑text logging of sensitive information, either stop logging the sensitive fields entirely, or replace them with non‑sensitive/less‑sensitive data (for example, coarse‑grained or redacted values) while keeping the log’s usefulness. Also ensure that any remaining logging is appropriate for its log level (e.g., detailed data only at DEBUG and ideally still redacted).

For this specific case in the_fallen/openpaths.py, the sensitive fields are the high‑precision latitude and longitude (latitude_best, longitude_best) logged at line 96. The code already computes rounded (“vague”) coordinates (latitude_vague, longitude_vague). The least intrusive fix that preserves behavior is to change the log message to use the vague values instead of the precise ones, and to make the log text explicitly indicate that they’re approximate. No other behavior (DB writes, control flow) needs to change. This only requires editing the logger.info call around line 96; no new imports or helpers are needed.

Concretely: in the_fallen/openpaths.py, replace the logger.info("Found %s %s/%s" % (timestamp, latitude_best, longitude_best)) line with a version that logs latitude_vague and longitude_vague (e.g., "Found %s approx %s/%s"), leaving the rest of the loop unchanged.

Suggested changeset 1
the_fallen/openpaths.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/the_fallen/openpaths.py b/the_fallen/openpaths.py
--- a/the_fallen/openpaths.py
+++ b/the_fallen/openpaths.py
@@ -93,7 +93,7 @@
     ):
 
         accuracy = 100
-        logger.info("Found %s %s/%s" % (timestamp, latitude_best, longitude_best))
+        logger.info("Found %s approx %s/%s" % (timestamp, latitude_vague, longitude_vague))
         #                                (`id`,                 `lat`,             `long`,            `lat_vague`,   `long_vague`, `timestamp`, `accuracy`)
         cursor.execute(
             s_sql,
EOF
@@ -93,7 +93,7 @@
):

accuracy = 100
logger.info("Found %s %s/%s" % (timestamp, latitude_best, longitude_best))
logger.info("Found %s approx %s/%s" % (timestamp, latitude_vague, longitude_vague))
# (`id`, `lat`, `long`, `lat_vague`, `long_vague`, `timestamp`, `accuracy`)
cursor.execute(
s_sql,
Copilot is powered by AI and may make mistakes. Always verify output.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR modernizes Lifestream by introducing an APScheduler-based scheduler (with Redis persistence), refactoring import scripts around a new shared library surface (EntryStore, cache/backoff helpers, notifications), and updating docs/config/tests to match the new operational model.

Changes:

  • Added a new scheduler.py daemon using APScheduler + Redis job store, plus systemd and cron migration docs.
  • Refactored many import scripts to use lifestream.db.EntryStore, new cache/backoff helpers, and updated argument parsing via lifestream.parse_args().
  • Added a pytest suite for the new cache/db/jobs/notifications modules and refreshed project docs/config examples.

Reviewed changes

Copilot reviewed 56 out of 64 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
vs.code-workspace Minor workspace formatting fix.
the_fallen/tsw.py Switches to EntryStore + updated arg parsing.
the_fallen/openpaths.py Updates DB connection helpers and arg parsing.
the_fallen/moves.py Refactors to EntryStore, updated imports/paths for secrets.
tests/test_notifications.py Adds unit tests for email/Slack notification helpers.
tests/test_jobs.py Adds unit tests for scheduler job execution helpers.
tests/test_db.py Adds unit tests for DB connection helpers and EntryStore.
tests/test_cache.py Adds unit tests for Redis backoff + file cache decorator.
tests/conftest.py Adds shared pytest fixtures and sets up test environment.
scheduler.py New APScheduler-based scheduler daemon and CLI.
pytest.ini Updates pytest discovery paths and comment.
pyproject.toml Adds APScheduler + flake8, removes legacy memcache dependency.
imports/wow.py Refactors to EntryStore, updates secrets dir handling, exception narrowing TODOs.
imports/wordpress.py Refactors to EntryStore and new lifestream.parse_args().
imports/tweets.py Refactors to EntryStore and secrets dir helper.
imports/tumblr.py Refactors to EntryStore, improved exception handling TODOs.
imports/switchbot.py Refactors stats writes to EntryStore, improves typing/default args.
imports/steambadges.py Refactors to EntryStore, minor robustness for missing image src.
imports/steam.py Extracts Steam API client, refactors to EntryStore.
imports/planetside.py Refactors to EntryStore + Redis backoff helper.
imports/oyster_csv.py Refactors to EntryStore + updated arg parsing.
imports/oyster.py Refactors to EntryStore, regex literal cleanup, updated arg parsing.
imports/mastodon_toots.py Refactors to EntryStore + updated arg parsing.
imports/mailbox-stats.py Refactors stats writes to EntryStore, removes old helper usage.
imports/lifestreamutils.py Removes legacy DB stats helper (replaced by EntryStore.add_stat).
imports/lifestream/utils.py Introduces utility module (time delta formatting, JSON coercion, etc.).
imports/lifestream/steam_api.py Introduces reusable Steam Web API client.
imports/lifestream/notifications.py Introduces email/Slack failure notifications for jobs.
imports/lifestream/jobs.py Introduces run_import / run_shell_command job runners.
imports/lifestream/foursquare_api.py Introduces Foursquare client with Redis caching (new module).
imports/lifestream/db.py Introduces DB helpers and EntryStore abstraction.
imports/lifestream/code_fetcher.py Updates CodeFetcher to use package config/project root and path-safe template loading.
imports/lifestream/cache.py Introduces Redis helpers + file cache decorator.
imports/lifestream/init.py Major refactor: config/logging/arg parsing, re-exports, removes legacy classes.
imports/lastfm.py Converts to main() entrypoint and EntryStore.
imports/instagram.py Refactors to EntryStore + updated arg parsing.
imports/historic.py Refactors to EntryStore and DB helper functions.
imports/gw2.py Refactors to EntryStore, Redis backoff + file cache decorator.
imports/github_commits.py Refactors to EntryStore + updated arg parsing.
imports/foursquare_oauth.py Removes legacy OAuth helper script (replaced by imports/foursquare.py auth flow).
imports/foursquare.py Adds integrated OAuth flow + refactors to EntryStore and DB helpers.
imports/flickr.py Converts to main() entrypoint, refactors to EntryStore and DB helpers.
imports/fitbit_day.py Refactors to EntryStore for entries/stats, removes old stats helper usage.
imports/ffxiv.py Refactors to EntryStore and updated arg parsing.
imports/facebook_posts.py Refactors to EntryStore, Redis backoff, updated secrets dir.
imports/facebook_page.py Refactors to EntryStore, updated secrets dir, small ordering cleanup.
imports/destiny2.py Refactors to EntryStore, Redis backoff helpers, updated secrets dir.
imports/atproto_posts.py Refactors to EntryStore + updated arg parsing.
imports/atom.py Converts to main() entrypoint and refactors to EntryStore.
docs/lifestream-scheduler.service Adds systemd unit for running the scheduler daemon.
docs/crontab.example Adds deprecated legacy crontab reference and migration notes.
datafiles/achievements_schema.sql Minor formatting/whitespace fix.
contrib/oyster_import_csv.py Updates DB connection usage to new helper functions.
contrib/ffxiv-upload-achievement-icons/update_sc_and_update_icons.ps1 Minor formatting/whitespace fix.
contrib/ffxiv-upload-achievement-icons/readme.md Minor formatting/whitespace fix.
contrib/ffxiv-upload-achievement-icons/lib/powershell/read_ini.ps1 Formatting cleanup.
contrib/ffxiv-upload-achievement-icons/etc/achievements_schema.sql Minor formatting/whitespace fix.
contrib/ffxiv-upload-achievement-icons/.gitignore Minor formatting/whitespace fix.
config.example.ini Adds scheduler schedules + notifications config, removes legacy memcache config, updates paths.
README.textile Removes legacy README.
README.md Adds modern README with setup, scheduler usage, configuration and testing instructions.
.gitignore Removes trailing blank line.
.flake8 Expands ignores/excludes to reflect refactor and new directories.
.envrc Removes trailing blank line.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +143 to +165
for job_name, config in schedules.items():
cron = config["cron"]

try:
trigger = CronTrigger.from_crontab(cron)
except ValueError as e:
logger.error(f"Invalid cron expression for {job_name}: {cron} - {e}")
continue

# Check if it's a shell command (starts with !)
if job_name.startswith("!"):
actual_name = job_name[1:]
command = cron # For shell commands, cron is actually the command
scheduler.add_job(
run_shell_command,
trigger=trigger,
args=[actual_name, command],
id=actual_name,
name=actual_name,
misfire_grace_time=config["misfire_grace_time"],
coalesce=config["coalesce"],
replace_existing=True,
)
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shell-command jobs (those with names starting with !) can't work as implemented: cron is used to build CronTrigger.from_crontab(cron) before the job_name.startswith('!') check, but for shell jobs the code then treats that same cron string as the shell command. A non-crontab shell command will fail trigger parsing, and a real cron expression can't also be the command. Consider changing the config format for shell jobs (e.g., separate schedule/command fields) and only pass the cron part into CronTrigger.from_crontab().

Copilot uses AI. Check for mistakes.
Comment on lines +17 to 23
entry_store = EntryStore()

APIKEY = Lifestream.config.get("guildwars2", "apikey")
APIKEY = entry_store.config.get("guildwars2", "apikey")

logger = logging.getLogger("GW2")
args = lifestream.arguments.parse_args()
args = lifestream.parse_args()

Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several scripts call entry_store.config.get(...), but EntryStore currently doesn't expose a config attribute. This will raise AttributeError at import-time for this module. Either switch these reads back to lifestream.config, or add a config attribute/property on EntryStore that references the shared lifestream config.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants