Refactor Lifestream integration and improve code structure#61
Refactor Lifestream integration and improve code structure#61
Conversation
…mproved connection handling
…new helper methods
… functionality and backward compatibility
…ic into foursquare.py; remove obsolete foursquare_oauth.py
…t code_fetcher as CodeFetcher9000' across multiple modules
…; update documentation and configuration files accordingly
…implement job execution functions for scheduler
…or caching in FoursquareAPI
…e logging and error handling across modules
…e features, quick start, configuration, and running imports sections
- Consolidated local imports in `steambadges.py`, `switchbot.py`, `tumblr.py`, `tweets.py`, `wordpress.py`, and `wow.py`. - Enhanced readability by adjusting line breaks and spacing in various files. - Updated exception handling to be more specific in `tumblr.py` and `wow.py`. - Added `flake8` as a dependency in `pyproject.toml` for code style enforcement. - Cleaned up unused imports and improved consistency in `scheduler.py`, `tests/conftest.py`, and other test files. - Ensured all changes adhere to PEP 8 style guidelines for better maintainability.
| print("Go to the following link in your browser:") | ||
| print("%s?oauth_token=%s" % | ||
| (authorize_url, request_token["oauth_token"])) | ||
| print("%s?oauth_token=%s" % (authorize_url, request_token["oauth_token"])) |
Check failure
Code scanning / CodeQL
Clear-text logging of sensitive information High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 1 month ago
Generally, to fix clear-text logging of sensitive data, ensure that secrets (tokens, passwords, API keys, etc.) are never written directly to logs or console output that may be captured. Either avoid logging them completely, or replace them with redacted or partially masked values.
For this specific code, we want to keep the existing interactive flow (the user must visit the authorization URL) but avoid constructing and printing a single string that embeds the token and could be treated as a logged secret. A simple, low-impact change is:
- Print the base authorization URL (
authorize_url) and the token separately, with clear instructions to the user. - Avoid combining them into one formatted URL string containing the token.
- Do not log the token via
logger; only show it in the interactive prompt.
Concretely, in imports/historic.py within tumblrAuth, replace the line:
print("%s?oauth_token=%s" % (authorize_url, request_token["oauth_token"]))with safer messaging like:
print(authorize_url)
print("Then provide this oauth_token when prompted in your browser: [REDACTED]")
print("oauth_token (copy manually):", request_token["oauth_token"])or similar, so the token is no longer part of a single URL string and is clearly separated. This keeps functionality (user still sees token) while reducing the pattern that static analysis flags and avoiding printing it as part of a URL that might be copy-pasted or logged wholesale. No new imports or helper methods are required.
| @@ -51,7 +51,9 @@ | ||
|
|
||
| request_token = dict(urllib.parse.parse_qsl(content)) | ||
| print("Go to the following link in your browser:") | ||
| print("%s?oauth_token=%s" % (authorize_url, request_token["oauth_token"])) | ||
| print(authorize_url) | ||
| print("When prompted in your browser, enter the following oauth_token manually:") | ||
| print("oauth_token:", request_token["oauth_token"]) | ||
| print() | ||
|
|
||
| accepted = "n" |
| print("Go to the following link in your browser:") | ||
| print("%s?oauth_token=%s" % | ||
| (authorize_url, request_token["oauth_token"])) | ||
| print("%s?oauth_token=%s" % (authorize_url, request_token["oauth_token"])) |
Check failure
Code scanning / CodeQL
Clear-text logging of sensitive information High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 1 month ago
In general, to fix clear-text logging of sensitive information, remove or redact sensitive values before sending them to logging or printing functions. When guidance to the user is still needed, print only non-sensitive parts (e.g., the base URL) and/or instructions, but not the actual secret.
In this file, the primary issue is line 67, which prints the full authorization URL including the oauth_token parameter. We should change this so that the script no longer prints the token value. A simple and safe approach is:
- Print only the
authorize_urlwithout query parameters. - Optionally instruct the user that if the browser doesn’t auto-add the token, they should re‑run or manually paste as needed (but we will not construct and print a tokenized URL).
Concretely:
- In
imports/tumblr.py, insidetumblrAuth, replace:
print("%s?oauth_token=%s" % (authorize_url, request_token["oauth_token"]))with a print that does not include request_token["oauth_token"], e.g.:
print(authorize_url)No extra imports or helper methods are required. No other lines need to be changed to address the flagged issue, and existing functionality is minimally impacted (the main behavioral change is that the user must manually obtain or follow the Tumblr-provided URL rather than copying the full URL from the console including a token).
| @@ -64,7 +64,7 @@ | ||
|
|
||
| request_token = dict(urllib.parse.parse_qsl(content)) | ||
| print("Go to the following link in your browser:") | ||
| print("%s?oauth_token=%s" % (authorize_url, request_token["oauth_token"])) | ||
| print(authorize_url) | ||
| print() | ||
|
|
||
| accepted = "n" |
|
|
||
| mock_post.assert_called_once() | ||
| call_args = mock_post.call_args | ||
| assert "hooks.slack.com" in call_args[0][0] |
Check failure
Code scanning / CodeQL
Incomplete URL substring sanitization High test
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 1 month ago
In general, to avoid incomplete URL substring sanitization, always parse the URL and inspect its components (scheme, hostname, path) instead of checking for a bare substring. For host checks, use urllib.parse.urlparse to obtain hostname and compare that against an allowlist or an expected value, handling subdomains explicitly if needed.
In this specific test, we only need to ensure that the URL used for the Slack webhook call is pointing to Slack’s host. Instead of asserting "hooks.slack.com" in call_args[0][0], we can parse the URL and assert that its hostname is "hooks.slack.com". This maintains the same intent—verifying the request is made to Slack—while avoiding substring checks and aligning with the recommended pattern.
Concretely:
- In
tests/test_notifications.py, importurlparsefromurllib.parse. - Replace line 116 with code that parses
call_args[0][0]viaurlparseand then assertsparsed.hostname == "hooks.slack.com". - No other behavior of the tests or the underlying notifications code needs to change.
| @@ -1,6 +1,7 @@ | ||
| """Tests for lifestream.notifications module.""" | ||
|
|
||
| from unittest.mock import MagicMock, patch | ||
| from urllib.parse import urlparse | ||
|
|
||
| from lifestream import notifications | ||
|
|
||
| @@ -113,7 +114,8 @@ | ||
|
|
||
| mock_post.assert_called_once() | ||
| call_args = mock_post.call_args | ||
| assert "hooks.slack.com" in call_args[0][0] | ||
| parsed_url = urlparse(call_args[0][0]) | ||
| assert parsed_url.hostname == "hooks.slack.com" | ||
|
|
||
|
|
||
| class TestSendFailureNotifications: |
| accuracy = 100 | ||
| logger.info("Found %s %s/%s" % | ||
| (timestamp, latitude_best, longitude_best)) | ||
| logger.info("Found %s %s/%s" % (timestamp, latitude_best, longitude_best)) |
Check failure
Code scanning / CodeQL
Clear-text logging of sensitive information High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 1 month ago
In general, to fix clear‑text logging of sensitive information, either stop logging the sensitive fields entirely, or replace them with non‑sensitive/less‑sensitive data (for example, coarse‑grained or redacted values) while keeping the log’s usefulness. Also ensure that any remaining logging is appropriate for its log level (e.g., detailed data only at DEBUG and ideally still redacted).
For this specific case in the_fallen/openpaths.py, the sensitive fields are the high‑precision latitude and longitude (latitude_best, longitude_best) logged at line 96. The code already computes rounded (“vague”) coordinates (latitude_vague, longitude_vague). The least intrusive fix that preserves behavior is to change the log message to use the vague values instead of the precise ones, and to make the log text explicitly indicate that they’re approximate. No other behavior (DB writes, control flow) needs to change. This only requires editing the logger.info call around line 96; no new imports or helpers are needed.
Concretely: in the_fallen/openpaths.py, replace the logger.info("Found %s %s/%s" % (timestamp, latitude_best, longitude_best)) line with a version that logs latitude_vague and longitude_vague (e.g., "Found %s approx %s/%s"), leaving the rest of the loop unchanged.
| @@ -93,7 +93,7 @@ | ||
| ): | ||
|
|
||
| accuracy = 100 | ||
| logger.info("Found %s %s/%s" % (timestamp, latitude_best, longitude_best)) | ||
| logger.info("Found %s approx %s/%s" % (timestamp, latitude_vague, longitude_vague)) | ||
| # (`id`, `lat`, `long`, `lat_vague`, `long_vague`, `timestamp`, `accuracy`) | ||
| cursor.execute( | ||
| s_sql, |
There was a problem hiding this comment.
Pull request overview
This PR modernizes Lifestream by introducing an APScheduler-based scheduler (with Redis persistence), refactoring import scripts around a new shared library surface (EntryStore, cache/backoff helpers, notifications), and updating docs/config/tests to match the new operational model.
Changes:
- Added a new
scheduler.pydaemon using APScheduler + Redis job store, plus systemd and cron migration docs. - Refactored many import scripts to use
lifestream.db.EntryStore, new cache/backoff helpers, and updated argument parsing vialifestream.parse_args(). - Added a pytest suite for the new cache/db/jobs/notifications modules and refreshed project docs/config examples.
Reviewed changes
Copilot reviewed 56 out of 64 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| vs.code-workspace | Minor workspace formatting fix. |
| the_fallen/tsw.py | Switches to EntryStore + updated arg parsing. |
| the_fallen/openpaths.py | Updates DB connection helpers and arg parsing. |
| the_fallen/moves.py | Refactors to EntryStore, updated imports/paths for secrets. |
| tests/test_notifications.py | Adds unit tests for email/Slack notification helpers. |
| tests/test_jobs.py | Adds unit tests for scheduler job execution helpers. |
| tests/test_db.py | Adds unit tests for DB connection helpers and EntryStore. |
| tests/test_cache.py | Adds unit tests for Redis backoff + file cache decorator. |
| tests/conftest.py | Adds shared pytest fixtures and sets up test environment. |
| scheduler.py | New APScheduler-based scheduler daemon and CLI. |
| pytest.ini | Updates pytest discovery paths and comment. |
| pyproject.toml | Adds APScheduler + flake8, removes legacy memcache dependency. |
| imports/wow.py | Refactors to EntryStore, updates secrets dir handling, exception narrowing TODOs. |
| imports/wordpress.py | Refactors to EntryStore and new lifestream.parse_args(). |
| imports/tweets.py | Refactors to EntryStore and secrets dir helper. |
| imports/tumblr.py | Refactors to EntryStore, improved exception handling TODOs. |
| imports/switchbot.py | Refactors stats writes to EntryStore, improves typing/default args. |
| imports/steambadges.py | Refactors to EntryStore, minor robustness for missing image src. |
| imports/steam.py | Extracts Steam API client, refactors to EntryStore. |
| imports/planetside.py | Refactors to EntryStore + Redis backoff helper. |
| imports/oyster_csv.py | Refactors to EntryStore + updated arg parsing. |
| imports/oyster.py | Refactors to EntryStore, regex literal cleanup, updated arg parsing. |
| imports/mastodon_toots.py | Refactors to EntryStore + updated arg parsing. |
| imports/mailbox-stats.py | Refactors stats writes to EntryStore, removes old helper usage. |
| imports/lifestreamutils.py | Removes legacy DB stats helper (replaced by EntryStore.add_stat). |
| imports/lifestream/utils.py | Introduces utility module (time delta formatting, JSON coercion, etc.). |
| imports/lifestream/steam_api.py | Introduces reusable Steam Web API client. |
| imports/lifestream/notifications.py | Introduces email/Slack failure notifications for jobs. |
| imports/lifestream/jobs.py | Introduces run_import / run_shell_command job runners. |
| imports/lifestream/foursquare_api.py | Introduces Foursquare client with Redis caching (new module). |
| imports/lifestream/db.py | Introduces DB helpers and EntryStore abstraction. |
| imports/lifestream/code_fetcher.py | Updates CodeFetcher to use package config/project root and path-safe template loading. |
| imports/lifestream/cache.py | Introduces Redis helpers + file cache decorator. |
| imports/lifestream/init.py | Major refactor: config/logging/arg parsing, re-exports, removes legacy classes. |
| imports/lastfm.py | Converts to main() entrypoint and EntryStore. |
| imports/instagram.py | Refactors to EntryStore + updated arg parsing. |
| imports/historic.py | Refactors to EntryStore and DB helper functions. |
| imports/gw2.py | Refactors to EntryStore, Redis backoff + file cache decorator. |
| imports/github_commits.py | Refactors to EntryStore + updated arg parsing. |
| imports/foursquare_oauth.py | Removes legacy OAuth helper script (replaced by imports/foursquare.py auth flow). |
| imports/foursquare.py | Adds integrated OAuth flow + refactors to EntryStore and DB helpers. |
| imports/flickr.py | Converts to main() entrypoint, refactors to EntryStore and DB helpers. |
| imports/fitbit_day.py | Refactors to EntryStore for entries/stats, removes old stats helper usage. |
| imports/ffxiv.py | Refactors to EntryStore and updated arg parsing. |
| imports/facebook_posts.py | Refactors to EntryStore, Redis backoff, updated secrets dir. |
| imports/facebook_page.py | Refactors to EntryStore, updated secrets dir, small ordering cleanup. |
| imports/destiny2.py | Refactors to EntryStore, Redis backoff helpers, updated secrets dir. |
| imports/atproto_posts.py | Refactors to EntryStore + updated arg parsing. |
| imports/atom.py | Converts to main() entrypoint and refactors to EntryStore. |
| docs/lifestream-scheduler.service | Adds systemd unit for running the scheduler daemon. |
| docs/crontab.example | Adds deprecated legacy crontab reference and migration notes. |
| datafiles/achievements_schema.sql | Minor formatting/whitespace fix. |
| contrib/oyster_import_csv.py | Updates DB connection usage to new helper functions. |
| contrib/ffxiv-upload-achievement-icons/update_sc_and_update_icons.ps1 | Minor formatting/whitespace fix. |
| contrib/ffxiv-upload-achievement-icons/readme.md | Minor formatting/whitespace fix. |
| contrib/ffxiv-upload-achievement-icons/lib/powershell/read_ini.ps1 | Formatting cleanup. |
| contrib/ffxiv-upload-achievement-icons/etc/achievements_schema.sql | Minor formatting/whitespace fix. |
| contrib/ffxiv-upload-achievement-icons/.gitignore | Minor formatting/whitespace fix. |
| config.example.ini | Adds scheduler schedules + notifications config, removes legacy memcache config, updates paths. |
| README.textile | Removes legacy README. |
| README.md | Adds modern README with setup, scheduler usage, configuration and testing instructions. |
| .gitignore | Removes trailing blank line. |
| .flake8 | Expands ignores/excludes to reflect refactor and new directories. |
| .envrc | Removes trailing blank line. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for job_name, config in schedules.items(): | ||
| cron = config["cron"] | ||
|
|
||
| try: | ||
| trigger = CronTrigger.from_crontab(cron) | ||
| except ValueError as e: | ||
| logger.error(f"Invalid cron expression for {job_name}: {cron} - {e}") | ||
| continue | ||
|
|
||
| # Check if it's a shell command (starts with !) | ||
| if job_name.startswith("!"): | ||
| actual_name = job_name[1:] | ||
| command = cron # For shell commands, cron is actually the command | ||
| scheduler.add_job( | ||
| run_shell_command, | ||
| trigger=trigger, | ||
| args=[actual_name, command], | ||
| id=actual_name, | ||
| name=actual_name, | ||
| misfire_grace_time=config["misfire_grace_time"], | ||
| coalesce=config["coalesce"], | ||
| replace_existing=True, | ||
| ) |
There was a problem hiding this comment.
Shell-command jobs (those with names starting with !) can't work as implemented: cron is used to build CronTrigger.from_crontab(cron) before the job_name.startswith('!') check, but for shell jobs the code then treats that same cron string as the shell command. A non-crontab shell command will fail trigger parsing, and a real cron expression can't also be the command. Consider changing the config format for shell jobs (e.g., separate schedule/command fields) and only pass the cron part into CronTrigger.from_crontab().
| entry_store = EntryStore() | ||
|
|
||
| APIKEY = Lifestream.config.get("guildwars2", "apikey") | ||
| APIKEY = entry_store.config.get("guildwars2", "apikey") | ||
|
|
||
| logger = logging.getLogger("GW2") | ||
| args = lifestream.arguments.parse_args() | ||
| args = lifestream.parse_args() | ||
|
|
There was a problem hiding this comment.
Several scripts call entry_store.config.get(...), but EntryStore currently doesn't expose a config attribute. This will raise AttributeError at import-time for this module. Either switch these reads back to lifestream.config, or add a config attribute/property on EntryStore that references the shared lifestream config.
…ror notification formatting
This pull request modernizes the Lifestream project by introducing a new scheduler system, updating documentation, refactoring import scripts, and improving configuration and testing. The changes make the project easier to set up, maintain, and extend, while deprecating legacy approaches and clarifying best practices.
Scheduler & Job Management
docs/lifestream-scheduler.service) and updated documentation for job management and monitoring. [1] [2]config.example.iniwith a[schedules]section for job configuration, notification settings, and removed legacy memcache config. [1] [2]Documentation & Project Structure
README.textilewith a comprehensive newREADME.mdcovering features, setup, configuration, running imports, project structure, and testing. [1] [2]Import Script Refactoring
imports/atom.py,imports/atproto_posts.py,imports/destiny2.py,contrib/oyster_import_csv.py) to use the newEntryStoreclass for database entry management, improved argument parsing, and clarified logging. [1] [2] [3] [4] [5] [6] [7]Configuration & Testing
.flake8to ignore additional formatting rules and exclude new directories, improving code linting.pytestin the new documentation.These changes collectively modernize the project, improve maintainability, and streamline both development and operations.