Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -213,7 +213,7 @@ jobs:
PYTHON_VERSION: "3_6"
CIRCLE_ARTIFACTS: /tmp/circleci-artifacts/3_6
CIRCLE_TEST_REPORTS: /tmp/circleci-test-results/3_6
VERSION: 0.7.2
VERSION: 0.8.0
PANDOC_RELEASES_URL: https://github.com/jgm/pandoc/releases
YARN_STATIC_DIR: notebooker/web/static/
IMAGE_NAME: mangroup/notebooker
Expand All @@ -229,7 +229,7 @@ jobs:
environment:
CIRCLE_ARTIFACTS: /tmp/circleci-artifacts/3_7
CIRCLE_TEST_REPORTS: /tmp/circleci-test-results/3_7
VERSION: 0.7.2
VERSION: 0.8.0
PANDOC_RELEASES_URL: https://github.com/jgm/pandoc/releases
YARN_STATIC_DIR: notebooker/web/static/
IMAGE_NAME: mangroup/notebooker
Expand All @@ -243,7 +243,7 @@ jobs:
environment:
CIRCLE_ARTIFACTS: /tmp/circleci-artifacts/3_8
CIRCLE_TEST_REPORTS: /tmp/circleci-test-results/3_8
VERSION: 0.7.2
VERSION: 0.8.0
PANDOC_RELEASES_URL: https://github.com/jgm/pandoc/releases
YARN_STATIC_DIR: notebooker/web/static/
IMAGE_NAME: mangroup/notebooker
Expand All @@ -257,7 +257,7 @@ jobs:
environment:
CIRCLE_ARTIFACTS: /tmp/circleci-artifacts/3_11
CIRCLE_TEST_REPORTS: /tmp/circleci-test-results/3_11
VERSION: 0.7.2
VERSION: 0.8.0
PANDOC_RELEASES_URL: https://github.com/jgm/pandoc/releases
YARN_STATIC_DIR: notebooker/web/static/
IMAGE_NAME: mangroup/notebooker
Expand Down
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,11 @@
0.8.0 (2026-01-15)
------------------

* feature: standalone scheduler process for improved reliability in Kubernetes deployments
* feature: new `--scheduler-management-only` flag for webapp to manage jobs without executing them
* bugfix: fix scheduler race condition by starting in paused state
* bugfix: fix template dropdown showing folder names instead of templates

0.7.2 (2025-01-17)
------------------

Expand Down
120 changes: 120 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

Notebooker is a production system for executing and scheduling Jupyter Notebooks as parametrized reports. It converts notebooks (stored as .py files via Jupytext) into web-based reports with results stored in MongoDB.

## Common Commands

### Python Development
```bash
# Install in development mode
pip install -e ".[test]"

# Install kernel for notebook execution
python -m ipykernel install --user --name=notebooker_kernel

# Run tests (requires MongoDB)
pytest -svvvvv --junitxml=test-results/junit.xml

# Code quality
flake8 notebooker tests
black --check -l 120 notebooker tests

# Build docs
pip install -e ".[docs]"
sphinx-build -b html docs/ build/sphinx/html
```

### JavaScript Development
```bash
cd notebooker/web/static/

yarn install --frozen-lockfile
yarn run lint # ESLint
yarn run format # Prettier
yarn run bundle # Browserify scheduler.js
yarn test # Jest
```

### Quick Demo
```bash
cd docker && docker-compose up
# Access at http://localhost:8080/
```

## Architecture

### Core Components

- **`notebooker/execute_notebook.py`** - Notebook execution engine using Papermill
- **`notebooker/_entrypoints.py`** - Click-based CLI (`notebooker-cli`)
- **`notebooker/web/app.py`** - Flask webapp with Gevent WSGI server
- **`notebooker/serialization/`** - Storage backend interfaces
- **`notebooker/serializers/`** - MongoDB implementation (PyMongoResultSerializer)

### Entry Points
- `notebooker-cli` - Main CLI with subcommands: `start-webapp`, `execute-notebook`, `cleanup-old-reports`
- `notebooker_execute` - Docker-compatible entrypoint
- `notebooker_template_sanity_check` - Template validation
- `notebooker_template_regression_test` - Regression testing

### Execution Flow
1. Templates stored as .py files (Jupytext format) in git
2. Converted to .ipynb via `generate_ipynb_from_py()`
3. Executed with Papermill using parameters
4. Output converted to HTML/PDF via nbconvert
5. Results stored in MongoDB with GridFS for large files

### Web App Routes
- `/run_report/` - Execute notebooks
- `/results/` - Serve completed reports
- `/pending/` - Monitor running reports
- `/scheduler/` - Schedule management

### Template Parameters
Define parameters in templates using the Jupytext tag format:
```python
# + {"tags": ["parameters"]}
param_name = "default_value"
```

## Key Configuration

- `NOTEBOOK_KERNEL_NAME` - Kernel for execution (default: `notebooker_kernel`)
- `PY_TEMPLATE_BASE_DIR` - Git repo containing templates
- `SERIALIZER_CLS` / `SERIALIZER_CONFIG` - Storage backend config
- `NOTEBOOKER_DISABLE_GIT` - Skip git pulls during execution
- `SCHEDULER_MANAGEMENT_ONLY` - Webapp manages jobs but doesn't execute them (use with standalone scheduler)

## Standalone Scheduler

The scheduler can run as a standalone process instead of a background thread in the webapp:

```bash
# Webapp (manages jobs, doesn't execute)
notebooker-cli start-webapp --scheduler-management-only

# Standalone scheduler (executes jobs)
notebooker-cli start-scheduler
```

Key files: `scheduler_core.py` (shared infrastructure), `standalone_scheduler.py` (standalone process)

## Version Consistency

When bumping versions, update all of:
- `notebooker/version.py`
- `CHANGELOG.md`
- `docs/conf.py`
- `notebooker/web/static/package.json`

## Testing

Tests require MongoDB. The test suite uses pytest-server-fixtures for MongoDB test servers. Test directories:
- `tests/unit/` - Unit tests
- `tests/integration/` - Integration tests
- `tests/regression/` - Regression tests
- `tests/sanity/` - Sanity checks
2 changes: 1 addition & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
author = "Man Group Quant Tech"

# The full version, including alpha/beta/rc tags
release = "0.7.2"
release = "0.8.0"


# -- General configuration ---------------------------------------------------
Expand Down
61 changes: 61 additions & 0 deletions docs/webapp/webapp.rst
Original file line number Diff line number Diff line change
Expand Up @@ -181,3 +181,64 @@ environments or where the reports can reveal sensitive data if misconfigured.
Please note that read-only mode does not change the functionality of the scheduler; users will still be able to
modify schedules and it will execute as intended. To disable the scheduler you can add :code:`--disable-scheduler`
to the command line arguments of the webapp; likewise git pulls can be prevented by using :code:`--disable-git`.


Standalone scheduler
--------------------

.. note::
Available from version 0.8.0 onwards.

By default, the scheduler runs as a background thread within the webapp process. While convenient,
this approach has a drawback: if the scheduler thread dies, the only way to recover is to restart
the entire webapp.

For production deployments, especially in Kubernetes, you can run the scheduler as a standalone
process. This allows the scheduler to be restarted independently without affecting the webapp,
improving reliability.

**Starting the standalone scheduler:**

.. code-block:: bash

notebooker-cli \
--py-template-base-dir /path/to/your/repo \
--py-template-subdir notebook_templates \
--mongo-host localhost:27017 \
--database-name notebooker \
--result-collection-name notebooker_results \
start-scheduler

**Starting the webapp in management-only mode:**

When using a standalone scheduler, the webapp should be started with :code:`--scheduler-management-only`.
This allows users to create, update, and delete scheduled jobs via the UI, but the webapp won't
execute them - that's handled by the standalone scheduler.

.. code-block:: bash

notebooker-cli \
--py-template-base-dir /path/to/your/repo \
--py-template-subdir notebook_templates \
--mongo-host localhost:27017 \
--database-name notebooker \
--result-collection-name notebooker_results \
start-webapp \
--port 8080 \
--scheduler-management-only

**Deployment configuration:**

+---------------------------+----------------------------------------+-----------------------------------+
| Deployment | Webapp flags | Scheduler process |
+===========================+========================================+===================================+
| Traditional (default) | (none) | Not needed |
+---------------------------+----------------------------------------+-----------------------------------+
| Standalone scheduler | :code:`--scheduler-management-only` | :code:`start-scheduler` |
+---------------------------+----------------------------------------+-----------------------------------+
| No scheduling | :code:`--disable-scheduler` | Not needed |
+---------------------------+----------------------------------------+-----------------------------------+

.. warning::
Only run one scheduler process at a time. Running multiple schedulers won't corrupt data
(APScheduler uses MongoDB locking), but may cause inefficiencies.
48 changes: 48 additions & 0 deletions notebooker/_entrypoints.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
from notebooker.serialization import SERIALIZER_TO_CLI_OPTIONS
from notebooker.settings import BaseConfig, WebappConfig
from notebooker.snapshot import snap_latest_successful_notebooks
from notebooker.standalone_scheduler import run_standalone_scheduler
from notebooker.utils.cleanup import delete_old_reports
from notebooker.web.app import main

Expand Down Expand Up @@ -137,6 +138,13 @@ def base_notebooker(
help="If --disable-scheduler is given, then the scheduling back-end of the webapp will not start up. It will also "
"not display the scheduler from the front-end of the webapp.",
)
@click.option(
"--scheduler-management-only",
default=False,
is_flag=True,
help="If --scheduler-management-only is given, the webapp can create/update/delete scheduled jobs but will not "
"execute them. Use this when running a separate standalone scheduler process.",
)
@click.option(
"--scheduler-mongo-database",
default="",
Expand Down Expand Up @@ -164,6 +172,7 @@ def start_webapp(
debug,
base_cache_dir,
disable_scheduler,
scheduler_management_only,
scheduler_mongo_database,
scheduler_mongo_collection,
readonly_mode,
Expand All @@ -174,12 +183,51 @@ def start_webapp(
web_config.DEBUG = debug
web_config.CACHE_DIR = base_cache_dir
web_config.DISABLE_SCHEDULER = disable_scheduler
web_config.SCHEDULER_MANAGEMENT_ONLY = scheduler_management_only
web_config.SCHEDULER_MONGO_DATABASE = scheduler_mongo_database
web_config.SCHEDULER_MONGO_COLLECTION = scheduler_mongo_collection
web_config.READONLY_MODE = readonly_mode
return main(web_config)


@base_notebooker.command()
@click.option("--logging-level", default="INFO", help="The logging level. Set to DEBUG for lots of extra info.")
@click.option(
"--scheduler-mongo-database",
default="",
help="The name of the mongo database which is used for the scheduler. "
"Defaults to the same as the serializer's mongo database.",
)
@click.option(
"--scheduler-mongo-collection",
default="",
help="The name of the mongo collection for the scheduler. "
"Defaults to the same as the serializer's mongo collection + '_scheduler'.",
)
@pass_config
def start_scheduler(config: BaseConfig, logging_level, scheduler_mongo_database, scheduler_mongo_collection):
"""
Start the scheduler as a standalone process.

Use this when you want to run the scheduler separately from the webapp,
for example in a Kubernetes deployment where the scheduler can be
restarted independently.

The webapp should be started with --scheduler-management-only when
using a standalone scheduler.
"""
import logging

logging.basicConfig(level=logging.getLevelName(logging_level))

# Copy config and add scheduler-specific settings
scheduler_config = BaseConfig.copy_existing(config)
scheduler_config.SCHEDULER_MONGO_DATABASE = scheduler_mongo_database
scheduler_config.SCHEDULER_MONGO_COLLECTION = scheduler_mongo_collection

return run_standalone_scheduler(scheduler_config)


@base_notebooker.command()
@click.option("--report-name", help="The name of the template to execute, relative to the template directory.")
@click.option(
Expand Down
Loading