Design post-processing recipe system for stub generator

## Context

The stub generator (`dev/generate_stubs.R`) produces clean R wrapper functions from OpenAPI schemas. Some functions need post-processing logic beyond what the generated stub provides — dispatching across multiple stubs, annotation joins, string coercion, DTXSID extraction. Currently these are hand-written files that get overwritten when stubs are regenerated after schema changes.

We need a system that:

1. Stores post-processing logic separately from generated output
2. Automatically applies it during stub generation
3. Survives schema-driven regeneration (the whole point)
4. Is easy to add new recipes and edit existing ones
5. Works with the existing lifecycle badge protection in `05_file_scaffold.R`

## Current functions that need post-processing

- **`ct_bioactivity`** — dispatches to 4 generated stubs by `search_type`, optional annotation join via secondary API call
- **`ct_lists_all`** — projection selection logic, DTXSID comma-separated string coercion
- **`ct_list`** — uppercase coercion, DTXSID extraction + string split + dedup

More will be added as untested endpoints are validated.

## Design Options

### Option A: R list registry in a single file

All recipes live in one file (`dev/endpoint_eval/09_recipes.R`) as a named R list. Each entry contains metadata (title, lifecycle, params) and the function body as a character string. The generator reads this list and produces complete R files.

**Advantages:**
- Single source of truth — one file to manage all recipes
- Follows existing pipeline conventions (all `dev/endpoint_eval/` modules are single R files)
- Easy to iterate the registry programmatically (validation, reporting, drift detection)
- No file discovery logic needed — just read the list

**Disadvantages:**
- Writing R code inside character strings — no syntax highlighting, autocomplete, or linting in IDE
- Harder to review diffs (string changes vs. real code changes)
- File grows linearly with number of recipes; complex recipes (like `chemi_safety` with ~100 lines) make the file unwieldy
- Syntax errors in recipe bodies are only caught at generation time, not at edit time

### Option B: Separate R files per recipe

Each recipe gets its own file in `dev/recipes/` (e.g., `dev/recipes/ct_bioactivity.R`). Each file contains a standard R function definition that the generator reads, wraps with roxygen docs, and writes to `R/`. Metadata (title, lifecycle, params) could be in roxygen-style comments or a companion list at the top of the file.

**Advantages:**
- Full IDE support — syntax highlighting, autocomplete, linting, debugging all work
- Each recipe is independently readable and reviewable
- Complex recipes stay manageable (own file, own git history)
- Easy to test recipes in isolation (source the file, call the function)
- Git blame works per-recipe

**Disadvantages:**
- File discovery logic needed (glob `dev/recipes/*.R`, parse metadata)
- Metadata format needs design (roxygen comments? A header list? A companion YAML?)
- More files to manage
- Need convention for how the generator extracts the function body vs. metadata

### Option C: Marker-protected regions in R/ files

Post-processing is written directly in the generated `R/` files as normal R code. Special marker comments (e.g., `# <<< RECIPE START >>>` / `# <<< RECIPE END >>>`) delineate hand-written sections. The generator preserves everything between markers during regeneration and only rewrites the generated portions.

**Advantages:**
- Most natural workflow — edit the actual R file you're working with
- Full IDE support with complete file context
- No separate recipe files or registries to maintain
- What you see is what you get — the file in `R/` IS the source of truth

**Disadvantages:**
- Fragile — marker comments can be accidentally deleted, moved, or malformed
- Merges and rebases can corrupt marker boundaries
- Mixes generated and hand-written code in the same file (unclear ownership)
- Generator needs complex parsing logic to extract and re-inject protected regions
- Harder to validate — is the marker region valid? Did the generated portion change in a way that breaks the protected region?
- No clean separation between "what the schema gives us" and "what we added"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design post-processing recipe system for stub generator #120

Context

Current functions that need post-processing

Design Options

Option A: R list registry in a single file

Option B: Separate R files per recipe

Option C: Marker-protected regions in R/ files

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Design post-processing recipe system for stub generator #120

Description

Context

Current functions that need post-processing

Design Options

Option A: R list registry in a single file

Option B: Separate R files per recipe

Option C: Marker-protected regions in R/ files

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions