refactor: 3-layer architecture + web monitoring & UX improvements by BenjaminNavet · Pull Request #55 · Wimmics/SciLEx

BenjaminNavet · 2026-03-23T09:41:15Z

Summary

Architecture 3 couches : décomposition du monolithe aggregate_collect.py (2272→960 lignes) en scilex/config.py, scilex/pipeline/ (7 modules), et wrappers CLI/web minces. Élimination du hack sys.argv et des effets de bord module-level.
Monitoring pipeline web : capture de logs temps réel, stepper de phases, cartes de détail articles avec liens DOI, suppression de collections (API + UI), clés API réorganisées par catégorie.
UX polish : réécriture de 12 tooltips, ajout de 6 tooltips manquants, panneau de config repliable, gestion reprise/redémarrage de collections.

Changes

scilex/config.py — SciLExConfig centralisée avec from_files() / from_dicts()
scilex/pipeline/ — 7 modules : orchestrator, tracker, text_filter, citation_filter, ranking, itemtype_filter, enrichment, post_filter
scilex/webapi/scilex_api.py — nouveaux endpoints (delete collection, logs streaming)
scilex/webapi/web_interface.py — refonte UI complète (monitoring, cards, tooltips)
41 fichiers modifiés, +2849/−2018 lignes

Test plan

uv run python -m pytest tests/ — tous les tests passent
Vérifier le pipeline web de bout en bout (collect → aggregate → export)
Vérifier la suppression de collection via l'UI
Vérifier l'affichage des tooltips sur tous les champs du formulaire

🤖 Generated with Claude Code

…53) - Add progress_callback and cancel_event params to CollectCollection (optional, defaults to None — CLI behavior unchanged) - Add POST /pipelines/{job_id}/cancel endpoint with graceful shutdown - Offload blocking collection/aggregation to thread pool via run_in_executor so the event loop stays responsive for polling - Add Streamlit progress monitoring section with per-API stats, progress bar, and cancel button (polls every 2s) - Fix bare except Exception to catch queue.Empty specifically - Fix Zotero output string concatenation precedence bug

- Add enable_enrichment, enrichment_threshold, enrichment_limit fields to CollectionConfig Pydantic model - Run enrich_with_hf.main() after aggregation when enrichment is enabled in run_collection_task(), offloaded to thread pool - Add enrichment checkbox with conditional threshold/limit controls in Tab 1 (New Collection) - Add standalone "Enrich with HuggingFace" section in Tab 3 (Filter & Export) with threshold slider, limit input, and enrich button calling scilex-enrich via subprocess

Replace single dropdown with individual expanders per API in the sidebar. Each shows a status indicator (✅ configured / ⬚ not). "Clear" button wipes all credentials for that API at once.

Use default values from enrich_with_hf.py (threshold=85, no limit). These parameters are too technical for the web UI.

These are output/enrichment tools, not collection sources. Simplify Data Sources section to 2 columns: Free APIs and Paid APIs.

- Wrap all configuration (API keys, output dir) in a single collapsible expander (open by default, can be closed) - Show all APIs in a flat list with green dot (●) when configured, grey circle (○) when not — visible at a glance without expanding - Each API has inline Save/Clear buttons with a divider between them

Add PubMedCentral and Istex to the free APIs list in both the frontend and the /available-apis endpoint. Remove HuggingFace and Zotero from /available-apis since they are not data sources. Now shows all 11 registered collectors: 8 free + 3 paid.

Enrichment is a pipeline step, not an export action. The checkbox in New Collection (Tab 1) is the correct place to enable it.

Add OpenAlex, PubMed (optional keys for higher rate limits) and CrossRef (mailto for polite pool) to the API keys configuration. Now lists all 9 APIs that accept credentials.

- Detect existing data on disk for the collection name - Show info message explaining idempotent behavior - Change button text to "Resume Collection" when partial data exists - Improve cancelled state message to explain restart is safe

When partial data exists, show a "Start fresh" checkbox that deletes previous results before starting. Button adapts: "Resume Collection", "Start Fresh", or "Start Collection Pipeline" depending on context.

Rewrite 12 vague one-liner tooltips with clearer 2-3 line descriptions drawn from config file comments. Add 6 missing tooltips for Enable Base Filters, abstract length sliders, Allowed Publication Types, and API source selectors. Remove label_visibility="collapsed" from 3 multiselects so the help icon is actually visible.

…end restart

… polish Add real-time log capture, phase stepper, paper detail cards with DOI links, collection deletion (API + UI), and reorganize API keys by category. Quality filters moved into collapsible section. Includes ruff formatting across the codebase.

Extract the 2272-line aggregate_collect.py monolith into clean pipeline modules. Eliminate sys.argv hack and module-level side effects. - Add scilex/config.py: centralized SciLExConfig with from_files() and from_dicts() constructors, replacing scattered config loading - Add scilex/pipeline/ modules: tracker, text_filter, citation_filter, ranking, itemtype_filter, enrichment, post_filter, orchestrator - Add pipeline/orchestrator.py: run_aggregation() and run_collection() as pure function calls, accepting SciLExConfig + AggregationOptions - Replace sys.argv manipulation in scilex_api.py with orchestrator call - Remove sys.path.insert hacks from web files - Move module-level side effects (setup_logging, load_all_configs, print) inside main() in run_collection.py and aggregate_collect.py - Centralize FORMAT_CONVERTERS registry in crawlers/aggregate.py - Standardize logging via setup_logging() in all entry points - Add shared post_filter.py for web UI/API filtering consolidation - Slim aggregate_collect.py from 2272 to ~960 lines (thin CLI wrapper)

# Conflicts: # paper/paper.md

datalogism and others added 23 commits March 9, 2026 16:58

Update paper.md

a6d99ec

Merge branch 'main' into dev

4a5a5a9

chore: gitignore playwright

043e88e

Update paper.md

988b258

Update paper.md

cf1771e

Update paper.md

c5a80eb

refactor(web): show one expander per API for key configuration

097394b

Replace single dropdown with individual expanders per API in the sidebar. Each shows a status indicator (✅ configured / ⬚ not). "Clear" button wipes all credentials for that API at once.

simplify(web): remove enrichment threshold/limit from UI

689203c

Use default values from enrich_with_hf.py (threshold=85, no limit). These parameters are too technical for the web UI.

fix(web): remove HuggingFace and Zotero from data sources

4c6d0ae

These are output/enrichment tools, not collection sources. Simplify Data Sources section to 2 columns: Free APIs and Paid APIs.

fix(web): remove standalone enrichment button from Filter & Export tab

0da2ba1

Enrichment is a pipeline step, not an export action. The checkbox in New Collection (Tab 1) is the correct place to enable it.

fix(web): add missing APIs to configuration panel

676f2f4

Add OpenAlex, PubMed (optional keys for higher rate limits) and CrossRef (mailto for polite pool) to the API keys configuration. Now lists all 9 APIs that accept credentials.

ux(web): add option to start fresh when collection has existing data

565a05c

When partial data exists, show a "Start fresh" checkbox that deletes previous results before starting. Button adapts: "Resume Collection", "Start Fresh", or "Start Collection Pipeline" depending on context.

Merge branch 'dev' of https://github.com/Wimmics/SciLEx into dev

163133d

fix(web): error message when completion not completed because of back…

d249747

…end restart

Merge branch 'main' into dev

9e09846

# Conflicts: # paper/paper.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: 3-layer architecture + web monitoring & UX improvements#55

refactor: 3-layer architecture + web monitoring & UX improvements#55
BenjaminNavet wants to merge 23 commits intomainfrom
dev

BenjaminNavet commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

BenjaminNavet commented Mar 23, 2026

Summary

Changes

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants