This template is now maintained inside the scitex-template monorepo at templates/research/.
pip install scitex-template
# Python
from scitex_template import clone_template_from_cache
clone_template_from_cache("research", "./my-new-project")See: https://github.com/ywatanabe1989/scitex-template/tree/main/templates/research
This repository is archived for history and will no longer receive updates.
A boilerplate template for scientific research projects using the SciTeX framework.
This is a template project designed to be used as a starting point for your research. It demonstrates the standard SciTeX workflow with an MNIST example pipeline.
Part of the scitex package (scitex.template module).
# Install scitex package
pip install scitex
# Verify installation
scitex --versionRequirements:
- Python >= 3.10
- scitex >= 2.0.0
# Clone and setup
git clone https://github.com/ywatanabe1989/scitex-research-template.git
cd scitex-research-template
make install # Install project dependencies
make setup # Full setup (install + verify)
# Run example pipeline
make run-mnistscitex-research-template/
├── config/ # YAML configuration files
├── data/ # Centralized data storage (symlinked from scripts/*_out/)
├── scripts/ # Analysis scripts
│ ├── mnist/ # MNIST example pipeline
│ └── template.py # Template for new scripts
├── tests/ # Test suite
├── scitex/ # SciTeX managed resources
│ ├── writer/ # Manuscript projects (LaTeX)
│ ├── scholar/ # Research notes & bibliography
│ ├── vis/ # Figure management
│ ├── code/ # Code templates
│ ├── ai/ # AI prompts & conversations
│ └── uploads/ # File uploads
├── management/ # Project management scripts
├── externals/ # External dependencies
├── docs/ # Documentation
├── .venv -> ~/.venv # Python virtual environment (symlink)
└── Makefile # Automation commands
- Clone or fork this repository
- Remove MNIST example if not needed:
make clean-mnist - Add your scripts to
scripts/your_project/ - Configure paths in
config/PATH.yaml - Run your analysis with
make run-your-script
Setup & Installation
| Command | Description |
|---|---|
make install |
Install dependencies |
make install-dev |
Install with dev dependencies |
make setup |
Full setup (install + verify) |
make setup-writer |
Initialize manuscript project |
make verify |
Verify installation |
Running Analysis
| Command | Description |
|---|---|
make run-mnist |
Run full MNIST pipeline |
make run-mnist-download |
Download MNIST data |
make run-mnist-plot-digits |
Plot sample digits |
make run-mnist-plot-umap |
Generate UMAP visualization |
make run-mnist-clf-svm |
Train SVM classifier |
make run-mnist-conf-mat |
Plot confusion matrix |
Development
| Command | Description |
|---|---|
make test |
Run test suite |
make test-verbose |
Run tests with verbose output |
make format |
Format code (Python + Shell) |
make lint |
Run linters |
make check |
Format + lint + test |
Cleanup
| Command | Description |
|---|---|
make clean |
Clean temporary files |
make clean-mnist |
Remove MNIST outputs |
make clean-outputs |
Remove all script outputs |
make clean-data |
Remove downloaded data |
make clean-logs |
Remove log files |
make clean-all |
Full cleanup |
make clean-python |
Remove Python cache |
make clean-writer |
Clean writer build files |
Information
| Command | Description |
|---|---|
make help |
Show all available commands |
make info |
Show project information |
make tree |
Display directory tree |
make show-config |
Show configuration |
The scitex/ directory integrates with SciTeX Cloud:
| Directory | Purpose | Setup |
|---|---|---|
writer/ |
LaTeX manuscripts (00_shared, 01_manuscript, 02_supplementary, 03_revision) | make setup-writer |
scholar/ |
Bibliography management and research library | Auto |
vis/ |
Visualization workspace - figures, gallery templates | Auto |
code/ |
Code templates and AI-assisted coding | Auto |
ai/ |
AI prompts and conversation history | Auto |
uploads/ |
File upload staging area | Auto |
The scitex/writer/ directory is not included in the template - it's cloned on-demand to maintain independent git history.
Initialize Writer Project
# Quick setup (recommended)
make setup-writer
# Or with specific git strategy
./management/scripts/setup-writer.sh --git-strategy childGit Strategies:
| Strategy | Description |
|---|---|
parent |
Use parent repository (default) |
child |
Create isolated git in writer directory |
origin |
Preserve template's original git history |
none |
No git initialization |
After initialization:
scitex/writer/
├── 00_shared/ # Shared resources (title, authors, bibliography)
├── 01_manuscript/ # Main manuscript
├── 02_supplementary/ # Supplementary materials
├── 03_revision/ # Revision responses
└── compile.sh # LaTeX compilation script
Compile manuscript:
cd scitex/writer
./compile.sh manuscript # or: scitex writer compile manuscriptThis template uses symbolic links for DRY (Don't Repeat Yourself) principles and data provenance.
Environment Symlinks - Shared user configurations
.venv -> ~/.venv # Shared Python virtual environment
Points to user-level virtual environment, making the template portable across projects.
AI Prompts Centralization - Single access point for all prompts
scitex/ai/prompts/
├── writer -> ../../writer/ai/prompts
├── scholar -> ../../scholar/ai/prompts
├── code -> ../../code/ai/prompts
└── vis -> ../../vis/ai/prompts
All AI prompts are accessible from a central location while being organized by module.
Writer Shared Resources - Edit once, sync everywhere (after make setup-writer)
scitex/writer/ # Created by: make setup-writer
├── 00_shared/ # Single source of truth
│ ├── title.tex
│ ├── authors.tex
│ ├── bibliography.bib
│ ├── keywords.tex
│ └── latex_styles/
├── 01_manuscript/contents/
│ ├── title.tex -> ../../00_shared/title.tex
│ ├── authors.tex -> ../../00_shared/authors.tex
│ └── bibliography.bib -> ../../00_shared/bibliography.bib
├── 02_supplementary/contents/ # Same symlinks
└── 03_revision/contents/ # Same symlinks
Edit 00_shared/ once, and all manuscript sections stay synchronized.
Script Output to Data Links - Provenance tracking
data/mnist/
├── train_flattened.npy -> ../../scripts/mnist/download_out/data/mnist/train_flattened.npy
├── test_labels.npy -> ../../scripts/mnist/download_out/data/mnist/test_labels.npy
├── models/
│ └── mnist_svm.pkl -> ../../../scripts/mnist/clf_svm_out/data/mnist/models/mnist_svm.pkl
└── figures/
└── umap.jpg -> ../../../scripts/mnist/plot_umap_space_out/data/mnist/figures/umap.jpg
Script outputs (*_out/) are symlinked to data/, providing:
- Provenance: Know which script generated each file
- Central access: All data accessible from
data/directory - Reproducibility: Re-run script to regenerate linked output
Script Template & Conventions
Use the template as a starting point:
# Copy template
cp scripts/template.py scripts/my_analysis/01_preprocess.py
# Edit and run
python scripts/my_analysis/01_preprocess.pyConventions:
- Numbered prefix for execution order:
01_,02_, etc. - Output directory:
{script_name}_out/ - Use
main.shto orchestrate multiple steps
MNIST Example Pipeline:
scripts/mnist/
├── 01_download.py # Download MNIST dataset
├── 02_plot_digits.py # Visualize sample digits
├── 03_plot_umap_space.py # UMAP dimensionality reduction
├── 04_clf_svm.py # Train SVM classifier
├── 05_plot_conf_mat.py # Plot confusion matrix
└── main.sh # Run all steps sequentially
This template is powered by two essential SciTeX modules that ensure reproducible, standardized research.
scitex.io - Universal I/O with automatic symlinks
Philosophy: "Load and save anything with one function"
import scitex as stx
# Universal interface - format auto-detected from extension
data = stx.io.load("data.csv") # DataFrame
model = stx.io.load("model.pth") # PyTorch state
config = stx.io.load("config.yaml") # Dict
# Save with automatic directory creation
stx.io.save(df, "results.parquet")
stx.io.save(fig, "figure.png", dpi=300, auto_crop=True)Automatic Path Resolution:
When using relative paths, stx.io.save() automatically organizes outputs under {script_name}_out/:
# File: scripts/mnist/01_download.py
stx.io.save(train_data, "data/mnist/train.npy")
# ↓ Relative path
# Actual save location: scripts/mnist/download_out/data/mnist/train.npy
# ^^^^^^^^^^^^^^^^^^^^^^^^^ Auto-generated from script nameSymlink Parameters for Data Centralization:
# In scripts/mnist/01_download.py
stx.io.save(
train_data,
"data/mnist/train.npy", # Saved to: scripts/mnist/download_out/data/mnist/train.npy
symlink_to="../../data/mnist/" # Symlinked to: ./data/mnist/train.npy (relative to script_out)
)| Parameter | Description |
|---|---|
symlink_to |
Create symlink at specified path (relative to output location) |
symlink_from_cwd |
Create symlink from current working directory |
Path Resolution Rules:
- Relative path (e.g.,
"data/file.npy") → Saves to{script}_out/{path} - Absolute path (e.g.,
"/tmp/file.npy") → Saves to exact path - With
@stx.session→ Saves under session directory (e.g.,script_out/RUNNING/A7K2/)
This enables the provenance-tracking symlink architecture shown above.
Supported Formats (27+):
- Data: csv, tsv, parquet, json, yaml, pkl, joblib
- Arrays: npy, npz, hdf5, zarr, mat, nc
- ML: pth, pt, cbm (CatBoost), optuna
- Documents: txt, md, pdf, docx, xml, bib
- Images: png, jpg, tiff, gif (with auto-crop, metadata embedding)
- Bundles: figz, pltz, statsz
scitex.session - Experiment lifecycle management
Philosophy: "Every run is reproducible and traceable"
import scitex as stx
@stx.session(seed=42)
def main(
CONFIG=stx.INJECTED, # Session metadata (ID, paths, timestamps)
plt=stx.INJECTED, # Configured matplotlib
COLORS=stx.INJECTED, # Color palette
rng_manager=stx.INJECTED, # Reproducible RNG (numpy, torch, random)
logger=stx.INJECTED, # Auto-logging to files
):
print(f"Session: {CONFIG['ID']}") # e.g., "A7K2"
# All stdout/stderr captured to logs/
# Random seeds fixed across all libraries
# Output directory auto-managed
if __name__ == "__main__":
main()Session Directory Structure:
script.py
script_out/
├── RUNNING/ # Active sessions
│ └── A7K2/ # 4-char session ID
│ ├── logs/
│ │ ├── stdout.log
│ │ └── stderr.log
│ ├── CONFIGS/
│ │ └── CONFIG.yaml
│ └── [your outputs]
├── FINISHED_SUCCESS/ # Completed successfully
└── FINISHED_ERROR/ # Completed with errors
Features:
- Unique session IDs for every run
- Automatic stdout/stderr capture
- Fixed random seeds (numpy, torch, random, os)
- Runtime tracking and timestamps
- Exit status classification
How They Work Together - Complete workflow
┌─────────────────────────────────────────────────────────────────┐
│ scripts/mnist/01_download.py │
│ │
│ @stx.session(seed=42) │
│ def main(CONFIG, plt, ...): │
│ data = download_mnist() │
│ stx.io.save(data, "data/mnist/train.npy", │
│ symlink_to="../../data/mnist/") │
│ │
└────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ scripts/mnist/download_out/ │
│ └── FINISHED_SUCCESS/ │
│ └── A7K2/ <- Session tracking │
│ ├── logs/ │
│ │ ├── stdout.log <- All prints captured │
│ │ └── stderr.log │
│ ├── CONFIGS/ │
│ │ └── CONFIG.yaml <- Reproducibility record │
│ └── data/mnist/ │
│ └── train.npy <- Actual file │
│ │
└────────────────────────┬────────────────────────────────────────┘
│ symlink
▼
┌─────────────────────────────────────────────────────────────────┐
│ data/mnist/ │
│ └── train.npy -> ../../scripts/mnist/download_out/.../train.npy│
│ │
│ Central access with provenance tracking │
└─────────────────────────────────────────────────────────────────┘
Benefits:
- Provenance: Every file traces back to its generating script and session
- Reproducibility: Re-run with same seed = same results
- Central Access: All data accessible from
./data/ - Logging: Complete record of every run
- Standardized structure for reproducible research
- Automated workflows via Makefile
- Manuscript management with LaTeX compilation
- Testing framework included
- Figure provenance tracking via symlinks
- AI integration for coding assistance
scitex-code Examples (for deeper understanding):
examples/session/demo_session_plt_io.py- Complete session + io + symlink demoexamples/session/COMPARISON.md- Manual vs decorator comparisonsrc/scitex/io/README.md- Full I/O documentationsrc/scitex/session/README.md- Session management details
AGPL-3.0
Yusuke Watanabe (ywatanabe@scitex.ai)