Skip to content
This repository was archived by the owner on Apr 24, 2026. It is now read-only.

ywatanabe1989/scitex-research-template

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

⚠️ This repository has moved

This template is now maintained inside the scitex-template monorepo at templates/research/.

Migration

pip install scitex-template

# Python
from scitex_template import clone_template_from_cache
clone_template_from_cache("research", "./my-new-project")

See: https://github.com/ywatanabe1989/scitex-template/tree/main/templates/research

This repository is archived for history and will no longer receive updates.


SciTeX Research Template

A boilerplate template for scientific research projects using the SciTeX framework.

What is This?

This is a template project designed to be used as a starting point for your research. It demonstrates the standard SciTeX workflow with an MNIST example pipeline.

Part of the scitex package (scitex.template module).

Prerequisites

# Install scitex package
pip install scitex

# Verify installation
scitex --version

Requirements:

  • Python >= 3.10
  • scitex >= 2.0.0

Quick Start

# Clone and setup
git clone https://github.com/ywatanabe1989/scitex-research-template.git
cd scitex-research-template
make install    # Install project dependencies
make setup      # Full setup (install + verify)

# Run example pipeline
make run-mnist

Project Structure

scitex-research-template/
├── config/              # YAML configuration files
├── data/                # Centralized data storage (symlinked from scripts/*_out/)
├── scripts/             # Analysis scripts
│   ├── mnist/           # MNIST example pipeline
│   └── template.py      # Template for new scripts
├── tests/               # Test suite
├── scitex/              # SciTeX managed resources
│   ├── writer/          # Manuscript projects (LaTeX)
│   ├── scholar/         # Research notes & bibliography
│   ├── vis/             # Figure management
│   ├── code/            # Code templates
│   ├── ai/              # AI prompts & conversations
│   └── uploads/         # File uploads
├── management/          # Project management scripts
├── externals/           # External dependencies
├── docs/                # Documentation
├── .venv -> ~/.venv     # Python virtual environment (symlink)
└── Makefile             # Automation commands

Using as a Template

  1. Clone or fork this repository
  2. Remove MNIST example if not needed: make clean-mnist
  3. Add your scripts to scripts/your_project/
  4. Configure paths in config/PATH.yaml
  5. Run your analysis with make run-your-script

Available Commands

Setup & Installation
Command Description
make install Install dependencies
make install-dev Install with dev dependencies
make setup Full setup (install + verify)
make setup-writer Initialize manuscript project
make verify Verify installation
Running Analysis
Command Description
make run-mnist Run full MNIST pipeline
make run-mnist-download Download MNIST data
make run-mnist-plot-digits Plot sample digits
make run-mnist-plot-umap Generate UMAP visualization
make run-mnist-clf-svm Train SVM classifier
make run-mnist-conf-mat Plot confusion matrix
Development
Command Description
make test Run test suite
make test-verbose Run tests with verbose output
make format Format code (Python + Shell)
make lint Run linters
make check Format + lint + test
Cleanup
Command Description
make clean Clean temporary files
make clean-mnist Remove MNIST outputs
make clean-outputs Remove all script outputs
make clean-data Remove downloaded data
make clean-logs Remove log files
make clean-all Full cleanup
make clean-python Remove Python cache
make clean-writer Clean writer build files
Information
Command Description
make help Show all available commands
make info Show project information
make tree Display directory tree
make show-config Show configuration

SciTeX Directory (scitex/)

The scitex/ directory integrates with SciTeX Cloud:

Directory Purpose Setup
writer/ LaTeX manuscripts (00_shared, 01_manuscript, 02_supplementary, 03_revision) make setup-writer
scholar/ Bibliography management and research library Auto
vis/ Visualization workspace - figures, gallery templates Auto
code/ Code templates and AI-assisted coding Auto
ai/ AI prompts and conversation history Auto
uploads/ File upload staging area Auto

Manuscript Setup (Writer)

The scitex/writer/ directory is not included in the template - it's cloned on-demand to maintain independent git history.

Initialize Writer Project
# Quick setup (recommended)
make setup-writer

# Or with specific git strategy
./management/scripts/setup-writer.sh --git-strategy child

Git Strategies:

Strategy Description
parent Use parent repository (default)
child Create isolated git in writer directory
origin Preserve template's original git history
none No git initialization

After initialization:

scitex/writer/
├── 00_shared/           # Shared resources (title, authors, bibliography)
├── 01_manuscript/       # Main manuscript
├── 02_supplementary/    # Supplementary materials
├── 03_revision/         # Revision responses
└── compile.sh           # LaTeX compilation script

Compile manuscript:

cd scitex/writer
./compile.sh manuscript    # or: scitex writer compile manuscript

Symlink Architecture

This template uses symbolic links for DRY (Don't Repeat Yourself) principles and data provenance.

Environment Symlinks - Shared user configurations
.venv -> ~/.venv          # Shared Python virtual environment

Points to user-level virtual environment, making the template portable across projects.

AI Prompts Centralization - Single access point for all prompts
scitex/ai/prompts/
├── writer  -> ../../writer/ai/prompts
├── scholar -> ../../scholar/ai/prompts
├── code    -> ../../code/ai/prompts
└── vis     -> ../../vis/ai/prompts

All AI prompts are accessible from a central location while being organized by module.

Writer Shared Resources - Edit once, sync everywhere (after make setup-writer)
scitex/writer/                       # Created by: make setup-writer
├── 00_shared/                       # Single source of truth
│   ├── title.tex
│   ├── authors.tex
│   ├── bibliography.bib
│   ├── keywords.tex
│   └── latex_styles/
├── 01_manuscript/contents/
│   ├── title.tex        -> ../../00_shared/title.tex
│   ├── authors.tex      -> ../../00_shared/authors.tex
│   └── bibliography.bib -> ../../00_shared/bibliography.bib
├── 02_supplementary/contents/       # Same symlinks
└── 03_revision/contents/            # Same symlinks

Edit 00_shared/ once, and all manuscript sections stay synchronized.

Script Output to Data Links - Provenance tracking
data/mnist/
├── train_flattened.npy -> ../../scripts/mnist/download_out/data/mnist/train_flattened.npy
├── test_labels.npy     -> ../../scripts/mnist/download_out/data/mnist/test_labels.npy
├── models/
│   └── mnist_svm.pkl   -> ../../../scripts/mnist/clf_svm_out/data/mnist/models/mnist_svm.pkl
└── figures/
    └── umap.jpg        -> ../../../scripts/mnist/plot_umap_space_out/data/mnist/figures/umap.jpg

Script outputs (*_out/) are symlinked to data/, providing:

  • Provenance: Know which script generated each file
  • Central access: All data accessible from data/ directory
  • Reproducibility: Re-run script to regenerate linked output

Creating New Scripts

Script Template & Conventions

Use the template as a starting point:

# Copy template
cp scripts/template.py scripts/my_analysis/01_preprocess.py

# Edit and run
python scripts/my_analysis/01_preprocess.py

Conventions:

  • Numbered prefix for execution order: 01_, 02_, etc.
  • Output directory: {script_name}_out/
  • Use main.sh to orchestrate multiple steps

MNIST Example Pipeline:

scripts/mnist/
├── 01_download.py         # Download MNIST dataset
├── 02_plot_digits.py      # Visualize sample digits
├── 03_plot_umap_space.py  # UMAP dimensionality reduction
├── 04_clf_svm.py          # Train SVM classifier
├── 05_plot_conf_mat.py    # Plot confusion matrix
└── main.sh                # Run all steps sequentially

Core Modules for Reproducibility

This template is powered by two essential SciTeX modules that ensure reproducible, standardized research.

scitex.io - Universal I/O with automatic symlinks

Philosophy: "Load and save anything with one function"

import scitex as stx

# Universal interface - format auto-detected from extension
data = stx.io.load("data.csv")      # DataFrame
model = stx.io.load("model.pth")    # PyTorch state
config = stx.io.load("config.yaml") # Dict

# Save with automatic directory creation
stx.io.save(df, "results.parquet")
stx.io.save(fig, "figure.png", dpi=300, auto_crop=True)

Automatic Path Resolution:

When using relative paths, stx.io.save() automatically organizes outputs under {script_name}_out/:

# File: scripts/mnist/01_download.py

stx.io.save(train_data, "data/mnist/train.npy")
#                        ↓ Relative path
# Actual save location:  scripts/mnist/download_out/data/mnist/train.npy
#                        ^^^^^^^^^^^^^^^^^^^^^^^^^ Auto-generated from script name

Symlink Parameters for Data Centralization:

# In scripts/mnist/01_download.py
stx.io.save(
    train_data,
    "data/mnist/train.npy",           # Saved to: scripts/mnist/download_out/data/mnist/train.npy
    symlink_to="../../data/mnist/"    # Symlinked to: ./data/mnist/train.npy (relative to script_out)
)
Parameter Description
symlink_to Create symlink at specified path (relative to output location)
symlink_from_cwd Create symlink from current working directory

Path Resolution Rules:

  1. Relative path (e.g., "data/file.npy") → Saves to {script}_out/{path}
  2. Absolute path (e.g., "/tmp/file.npy") → Saves to exact path
  3. With @stx.session → Saves under session directory (e.g., script_out/RUNNING/A7K2/)

This enables the provenance-tracking symlink architecture shown above.

Supported Formats (27+):

  • Data: csv, tsv, parquet, json, yaml, pkl, joblib
  • Arrays: npy, npz, hdf5, zarr, mat, nc
  • ML: pth, pt, cbm (CatBoost), optuna
  • Documents: txt, md, pdf, docx, xml, bib
  • Images: png, jpg, tiff, gif (with auto-crop, metadata embedding)
  • Bundles: figz, pltz, statsz
scitex.session - Experiment lifecycle management

Philosophy: "Every run is reproducible and traceable"

import scitex as stx

@stx.session(seed=42)
def main(
    CONFIG=stx.INJECTED,      # Session metadata (ID, paths, timestamps)
    plt=stx.INJECTED,         # Configured matplotlib
    COLORS=stx.INJECTED,      # Color palette
    rng_manager=stx.INJECTED, # Reproducible RNG (numpy, torch, random)
    logger=stx.INJECTED,      # Auto-logging to files
):
    print(f"Session: {CONFIG['ID']}")  # e.g., "A7K2"

    # All stdout/stderr captured to logs/
    # Random seeds fixed across all libraries
    # Output directory auto-managed

if __name__ == "__main__":
    main()

Session Directory Structure:

script.py
script_out/
├── RUNNING/              # Active sessions
│   └── A7K2/             # 4-char session ID
│       ├── logs/
│       │   ├── stdout.log
│       │   └── stderr.log
│       ├── CONFIGS/
│       │   └── CONFIG.yaml
│       └── [your outputs]
├── FINISHED_SUCCESS/     # Completed successfully
└── FINISHED_ERROR/       # Completed with errors

Features:

  • Unique session IDs for every run
  • Automatic stdout/stderr capture
  • Fixed random seeds (numpy, torch, random, os)
  • Runtime tracking and timestamps
  • Exit status classification
How They Work Together - Complete workflow
┌─────────────────────────────────────────────────────────────────┐
│  scripts/mnist/01_download.py                                   │
│                                                                  │
│  @stx.session(seed=42)                                          │
│  def main(CONFIG, plt, ...):                                    │
│      data = download_mnist()                                    │
│      stx.io.save(data, "data/mnist/train.npy",                 │
│                  symlink_to="../../data/mnist/")               │
│                                                                  │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│  scripts/mnist/download_out/                                     │
│  └── FINISHED_SUCCESS/                                          │
│      └── A7K2/                    <- Session tracking           │
│          ├── logs/                                              │
│          │   ├── stdout.log       <- All prints captured        │
│          │   └── stderr.log                                     │
│          ├── CONFIGS/                                           │
│          │   └── CONFIG.yaml      <- Reproducibility record     │
│          └── data/mnist/                                        │
│              └── train.npy        <- Actual file                │
│                                                                  │
└────────────────────────┬────────────────────────────────────────┘
                         │ symlink
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│  data/mnist/                                                     │
│  └── train.npy -> ../../scripts/mnist/download_out/.../train.npy│
│                                                                  │
│  Central access with provenance tracking                        │
└─────────────────────────────────────────────────────────────────┘

Benefits:

  • Provenance: Every file traces back to its generating script and session
  • Reproducibility: Re-run with same seed = same results
  • Central Access: All data accessible from ./data/
  • Logging: Complete record of every run

Key Features

  • Standardized structure for reproducible research
  • Automated workflows via Makefile
  • Manuscript management with LaTeX compilation
  • Testing framework included
  • Figure provenance tracking via symlinks
  • AI integration for coding assistance

Documentation

scitex-code Examples (for deeper understanding):

License

AGPL-3.0

Contact

Yusuke Watanabe (ywatanabe@scitex.ai)

About

Boilerplate template for scientific research projects using the SciTeX framework. MNIST example pipeline included.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors