From aec6b350cc36e3291bcba91a288678fb2c215826 Mon Sep 17 00:00:00 2001
From: Austin Macdonald <austin@dartmouth.edu>
Date: Tue, 2 Sep 2025 15:37:27 -0500
Subject: [PATCH 1/9] Add hello-world and basic datalad-pair tutorial

Tested against typhon
---
 docs/source/index.rst        |   1 +
 docs/source/tutorial-ssh.rst | 178 +++++++++++++++++++++++++++++++++++
 2 files changed, 179 insertions(+)
 create mode 100644 docs/source/tutorial-ssh.rst

diff --git a/docs/source/index.rst b/docs/source/index.rst
index bd70fa23..604f222b 100644
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -5,6 +5,7 @@ ReproMan |---| tools for reproducible neuroimaging
    :maxdepth: 1
 
    overview
+   tutorial-ssh
    acknowledgements
 
 Concepts and technologies
diff --git a/docs/source/tutorial-ssh.rst b/docs/source/tutorial-ssh.rst
new file mode 100644
index 00000000..c12ba044
--- /dev/null
+++ b/docs/source/tutorial-ssh.rst
@@ -0,0 +1,178 @@
+.. _tutorial-ssh:
+
+Tutorial: SSH Resource Workflows
+*********************************
+
+This tutorial walks you through ReproMan workflows using SSH resources, from simple command execution to complex data analysis.
+We'll start with a basic hello-world example, then progress to processing neuroimaging data.
+
+This tutorial demonstrates ReproMan's power in creating reproducible, traceable computational workflows across SSH-accessible computing environments.
+
+Overview
+========
+
+We'll cover two workflows:
+
+**Part 1: Hello World Example**
+1. Create a ReproMan SSH resource  
+2. Execute a simple command remotely
+3. Fetch and examine results
+
+**Part 2: Dataset Analysis Example**
+1. Set up a DataLad dataset with input data
+2. Execute MRIQC quality control analysis remotely  
+3. Collect and examine results with full provenance
+
+Prerequisites
+=============
+
+- ReproMan installed (``pip install reproman``) 
+- Access to a remote server via SSH
+- For Part 2: DataLad support (``pip install 'reproman[full]'``)
+
+Part 1: Hello World Example
+============================
+
+Step 1: Create an SSH Resource
+-------------------------------
+
+First, let's add an SSH resource to ReproMan's inventory. Replace ``your-server.edu`` with your actual server::
+
+  reproman create myserver --resource-type ssh --backend-parameters host=your-server.edu
+
+Verify the resource was created::
+
+  reproman ls --refresh
+
+.. note::
+
+   The ``--refresh`` flag is needed to check the current status of resources. Without it, you'll only see cached status information.
+
+You should see output similar to::
+
+  RESOURCE NAME        TYPE                 ID                  STATUS
+  -------------        ----                 --                  ------
+  myserver             ssh                  1a23b456-789c-      ONLINE
+
+Step 2: Execute a Simple Command
+---------------------------------
+
+Let's start with a simple test to verify our setup works. Create a working directory and run a basic command::
+
+  mkdir -p hello-world
+  cd hello-world
+  
+  reproman run --resource myserver \
+    --sub local \
+    --orc plain \
+    --output results \
+    sh -c 'mkdir -p results && echo "Hello from ReproMan on $(hostname)" > results/hello.txt'
+
+
+Step 3: Fetch Results
+---------------------
+
+The job will execute on the remote. To check status and fetch results::
+
+  # Check job status and get job ID
+  reproman jobs
+
+  # Fetch results for completed job (replace JOB_ID with actual ID)
+  reproman jobs JOB_ID
+
+When you run ``reproman jobs JOB_ID``, ReproMan will automatically:
+
+- Fetch the output files from the remote to your local working directory
+- Display job information and logs  
+- Unregister the completed job
+
+You should now see the results locally::
+
+  cat results/hello.txt
+
+.. note::
+
+   ReproMan creates a working directory on the remote resource automatically. By default, it uses ``~/.reproman/run-root`` on the remote. You can verify the file exists there with ``reproman login myserver``.
+
+Part 2: Dataset Analysis Example  
+=================================
+
+Now let's try a more realistic example with DataLad dataset management and neuroimaging analysis.
+
+Step 1: Set Up the Analysis Dataset
+------------------------------------
+
+Create a new DataLad dataset for our analysis::
+
+  # Create dataset for MRIQC quality control results
+  datalad create -d demo-mriqc -c text2git
+  cd demo-mriqc
+
+Install input data (using a demo BIDS dataset)::
+
+  # TODO does this have to be fetched locally? i think no?
+  # Install demo neuroimaging dataset  
+  datalad install -d . -s https://github.com/ReproNim/ds000003-demo sourcedata/raw
+
+
+Set up working directory to be ignored::
+
+  # TODO oneline with datalad run
+  echo "workdir/" > .gitignore
+  datalad save -m "Ignore processing workdir" .gitignore
+
+Step 2: Execute Analysis with DataLad Integration
+-------------------------------------------------
+
+For full provenance tracking with DataLad::
+
+  reproman run --resource myserver \
+    --sub local \
+    --orc datalad-pair-run \
+    --input sourcedata/raw \
+    --output . \
+    bash -c 'podman run --rm -v "$(pwd):/work:rw" poldracklab/mriqc:latest /work/sourcedata/raw /work/results participant group --participant-label 02'
+
+Step 3: Monitor Execution
+-------------------------
+
+ReproMan jobs run in detached mode by default. Monitor progress::
+
+  # List all jobs
+  reproman jobs
+
+  # Check specific job status (replace JOB_ID with actual ID)
+  reproman jobs JOB_ID
+
+  # Fetch completed job results
+  reproman jobs JOB_ID --fetch
+
+For attached execution (wait for completion)::
+
+  reproman run --resource myserver --follow \
+    [... rest of command ...]
+
+Step 4: Examine Results and Provenance
+--------------------------------------
+
+Once the job completes, examine what was captured::
+
+  # View the provenance record
+  git log --oneline -1
+
+  # Look at captured job information
+  ls .reproman/jobs/myserver/
+
+  # View job specification
+  cat .reproman/jobs/myserver/JOB_ID/spec.yaml
+
+  # Check MRIQC outputs
+  ls -la results/
+
+The DataLad orchestrators create rich provenance records::
+
+  # View the detailed run record
+  git show --stat
+
+  # See what files were modified/added
+  git show --name-status

From 9dfebc6cde3aa2335f070c5a220eaa4c9ab720d1 Mon Sep 17 00:00:00 2001
From: Austin Macdonald <austin@dartmouth.edu>
Date: Wed, 3 Sep 2025 12:06:15 -0500
Subject: [PATCH 2/9] fixup: nipreps repo for mriqc container

---
 docs/source/execution.rst    | 70 ++++++++++++++++++++++--------------
 docs/source/tutorial-ssh.rst |  2 +-
 2 files changed, 44 insertions(+), 28 deletions(-)

diff --git a/docs/source/execution.rst b/docs/source/execution.rst
index 474fc292..5905eec3 100644
--- a/docs/source/execution.rst
+++ b/docs/source/execution.rst
@@ -53,10 +53,8 @@ necessary).
 Choosing an orchestrator
 ------------------------
 
-Before running a command, we need to decide on an orchestrator. The
-orchestrator is responsible for the first and third :ref:`tasks above
-<rr-tasks>`, preparing the remote and collecting the results. The complete
-set of orchestrators, accompanied by descriptions, can be seen by
+Orchestrators are responsible for preparing the remote and collecting the results.
+ The complete set of orchestrators, accompanied by descriptions, can be seen by
 calling ``reproman run --list=orchestrators``.
 
 .. note::
@@ -66,29 +64,47 @@ calling ``reproman run --list=orchestrators``.
    only a limited set of functionality is available. If you are new to
    DataLad, consider reading the `DataLad handbook`_.
 
-The main orchestrator choices are ``datalad-pair``,
-``datalad-pair-run``, and ``datalad-local-run``. If the remote has
-DataLad available, you should go with one of the ``datalad-pair*`` orchestrators.
-These will sync your local dataset with a dataset on the remote machine
-(using `datalad push`_), creating one if it doesn't already exist
-(using `datalad create-sibling`_).
-
-``datalad-pair`` differs from the ``datalad-*-run`` orchestrators in the
-way it captures results. After execution has completed, ``datalad-pair``
-commits the result *on the remote* via DataLad. On fetch, it will pull
-that commit down with `datalad update`_. Outputs (specified via
-``--outputs`` or as a job parameter) are retrieved with `datalad get`_.
-
-``datalad-pair-run`` and ``datalad-local-run``, on the other hand,
-determine a list of output files based on modification times and
-packages these files in a tarball. (This approach is inspired by
-`datalad-htcondor`_.) On fetch, this tarball is downloaded locally and
-used to create a `datalad run`_ commit in the *local* repository.
-
-There is one more orchestrator, ``datalad-no-remote``, that is designed
-to work only with a local shell resource. It is similar to
-``datalad-pair``, except that the command is executed in the same
-directory from which ``reproman run`` is invoked.
+Choose the orchestrator based on your setup and needs:
+
+**For remote resources with DataLad (recommended):**
+
+- **``datalad-pair``** - Best for persistent remote datasets
+  
+  - Creates and maintains DataLad datasets on the remote
+  - Commits results directly on the remote with full provenance
+  - Retrieves results using `datalad update`_ and `datalad get`_
+  - Marks completed jobs with git refs (refs/reproman/JOBID)
+
+- **``datalad-pair-run``** - Best for capturing runs in local dataset
+  
+  - Prepares remote dataset like ``datalad-pair``
+  - Packages results in tarball based on file modification times  
+  - Creates a `datalad run`_ commit in your *local* repository
+  - Marks local commit with git ref (refs/reproman/JOBID)
+
+**For remote resources without DataLad:**
+
+- **``datalad-local-run``** - Remote execution, local DataLad integration
+  
+  - Uses plain remote directory (no DataLad on remote required)
+  - Captures results as `datalad run`_ commit locally
+  - Good when remote lacks DataLad but you want local provenance
+
+- **``plain``** - Simple remote execution
+  
+  - Basic file transfer using session.put() and session.get()
+  - No DataLad integration or provenance tracking
+  - Creates working directory named with job ID
+  - Sufficient for simple tasks but DataLad orchestrators recommended
+
+**For local execution:**
+
+- **``datalad-no-remote``** - Local dataset execution
+  
+  - Executes in current local dataset directory
+  - Behaves like ``datalad-pair`` but stays local
+  - Available for local shell resources only
+  - Good for testing workflows locally
 
 Revisiting :ref:`our concrete example <rr-refex>` and assuming we have
 an SSH resource named "foo" in our inventory, here's how we could
diff --git a/docs/source/tutorial-ssh.rst b/docs/source/tutorial-ssh.rst
index c12ba044..650643f7 100644
--- a/docs/source/tutorial-ssh.rst
+++ b/docs/source/tutorial-ssh.rst
@@ -131,7 +131,7 @@ For full provenance tracking with DataLad::
     --orc datalad-pair-run \
     --input sourcedata/raw \
     --output . \
-    bash -c 'podman run --rm -v "$(pwd):/work:rw" poldracklab/mriqc:latest /work/sourcedata/raw /work/results participant group --participant-label 02'
+    bash -c 'podman run --rm -v "$(pwd):/work:rw" nipreps/mriqc:latest /work/sourcedata/raw /work/results participant group --participant-label 02'
 
 Step 3: Monitor Execution
 -------------------------

From 711e3421069a60e5e21eddcece3c276a16172f7f Mon Sep 17 00:00:00 2001
From: Austin Macdonald <austin@dartmouth.edu>
Date: Wed, 3 Sep 2025 13:17:14 -0500
Subject: [PATCH 3/9] Use datalad run for workdir setup

Co-Authored-By: Claude <noreply@anthropic.com>
---
 docs/source/tutorial-ssh.rst | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/docs/source/tutorial-ssh.rst b/docs/source/tutorial-ssh.rst
index 650643f7..93634a29 100644
--- a/docs/source/tutorial-ssh.rst
+++ b/docs/source/tutorial-ssh.rst
@@ -117,9 +117,7 @@ Install input data (using a demo BIDS dataset)::
 
 Set up working directory to be ignored::
 
-  # TODO oneline with datalad run
-  echo "workdir/" > .gitignore
-  datalad save -m "Ignore processing workdir" .gitignore
+  datalad run -m "Ignore processing workdir" 'echo "workdir/" > .gitignore'
 
 Step 2: Execute Analysis with DataLad Integration
 -------------------------------------------------

From 2f3b48af6fcc1355c6f043c7be37c1ea8b8b05fc Mon Sep 17 00:00:00 2001
From: Austin Macdonald <austin@dartmouth.edu>
Date: Wed, 3 Sep 2025 13:21:29 -0500
Subject: [PATCH 4/9] Add brief explanation of datalad install to tutorial

---
 docs/source/tutorial-ssh.rst | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/docs/source/tutorial-ssh.rst b/docs/source/tutorial-ssh.rst
index 93634a29..f939eb25 100644
--- a/docs/source/tutorial-ssh.rst
+++ b/docs/source/tutorial-ssh.rst
@@ -110,10 +110,14 @@ Create a new DataLad dataset for our analysis::
 
 Install input data (using a demo BIDS dataset)::
 
-  # TODO does this have to be fetched locally? i think no?
   # Install demo neuroimaging dataset  
   datalad install -d . -s https://github.com/ReproNim/ds000003-demo sourcedata/raw
 
+.. note::
+   This only installs the dataset structure - the actual data files are not 
+   downloaded locally. DataLad will automatically fetch any data specified 
+   by `--input` when the analysis runs.
+
 
 Set up working directory to be ignored::
 

From d5554b4df6d470cb6d9624956e660ff72d90c694 Mon Sep 17 00:00:00 2001
From: Austin Macdonald <austin@dartmouth.edu>
Date: Wed, 3 Sep 2025 13:22:52 -0500
Subject: [PATCH 5/9] add newbie docker/podman explanation of volume mounts

---
 docs/source/tutorial-ssh.rst | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/docs/source/tutorial-ssh.rst b/docs/source/tutorial-ssh.rst
index f939eb25..a09ff721 100644
--- a/docs/source/tutorial-ssh.rst
+++ b/docs/source/tutorial-ssh.rst
@@ -135,6 +135,11 @@ For full provenance tracking with DataLad::
     --output . \
     bash -c 'podman run --rm -v "$(pwd):/work:rw" nipreps/mriqc:latest /work/sourcedata/raw /work/results participant group --participant-label 02'
 
+.. note::
+   The ``-v "$(pwd):/work:rw"`` part mounts your current directory into the 
+   container at ``/work``, allowing the containerized software to access the
+   top level dataset.
+
 Step 3: Monitor Execution
 -------------------------
 

From a1a5e5c8285f4f29ab244341fd31067d9447a052 Mon Sep 17 00:00:00 2001
From: Austin Macdonald <austin@dartmouth.edu>
Date: Wed, 3 Sep 2025 13:26:27 -0500
Subject: [PATCH 6/9] use full length option names in tutorial

---
 docs/source/tutorial-ssh.rst | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/docs/source/tutorial-ssh.rst b/docs/source/tutorial-ssh.rst
index a09ff721..0d114a8c 100644
--- a/docs/source/tutorial-ssh.rst
+++ b/docs/source/tutorial-ssh.rst
@@ -63,8 +63,8 @@ Let's start with a simple test to verify our setup works. Create a working direc
   cd hello-world
   
   reproman run --resource myserver \
-    --sub local \
-    --orc plain \
+    --submitter local \
+    --orchestrator plain \
     --output results \
     sh -c 'mkdir -p results && echo "Hello from ReproMan on $(hostname)" > results/hello.txt'
 
@@ -129,8 +129,8 @@ Step 2: Execute Analysis with DataLad Integration
 For full provenance tracking with DataLad::
 
   reproman run --resource myserver \
-    --sub local \
-    --orc datalad-pair-run \
+    --submitter local \
+    --orchestrator datalad-pair-run \
     --input sourcedata/raw \
     --output . \
     bash -c 'podman run --rm -v "$(pwd):/work:rw" nipreps/mriqc:latest /work/sourcedata/raw /work/results participant group --participant-label 02'

From e2687d9e049e0b6f137fd8e964474fad2dfe5cf3 Mon Sep 17 00:00:00 2001
From: Austin Macdonald <austin@dartmouth.edu>
Date: Wed, 3 Sep 2025 13:29:17 -0500
Subject: [PATCH 7/9] fixup list spacing

---
 docs/source/tutorial-ssh.rst | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/docs/source/tutorial-ssh.rst b/docs/source/tutorial-ssh.rst
index 0d114a8c..9c15539c 100644
--- a/docs/source/tutorial-ssh.rst
+++ b/docs/source/tutorial-ssh.rst
@@ -14,11 +14,13 @@ Overview
 We'll cover two workflows:
 
 **Part 1: Hello World Example**
+
 1. Create a ReproMan SSH resource  
 2. Execute a simple command remotely
 3. Fetch and examine results
 
 **Part 2: Dataset Analysis Example**
+
 1. Set up a DataLad dataset with input data
 2. Execute MRIQC quality control analysis remotely  
 3. Collect and examine results with full provenance

From 286744e2cf98267a052a29e579489d9f5579be5f Mon Sep 17 00:00:00 2001
From: Austin Macdonald <austin@dartmouth.edu>
Date: Wed, 3 Sep 2025 13:33:45 -0500
Subject: [PATCH 8/9] clarify requirements on local vs remote

---
 docs/source/tutorial-ssh.rst | 28 +++++++++++++++++-----------
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/docs/source/tutorial-ssh.rst b/docs/source/tutorial-ssh.rst
index 9c15539c..0fd7bb69 100644
--- a/docs/source/tutorial-ssh.rst
+++ b/docs/source/tutorial-ssh.rst
@@ -15,22 +15,28 @@ We'll cover two workflows:
 
 **Part 1: Hello World Example**
 
-1. Create a ReproMan SSH resource  
+1. Create a ReproMan SSH resource
 2. Execute a simple command remotely
 3. Fetch and examine results
 
 **Part 2: Dataset Analysis Example**
 
 1. Set up a DataLad dataset with input data
-2. Execute MRIQC quality control analysis remotely  
+2. Execute MRIQC quality control analysis remotely
 3. Collect and examine results with full provenance
 
 Prerequisites
 =============
 
-- ReproMan installed (``pip install reproman``) 
+For Part 1:
+
+- ReproMan installed on local machine (``pip install reproman``)
 - Access to a remote server via SSH
-- For Part 2: DataLad support (``pip install 'reproman[full]'``)
+
+For Part 2:
+
+- DataLad support (``pip install 'reproman[full]'``)
+- DataLad installed on remote server
 
 Part 1: Hello World Example
 ============================
@@ -63,7 +69,7 @@ Let's start with a simple test to verify our setup works. Create a working direc
 
   mkdir -p hello-world
   cd hello-world
-  
+
   reproman run --resource myserver \
     --submitter local \
     --orchestrator plain \
@@ -85,7 +91,7 @@ The job will execute on the remote. To check status and fetch results::
 When you run ``reproman jobs JOB_ID``, ReproMan will automatically:
 
 - Fetch the output files from the remote to your local working directory
-- Display job information and logs  
+- Display job information and logs
 - Unregister the completed job
 
 You should now see the results locally::
@@ -96,7 +102,7 @@ You should now see the results locally::
 
    ReproMan creates a working directory on the remote resource automatically. By default, it uses ``~/.reproman/run-root`` on the remote. You can verify the file exists there with ``reproman login myserver``.
 
-Part 2: Dataset Analysis Example  
+Part 2: Dataset Analysis Example
 =================================
 
 Now let's try a more realistic example with DataLad dataset management and neuroimaging analysis.
@@ -112,12 +118,12 @@ Create a new DataLad dataset for our analysis::
 
 Install input data (using a demo BIDS dataset)::
 
-  # Install demo neuroimaging dataset  
+  # Install demo neuroimaging dataset
   datalad install -d . -s https://github.com/ReproNim/ds000003-demo sourcedata/raw
 
 .. note::
-   This only installs the dataset structure - the actual data files are not 
-   downloaded locally. DataLad will automatically fetch any data specified 
+   This only installs the dataset structure - the actual data files are not
+   downloaded locally. DataLad will automatically fetch any data specified
    by `--input` when the analysis runs.
 
 
@@ -138,7 +144,7 @@ For full provenance tracking with DataLad::
     bash -c 'podman run --rm -v "$(pwd):/work:rw" nipreps/mriqc:latest /work/sourcedata/raw /work/results participant group --participant-label 02'
 
 .. note::
-   The ``-v "$(pwd):/work:rw"`` part mounts your current directory into the 
+   The ``-v "$(pwd):/work:rw"`` part mounts your current directory into the
    container at ``/work``, allowing the containerized software to access the
    top level dataset.
 

From f4c81aefc5f165b43778ad2d9315540bd28b54e0 Mon Sep 17 00:00:00 2001
From: Austin Macdonald <austin@dartmouth.edu>
Date: Fri, 12 Sep 2025 07:44:21 -0500
Subject: [PATCH 9/9] fixup: spacing

---
 docs/source/execution.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/source/execution.rst b/docs/source/execution.rst
index 5905eec3..fec66d62 100644
--- a/docs/source/execution.rst
+++ b/docs/source/execution.rst
@@ -54,7 +54,7 @@ Choosing an orchestrator
 ------------------------
 
 Orchestrators are responsible for preparing the remote and collecting the results.
- The complete set of orchestrators, accompanied by descriptions, can be seen by
+The complete set of orchestrators, accompanied by descriptions, can be seen by
 calling ``reproman run --list=orchestrators``.
 
 .. note::