Skip to content

Add SupTrain: Decentralized AI training coordination module#73

Draft
Copilot wants to merge 5 commits intomasterfrom
copilot/add-suptrain-toolbar-button
Draft

Add SupTrain: Decentralized AI training coordination module#73
Copilot wants to merge 5 commits intomasterfrom
copilot/add-suptrain-toolbar-button

Conversation

Copy link
Contributor

Copilot AI commented Feb 17, 2026

Implements decentralized AI training coordination using IPFS for model/delta distribution and blockchain keywords for worker discovery. Users discover jobs via #suptrain hashtags, train locally with their GPU, and publish LoRA deltas back to the network.

UI Integration

  • New toolbar button (btnSupTrain with 🤖 emoji) opens modeless form
  • 5-tab interface: Discover → Configure → Run → Publish → Monitor
  • Follows SupFlix pattern: Task.Run for heavy ops, Invoke for UI updates, FlowLayoutPanel for lazy-loaded lists

Service Layer

SupTrainService.cs - Core coordination logic:

  • SearchJobsByKeywordAsync() - Parse blockchain messages for job announcements
  • PublishUpdateAsync() - Post worker updates with keyword protocol
  • IPFS helpers via ProcessStartInfo (matches existing patterns)

SupTrainModels.cs - Protocol definitions:

  • Job manifests, worker updates, aggregate checkpoints
  • Keyword constants: #suptrain #update #job:<id> #round:<n> #delta:<cid>

Training Workers

Python CLI scripts launched as external processes:

worker.py - Produces LoRA delta + metrics from local data

python worker.py --job-id job123 --base-checkpoint Qm... \
  --data ./mydata --epochs 1 --lr 0.0001 --output-dir ./out

aggregate.py - Merges worker deltas with weighted averaging

python aggregate.py --job-id job123 --round 1 \
  --update-dirs ./w1,./w2,./w3 --output-dir ./merged

Protocol Design

All messages use structured hashtags for discoverability:

Job Genesis:    #suptrain #jobgenesis #model:<slug> #job:<id> #checkpoint:<cid> #manifest:<cid>
Worker Update:  #suptrain #update #job:<id> #round:<n> #base:<baseCid> #delta:<cid> #metrics:<cid>
Aggregate:      #suptrain #aggregate #job:<id> #round:<n> #checkpoint:<cid> #inputs:<cid>

Large artifacts (models, deltas) stored on IPFS; only CIDs posted on-chain.

Integration Points

TODOs marked for connection to existing Sup!? infrastructure:

  • SearchJobsByKeywordAsync() needs hookup to Root.GetPublicAddressByKeyword()
  • PublishUpdateAsync() needs RPC transaction posting
  • Python workers ready for PyTorch/PEFT integration (currently simulate training)

Stats

  • 14 files: 3,175 lines (1,891 C#, 481 Python, 803 docs)
  • Zero security vulnerabilities (CodeQL verified)
  • Minimal surface area: only SupMain modified for button integration
Original prompt

Goal

Add a new integrated module called SupTrain that appears as a new toolbar button in SupMain (like the existing SupFlix button). Clicking it opens a new form/window that lets a Sup!? user perform decentralized AI training using IPFS and keyword coordination.

Key User Flow (Must be Easy!)

  1. Discover a training job via Sup!? #keywords
  2. Download the base model checkpoint + training manifest from IPFS (CID)
  3. Select local training data (files/folders) and/or opt into shards referenced by CID
  4. Run local GPU training (on laptop GPU) producing a small delta update (LoRA or diff)
  5. Upload the delta + metrics to IPFS
  6. Announce/etch an update message to the Sup!? network using #keywords so others can find it
  7. Monitor the round: see latest checkpoint, round status, updates, aggregator announcements
  8. Support resume/continue training by picking up any posted checkpoint CID + manifest CID

Simple workflow: "Choose Job" → "Choose Data" → "Start Training" → "Publish Update"


Phase 1: UI Integration (Button like SupFlix)

Tasks

  1. Add a new toolbar button in SupMain.Designer.cs

    • Name: btnSupTrain
    • Icon: pick any placeholder image from includes/ or reuse an existing one
    • Tooltip text similar to others (follow existing tooltip pattern in SupMain constructor)
    • Placement: near btnVideoSearch / btnWorkBench
  2. Wire click handler in SupMain.cs

    • Similar to SupFlix launch behavior:
      • Instantiate SupTrain form and show it modeless
      • Pass in:
        • default search key: current profileURN.Text or #keyword if active
        • current chain mode (testnet vs mainnet) matching how SupFlix uses testnet bool and rpc settings
  3. Create new WinForms Form

    • Files:
      • SupTrain.cs
      • SupTrain.Designer.cs
      • SupTrain.resx
    • Visual layout:
      • Left: "Job Browser"
      • Right: "Local Worker Panel"
      • Bottom: "Log / Status"
    • Use a FlowLayoutPanel pattern similar to SupFlix for listing items and lazy loading batches

Phase 2: SupTrain UX Flow (Simple and friendly)

SupTrain sections (tabs or panels)

A) Discover

  • Input box: txtJobKeyword default to #suptrain or last used
  • Buttons:
    • "Search Jobs"
    • "Use Latest Checkpoint"
  • Results list of "Job Cards":
    • Job name, round, base checkpoint CID, manifest CID, created date, owner/creator
    • Click card sets it as Active Job

B) Configure

  • Show Active Job details:
    • ModelName, Round, BaseCheckpointCID, ManifestCID, EvalCID (optional), PolicyCID (optional)
  • Local data selection:
    • "Add Folder"
    • "Add Files"
    • Show selected paths list
  • Training params (simple):
    • epochs/steps
    • lr (learning rate)
    • batch size
    • precision dropdown (fp16/bf16)
    • output type: LoRA (default) vs "diff"
  • "Dry Run / Validate" button

C) Run

  • "Start Training"
  • Progress bar
  • Live console output box
  • "Stop" button

D) Publish

  • After run:
    • show DeltaCID, MetricsCID
    • "Publish Update" button that posts Sup!? message with keywords

E) Monitor

  • Live feed list:
    • latest #checkpoint announcements
    • #update submissions
    • #aggregate announcements
  • Buttons:
    • "Refresh"
    • "Follow Job Keywords"
    • "Pin Latest Checkpoint"

Phase 3: Keyword + IPFS Protocol (Data Model)

Use a consistent keyword scheme

All SupTrain messages include:

  • #suptrain
  • #model:<ModelSlug> (example: #model:suplm)
  • #job:<JobId> (short hash)
  • #round:<N>

Message types:

  1. Job Genesis

    • #suptrain #jobgenesis #model:<slug> #job:<id> #cid:<jobCID> #manifest:<manifestCID> #checkpoint:<baseCkptCID>
  2. Task/Shard

    • #suptrain #task #job:<id> #round:<n> #shard:<k> #data:<dataCID>
  3. Worker Update

    • #suptrain #update #job:<id> #round:<n> #base:<baseCkptCID> #delta:<deltaCID> #metrics:<metricsCID> #from:<addressOrUrn>
  4. Aggregate Checkpoint

    • #suptrain #aggregate #job:<id> #round:<n> #checkpoint:<newCkptCID> #inputs:<listCID> #metrics:<metricsCID>
  5. Policy / Filter

    • #suptrain #policy #job:<id> #allow:<cid> #deny:<cid> #keywords:<cid>

IPFS payload formats (JSON)

Define plain JSON files and store on IPFS:

  • job.json: model slug, description, license hint, created time, genesis checkpoint CID, manifest CID, eval CID, policy CID, aggregator rules

  • manifest.json: training recipe defaults, tokenizer/config references, allowed data sources, excluded keywords list CID

  • metrics.json: loss curve summary, eval scores, step count, runtime stats, gpu info

  • update.json: base checkpoint CID, delta CID, worker id, params used, hashes of data used

  • aggregate_inputs.json: array of delta CIDs included and weights


Phase 4: Implement Discovery + Publishing using existing Sup!? plumbing

Discovery

Implement:

  • `SupTrainService.SearchJobsByKeyword...

This pull request was created from Copilot chat.


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 4 commits February 17, 2026 01:57
Co-authored-by: embiimob <6278950+embiimob@users.noreply.github.com>
Co-authored-by: embiimob <6278950+embiimob@users.noreply.github.com>
Co-authored-by: embiimob <6278950+embiimob@users.noreply.github.com>
Co-authored-by: embiimob <6278950+embiimob@users.noreply.github.com>
Copilot AI changed the title [WIP] Add new SupTrain toolbar button for decentralized AI training Add SupTrain: Decentralized AI training coordination module Feb 17, 2026
Copilot AI requested a review from embiimob February 17, 2026 02:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments