Bounded Logit Perturbation Channels for LLM-Guided Reinforcement Learning
We introduce ALA (Asynchronous LLM Advisor), a novel architecture that enables large language models to provide real-time strategic guidance to reinforcement learning agents via bounded logit perturbation channels. Unlike approaches that replace RL policies with LLM decisions or use LLMs only for pre-training, ALA creates a continuous, asynchronous advisory channel that nudges agent behavior while preserving learned policies.
Key innovations include:
- Time-bounded bias expiration — Stale advice automatically expires
- Multi-advisor voting — Parallel LLM queries with priority-weighted selection
- Importance sampling correction — Maintains unbiased PPO gradients despite biased action sampling
Cahlen Humphreys
Enfuse Labs
ch@enfuse.io
- PDF: ala-paper.pdf
- LaTeX Source: ala-paper.tex
- Bibliography: references.bib
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ DGX Spark │ │ RTX 5090 │ │ Jetson Orin │
│ (LLM) │────▶│ (Router) │────▶│ (Actor) │
│ │ │ │ │ │
│ GPT-OSS-20B │ │ Bounds & routes │ │ logits += bias │
│ ~30 tok/s │ │ biases │ │ action=softmax │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
3-5 sec <50ms ~10-15ms
async sync sync
The importance sampling correction for PPO with ALA biases:
ratio = exp(log π_new(a|s) - log π_old(a|s) - β × bias[a])
This ensures unbiased policy gradients despite biased action sampling.
| Component | Role | Specs |
|---|---|---|
| NVIDIA DGX Spark | LLM Server | GPT-OSS-20B, 128GB HBM |
| NVIDIA RTX 5090 | Learner + ALA Router | 32GB VRAM |
| NVIDIA Jetson Orin AGX | Actor (20 bots) | 64GB unified memory |
@software{humphreys2026ala,
author = {Humphreys, Cahlen},
title = {ALA: Asynchronous LLM Advisor for Real-Time Guidance in Reinforcement Learning},
year = 2026,
publisher = {Zenodo},
doi = {10.5281/zenodo.18172889},
url = {https://doi.org/10.5281/zenodo.18172889}
}This work is licensed under a Creative Commons Attribution 4.0 International License.
- Live System: mc.enfuse.ai
- Twitch: twitch.tv/enfuseio
- YouTube: youtube.com/@Enfuseio