llm
- use gemma 3-27b qat to replace qwen in AI server
- get multimodal to work on gemma
- fix the bot cannot switch back to user context
- add actual memory to the bot (maybe full text search on vector database via tool calling)
- allow bot to reference past image upload
txt2img / img2img
- add chroma support
- add hidream i1 and hidream e1 support
- migrate flux execution to comfy backend
- add redux support
- add controlnet support
- remaining speed up tech
- use flash attention or sage attention (+15% it/s)
- use torch.compile or tensorrt (storage hog, +100% it/s)
- fp16-accumulation (require beta pytorch, +20% it/s)
- use negative adaptive guidance to skip uncond (neg prompt will be less effective, +90% it/s)
- fix an undisclosed exploit
- allow adetailer to use groundingdino
img2model
- fix trellis being broken with pytorch 2.7
- add triposg support (higher res mesh)
- add mv-adapter support (dedicated img2texture/img2multi-view)
txt2vid / img2vid / ref2vid
- ref2vid (using VACE, consider it controlnet for txt2vid)
- remaining speed up technique
- attempt longer vid gen (+128 frames)
- skyreel vid2vid workflow
- framepack (ughhhhhhhhh)
- get self-forcing dmd VACE to work (result is significantly worse but its 2-2.5 times faster than current workflow, and can be further accelarated with teacache)
- investigate better prompt adherance (especially on img2vid)
- LoRA support
- More model (if my storage can handle)
interrogate
- why florence takes forever to load???
mapperatorinator
llm
txt2img / img2img
img2model
txt2vid / img2vid / ref2vid
interrogate
mapperatorinator