Formerly Big Grab
CaDU Turns Discord chat exports into anonymized user-assistant pairs for LLM fine-tuning.
python ./discord-clone.py <USERID> [--input-files <PATH:cwd>] [--timeout <MINS:10>] [--include-embeds <:false>]
When no --input-files argument is given, it will default to the current working directory.
The resulting output will be saved to ./paired/<USERID>.json.
This program does not perform any in-message PII filtering whatsoever.
There is a possibility that due to overfitting, trained models may parrot input data when given its corresponding prompt.
As such, feeding DM's or private/secure channels into CaDU is considered risky, and is not reccomended.