-
Notifications
You must be signed in to change notification settings - Fork 15
Open
Description
Thank you for sharing this excellent work!
I've been experimenting with the approach and observed that while the Transformer architecture supports multi-dataset training, the performance tends to be suboptimal when employing a sequential training strategy across individual datasets.
I would like to kindly ask if there are recommended techniques or best practices to mitigate this issue.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels