Great work! While reproducing the results, I found that FLUX could not achieve the reported performance on DrawBench200 as stated in the paper (N=6, O=1, ImageReward=0.9953, CLIP=19.637). Since you didn’t provide the complete evaluation code, I’m not sure whether the issue lies in my own implementation.
I used the following command for generation:
CUDA_VISIBLE_DEVICES=0 python src/sample.py
--prompt_file DrawBench200.txt
--width 1024
--height 1024
--model_name flux-dev
--add_sampling_metadata
--output_dir ./results
--num_steps 50
I evaluated the generated images using ImageReward (ImageReward-v1.0 version) and CLIPScore (from torchmetrics.multimodal.clip_score, with the openai/clip-vit-large-patch14 version).
The final results I obtained were: ImageReward = 0.9410, CLIP = 17.1656, which show a significant gap compared to the reported results.
I'm not sure if there's anything wrong with this implementation or other factors I may have overlooked.
Looking forward to your response. Thank you again for your outstanding work!
Great work! While reproducing the results, I found that FLUX could not achieve the reported performance on DrawBench200 as stated in the paper (N=6, O=1, ImageReward=0.9953, CLIP=19.637). Since you didn’t provide the complete evaluation code, I’m not sure whether the issue lies in my own implementation.
I used the following command for generation:
I evaluated the generated images using ImageReward (
ImageReward-v1.0version) and CLIPScore (from torchmetrics.multimodal.clip_score, with theopenai/clip-vit-large-patch14version).The final results I obtained were: ImageReward = 0.9410, CLIP = 17.1656, which show a significant gap compared to the reported results.
I'm not sure if there's anything wrong with this implementation or other factors I may have overlooked.
Looking forward to your response. Thank you again for your outstanding work!