I conducted several experiments using the checkpoints released in the paper and observed the following insights and issues:
I. Poor Performance
- The overall performance of the model is very weak, as shown below:
II. Weakness of the Checkpoint Model
-
My experiments suggest that visual information has little to no impact on the model’s predictions.
-
The model seems to rely primarily on the question or textual features instead of the image.
-
I applied adversarial attacks using the downstream loss, but the attacks had little effect, the output was not change.
-
Furthermore, when I provided random noise images and asked the model questions, it produced the same responses as it did with real medical images.
