Skip to content

[20230504] Weekly VLM3 - Clinical-BERT #6

@dh58319

Description

@dh58319

Paper

https://ojs.aaai.org/index.php/AAAI/article/view/20204

Speaker

@dh58319

Summary

CleanShot 2023-05-08 at 16 19 32

Key Point

a vision-language pre-training model for the medical domain
Medical Subject Headings(MeSH) words are important semantic components in radiograph reports.

Methods

  • Pretrained with MIMIC-CXR

  • Clinical Diagnosis (CD)

  • Masked MeSH Modeling (MMM)
    Same method as MLM, not all language tokens but MeSH words
    80% mask,10% replace, 10% unchanged

  • Image-MeSH Matching (IMM)

CleanShot 2023-05-09 at 23 18 08@2x

Image-MeSH Matching(IMM)

Imm task, align images and mesh words in certain latent space → by cross-modal matching score

propose Two-level sparse attention

  • RSA(region sparse attention):The RSA generates aligned region features for each word. This process mimics the focus of radiologists’ interest when writing reports according to different observations.
  • WSA(word sparse attention): The WSA forces the model to focus on semantic com- ponents in the report to increase the contribution of MeSH words to the matching score.

요약

기존의 방법론(masked language modeling, Image Report Matching)에서 MeSH와 다른 단어들은 동등하게 취급됨, 그러나 MeSH 단어는 pre-training task에서 recieve more attention 해야만이 downstream task에서 좋은 성능을 발휘할 수 있음

Clinical Diagnosis (CD), Masked MeSH Modeling (MMM), and Image-MeSH Matching (IMM) 이 세 가지 방법을 제시함

CD task에서 multi-label classification problem으로 생각함.

MMM은 MeSH를 randomly mask 함 - 이를 통해서 모델이 MeSH 에 대해서 조금 더 집중 할 수 있게 해줌

IMM에서 Two-level sparse attention을 실시함- 이를 통해 모델이 MeSH word에 대한 alignment 를 좀 더 잘 학습 할 수 있게 해줌

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions