Visual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology Images

Contrastive visual language pretraining has emerged as a powerful method for either training new language-aware image encoders or augmenting existing pretrained models with zero-shot visual recognition capabilities. However, existing works typically train on large datasets of image-text pairs and ha...

Full description

Bibliographic Details
Main Author: Lu, Ming Yang (Max)
Other Authors: Mahmood, Faisal
Format: Thesis
Published: Massachusetts Institute of Technology 2023
Online Access:https://hdl.handle.net/1721.1/151651
_version_ 1826190992734158848
author Lu, Ming Yang (Max)
author2 Mahmood, Faisal
author_facet Mahmood, Faisal
Lu, Ming Yang (Max)
author_sort Lu, Ming Yang (Max)
collection MIT
description Contrastive visual language pretraining has emerged as a powerful method for either training new language-aware image encoders or augmenting existing pretrained models with zero-shot visual recognition capabilities. However, existing works typically train on large datasets of image-text pairs and have been designed to perform downstream tasks involving only small to medium sized-images, neither of which are applicable to the emerging field of computational pathology where there are limited publicly available paired image-text datasets and each image can span up to 100,000 x 100,000 pixels. In this paper we present MI-Zero, a simple and intuitive framework for unleashing the zero-shot transfer capabilities of contrastively aligned image and text models on gigapixel histopathology whole slide images, enabling multiple downstream diagnostic tasks to be carried out by pretrained encoders without requiring any additional labels. MI-Zero reformulates zero-shot transfer under the framework of multiple instance learning to overcome the computational challenge of inference on extremely large images. We used over 550k pathology reports and other available in-domain text corpora to pretrain our text encoder. By effectively leveraging strong pretrained encoders, our best model pretrained on over 33k histopathology image-caption pairs achieves an average median zero-shot accuracy of 70.2% across three different real-world cancer subtyping tasks.
first_indexed 2024-09-23T08:48:44Z
format Thesis
id mit-1721.1/151651
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T08:48:44Z
publishDate 2023
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1516512023-08-01T04:16:57Z Visual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology Images Lu, Ming Yang (Max) Mahmood, Faisal Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Contrastive visual language pretraining has emerged as a powerful method for either training new language-aware image encoders or augmenting existing pretrained models with zero-shot visual recognition capabilities. However, existing works typically train on large datasets of image-text pairs and have been designed to perform downstream tasks involving only small to medium sized-images, neither of which are applicable to the emerging field of computational pathology where there are limited publicly available paired image-text datasets and each image can span up to 100,000 x 100,000 pixels. In this paper we present MI-Zero, a simple and intuitive framework for unleashing the zero-shot transfer capabilities of contrastively aligned image and text models on gigapixel histopathology whole slide images, enabling multiple downstream diagnostic tasks to be carried out by pretrained encoders without requiring any additional labels. MI-Zero reformulates zero-shot transfer under the framework of multiple instance learning to overcome the computational challenge of inference on extremely large images. We used over 550k pathology reports and other available in-domain text corpora to pretrain our text encoder. By effectively leveraging strong pretrained encoders, our best model pretrained on over 33k histopathology image-caption pairs achieves an average median zero-shot accuracy of 70.2% across three different real-world cancer subtyping tasks. S.M. 2023-07-31T19:56:11Z 2023-07-31T19:56:11Z 2023-06 2023-07-13T14:23:06.551Z Thesis https://hdl.handle.net/1721.1/151651 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle Lu, Ming Yang (Max)
Visual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology Images
title Visual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology Images
title_full Visual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology Images
title_fullStr Visual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology Images
title_full_unstemmed Visual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology Images
title_short Visual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology Images
title_sort visual language pretrained multiple instance zero shot transfer for histopathology images
url https://hdl.handle.net/1721.1/151651
work_keys_str_mv AT lumingyangmax visuallanguagepretrainedmultipleinstancezeroshottransferforhistopathologyimages