Visual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology Images

Contrastive visual language pretraining has emerged as a powerful method for either training new language-aware image encoders or augmenting existing pretrained models with zero-shot visual recognition capabilities. However, existing works typically train on large datasets of image-text pairs and ha...

Full description

Bibliographic Details
Main Author:	Lu, Ming Yang (Max)
Other Authors:	Mahmood, Faisal
Format:	Thesis
Published:	Massachusetts Institute of Technology 2023
Online Access:	https://hdl.handle.net/1721.1/151651

_version_	1826190992734158848
author	Lu, Ming Yang (Max)
author2	Mahmood, Faisal
author_facet	Mahmood, Faisal Lu, Ming Yang (Max)
author_sort	Lu, Ming Yang (Max)
collection	MIT
description	Contrastive visual language pretraining has emerged as a powerful method for either training new language-aware image encoders or augmenting existing pretrained models with zero-shot visual recognition capabilities. However, existing works typically train on large datasets of image-text pairs and have been designed to perform downstream tasks involving only small to medium sized-images, neither of which are applicable to the emerging field of computational pathology where there are limited publicly available paired image-text datasets and each image can span up to 100,000 x 100,000 pixels. In this paper we present MI-Zero, a simple and intuitive framework for unleashing the zero-shot transfer capabilities of contrastively aligned image and text models on gigapixel histopathology whole slide images, enabling multiple downstream diagnostic tasks to be carried out by pretrained encoders without requiring any additional labels. MI-Zero reformulates zero-shot transfer under the framework of multiple instance learning to overcome the computational challenge of inference on extremely large images. We used over 550k pathology reports and other available in-domain text corpora to pretrain our text encoder. By effectively leveraging strong pretrained encoders, our best model pretrained on over 33k histopathology image-caption pairs achieves an average median zero-shot accuracy of 70.2% across three different real-world cancer subtyping tasks.
first_indexed	2024-09-23T08:48:44Z
format	Thesis
id	mit-1721.1/151651
institution	Massachusetts Institute of Technology
last_indexed	2024-09-23T08:48:44Z
publishDate	2023
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/1516512023-08-01T04:16:57Z Visual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology Images Lu, Ming Yang (Max) Mahmood, Faisal Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Contrastive visual language pretraining has emerged as a powerful method for either training new language-aware image encoders or augmenting existing pretrained models with zero-shot visual recognition capabilities. However, existing works typically train on large datasets of image-text pairs and have been designed to perform downstream tasks involving only small to medium sized-images, neither of which are applicable to the emerging field of computational pathology where there are limited publicly available paired image-text datasets and each image can span up to 100,000 x 100,000 pixels. In this paper we present MI-Zero, a simple and intuitive framework for unleashing the zero-shot transfer capabilities of contrastively aligned image and text models on gigapixel histopathology whole slide images, enabling multiple downstream diagnostic tasks to be carried out by pretrained encoders without requiring any additional labels. MI-Zero reformulates zero-shot transfer under the framework of multiple instance learning to overcome the computational challenge of inference on extremely large images. We used over 550k pathology reports and other available in-domain text corpora to pretrain our text encoder. By effectively leveraging strong pretrained encoders, our best model pretrained on over 33k histopathology image-caption pairs achieves an average median zero-shot accuracy of 70.2% across three different real-world cancer subtyping tasks. S.M. 2023-07-31T19:56:11Z 2023-07-31T19:56:11Z 2023-06 2023-07-13T14:23:06.551Z Thesis https://hdl.handle.net/1721.1/151651 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle	Lu, Ming Yang (Max) Visual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology Images
title	Visual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology Images
title_full	Visual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology Images
title_fullStr	Visual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology Images
title_full_unstemmed	Visual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology Images
title_short	Visual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology Images
title_sort	visual language pretrained multiple instance zero shot transfer for histopathology images
url	https://hdl.handle.net/1721.1/151651
work_keys_str_mv	AT lumingyangmax visuallanguagepretrainedmultipleinstancezeroshottransferforhistopathologyimages

Visual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology Images

Similar Items