Multi-modal learning from video, eye tracking, and pupillometry for operator skill characterization in clinical fetal ultrasound
This paper presents a novel multi-modal learning approach for automated skill characterization of obstetric ultrasound operators using heterogeneous spatio-temporal sensory cues, namely, scan video, eye-tracking data, and pupillometric data, acquired in the clinical environment. We address pertinent...
Main Authors: | , , , |
---|---|
Format: | Conference item |
Language: | English |
Published: |
IEEE
2021
|
_version_ | 1826307871683379200 |
---|---|
author | Sharma, H Drukker, L Papageorghiou, AT Noble, JA |
author_facet | Sharma, H Drukker, L Papageorghiou, AT Noble, JA |
author_sort | Sharma, H |
collection | OXFORD |
description | This paper presents a novel multi-modal learning approach for automated skill characterization of obstetric ultrasound operators using heterogeneous spatio-temporal sensory cues, namely, scan video, eye-tracking data, and pupillometric data, acquired in the clinical environment. We address pertinent challenges such as combining heterogeneous, small-scale and variable-length sequential datasets, to learn deep convolutional neural networks in real-world scenarios. We propose spatial encoding for multi-modal analysis using sonography standard plane images, spatial gaze maps, gaze trajectory images, and pupillary response images. We present and compare five multi-modal learning network architectures using late, intermediate, hybrid, and tensor fusion. We build models for the Heart and the Brain scanning tasks, and performance evaluation suggests that multi-modal learning networks outperform uni-modal networks, with the best-performing model achieving accuracies of 82.4% (Brain task) and 76.4% (Heart task) for the operator skill classification problem. |
first_indexed | 2024-03-07T07:09:31Z |
format | Conference item |
id | oxford-uuid:a565c7b1-1705-4783-bf13-eeac06fcc86c |
institution | University of Oxford |
language | English |
last_indexed | 2024-03-07T07:09:31Z |
publishDate | 2021 |
publisher | IEEE |
record_format | dspace |
spelling | oxford-uuid:a565c7b1-1705-4783-bf13-eeac06fcc86c2022-06-06T13:12:40ZMulti-modal learning from video, eye tracking, and pupillometry for operator skill characterization in clinical fetal ultrasoundConference itemhttp://purl.org/coar/resource_type/c_5794uuid:a565c7b1-1705-4783-bf13-eeac06fcc86cEnglishSymplectic ElementsIEEE2021Sharma, HDrukker, LPapageorghiou, ATNoble, JAThis paper presents a novel multi-modal learning approach for automated skill characterization of obstetric ultrasound operators using heterogeneous spatio-temporal sensory cues, namely, scan video, eye-tracking data, and pupillometric data, acquired in the clinical environment. We address pertinent challenges such as combining heterogeneous, small-scale and variable-length sequential datasets, to learn deep convolutional neural networks in real-world scenarios. We propose spatial encoding for multi-modal analysis using sonography standard plane images, spatial gaze maps, gaze trajectory images, and pupillary response images. We present and compare five multi-modal learning network architectures using late, intermediate, hybrid, and tensor fusion. We build models for the Heart and the Brain scanning tasks, and performance evaluation suggests that multi-modal learning networks outperform uni-modal networks, with the best-performing model achieving accuracies of 82.4% (Brain task) and 76.4% (Heart task) for the operator skill classification problem. |
spellingShingle | Sharma, H Drukker, L Papageorghiou, AT Noble, JA Multi-modal learning from video, eye tracking, and pupillometry for operator skill characterization in clinical fetal ultrasound |
title | Multi-modal learning from video, eye tracking, and pupillometry for operator skill characterization in clinical fetal ultrasound |
title_full | Multi-modal learning from video, eye tracking, and pupillometry for operator skill characterization in clinical fetal ultrasound |
title_fullStr | Multi-modal learning from video, eye tracking, and pupillometry for operator skill characterization in clinical fetal ultrasound |
title_full_unstemmed | Multi-modal learning from video, eye tracking, and pupillometry for operator skill characterization in clinical fetal ultrasound |
title_short | Multi-modal learning from video, eye tracking, and pupillometry for operator skill characterization in clinical fetal ultrasound |
title_sort | multi modal learning from video eye tracking and pupillometry for operator skill characterization in clinical fetal ultrasound |
work_keys_str_mv | AT sharmah multimodallearningfromvideoeyetrackingandpupillometryforoperatorskillcharacterizationinclinicalfetalultrasound AT drukkerl multimodallearningfromvideoeyetrackingandpupillometryforoperatorskillcharacterizationinclinicalfetalultrasound AT papageorghiouat multimodallearningfromvideoeyetrackingandpupillometryforoperatorskillcharacterizationinclinicalfetalultrasound AT nobleja multimodallearningfromvideoeyetrackingandpupillometryforoperatorskillcharacterizationinclinicalfetalultrasound |