Multi-modal learning from video, eye tracking, and pupillometry for operator skill characterization in clinical fetal ultrasound

This paper presents a novel multi-modal learning approach for automated skill characterization of obstetric ultrasound operators using heterogeneous spatio-temporal sensory cues, namely, scan video, eye-tracking data, and pupillometric data, acquired in the clinical environment. We address pertinent...

Full description

Bibliographic Details
Main Authors: Sharma, H, Drukker, L, Papageorghiou, AT, Noble, JA
Format: Conference item
Language:English
Published: IEEE 2021
_version_ 1826307871683379200
author Sharma, H
Drukker, L
Papageorghiou, AT
Noble, JA
author_facet Sharma, H
Drukker, L
Papageorghiou, AT
Noble, JA
author_sort Sharma, H
collection OXFORD
description This paper presents a novel multi-modal learning approach for automated skill characterization of obstetric ultrasound operators using heterogeneous spatio-temporal sensory cues, namely, scan video, eye-tracking data, and pupillometric data, acquired in the clinical environment. We address pertinent challenges such as combining heterogeneous, small-scale and variable-length sequential datasets, to learn deep convolutional neural networks in real-world scenarios. We propose spatial encoding for multi-modal analysis using sonography standard plane images, spatial gaze maps, gaze trajectory images, and pupillary response images. We present and compare five multi-modal learning network architectures using late, intermediate, hybrid, and tensor fusion. We build models for the Heart and the Brain scanning tasks, and performance evaluation suggests that multi-modal learning networks outperform uni-modal networks, with the best-performing model achieving accuracies of 82.4% (Brain task) and 76.4% (Heart task) for the operator skill classification problem.
first_indexed 2024-03-07T07:09:31Z
format Conference item
id oxford-uuid:a565c7b1-1705-4783-bf13-eeac06fcc86c
institution University of Oxford
language English
last_indexed 2024-03-07T07:09:31Z
publishDate 2021
publisher IEEE
record_format dspace
spelling oxford-uuid:a565c7b1-1705-4783-bf13-eeac06fcc86c2022-06-06T13:12:40ZMulti-modal learning from video, eye tracking, and pupillometry for operator skill characterization in clinical fetal ultrasoundConference itemhttp://purl.org/coar/resource_type/c_5794uuid:a565c7b1-1705-4783-bf13-eeac06fcc86cEnglishSymplectic ElementsIEEE2021Sharma, HDrukker, LPapageorghiou, ATNoble, JAThis paper presents a novel multi-modal learning approach for automated skill characterization of obstetric ultrasound operators using heterogeneous spatio-temporal sensory cues, namely, scan video, eye-tracking data, and pupillometric data, acquired in the clinical environment. We address pertinent challenges such as combining heterogeneous, small-scale and variable-length sequential datasets, to learn deep convolutional neural networks in real-world scenarios. We propose spatial encoding for multi-modal analysis using sonography standard plane images, spatial gaze maps, gaze trajectory images, and pupillary response images. We present and compare five multi-modal learning network architectures using late, intermediate, hybrid, and tensor fusion. We build models for the Heart and the Brain scanning tasks, and performance evaluation suggests that multi-modal learning networks outperform uni-modal networks, with the best-performing model achieving accuracies of 82.4% (Brain task) and 76.4% (Heart task) for the operator skill classification problem.
spellingShingle Sharma, H
Drukker, L
Papageorghiou, AT
Noble, JA
Multi-modal learning from video, eye tracking, and pupillometry for operator skill characterization in clinical fetal ultrasound
title Multi-modal learning from video, eye tracking, and pupillometry for operator skill characterization in clinical fetal ultrasound
title_full Multi-modal learning from video, eye tracking, and pupillometry for operator skill characterization in clinical fetal ultrasound
title_fullStr Multi-modal learning from video, eye tracking, and pupillometry for operator skill characterization in clinical fetal ultrasound
title_full_unstemmed Multi-modal learning from video, eye tracking, and pupillometry for operator skill characterization in clinical fetal ultrasound
title_short Multi-modal learning from video, eye tracking, and pupillometry for operator skill characterization in clinical fetal ultrasound
title_sort multi modal learning from video eye tracking and pupillometry for operator skill characterization in clinical fetal ultrasound
work_keys_str_mv AT sharmah multimodallearningfromvideoeyetrackingandpupillometryforoperatorskillcharacterizationinclinicalfetalultrasound
AT drukkerl multimodallearningfromvideoeyetrackingandpupillometryforoperatorskillcharacterizationinclinicalfetalultrasound
AT papageorghiouat multimodallearningfromvideoeyetrackingandpupillometryforoperatorskillcharacterizationinclinicalfetalultrasound
AT nobleja multimodallearningfromvideoeyetrackingandpupillometryforoperatorskillcharacterizationinclinicalfetalultrasound