Entry-flipped transformer for inference and prediction of participant behavior

Some group activities, such as team sports and choreographed dances, involve closely coupled interaction between participants. Here we investigate the tasks of inferring and predicting participant behavior, in terms of motion paths and actions, under such conditions. We narrow the problem to that of...

Full description

Bibliographic Details
Main Authors: Hu, Bo, Cham, Tat-Jen
Other Authors: School of Computer Science and Engineering
Format: Conference Paper
Language:English
Published: 2023
Subjects:
Online Access:https://hdl.handle.net/10356/172664
_version_ 1811696815537389568
author Hu, Bo
Cham, Tat-Jen
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Hu, Bo
Cham, Tat-Jen
author_sort Hu, Bo
collection NTU
description Some group activities, such as team sports and choreographed dances, involve closely coupled interaction between participants. Here we investigate the tasks of inferring and predicting participant behavior, in terms of motion paths and actions, under such conditions. We narrow the problem to that of estimating how a set target participants react to the behavior of other observed participants. Our key idea is to model the spatio-temporal relations among participants in a manner that is robust to error accumulation during frame-wise inference and prediction. We propose a novel Entry-Flipped Transformer (EF-Transformer), which models the relations of participants by attention mechanisms on both spatial and temporal domains. Unlike typical transformers, we tackle the problem of error accumulation by flipping the order of query, key, and value entries, to increase the importance and fidelity of observed features in the current frame. Comparative experiments show that our EF-Transformer achieves the best performance on a newly-collected tennis doubles dataset, a Ceilidh dance dataset, and two pedestrian datasets. Furthermore, it is also demonstrated that our EF-Transformer is better at limiting accumulated errors and recovering from wrong estimations.
first_indexed 2024-10-01T07:45:22Z
format Conference Paper
id ntu-10356/172664
institution Nanyang Technological University
language English
last_indexed 2024-10-01T07:45:22Z
publishDate 2023
record_format dspace
spelling ntu-10356/1726642023-12-19T06:04:45Z Entry-flipped transformer for inference and prediction of participant behavior Hu, Bo Cham, Tat-Jen School of Computer Science and Engineering 17th European Conference on Computer Vision (ECCV 2022) Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Entry-Flipped Transformer Prediction Some group activities, such as team sports and choreographed dances, involve closely coupled interaction between participants. Here we investigate the tasks of inferring and predicting participant behavior, in terms of motion paths and actions, under such conditions. We narrow the problem to that of estimating how a set target participants react to the behavior of other observed participants. Our key idea is to model the spatio-temporal relations among participants in a manner that is robust to error accumulation during frame-wise inference and prediction. We propose a novel Entry-Flipped Transformer (EF-Transformer), which models the relations of participants by attention mechanisms on both spatial and temporal domains. Unlike typical transformers, we tackle the problem of error accumulation by flipping the order of query, key, and value entries, to increase the importance and fidelity of observed features in the current frame. Comparative experiments show that our EF-Transformer achieves the best performance on a newly-collected tennis doubles dataset, a Ceilidh dance dataset, and two pedestrian datasets. Furthermore, it is also demonstrated that our EF-Transformer is better at limiting accumulated errors and recovering from wrong estimations. 2023-12-19T06:04:45Z 2023-12-19T06:04:45Z 2022 Conference Paper Hu, B. & Cham, T. (2022). Entry-flipped transformer for inference and prediction of participant behavior. 17th European Conference on Computer Vision (ECCV 2022), 439-456. https://dx.doi.org/10.1007/978-3-031-19772-7_26 9783031197710 https://hdl.handle.net/10356/172664 10.1007/978-3-031-19772-7_26 2-s2.0-85142709965 439 456 en © 2022 Association for Computing Machinery. All rights reserved.
spellingShingle Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
Entry-Flipped Transformer
Prediction
Hu, Bo
Cham, Tat-Jen
Entry-flipped transformer for inference and prediction of participant behavior
title Entry-flipped transformer for inference and prediction of participant behavior
title_full Entry-flipped transformer for inference and prediction of participant behavior
title_fullStr Entry-flipped transformer for inference and prediction of participant behavior
title_full_unstemmed Entry-flipped transformer for inference and prediction of participant behavior
title_short Entry-flipped transformer for inference and prediction of participant behavior
title_sort entry flipped transformer for inference and prediction of participant behavior
topic Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
Entry-Flipped Transformer
Prediction
url https://hdl.handle.net/10356/172664
work_keys_str_mv AT hubo entryflippedtransformerforinferenceandpredictionofparticipantbehavior
AT chamtatjen entryflippedtransformerforinferenceandpredictionofparticipantbehavior