AI Commentator: Narrating Sports Games through Multimodal Perception and Large Language Models

Automated visual understanding is an essential part of the sports industry, particularly in the context of major sports tournaments. The scale of generated video footage necessitates the use of automated systems to generate insights and enhance fan experiences. One area where this is particularly ch...

Full description

Bibliographic Details
Main Author: Purohit, Sonia
Other Authors: Oliva, Aude
Format: Thesis
Published: Massachusetts Institute of Technology 2023
Online Access:https://hdl.handle.net/1721.1/151608
_version_ 1826210421208514560
author Purohit, Sonia
author2 Oliva, Aude
author_facet Oliva, Aude
Purohit, Sonia
author_sort Purohit, Sonia
collection MIT
description Automated visual understanding is an essential part of the sports industry, particularly in the context of major sports tournaments. The scale of generated video footage necessitates the use of automated systems to generate insights and enhance fan experiences. One area where this is particularly challenging is commentary, which requires detailed information about play-by-play action, a task that cannot be efficiently carried out by human commentators at scale. We tackle this problem for grand-slam tennis through an IBM partnership with the Championships, Wimbledon. This thesis introduces a novel system that utilizes computer vision to extract play-by-play metadata and convert it into fluent commentary using large language models. Our computer vision module utilizes a single camera feed to understand every detail of the game – court and net detection, player and ball tracking, player poses, and fine-grained shot classification, all in near-real-time. This metadata is then combined with additional information from other modalities, such as crowd audio and radar-measured ball speed, and fed into a "data2text" large language model to generate commentary in natural language. Our system not only supports the narration of match content at scale, but also powers the collection of additional metadata to facilitate additional match insights in the future.
first_indexed 2024-09-23T14:49:36Z
format Thesis
id mit-1721.1/151608
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T14:49:36Z
publishDate 2023
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1516082023-08-01T03:19:54Z AI Commentator: Narrating Sports Games through Multimodal Perception and Large Language Models Purohit, Sonia Oliva, Aude Feris, Rogerio Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Automated visual understanding is an essential part of the sports industry, particularly in the context of major sports tournaments. The scale of generated video footage necessitates the use of automated systems to generate insights and enhance fan experiences. One area where this is particularly challenging is commentary, which requires detailed information about play-by-play action, a task that cannot be efficiently carried out by human commentators at scale. We tackle this problem for grand-slam tennis through an IBM partnership with the Championships, Wimbledon. This thesis introduces a novel system that utilizes computer vision to extract play-by-play metadata and convert it into fluent commentary using large language models. Our computer vision module utilizes a single camera feed to understand every detail of the game – court and net detection, player and ball tracking, player poses, and fine-grained shot classification, all in near-real-time. This metadata is then combined with additional information from other modalities, such as crowd audio and radar-measured ball speed, and fed into a "data2text" large language model to generate commentary in natural language. Our system not only supports the narration of match content at scale, but also powers the collection of additional metadata to facilitate additional match insights in the future. M.Eng. 2023-07-31T19:52:20Z 2023-07-31T19:52:20Z 2023-06 2023-06-06T16:35:17.730Z Thesis https://hdl.handle.net/1721.1/151608 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle Purohit, Sonia
AI Commentator: Narrating Sports Games through Multimodal Perception and Large Language Models
title AI Commentator: Narrating Sports Games through Multimodal Perception and Large Language Models
title_full AI Commentator: Narrating Sports Games through Multimodal Perception and Large Language Models
title_fullStr AI Commentator: Narrating Sports Games through Multimodal Perception and Large Language Models
title_full_unstemmed AI Commentator: Narrating Sports Games through Multimodal Perception and Large Language Models
title_short AI Commentator: Narrating Sports Games through Multimodal Perception and Large Language Models
title_sort ai commentator narrating sports games through multimodal perception and large language models
url https://hdl.handle.net/1721.1/151608
work_keys_str_mv AT purohitsonia aicommentatornarratingsportsgamesthroughmultimodalperceptionandlargelanguagemodels