SAGE: Segmenting and Grouping Data Effectively using Large Language Models
Grouping is a technique used to organize data into manageable pieces, reducing cognitive load and enabling users to focus on discovering higher-level insights and generating new questions. However, creating groups remains a challenge, often requiring users to have prior domain knowledge or an unders...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Published: |
Massachusetts Institute of Technology
2024
|
Online Access: | https://hdl.handle.net/1721.1/156764 |
_version_ | 1826190741260468224 |
---|---|
author | Pedraza Pineros, Isabella |
author2 | Satyanarayan, Arvind |
author_facet | Satyanarayan, Arvind Pedraza Pineros, Isabella |
author_sort | Pedraza Pineros, Isabella |
collection | MIT |
description | Grouping is a technique used to organize data into manageable pieces, reducing cognitive load and enabling users to focus on discovering higher-level insights and generating new questions. However, creating groups remains a challenge, often requiring users to have prior domain knowledge or an understanding of the underlying structure of the data. We introduce SAGE, a novel technique that leverages the knowledge base and pattern recognition abilities of large language models (LLMs) to segment and group data with domainawareness. We instantiate our technique through two structures: bins and highlights; bins are contiguous, non-overlapping ranges that segment a single field into groups; highlights are multi-field intersections of ranges that surface broader groups in the data. We integrate these structures into Olli, an open-source tool that converts data visualizations into accessible, keyboard-navigable textual formats to facilitate a study with 15 blind and low-vision (BLV) participants, recognizing them as experts in assessing agency. Through this study, we evaluate how SAGE impacts a user’s interpretation of data and visualizations, and find our technique provides a rich contextual framework for users to independently scaffold their initial sensemaking process. |
first_indexed | 2024-09-23T08:44:39Z |
format | Thesis |
id | mit-1721.1/156764 |
institution | Massachusetts Institute of Technology |
last_indexed | 2024-09-23T08:44:39Z |
publishDate | 2024 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/1567642024-09-17T04:02:20Z SAGE: Segmenting and Grouping Data Effectively using Large Language Models Pedraza Pineros, Isabella Satyanarayan, Arvind Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Grouping is a technique used to organize data into manageable pieces, reducing cognitive load and enabling users to focus on discovering higher-level insights and generating new questions. However, creating groups remains a challenge, often requiring users to have prior domain knowledge or an understanding of the underlying structure of the data. We introduce SAGE, a novel technique that leverages the knowledge base and pattern recognition abilities of large language models (LLMs) to segment and group data with domainawareness. We instantiate our technique through two structures: bins and highlights; bins are contiguous, non-overlapping ranges that segment a single field into groups; highlights are multi-field intersections of ranges that surface broader groups in the data. We integrate these structures into Olli, an open-source tool that converts data visualizations into accessible, keyboard-navigable textual formats to facilitate a study with 15 blind and low-vision (BLV) participants, recognizing them as experts in assessing agency. Through this study, we evaluate how SAGE impacts a user’s interpretation of data and visualizations, and find our technique provides a rich contextual framework for users to independently scaffold their initial sensemaking process. M.Eng. 2024-09-16T13:47:42Z 2024-09-16T13:47:42Z 2024-05 2024-07-11T14:36:48.705Z Thesis https://hdl.handle.net/1721.1/156764 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology |
spellingShingle | Pedraza Pineros, Isabella SAGE: Segmenting and Grouping Data Effectively using Large Language Models |
title | SAGE: Segmenting and Grouping Data Effectively using Large Language Models |
title_full | SAGE: Segmenting and Grouping Data Effectively using Large Language Models |
title_fullStr | SAGE: Segmenting and Grouping Data Effectively using Large Language Models |
title_full_unstemmed | SAGE: Segmenting and Grouping Data Effectively using Large Language Models |
title_short | SAGE: Segmenting and Grouping Data Effectively using Large Language Models |
title_sort | sage segmenting and grouping data effectively using large language models |
url | https://hdl.handle.net/1721.1/156764 |
work_keys_str_mv | AT pedrazapinerosisabella sagesegmentingandgroupingdataeffectivelyusinglargelanguagemodels |