Pretraining Table Embeddings for Knowledge Graph Based Provenance Systems
We aim to build a knowledge graph based provenance system for data objects across institutions and teams. The world of data objects and systems is complex and heterogeneous. For effective collaboration, a shared data model is needed. Specifically, this work examines the problem of provenance subgrap...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Published: |
Massachusetts Institute of Technology
2022
|
Online Access: | https://hdl.handle.net/1721.1/145143 |
_version_ | 1826197369567313920 |
---|---|
author | Yang, Steven |
author2 | Cafarella, Michael J. |
author_facet | Cafarella, Michael J. Yang, Steven |
author_sort | Yang, Steven |
collection | MIT |
description | We aim to build a knowledge graph based provenance system for data objects across institutions and teams. The world of data objects and systems is complex and heterogeneous. For effective collaboration, a shared data model is needed. Specifically, this work examines the problem of provenance subgraph classification: given a coarser low-level provenance subgraph that is not easily digestible by humans, we want to annotate the subgraph with human readable labels describing the operations done on each data object. This work first involves creating the infrastructure needed to select and label subgraphs. Next, this work focuses on producing table embedding techniques using the pretrain and finetune paradigm with an emphasis on the downstream task of Operator Classification. |
first_indexed | 2024-09-23T10:46:38Z |
format | Thesis |
id | mit-1721.1/145143 |
institution | Massachusetts Institute of Technology |
last_indexed | 2024-09-23T10:46:38Z |
publishDate | 2022 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/1451432022-08-30T03:46:44Z Pretraining Table Embeddings for Knowledge Graph Based Provenance Systems Yang, Steven Cafarella, Michael J. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science We aim to build a knowledge graph based provenance system for data objects across institutions and teams. The world of data objects and systems is complex and heterogeneous. For effective collaboration, a shared data model is needed. Specifically, this work examines the problem of provenance subgraph classification: given a coarser low-level provenance subgraph that is not easily digestible by humans, we want to annotate the subgraph with human readable labels describing the operations done on each data object. This work first involves creating the infrastructure needed to select and label subgraphs. Next, this work focuses on producing table embedding techniques using the pretrain and finetune paradigm with an emphasis on the downstream task of Operator Classification. M.Eng. 2022-08-29T16:36:12Z 2022-08-29T16:36:12Z 2022-05 2022-05-27T16:19:39.096Z Thesis https://hdl.handle.net/1721.1/145143 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology |
spellingShingle | Yang, Steven Pretraining Table Embeddings for Knowledge Graph Based Provenance Systems |
title | Pretraining Table Embeddings for Knowledge Graph Based Provenance Systems |
title_full | Pretraining Table Embeddings for Knowledge Graph Based Provenance Systems |
title_fullStr | Pretraining Table Embeddings for Knowledge Graph Based Provenance Systems |
title_full_unstemmed | Pretraining Table Embeddings for Knowledge Graph Based Provenance Systems |
title_short | Pretraining Table Embeddings for Knowledge Graph Based Provenance Systems |
title_sort | pretraining table embeddings for knowledge graph based provenance systems |
url | https://hdl.handle.net/1721.1/145143 |
work_keys_str_mv | AT yangsteven pretrainingtableembeddingsforknowledgegraphbasedprovenancesystems |