Pretraining Table Embeddings for Knowledge Graph Based Provenance Systems

We aim to build a knowledge graph based provenance system for data objects across institutions and teams. The world of data objects and systems is complex and heterogeneous. For effective collaboration, a shared data model is needed. Specifically, this work examines the problem of provenance subgrap...

Full description

Bibliographic Details
Main Author: Yang, Steven
Other Authors: Cafarella, Michael J.
Format: Thesis
Published: Massachusetts Institute of Technology 2022
Online Access:https://hdl.handle.net/1721.1/145143
_version_ 1826197369567313920
author Yang, Steven
author2 Cafarella, Michael J.
author_facet Cafarella, Michael J.
Yang, Steven
author_sort Yang, Steven
collection MIT
description We aim to build a knowledge graph based provenance system for data objects across institutions and teams. The world of data objects and systems is complex and heterogeneous. For effective collaboration, a shared data model is needed. Specifically, this work examines the problem of provenance subgraph classification: given a coarser low-level provenance subgraph that is not easily digestible by humans, we want to annotate the subgraph with human readable labels describing the operations done on each data object. This work first involves creating the infrastructure needed to select and label subgraphs. Next, this work focuses on producing table embedding techniques using the pretrain and finetune paradigm with an emphasis on the downstream task of Operator Classification.
first_indexed 2024-09-23T10:46:38Z
format Thesis
id mit-1721.1/145143
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T10:46:38Z
publishDate 2022
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1451432022-08-30T03:46:44Z Pretraining Table Embeddings for Knowledge Graph Based Provenance Systems Yang, Steven Cafarella, Michael J. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science We aim to build a knowledge graph based provenance system for data objects across institutions and teams. The world of data objects and systems is complex and heterogeneous. For effective collaboration, a shared data model is needed. Specifically, this work examines the problem of provenance subgraph classification: given a coarser low-level provenance subgraph that is not easily digestible by humans, we want to annotate the subgraph with human readable labels describing the operations done on each data object. This work first involves creating the infrastructure needed to select and label subgraphs. Next, this work focuses on producing table embedding techniques using the pretrain and finetune paradigm with an emphasis on the downstream task of Operator Classification. M.Eng. 2022-08-29T16:36:12Z 2022-08-29T16:36:12Z 2022-05 2022-05-27T16:19:39.096Z Thesis https://hdl.handle.net/1721.1/145143 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle Yang, Steven
Pretraining Table Embeddings for Knowledge Graph Based Provenance Systems
title Pretraining Table Embeddings for Knowledge Graph Based Provenance Systems
title_full Pretraining Table Embeddings for Knowledge Graph Based Provenance Systems
title_fullStr Pretraining Table Embeddings for Knowledge Graph Based Provenance Systems
title_full_unstemmed Pretraining Table Embeddings for Knowledge Graph Based Provenance Systems
title_short Pretraining Table Embeddings for Knowledge Graph Based Provenance Systems
title_sort pretraining table embeddings for knowledge graph based provenance systems
url https://hdl.handle.net/1721.1/145143
work_keys_str_mv AT yangsteven pretrainingtableembeddingsforknowledgegraphbasedprovenancesystems