A novel pipeline for table extraction using deep learning
Table extraction refers to the detection and extraction of tables from documents and images while preserving their structural layout and content. With the ever-growing volume of digital files and content, there is an increasing demand for the automated extraction of tables for consumption in a progr...
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project (FYP) |
Language: | English |
Published: |
Nanyang Technological University
2020
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/136597 |
_version_ | 1826111006650138624 |
---|---|
author | Lee, Seng Cheong |
author2 | School of Computer Science and Engineering |
author_facet | School of Computer Science and Engineering Lee, Seng Cheong |
author_sort | Lee, Seng Cheong |
collection | NTU |
description | Table extraction refers to the detection and extraction of tables from documents and images while preserving their structural layout and content. With the ever-growing volume of digital files and content, there is an increasing demand for the automated extraction of tables for consumption in a programmatic format, as well as in support of advanced applications such as information retrieval and natural language processing.
This project proposes an automated pipeline for table extraction using convolutional neural networks (CNN). The pipeline consists of a table detection module, which detects the presence of tables and extract the table regions using an object detection CNN model, and a table structure recognition module, which extracts table cells and their contents before reconstructing the table structure. To enhance performance of the table detection module, modifications were implemented into the table detection model and evaluated against their non-modified versions.
The report will first review existing literature for table detection and table structure recognition. Next, the report introduces the datasets utilized for training, as well as data augmentation methods, the architectures utilized in the evaluation of single-stage approaches and experiments on modifications carried out to improve performance. The evaluation metrics and results will then be presented and discussed. Several experiments carried out in this project were discovered to show promising results over their non-modified counterparts. Additionally, the pipeline was successfully demonstrated to perform table extraction, thus demonstrating the viability of the overall process. |
first_indexed | 2024-10-01T02:43:51Z |
format | Final Year Project (FYP) |
id | ntu-10356/136597 |
institution | Nanyang Technological University |
language | English |
last_indexed | 2024-10-01T02:43:51Z |
publishDate | 2020 |
publisher | Nanyang Technological University |
record_format | dspace |
spelling | ntu-10356/1365972020-01-06T06:18:29Z A novel pipeline for table extraction using deep learning Lee, Seng Cheong School of Computer Science and Engineering Loke Yuan Ren yrloke@ntu.edu.sg Engineering::Computer science and engineering Table extraction refers to the detection and extraction of tables from documents and images while preserving their structural layout and content. With the ever-growing volume of digital files and content, there is an increasing demand for the automated extraction of tables for consumption in a programmatic format, as well as in support of advanced applications such as information retrieval and natural language processing. This project proposes an automated pipeline for table extraction using convolutional neural networks (CNN). The pipeline consists of a table detection module, which detects the presence of tables and extract the table regions using an object detection CNN model, and a table structure recognition module, which extracts table cells and their contents before reconstructing the table structure. To enhance performance of the table detection module, modifications were implemented into the table detection model and evaluated against their non-modified versions. The report will first review existing literature for table detection and table structure recognition. Next, the report introduces the datasets utilized for training, as well as data augmentation methods, the architectures utilized in the evaluation of single-stage approaches and experiments on modifications carried out to improve performance. The evaluation metrics and results will then be presented and discussed. Several experiments carried out in this project were discovered to show promising results over their non-modified counterparts. Additionally, the pipeline was successfully demonstrated to perform table extraction, thus demonstrating the viability of the overall process. Bachelor of Engineering (Computer Science) 2020-01-06T06:17:30Z 2020-01-06T06:17:30Z 2019 Final Year Project (FYP) https://hdl.handle.net/10356/136597 en application/pdf application/pdf text/html Nanyang Technological University |
spellingShingle | Engineering::Computer science and engineering Lee, Seng Cheong A novel pipeline for table extraction using deep learning |
title | A novel pipeline for table extraction using deep learning |
title_full | A novel pipeline for table extraction using deep learning |
title_fullStr | A novel pipeline for table extraction using deep learning |
title_full_unstemmed | A novel pipeline for table extraction using deep learning |
title_short | A novel pipeline for table extraction using deep learning |
title_sort | novel pipeline for table extraction using deep learning |
topic | Engineering::Computer science and engineering |
url | https://hdl.handle.net/10356/136597 |
work_keys_str_mv | AT leesengcheong anovelpipelinefortableextractionusingdeeplearning AT leesengcheong novelpipelinefortableextractionusingdeeplearning |