A Highly Accurate PDF-To-Text Conversion System for Academic Papers Using Natural Language Processing Approach
Extracting text out of PDF documents is never an easy task when a higher degree of accuracy and consistency are the two main criteria to be attained. Although, there exist a considerable number of such systems; however, most of them are falling short of offering desirable performance especially when...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
American Scientific Publisher
2018
|
Subjects: | |
Online Access: | http://umpir.ump.edu.my/id/eprint/21838/1/A%20Highly%20Accurate%20PDF.pdf |
_version_ | 1796992834168946688 |
---|---|
author | Yong, Tien Fui Azad, Saiful Rahman, Mohammed Mostafizur Kamal Z., Zamli Gollam, Rabby |
author_facet | Yong, Tien Fui Azad, Saiful Rahman, Mohammed Mostafizur Kamal Z., Zamli Gollam, Rabby |
author_sort | Yong, Tien Fui |
collection | UMP |
description | Extracting text out of PDF documents is never an easy task when a higher degree of accuracy and consistency are the two main criteria to be attained. Although, there exist a considerable number of such systems; however, most of them are falling short of offering desirable performance especially when academic literature is the concern. Researches, those involved heavily in text mining and project analyzing, need an accurate and consistent supporting tool for PDF-To-Text (PTT) conversion. Therefore, in this paper, we propose a Natural Language Processing based PDF-to-text (NLPDF) conversion system, which comprises of two major steps, namely (i) reads contents from the PDF and (ii) reconstruct the text. The performance of the proposed system is evaluated via four metrics, namely Precision, Recall, F -Measure (AF), and standard deviation, and compared with eight other similar benchmarked systems available in the market. The experimental results evidently demonstrate the effectiveness of the proposed system. |
first_indexed | 2024-03-06T12:25:31Z |
format | Article |
id | UMPir21838 |
institution | Universiti Malaysia Pahang |
language | English |
last_indexed | 2024-03-06T12:25:31Z |
publishDate | 2018 |
publisher | American Scientific Publisher |
record_format | dspace |
spelling | UMPir218382018-11-29T03:06:43Z http://umpir.ump.edu.my/id/eprint/21838/ A Highly Accurate PDF-To-Text Conversion System for Academic Papers Using Natural Language Processing Approach Yong, Tien Fui Azad, Saiful Rahman, Mohammed Mostafizur Kamal Z., Zamli Gollam, Rabby QA76 Computer software Extracting text out of PDF documents is never an easy task when a higher degree of accuracy and consistency are the two main criteria to be attained. Although, there exist a considerable number of such systems; however, most of them are falling short of offering desirable performance especially when academic literature is the concern. Researches, those involved heavily in text mining and project analyzing, need an accurate and consistent supporting tool for PDF-To-Text (PTT) conversion. Therefore, in this paper, we propose a Natural Language Processing based PDF-to-text (NLPDF) conversion system, which comprises of two major steps, namely (i) reads contents from the PDF and (ii) reconstruct the text. The performance of the proposed system is evaluated via four metrics, namely Precision, Recall, F -Measure (AF), and standard deviation, and compared with eight other similar benchmarked systems available in the market. The experimental results evidently demonstrate the effectiveness of the proposed system. American Scientific Publisher 2018-10-01 Article PeerReviewed pdf en http://umpir.ump.edu.my/id/eprint/21838/1/A%20Highly%20Accurate%20PDF.pdf Yong, Tien Fui and Azad, Saiful and Rahman, Mohammed Mostafizur and Kamal Z., Zamli and Gollam, Rabby (2018) A Highly Accurate PDF-To-Text Conversion System for Academic Papers Using Natural Language Processing Approach. Advanced Science Letters, 24 (10). pp. 7844-7849. ISSN 1936-6612. (Published) https://doi.org/10.1166/asl.2018.13029 doi:10.1166/asl.2018.13029 |
spellingShingle | QA76 Computer software Yong, Tien Fui Azad, Saiful Rahman, Mohammed Mostafizur Kamal Z., Zamli Gollam, Rabby A Highly Accurate PDF-To-Text Conversion System for Academic Papers Using Natural Language Processing Approach |
title | A Highly Accurate PDF-To-Text Conversion System for Academic Papers Using Natural Language Processing Approach |
title_full | A Highly Accurate PDF-To-Text Conversion System for Academic Papers Using Natural Language Processing Approach |
title_fullStr | A Highly Accurate PDF-To-Text Conversion System for Academic Papers Using Natural Language Processing Approach |
title_full_unstemmed | A Highly Accurate PDF-To-Text Conversion System for Academic Papers Using Natural Language Processing Approach |
title_short | A Highly Accurate PDF-To-Text Conversion System for Academic Papers Using Natural Language Processing Approach |
title_sort | highly accurate pdf to text conversion system for academic papers using natural language processing approach |
topic | QA76 Computer software |
url | http://umpir.ump.edu.my/id/eprint/21838/1/A%20Highly%20Accurate%20PDF.pdf |
work_keys_str_mv | AT yongtienfui ahighlyaccuratepdftotextconversionsystemforacademicpapersusingnaturallanguageprocessingapproach AT azadsaiful ahighlyaccuratepdftotextconversionsystemforacademicpapersusingnaturallanguageprocessingapproach AT rahmanmohammedmostafizur ahighlyaccuratepdftotextconversionsystemforacademicpapersusingnaturallanguageprocessingapproach AT kamalzzamli ahighlyaccuratepdftotextconversionsystemforacademicpapersusingnaturallanguageprocessingapproach AT gollamrabby ahighlyaccuratepdftotextconversionsystemforacademicpapersusingnaturallanguageprocessingapproach AT yongtienfui highlyaccuratepdftotextconversionsystemforacademicpapersusingnaturallanguageprocessingapproach AT azadsaiful highlyaccuratepdftotextconversionsystemforacademicpapersusingnaturallanguageprocessingapproach AT rahmanmohammedmostafizur highlyaccuratepdftotextconversionsystemforacademicpapersusingnaturallanguageprocessingapproach AT kamalzzamli highlyaccuratepdftotextconversionsystemforacademicpapersusingnaturallanguageprocessingapproach AT gollamrabby highlyaccuratepdftotextconversionsystemforacademicpapersusingnaturallanguageprocessingapproach |