Web Scraping Tool For Newspapers And Images Data Using Jsonify

Web scraping is the process of extracting data from a website in an efficient and fast way. In such a scenario, python programming can offer useful set of methods that help web editors to improve the quality of the provided service. This scraper contains three steps 1) to understand the structure o...

Full description

Bibliographic Details
Main Authors: Qingli Niu, Irfan Ali Kandhro, Anil Kumar, Shahnawaz shah, Muhammad Hasan, Hifza Mehfooz Ahmed, Fei Liang
Format: Article
Language:English
Published: Tamkang University Press 2022-09-01
Series:Journal of Applied Science and Engineering
Subjects:
Online Access:http://jase.tku.edu.tw/articles/jase-202304-26-4-0002
_version_ 1811273719613489152
author Qingli Niu
Irfan Ali Kandhro
Anil Kumar
Shahnawaz shah
Muhammad Hasan
Hifza Mehfooz Ahmed
Fei Liang
author_facet Qingli Niu
Irfan Ali Kandhro
Anil Kumar
Shahnawaz shah
Muhammad Hasan
Hifza Mehfooz Ahmed
Fei Liang
author_sort Qingli Niu
collection DOAJ
description Web scraping is the process of extracting data from a website in an efficient and fast way. In such a scenario, python programming can offer useful set of methods that help web editors to improve the quality of the provided service. This scraper contains three steps 1) to understand the structure of web page, 2) design regular expression pattern and finally use that pattern to get certain data. In this paper, we also used Flask, Request, JSONify library to get the data, after processing, the data is transformed into the JSON form and ready for CSV with help of API. After generated all required regex patterns, the system uses these patterns as a set of rules, and with this, designed scraper tool works efficiently, and achieved outstanding results with help of support libraries to storing and extracting the news and web-based information. The proposed Web scraping tool eliminates the time and effort of manually collecting or copying data by automating the process. It is found that this designed scraper is easy and direct approach to extract the newspapers, websites, blogs, and images data.
first_indexed 2024-04-12T23:04:27Z
format Article
id doaj.art-2d4b26d5ff224366a5b1b0c16eb713ab
institution Directory Open Access Journal
issn 2708-9967
2708-9975
language English
last_indexed 2024-04-12T23:04:27Z
publishDate 2022-09-01
publisher Tamkang University Press
record_format Article
series Journal of Applied Science and Engineering
spelling doaj.art-2d4b26d5ff224366a5b1b0c16eb713ab2022-12-22T03:12:57ZengTamkang University PressJournal of Applied Science and Engineering2708-99672708-99752022-09-0126446547410.6180/jase.202304_26(4).0002Web Scraping Tool For Newspapers And Images Data Using JsonifyQingli Niu0Irfan Ali Kandhro1Anil Kumar2Shahnawaz shah3Muhammad Hasan4Hifza Mehfooz Ahmed5Fei Liang6College of Information Engineering, Zhengzhou University of Science & Technology, Zhengzhou 450064, ChinaDepartment of Computer Science, Sindh Madressatul Islam University, Karachi, PakistanDepartment of Computer Science, Sindh Madressatul Islam University, Karachi, PakistanDepartment of telecommunication engineering, University of Sindh Jamshoro, PakistanDepartment of Computer Science, Sindh Madressatul Islam University, Karachi, PakistanDepartment of Computer Science, Sindh Madressatul Islam University, Karachi, PakistanCollege of Information Engineering, Zhengzhou University of Science & Technology, Zhengzhou 450064, ChinaWeb scraping is the process of extracting data from a website in an efficient and fast way. In such a scenario, python programming can offer useful set of methods that help web editors to improve the quality of the provided service. This scraper contains three steps 1) to understand the structure of web page, 2) design regular expression pattern and finally use that pattern to get certain data. In this paper, we also used Flask, Request, JSONify library to get the data, after processing, the data is transformed into the JSON form and ready for CSV with help of API. After generated all required regex patterns, the system uses these patterns as a set of rules, and with this, designed scraper tool works efficiently, and achieved outstanding results with help of support libraries to storing and extracting the news and web-based information. The proposed Web scraping tool eliminates the time and effort of manually collecting or copying data by automating the process. It is found that this designed scraper is easy and direct approach to extract the newspapers, websites, blogs, and images data.http://jase.tku.edu.tw/articles/jase-202304-26-4-0002web scrapingextractingretrievingpython frameworkapimanually collecting data
spellingShingle Qingli Niu
Irfan Ali Kandhro
Anil Kumar
Shahnawaz shah
Muhammad Hasan
Hifza Mehfooz Ahmed
Fei Liang
Web Scraping Tool For Newspapers And Images Data Using Jsonify
Journal of Applied Science and Engineering
web scraping
extracting
retrieving
python framework
api
manually collecting data
title Web Scraping Tool For Newspapers And Images Data Using Jsonify
title_full Web Scraping Tool For Newspapers And Images Data Using Jsonify
title_fullStr Web Scraping Tool For Newspapers And Images Data Using Jsonify
title_full_unstemmed Web Scraping Tool For Newspapers And Images Data Using Jsonify
title_short Web Scraping Tool For Newspapers And Images Data Using Jsonify
title_sort web scraping tool for newspapers and images data using jsonify
topic web scraping
extracting
retrieving
python framework
api
manually collecting data
url http://jase.tku.edu.tw/articles/jase-202304-26-4-0002
work_keys_str_mv AT qingliniu webscrapingtoolfornewspapersandimagesdatausingjsonify
AT irfanalikandhro webscrapingtoolfornewspapersandimagesdatausingjsonify
AT anilkumar webscrapingtoolfornewspapersandimagesdatausingjsonify
AT shahnawazshah webscrapingtoolfornewspapersandimagesdatausingjsonify
AT muhammadhasan webscrapingtoolfornewspapersandimagesdatausingjsonify
AT hifzamehfoozahmed webscrapingtoolfornewspapersandimagesdatausingjsonify
AT feiliang webscrapingtoolfornewspapersandimagesdatausingjsonify