Cloud Spark Cluster to analyse English prescription big data for NHS intelligence

Spark is a large-scale data processing engine that is at least a hundred times faster than the Hadoop big data processing engine. Even though Spark is a complete in-memory framework, although limited with its big data platforms facilities compared to Hadoop, Spark analytics engine with Ha-doop distr...

Full description

Bibliographic Details
Main Authors: Fernando, Sandra, Sowinski-Mydlarz, Victor, Virdee, Bal Singh
Format: Conference or Workshop Item
Language:English
Published: Springer 2024
Subjects:
Online Access:https://repository.londonmet.ac.uk/8957/5/Analysis%20of%20English%20Prescription%20Big%20Data%20with%20Cloud%20Cluster%20for%20NHS%20Intelligence%20Final2.pdf
_version_ 1804072960024117248
author Fernando, Sandra
Sowinski-Mydlarz, Victor
Virdee, Bal Singh
author_facet Fernando, Sandra
Sowinski-Mydlarz, Victor
Virdee, Bal Singh
author_sort Fernando, Sandra
collection LMU
description Spark is a large-scale data processing engine that is at least a hundred times faster than the Hadoop big data processing engine. Even though Spark is a complete in-memory framework, although limited with its big data platforms facilities compared to Hadoop, Spark analytics engine with Ha-doop distributed file system gives better throughput than Hadoop alone. The main contribution of this paper is the insight into the behaviour of HDFS-based Azura Cloud Spark Cluster with discus-sion and evaluation of its strengths and limitations using NHS prescription large dataset. Data on NHS prescriptions obtained from 2015 to April 2022 exceeds 500 GB of records. A public dash-board for individual BNF code analysis and studies on NHS cost analysis exist, but no analysis of this data range and volume of NHS prescription and especially using new big data processing en-gines such as Spark, was conducted. This study also contributes descriptive statistics and machine learning models of prescription data trends using Cloud Spark engine, and PySpark technology that has not been used in this context before. This study illustrates regions as well as GP practices in terms of reimbursement cost, drug consumption level, the type of the drug, and the disease type; varied demand for dispensed chemical substances over the years; shows what diseases have increased or decreased over the years as well as the total cost and its trends.
first_indexed 2024-07-09T04:07:26Z
format Conference or Workshop Item
id oai:repository.londonmet.ac.uk:8957
institution London Metropolitan University
language English
last_indexed 2024-07-09T04:07:26Z
publishDate 2024
publisher Springer
record_format eprints
spelling oai:repository.londonmet.ac.uk:89572024-01-30T14:19:58Z http://repository.londonmet.ac.uk/8957/ Cloud Spark Cluster to analyse English prescription big data for NHS intelligence Fernando, Sandra Sowinski-Mydlarz, Victor Virdee, Bal Singh 000 Computer science, information & general works Spark is a large-scale data processing engine that is at least a hundred times faster than the Hadoop big data processing engine. Even though Spark is a complete in-memory framework, although limited with its big data platforms facilities compared to Hadoop, Spark analytics engine with Ha-doop distributed file system gives better throughput than Hadoop alone. The main contribution of this paper is the insight into the behaviour of HDFS-based Azura Cloud Spark Cluster with discus-sion and evaluation of its strengths and limitations using NHS prescription large dataset. Data on NHS prescriptions obtained from 2015 to April 2022 exceeds 500 GB of records. A public dash-board for individual BNF code analysis and studies on NHS cost analysis exist, but no analysis of this data range and volume of NHS prescription and especially using new big data processing en-gines such as Spark, was conducted. This study also contributes descriptive statistics and machine learning models of prescription data trends using Cloud Spark engine, and PySpark technology that has not been used in this context before. This study illustrates regions as well as GP practices in terms of reimbursement cost, drug consumption level, the type of the drug, and the disease type; varied demand for dispensed chemical substances over the years; shows what diseases have increased or decreased over the years as well as the total cost and its trends. Springer 2024-01-14 Conference or Workshop Item PeerReviewed text en https://repository.londonmet.ac.uk/8957/5/Analysis%20of%20English%20Prescription%20Big%20Data%20with%20Cloud%20Cluster%20for%20NHS%20Intelligence%20Final2.pdf Fernando, Sandra, Sowinski-Mydlarz, Victor and Virdee, Bal Singh (2024) Cloud Spark Cluster to analyse English prescription big data for NHS intelligence. In: ICDAM2023, 23-24 June 2023, London Metropolitan University - London, UK. https://link.springer.com/chapter/10.1007/978-981-99-6544-1_27 10.1007/978-981-99-6544-1_27
spellingShingle 000 Computer science, information & general works
Fernando, Sandra
Sowinski-Mydlarz, Victor
Virdee, Bal Singh
Cloud Spark Cluster to analyse English prescription big data for NHS intelligence
title Cloud Spark Cluster to analyse English prescription big data for NHS intelligence
title_full Cloud Spark Cluster to analyse English prescription big data for NHS intelligence
title_fullStr Cloud Spark Cluster to analyse English prescription big data for NHS intelligence
title_full_unstemmed Cloud Spark Cluster to analyse English prescription big data for NHS intelligence
title_short Cloud Spark Cluster to analyse English prescription big data for NHS intelligence
title_sort cloud spark cluster to analyse english prescription big data for nhs intelligence
topic 000 Computer science, information & general works
url https://repository.londonmet.ac.uk/8957/5/Analysis%20of%20English%20Prescription%20Big%20Data%20with%20Cloud%20Cluster%20for%20NHS%20Intelligence%20Final2.pdf
work_keys_str_mv AT fernandosandra cloudsparkclustertoanalyseenglishprescriptionbigdatafornhsintelligence
AT sowinskimydlarzvictor cloudsparkclustertoanalyseenglishprescriptionbigdatafornhsintelligence
AT virdeebalsingh cloudsparkclustertoanalyseenglishprescriptionbigdatafornhsintelligence