Cloud Spark Cluster to analyse English prescription big data for NHS intelligence
Spark is a large-scale data processing engine that is at least a hundred times faster than the Hadoop big data processing engine. Even though Spark is a complete in-memory framework, although limited with its big data platforms facilities compared to Hadoop, Spark analytics engine with Ha-doop distr...
Main Authors: | , , |
---|---|
Format: | Conference or Workshop Item |
Language: | English |
Published: |
Springer
2024
|
Subjects: | |
Online Access: | https://repository.londonmet.ac.uk/8957/5/Analysis%20of%20English%20Prescription%20Big%20Data%20with%20Cloud%20Cluster%20for%20NHS%20Intelligence%20Final2.pdf |
_version_ | 1804072960024117248 |
---|---|
author | Fernando, Sandra Sowinski-Mydlarz, Victor Virdee, Bal Singh |
author_facet | Fernando, Sandra Sowinski-Mydlarz, Victor Virdee, Bal Singh |
author_sort | Fernando, Sandra |
collection | LMU |
description | Spark is a large-scale data processing engine that is at least a hundred times faster than the Hadoop big data processing engine. Even though Spark is a complete in-memory framework, although limited with its big data platforms facilities compared to Hadoop, Spark analytics engine with Ha-doop distributed file system gives better throughput than Hadoop alone. The main contribution of this paper is the insight into the behaviour of HDFS-based Azura Cloud Spark Cluster with discus-sion and evaluation of its strengths and limitations using NHS prescription large dataset. Data on NHS prescriptions obtained from 2015 to April 2022 exceeds 500 GB of records. A public dash-board for individual BNF code analysis and studies on NHS cost analysis exist, but no analysis of this data range and volume of NHS prescription and especially using new big data processing en-gines such as Spark, was conducted. This study also contributes descriptive statistics and machine learning models of prescription data trends using Cloud Spark engine, and PySpark technology that has not been used in this context before. This study illustrates regions as well as GP practices in terms of reimbursement cost, drug consumption level, the type of the drug, and the disease type; varied demand for dispensed chemical substances over the years; shows what diseases have increased or decreased over the years as well as the total cost and its trends. |
first_indexed | 2024-07-09T04:07:26Z |
format | Conference or Workshop Item |
id | oai:repository.londonmet.ac.uk:8957 |
institution | London Metropolitan University |
language | English |
last_indexed | 2024-07-09T04:07:26Z |
publishDate | 2024 |
publisher | Springer |
record_format | eprints |
spelling | oai:repository.londonmet.ac.uk:89572024-01-30T14:19:58Z http://repository.londonmet.ac.uk/8957/ Cloud Spark Cluster to analyse English prescription big data for NHS intelligence Fernando, Sandra Sowinski-Mydlarz, Victor Virdee, Bal Singh 000 Computer science, information & general works Spark is a large-scale data processing engine that is at least a hundred times faster than the Hadoop big data processing engine. Even though Spark is a complete in-memory framework, although limited with its big data platforms facilities compared to Hadoop, Spark analytics engine with Ha-doop distributed file system gives better throughput than Hadoop alone. The main contribution of this paper is the insight into the behaviour of HDFS-based Azura Cloud Spark Cluster with discus-sion and evaluation of its strengths and limitations using NHS prescription large dataset. Data on NHS prescriptions obtained from 2015 to April 2022 exceeds 500 GB of records. A public dash-board for individual BNF code analysis and studies on NHS cost analysis exist, but no analysis of this data range and volume of NHS prescription and especially using new big data processing en-gines such as Spark, was conducted. This study also contributes descriptive statistics and machine learning models of prescription data trends using Cloud Spark engine, and PySpark technology that has not been used in this context before. This study illustrates regions as well as GP practices in terms of reimbursement cost, drug consumption level, the type of the drug, and the disease type; varied demand for dispensed chemical substances over the years; shows what diseases have increased or decreased over the years as well as the total cost and its trends. Springer 2024-01-14 Conference or Workshop Item PeerReviewed text en https://repository.londonmet.ac.uk/8957/5/Analysis%20of%20English%20Prescription%20Big%20Data%20with%20Cloud%20Cluster%20for%20NHS%20Intelligence%20Final2.pdf Fernando, Sandra, Sowinski-Mydlarz, Victor and Virdee, Bal Singh (2024) Cloud Spark Cluster to analyse English prescription big data for NHS intelligence. In: ICDAM2023, 23-24 June 2023, London Metropolitan University - London, UK. https://link.springer.com/chapter/10.1007/978-981-99-6544-1_27 10.1007/978-981-99-6544-1_27 |
spellingShingle | 000 Computer science, information & general works Fernando, Sandra Sowinski-Mydlarz, Victor Virdee, Bal Singh Cloud Spark Cluster to analyse English prescription big data for NHS intelligence |
title | Cloud Spark Cluster to analyse English prescription big data for NHS intelligence |
title_full | Cloud Spark Cluster to analyse English prescription big data for NHS intelligence |
title_fullStr | Cloud Spark Cluster to analyse English prescription big data for NHS intelligence |
title_full_unstemmed | Cloud Spark Cluster to analyse English prescription big data for NHS intelligence |
title_short | Cloud Spark Cluster to analyse English prescription big data for NHS intelligence |
title_sort | cloud spark cluster to analyse english prescription big data for nhs intelligence |
topic | 000 Computer science, information & general works |
url | https://repository.londonmet.ac.uk/8957/5/Analysis%20of%20English%20Prescription%20Big%20Data%20with%20Cloud%20Cluster%20for%20NHS%20Intelligence%20Final2.pdf |
work_keys_str_mv | AT fernandosandra cloudsparkclustertoanalyseenglishprescriptionbigdatafornhsintelligence AT sowinskimydlarzvictor cloudsparkclustertoanalyseenglishprescriptionbigdatafornhsintelligence AT virdeebalsingh cloudsparkclustertoanalyseenglishprescriptionbigdatafornhsintelligence |