Establishing of big data clinical dataset in brain vessel aneurysm research

Variability and heterogeneity of digital medical data requires establishing of modern algorithms which provide appropriate data processing. The aim of the study was to delineate the main steps in formation of a clinical dataset of patients with brain aneurysms from the stage of producing primary min...

Full description

Bibliographic Details
Main Authors: Ju. V. Kivelev, I. Saarenpää, A. L. Krivoshapkin
Format: Article
Language:Russian
Published: Russian Academy of Sciences, Siberian Branch Publishing House 2023-06-01
Series:Сибирский научный медицинский журнал
Subjects:
Online Access:https://sibmed.elpub.ru/jour/article/view/1112
_version_ 1797269507033530368
author Ju. V. Kivelev
I. Saarenpää
A. L. Krivoshapkin
author_facet Ju. V. Kivelev
I. Saarenpää
A. L. Krivoshapkin
author_sort Ju. V. Kivelev
collection DOAJ
description Variability and heterogeneity of digital medical data requires establishing of modern algorithms which provide appropriate data processing. The aim of the study was to delineate the main steps in formation of a clinical dataset of patients with brain aneurysms from the stage of producing primary mining specifications to formation of a final version.Material and methods. Data collection, crosschecking of the cases and analyses of dataset has been carried out in Turku University Hospital. Within last two decades available medical data at our hospital have been stored in digital data lake thus allowing automatized data mining. In frame of our study, data mining was performed by a data scientist utilizing R software. Inclusion criteria were based on a set of diagnosis which were coded in medical charts according to international classification of diseases (ICD 10).Resutls and Discussion. Primary data mining identified 3850 patients with brain aneurysms treated at our hospital from January 2000 till May 2018. After independent manual crosschecking of medical charts of these patients, we found 1218 (32 %) cases, which had no aneurysm (false-positive). Data of remaining true aneurysm-cases were divided into clinical and intensive care unit subsets where every event linked to particular date of treatment was defined as an info-unit. All the data in both subsets were structured into separate Excel files and presented in chronological order for each particular patient. Altogether, dataset included 70 000 000 rows of info-units found in 2632 patients.Conclusions. Data mining allowed establishment of detailed clinical dataset of patients with brain aneurysms. Produced mining algorithm had limitation regarding false-positive cases (32 % patients). Based on that, we recommend manual crosschecking of automatically collected dataset before statistical analysis.
first_indexed 2024-03-08T14:32:39Z
format Article
id doaj.art-180a26334a174370a463eb1095d3c4b5
institution Directory Open Access Journal
issn 2410-2512
2410-2520
language Russian
last_indexed 2024-04-25T01:49:28Z
publishDate 2023-06-01
publisher Russian Academy of Sciences, Siberian Branch Publishing House
record_format Article
series Сибирский научный медицинский журнал
spelling doaj.art-180a26334a174370a463eb1095d3c4b52024-03-07T18:50:01ZrusRussian Academy of Sciences, Siberian Branch Publishing HouseСибирский научный медицинский журнал2410-25122410-25202023-06-01433869410.18699/SSMJ20230311508Establishing of big data clinical dataset in brain vessel aneurysm researchJu. V. Kivelev0I. Saarenpää1A. L. Krivoshapkin2Turku University Hospital; European Medical CenterTurku University HospitalEuropean Medical Center; Peoples’ Friendship University of Russia (RUDN University); Meshalkin National Medical Research Center of Minzdrav of RussiaVariability and heterogeneity of digital medical data requires establishing of modern algorithms which provide appropriate data processing. The aim of the study was to delineate the main steps in formation of a clinical dataset of patients with brain aneurysms from the stage of producing primary mining specifications to formation of a final version.Material and methods. Data collection, crosschecking of the cases and analyses of dataset has been carried out in Turku University Hospital. Within last two decades available medical data at our hospital have been stored in digital data lake thus allowing automatized data mining. In frame of our study, data mining was performed by a data scientist utilizing R software. Inclusion criteria were based on a set of diagnosis which were coded in medical charts according to international classification of diseases (ICD 10).Resutls and Discussion. Primary data mining identified 3850 patients with brain aneurysms treated at our hospital from January 2000 till May 2018. After independent manual crosschecking of medical charts of these patients, we found 1218 (32 %) cases, which had no aneurysm (false-positive). Data of remaining true aneurysm-cases were divided into clinical and intensive care unit subsets where every event linked to particular date of treatment was defined as an info-unit. All the data in both subsets were structured into separate Excel files and presented in chronological order for each particular patient. Altogether, dataset included 70 000 000 rows of info-units found in 2632 patients.Conclusions. Data mining allowed establishment of detailed clinical dataset of patients with brain aneurysms. Produced mining algorithm had limitation regarding false-positive cases (32 % patients). Based on that, we recommend manual crosschecking of automatically collected dataset before statistical analysis.https://sibmed.elpub.ru/jour/article/view/1112digitalizationmedical datadatasetminingcrosschecking
spellingShingle Ju. V. Kivelev
I. Saarenpää
A. L. Krivoshapkin
Establishing of big data clinical dataset in brain vessel aneurysm research
Сибирский научный медицинский журнал
digitalization
medical data
dataset
mining
crosschecking
title Establishing of big data clinical dataset in brain vessel aneurysm research
title_full Establishing of big data clinical dataset in brain vessel aneurysm research
title_fullStr Establishing of big data clinical dataset in brain vessel aneurysm research
title_full_unstemmed Establishing of big data clinical dataset in brain vessel aneurysm research
title_short Establishing of big data clinical dataset in brain vessel aneurysm research
title_sort establishing of big data clinical dataset in brain vessel aneurysm research
topic digitalization
medical data
dataset
mining
crosschecking
url https://sibmed.elpub.ru/jour/article/view/1112
work_keys_str_mv AT juvkivelev establishingofbigdataclinicaldatasetinbrainvesselaneurysmresearch
AT isaarenpaa establishingofbigdataclinicaldatasetinbrainvesselaneurysmresearch
AT alkrivoshapkin establishingofbigdataclinicaldatasetinbrainvesselaneurysmresearch