Semantic representation and comparative analysis of physical activity sensor observations using MOX2-5 sensor in real and synthetic datasets: a proof-of-concept-study

Abstract The widespread use of devices like mobile phones and wearables allows for automatic monitoring of human daily activities, generating vast datasets that offer insights into long-term human behavior. A structured and controlled data collection process is essential to unlock the full potential...

Full description

Bibliographic Details
Main Authors: Ayan Chatterjee, Martin W. Gerdes, Andreas Prinz, Michael A. Riegler, Santiago G. Martinez
Format: Article
Language:English
Published: Nature Portfolio 2024-02-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-024-55183-6
_version_ 1797275133500456960
author Ayan Chatterjee
Martin W. Gerdes
Andreas Prinz
Michael A. Riegler
Santiago G. Martinez
author_facet Ayan Chatterjee
Martin W. Gerdes
Andreas Prinz
Michael A. Riegler
Santiago G. Martinez
author_sort Ayan Chatterjee
collection DOAJ
description Abstract The widespread use of devices like mobile phones and wearables allows for automatic monitoring of human daily activities, generating vast datasets that offer insights into long-term human behavior. A structured and controlled data collection process is essential to unlock the full potential of this information. While wearable sensors for physical activity monitoring have gained significant traction in healthcare, sports science, and fitness applications, securing diverse and comprehensive datasets for research and algorithm development poses a notable challenge. In this proof-of-concept study, we underscore the significance of semantic representation in enhancing data interoperability and facilitating advanced analytics for physical activity sensor observations. Our approach focuses on enhancing the usability of physical activity datasets by employing a medical-grade (CE certified) sensor to generate synthetic datasets. Additionally, we provide insights into ethical considerations related to synthetic datasets. The study conducts a comparative analysis between real and synthetic activity datasets, assessing their effectiveness in mitigating model bias and promoting fairness in predictive analysis. We have created an ontology for semantically representing observations from physical activity sensors and conducted predictive analysis on data collected using MOX2-5 activity sensors. Until now, there has been a lack of publicly available datasets for physical activity collected with MOX2-5 activity monitoring medical grade (CE certified) device. The MOX2-5 captures and transmits high-resolution data, including activity intensity, weight-bearing, sedentary, standing, low, moderate, and vigorous physical activity, as well as steps per minute. Our dataset consists of physical activity data collected from 16 adults (Male: 12; Female: 4) over a period of 30–45 days (approximately 1.5 months), yielding a relatively small volume of 539 records. To address this limitation, we employ various synthetic data generation methods, such as Gaussian Capula (GC), Conditional Tabular General Adversarial Network (CTGAN), and Tabular General Adversarial Network (TABGAN), to augment the dataset with synthetic data. For both the authentic and synthetic datasets, we have developed a Multilayer Perceptron (MLP) classification model for accurately classifying daily physical activity levels. The findings underscore the effectiveness of semantic ontology in semantic search, knowledge representation, data integration, reasoning, and capturing meaningful relationships between data. The analysis supports the hypothesis that the efficiency of predictive models improves as the volume of additional synthetic training data increases. Ontology and Generative AI hold the potential to expedite advancements in behavioral monitoring research. The data presented, encompassing both real MOX2-5 and its synthetic counterpart, serves as a valuable resource for developing robust methods in activity type classification. Furthermore, it opens avenues for exploration into research directions related to synthetic data, including model efficiency, detection of generated data, and considerations regarding data privacy.
first_indexed 2024-03-07T15:08:57Z
format Article
id doaj.art-3e7b491b6e974806bcf1e56df46b1616
institution Directory Open Access Journal
issn 2045-2322
language English
last_indexed 2024-03-07T15:08:57Z
publishDate 2024-02-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj.art-3e7b491b6e974806bcf1e56df46b16162024-03-05T18:44:02ZengNature PortfolioScientific Reports2045-23222024-02-0114112110.1038/s41598-024-55183-6Semantic representation and comparative analysis of physical activity sensor observations using MOX2-5 sensor in real and synthetic datasets: a proof-of-concept-studyAyan Chatterjee0Martin W. Gerdes1Andreas Prinz2Michael A. Riegler3Santiago G. Martinez4Department of Holistic Systems, Simula Metropolitan Center for Digital EngineeringDepartment of Information and Communication Technologies (ICT), Centre for E-Health, University of AgderDepartment of Information and Communication Technologies (ICT), Centre for E-Health, University of AgderDepartment of Holistic Systems, Simula Metropolitan Center for Digital EngineeringDepartment of Health and Nursing Science, Centre for E-Health, University of AgderAbstract The widespread use of devices like mobile phones and wearables allows for automatic monitoring of human daily activities, generating vast datasets that offer insights into long-term human behavior. A structured and controlled data collection process is essential to unlock the full potential of this information. While wearable sensors for physical activity monitoring have gained significant traction in healthcare, sports science, and fitness applications, securing diverse and comprehensive datasets for research and algorithm development poses a notable challenge. In this proof-of-concept study, we underscore the significance of semantic representation in enhancing data interoperability and facilitating advanced analytics for physical activity sensor observations. Our approach focuses on enhancing the usability of physical activity datasets by employing a medical-grade (CE certified) sensor to generate synthetic datasets. Additionally, we provide insights into ethical considerations related to synthetic datasets. The study conducts a comparative analysis between real and synthetic activity datasets, assessing their effectiveness in mitigating model bias and promoting fairness in predictive analysis. We have created an ontology for semantically representing observations from physical activity sensors and conducted predictive analysis on data collected using MOX2-5 activity sensors. Until now, there has been a lack of publicly available datasets for physical activity collected with MOX2-5 activity monitoring medical grade (CE certified) device. The MOX2-5 captures and transmits high-resolution data, including activity intensity, weight-bearing, sedentary, standing, low, moderate, and vigorous physical activity, as well as steps per minute. Our dataset consists of physical activity data collected from 16 adults (Male: 12; Female: 4) over a period of 30–45 days (approximately 1.5 months), yielding a relatively small volume of 539 records. To address this limitation, we employ various synthetic data generation methods, such as Gaussian Capula (GC), Conditional Tabular General Adversarial Network (CTGAN), and Tabular General Adversarial Network (TABGAN), to augment the dataset with synthetic data. For both the authentic and synthetic datasets, we have developed a Multilayer Perceptron (MLP) classification model for accurately classifying daily physical activity levels. The findings underscore the effectiveness of semantic ontology in semantic search, knowledge representation, data integration, reasoning, and capturing meaningful relationships between data. The analysis supports the hypothesis that the efficiency of predictive models improves as the volume of additional synthetic training data increases. Ontology and Generative AI hold the potential to expedite advancements in behavioral monitoring research. The data presented, encompassing both real MOX2-5 and its synthetic counterpart, serves as a valuable resource for developing robust methods in activity type classification. Furthermore, it opens avenues for exploration into research directions related to synthetic data, including model efficiency, detection of generated data, and considerations regarding data privacy.https://doi.org/10.1038/s41598-024-55183-6Semantic ontologySemantic sensor networkGeneral adversarial networkGaussian CapulaMOX2-5Multilayer perceptron
spellingShingle Ayan Chatterjee
Martin W. Gerdes
Andreas Prinz
Michael A. Riegler
Santiago G. Martinez
Semantic representation and comparative analysis of physical activity sensor observations using MOX2-5 sensor in real and synthetic datasets: a proof-of-concept-study
Scientific Reports
Semantic ontology
Semantic sensor network
General adversarial network
Gaussian Capula
MOX2-5
Multilayer perceptron
title Semantic representation and comparative analysis of physical activity sensor observations using MOX2-5 sensor in real and synthetic datasets: a proof-of-concept-study
title_full Semantic representation and comparative analysis of physical activity sensor observations using MOX2-5 sensor in real and synthetic datasets: a proof-of-concept-study
title_fullStr Semantic representation and comparative analysis of physical activity sensor observations using MOX2-5 sensor in real and synthetic datasets: a proof-of-concept-study
title_full_unstemmed Semantic representation and comparative analysis of physical activity sensor observations using MOX2-5 sensor in real and synthetic datasets: a proof-of-concept-study
title_short Semantic representation and comparative analysis of physical activity sensor observations using MOX2-5 sensor in real and synthetic datasets: a proof-of-concept-study
title_sort semantic representation and comparative analysis of physical activity sensor observations using mox2 5 sensor in real and synthetic datasets a proof of concept study
topic Semantic ontology
Semantic sensor network
General adversarial network
Gaussian Capula
MOX2-5
Multilayer perceptron
url https://doi.org/10.1038/s41598-024-55183-6
work_keys_str_mv AT ayanchatterjee semanticrepresentationandcomparativeanalysisofphysicalactivitysensorobservationsusingmox25sensorinrealandsyntheticdatasetsaproofofconceptstudy
AT martinwgerdes semanticrepresentationandcomparativeanalysisofphysicalactivitysensorobservationsusingmox25sensorinrealandsyntheticdatasetsaproofofconceptstudy
AT andreasprinz semanticrepresentationandcomparativeanalysisofphysicalactivitysensorobservationsusingmox25sensorinrealandsyntheticdatasetsaproofofconceptstudy
AT michaelariegler semanticrepresentationandcomparativeanalysisofphysicalactivitysensorobservationsusingmox25sensorinrealandsyntheticdatasetsaproofofconceptstudy
AT santiagogmartinez semanticrepresentationandcomparativeanalysisofphysicalactivitysensorobservationsusingmox25sensorinrealandsyntheticdatasetsaproofofconceptstudy