Standardized Description of the Feature Extraction Process to Transform Raw Data Into Meaningful Information for Enhancing Data Reuse: Consensus Study

BackgroundDespite the many opportunities data reuse offers, its implementation presents many difficulties, and raw data cannot be reused directly. Information is not always directly available in the source database and needs to be computed afterwards with raw data for definin...

Full description

Bibliographic Details
Main Authors:	Antoine Lamer, Mathilde Fruchart, Nicolas Paris, Benjamin Popoff, Anaïs Payen, Thibaut Balcaen, William Gacquer, Guillaume Bouzillé, Marc Cuggia, Matthieu Doutreligne, Emmanuel Chazard
Format:	Article
Language:	English
Published:	JMIR Publications 2022-10-01
Series:	JMIR Medical Informatics
Online Access:	https://medinform.jmir.org/2022/10/e38936

_version_	1797734654824939520
author	Antoine Lamer Mathilde Fruchart Nicolas Paris Benjamin Popoff Anaïs Payen Thibaut Balcaen William Gacquer Guillaume Bouzillé Marc Cuggia Matthieu Doutreligne Emmanuel Chazard
author_facet	Antoine Lamer Mathilde Fruchart Nicolas Paris Benjamin Popoff Anaïs Payen Thibaut Balcaen William Gacquer Guillaume Bouzillé Marc Cuggia Matthieu Doutreligne Emmanuel Chazard
author_sort	Antoine Lamer
collection	DOAJ
description	BackgroundDespite the many opportunities data reuse offers, its implementation presents many difficulties, and raw data cannot be reused directly. Information is not always directly available in the source database and needs to be computed afterwards with raw data for defining an algorithm. ObjectiveThe main purpose of this article is to present a standardized description of the steps and transformations required during the feature extraction process when conducting retrospective observational studies. A secondary objective is to identify how the features could be stored in the schema of a data warehouse. MethodsThis study involved the following 3 main steps: (1) the collection of relevant study cases related to feature extraction and based on the automatic and secondary use of data; (2) the standardized description of raw data, steps, and transformations, which were common to the study cases; and (3) the identification of an appropriate table to store the features in the Observation Medical Outcomes Partnership (OMOP) common data model (CDM). ResultsWe interviewed 10 researchers from 3 French university hospitals and a national institution, who were involved in 8 retrospective and observational studies. Based on these studies, 2 states (track and feature) and 2 transformations (track definition and track aggregation) emerged. “Track” is a time-dependent signal or period of interest, defined by a statistical unit, a value, and 2 milestones (a start event and an end event). “Feature” is time-independent high-level information with dimensionality identical to the statistical unit of the study, defined by a label and a value. The time dimension has become implicit in the value or name of the variable. We propose the 2 tables “TRACK” and “FEATURE” to store variables obtained in feature extraction and extend the OMOP CDM. ConclusionsWe propose a standardized description of the feature extraction process. The process combined the 2 steps of track definition and track aggregation. By dividing the feature extraction into these 2 steps, difficulty was managed during track definition. The standardization of tracks requires great expertise with regard to the data, but allows the application of an infinite number of complex transformations. On the contrary, track aggregation is a very simple operation with a finite number of possibilities. A complete description of these steps could enhance the reproducibility of retrospective studies.
first_indexed	2024-03-12T12:47:28Z
format	Article
id	doaj.art-816219f409854debbc633510d0fb41e3
institution	Directory Open Access Journal
issn	2291-9694
language	English
last_indexed	2024-03-12T12:47:28Z
publishDate	2022-10-01
publisher	JMIR Publications
record_format	Article
series	JMIR Medical Informatics
spelling	doaj.art-816219f409854debbc633510d0fb41e32023-08-28T23:16:30ZengJMIR PublicationsJMIR Medical Informatics2291-96942022-10-011010e3893610.2196/38936Standardized Description of the Feature Extraction Process to Transform Raw Data Into Meaningful Information for Enhancing Data Reuse: Consensus StudyAntoine Lamerhttps://orcid.org/0000-0002-9546-1808Mathilde Frucharthttps://orcid.org/0000-0002-7826-0713Nicolas Parishttps://orcid.org/0000-0002-1533-5087Benjamin Popoffhttps://orcid.org/0000-0003-2854-0909Anaïs Payenhttps://orcid.org/0000-0003-3311-324XThibaut Balcaenhttps://orcid.org/0000-0003-3694-4508William Gacquerhttps://orcid.org/0000-0002-4876-3803Guillaume Bouzilléhttps://orcid.org/0000-0002-3637-6558Marc Cuggiahttps://orcid.org/0000-0001-6943-3937Matthieu Doutrelignehttps://orcid.org/0000-0001-8072-9966Emmanuel Chazardhttps://orcid.org/0000-0001-7841-5925 BackgroundDespite the many opportunities data reuse offers, its implementation presents many difficulties, and raw data cannot be reused directly. Information is not always directly available in the source database and needs to be computed afterwards with raw data for defining an algorithm. ObjectiveThe main purpose of this article is to present a standardized description of the steps and transformations required during the feature extraction process when conducting retrospective observational studies. A secondary objective is to identify how the features could be stored in the schema of a data warehouse. MethodsThis study involved the following 3 main steps: (1) the collection of relevant study cases related to feature extraction and based on the automatic and secondary use of data; (2) the standardized description of raw data, steps, and transformations, which were common to the study cases; and (3) the identification of an appropriate table to store the features in the Observation Medical Outcomes Partnership (OMOP) common data model (CDM). ResultsWe interviewed 10 researchers from 3 French university hospitals and a national institution, who were involved in 8 retrospective and observational studies. Based on these studies, 2 states (track and feature) and 2 transformations (track definition and track aggregation) emerged. “Track” is a time-dependent signal or period of interest, defined by a statistical unit, a value, and 2 milestones (a start event and an end event). “Feature” is time-independent high-level information with dimensionality identical to the statistical unit of the study, defined by a label and a value. The time dimension has become implicit in the value or name of the variable. We propose the 2 tables “TRACK” and “FEATURE” to store variables obtained in feature extraction and extend the OMOP CDM. ConclusionsWe propose a standardized description of the feature extraction process. The process combined the 2 steps of track definition and track aggregation. By dividing the feature extraction into these 2 steps, difficulty was managed during track definition. The standardization of tracks requires great expertise with regard to the data, but allows the application of an infinite number of complex transformations. On the contrary, track aggregation is a very simple operation with a finite number of possibilities. A complete description of these steps could enhance the reproducibility of retrospective studies.https://medinform.jmir.org/2022/10/e38936
spellingShingle	Antoine Lamer Mathilde Fruchart Nicolas Paris Benjamin Popoff Anaïs Payen Thibaut Balcaen William Gacquer Guillaume Bouzillé Marc Cuggia Matthieu Doutreligne Emmanuel Chazard Standardized Description of the Feature Extraction Process to Transform Raw Data Into Meaningful Information for Enhancing Data Reuse: Consensus Study JMIR Medical Informatics
title	Standardized Description of the Feature Extraction Process to Transform Raw Data Into Meaningful Information for Enhancing Data Reuse: Consensus Study
title_full	Standardized Description of the Feature Extraction Process to Transform Raw Data Into Meaningful Information for Enhancing Data Reuse: Consensus Study
title_fullStr	Standardized Description of the Feature Extraction Process to Transform Raw Data Into Meaningful Information for Enhancing Data Reuse: Consensus Study
title_full_unstemmed	Standardized Description of the Feature Extraction Process to Transform Raw Data Into Meaningful Information for Enhancing Data Reuse: Consensus Study
title_short	Standardized Description of the Feature Extraction Process to Transform Raw Data Into Meaningful Information for Enhancing Data Reuse: Consensus Study
title_sort	standardized description of the feature extraction process to transform raw data into meaningful information for enhancing data reuse consensus study
url	https://medinform.jmir.org/2022/10/e38936
work_keys_str_mv	AT antoinelamer standardizeddescriptionofthefeatureextractionprocesstotransformrawdataintomeaningfulinformationforenhancingdatareuseconsensusstudy AT mathildefruchart standardizeddescriptionofthefeatureextractionprocesstotransformrawdataintomeaningfulinformationforenhancingdatareuseconsensusstudy AT nicolasparis standardizeddescriptionofthefeatureextractionprocesstotransformrawdataintomeaningfulinformationforenhancingdatareuseconsensusstudy AT benjaminpopoff standardizeddescriptionofthefeatureextractionprocesstotransformrawdataintomeaningfulinformationforenhancingdatareuseconsensusstudy AT anaispayen standardizeddescriptionofthefeatureextractionprocesstotransformrawdataintomeaningfulinformationforenhancingdatareuseconsensusstudy AT thibautbalcaen standardizeddescriptionofthefeatureextractionprocesstotransformrawdataintomeaningfulinformationforenhancingdatareuseconsensusstudy AT williamgacquer standardizeddescriptionofthefeatureextractionprocesstotransformrawdataintomeaningfulinformationforenhancingdatareuseconsensusstudy AT guillaumebouzille standardizeddescriptionofthefeatureextractionprocesstotransformrawdataintomeaningfulinformationforenhancingdatareuseconsensusstudy AT marccuggia standardizeddescriptionofthefeatureextractionprocesstotransformrawdataintomeaningfulinformationforenhancingdatareuseconsensusstudy AT matthieudoutreligne standardizeddescriptionofthefeatureextractionprocesstotransformrawdataintomeaningfulinformationforenhancingdatareuseconsensusstudy AT emmanuelchazard standardizeddescriptionofthefeatureextractionprocesstotransformrawdataintomeaningfulinformationforenhancingdatareuseconsensusstudy

Standardized Description of the Feature Extraction Process to Transform Raw Data Into Meaningful Information for Enhancing Data Reuse: Consensus Study

Similar Items