Correlation-based outlier detection for ships’ in-service datasets

Abstract With the advent of big data, it has become increasingly difficult to obtain high-quality data. Solutions are required to remove undesired outlier samples from massively large datasets. Ship operators rely on high-frequency in-service datasets recorded onboard the ships for monitoring the pe...

Descripción completa

Detalles Bibliográficos
Autores principales: Prateek Gupta, Adil Rasheed, Sverre Steen
Formato: Artículo
Lenguaje:English
Publicado: SpringerOpen 2024-06-01
Colección:Journal of Big Data
Materias:
Acceso en línea:https://doi.org/10.1186/s40537-024-00937-2
Descripción
Sumario:Abstract With the advent of big data, it has become increasingly difficult to obtain high-quality data. Solutions are required to remove undesired outlier samples from massively large datasets. Ship operators rely on high-frequency in-service datasets recorded onboard the ships for monitoring the performance of their fleet. The large in-service datasets are known to be highly unbalanced, making it difficult to adopt ordinary outlier detection techniques, as they would also result in the removal of rare but quite valuable data samples. Thus, the current work proposes to establish a correlation-based outlier detection scheme for ships’ in-service datasets using two well-known dimensionality reduction methods, namely, Principal Component Analysis (PCA) and Autoencoders. The correlation-based approach detects samples which do not fit the prominent correlations present in the dataset and avoids misidentifying the rare but correlation-following samples in the sparse regions of data domain. The study also attempts to provide the physical meaning of the latent variables obtained using PCA. The effectiveness of the proposed methodology is proven using an actual dataset recorded onboard a ship.
ISSN:2196-1115