Data Discovery and Anomaly Detection Using Atypicality for Real-Valued Data

The aim of using atypicality is to extract small, rare, unusual and interesting pieces out of big data. This complements statistics about typical data to give insight into data. In order to find such “interesting„ parts of data, universal approaches are required, since it is not...

Full description

Bibliographic Details
Main Authors: Elyas Sabeti, Anders Høst-Madsen
Format: Article
Language:English
Published: MDPI AG 2019-02-01
Series:Entropy
Subjects:
Online Access:https://www.mdpi.com/1099-4300/21/3/219
_version_ 1798003719319584768
author Elyas Sabeti
Anders Høst-Madsen
author_facet Elyas Sabeti
Anders Høst-Madsen
author_sort Elyas Sabeti
collection DOAJ
description The aim of using atypicality is to extract small, rare, unusual and interesting pieces out of big data. This complements statistics about typical data to give insight into data. In order to find such “interesting„ parts of data, universal approaches are required, since it is not known in advance what we are looking for. We therefore base the atypicality criterion on codelength. In a prior paper we developed the methodology for discrete-valued data, and the current paper extends this to real-valued data. This is done by using minimum description length (MDL). We develop the information-theoretic methodology for a number of “universal„ signal processing models, and finally apply them to recorded hydrophone data and heart rate variability (HRV) signal.
first_indexed 2024-04-11T12:12:14Z
format Article
id doaj.art-cf1b0e9d24e6435e9676b82a1efa3299
institution Directory Open Access Journal
issn 1099-4300
language English
last_indexed 2024-04-11T12:12:14Z
publishDate 2019-02-01
publisher MDPI AG
record_format Article
series Entropy
spelling doaj.art-cf1b0e9d24e6435e9676b82a1efa32992022-12-22T04:24:34ZengMDPI AGEntropy1099-43002019-02-0121321910.3390/e21030219e21030219Data Discovery and Anomaly Detection Using Atypicality for Real-Valued DataElyas Sabeti0Anders Høst-Madsen1Department of Computational Medicine and Bioinformatics, University of Michigan, NCRC 10-A108, 2800 Plymouth Rd, Ann Arbor, MI 48109-2800, USADepartment of Electrical Engineering, University of Hawaii at Manoa, Honolulu, HI 96822, USAThe aim of using atypicality is to extract small, rare, unusual and interesting pieces out of big data. This complements statistics about typical data to give insight into data. In order to find such “interesting„ parts of data, universal approaches are required, since it is not known in advance what we are looking for. We therefore base the atypicality criterion on codelength. In a prior paper we developed the methodology for discrete-valued data, and the current paper extends this to real-valued data. This is done by using minimum description length (MDL). We develop the information-theoretic methodology for a number of “universal„ signal processing models, and finally apply them to recorded hydrophone data and heart rate variability (HRV) signal.https://www.mdpi.com/1099-4300/21/3/219atypicalityminimum description lengthbig datacodelength
spellingShingle Elyas Sabeti
Anders Høst-Madsen
Data Discovery and Anomaly Detection Using Atypicality for Real-Valued Data
Entropy
atypicality
minimum description length
big data
codelength
title Data Discovery and Anomaly Detection Using Atypicality for Real-Valued Data
title_full Data Discovery and Anomaly Detection Using Atypicality for Real-Valued Data
title_fullStr Data Discovery and Anomaly Detection Using Atypicality for Real-Valued Data
title_full_unstemmed Data Discovery and Anomaly Detection Using Atypicality for Real-Valued Data
title_short Data Discovery and Anomaly Detection Using Atypicality for Real-Valued Data
title_sort data discovery and anomaly detection using atypicality for real valued data
topic atypicality
minimum description length
big data
codelength
url https://www.mdpi.com/1099-4300/21/3/219
work_keys_str_mv AT elyassabeti datadiscoveryandanomalydetectionusingatypicalityforrealvalueddata
AT andershøstmadsen datadiscoveryandanomalydetectionusingatypicalityforrealvalueddata