Autoencoding Galaxy Spectra. II. Redshift Invariance and Outlier Detection

We present an unsupervised outlier detection method for galaxy spectra based on the spectrum autoencoder architecture spender , which reliably captures spectral features and provides highly realistic reconstructions for SDSS galaxy spectra. We interpret the sample density in the autoencoder latent s...

Full description

Bibliographic Details
Main Authors: Yan Liang, Peter Melchior, Sicong Lu, Andy Goulding, Charlotte Ward
Format: Article
Language:English
Published: IOP Publishing 2023-01-01
Series:The Astronomical Journal
Subjects:
Online Access:https://doi.org/10.3847/1538-3881/ace100
_version_ 1797698044037169152
author Yan Liang
Peter Melchior
Sicong Lu
Andy Goulding
Charlotte Ward
author_facet Yan Liang
Peter Melchior
Sicong Lu
Andy Goulding
Charlotte Ward
author_sort Yan Liang
collection DOAJ
description We present an unsupervised outlier detection method for galaxy spectra based on the spectrum autoencoder architecture spender , which reliably captures spectral features and provides highly realistic reconstructions for SDSS galaxy spectra. We interpret the sample density in the autoencoder latent space as a probability distribution, and identify outliers as low-probability objects with a normalizing flow. However, we found that the latent-space position is not, as expected from the architecture, redshift invariant, which introduces stochasticity into the latent space and the outlier detection method. We solve this problem by adding two novel loss terms during training, which explicitly link latent-space distances to data-space distances, preserving locality in the autoencoding process. Minimizing the additional losses leads to a redshift-invariant, nondegenerate latent-space distribution with clear separations between common and anomalous data. We inspect the spectra with the lowest probability and find them to include blends with foreground stars, extremely reddened galaxies, galaxy pairs and triples, and stars that are misclassified as galaxies. We release the newly trained spender model and the latent-space probability for the entire SDSS-I galaxy sample to aid further investigations.
first_indexed 2024-03-12T03:49:01Z
format Article
id doaj.art-fbc3ccb893ce413e9e681d1a7bc433b5
institution Directory Open Access Journal
issn 1538-3881
language English
last_indexed 2024-03-12T03:49:01Z
publishDate 2023-01-01
publisher IOP Publishing
record_format Article
series The Astronomical Journal
spelling doaj.art-fbc3ccb893ce413e9e681d1a7bc433b52023-09-03T12:35:53ZengIOP PublishingThe Astronomical Journal1538-38812023-01-0116627510.3847/1538-3881/ace100Autoencoding Galaxy Spectra. II. Redshift Invariance and Outlier DetectionYan Liang0https://orcid.org/0000-0002-1001-1235Peter Melchior1https://orcid.org/0000-0002-8873-5065Sicong Lu2https://orcid.org/0000-0002-8814-1670Andy Goulding3https://orcid.org/0000-0003-4700-663XCharlotte Ward4https://orcid.org/0000-0002-4557-6682Department of Astrophysical Sciences, Princeton University , Princeton, NJ 08544, USA ; yanliang@princeton.eduDepartment of Astrophysical Sciences, Princeton University , Princeton, NJ 08544, USA ; yanliang@princeton.edu; Center for Statistics & Machine Learning, Princeton University , Princeton, NJ 08544, USADepartment of Physics and Astronomy, University of Pennsylvania , Philadelphia, PA 19104, USADepartment of Astrophysical Sciences, Princeton University , Princeton, NJ 08544, USA ; yanliang@princeton.eduDepartment of Astrophysical Sciences, Princeton University , Princeton, NJ 08544, USA ; yanliang@princeton.eduWe present an unsupervised outlier detection method for galaxy spectra based on the spectrum autoencoder architecture spender , which reliably captures spectral features and provides highly realistic reconstructions for SDSS galaxy spectra. We interpret the sample density in the autoencoder latent space as a probability distribution, and identify outliers as low-probability objects with a normalizing flow. However, we found that the latent-space position is not, as expected from the architecture, redshift invariant, which introduces stochasticity into the latent space and the outlier detection method. We solve this problem by adding two novel loss terms during training, which explicitly link latent-space distances to data-space distances, preserving locality in the autoencoding process. Minimizing the additional losses leads to a redshift-invariant, nondegenerate latent-space distribution with clear separations between common and anomalous data. We inspect the spectra with the lowest probability and find them to include blends with foreground stars, extremely reddened galaxies, galaxy pairs and triples, and stars that are misclassified as galaxies. We release the newly trained spender model and the latent-space probability for the entire SDSS-I galaxy sample to aid further investigations.https://doi.org/10.3847/1538-3881/ace100GalaxiesSpectroscopyAstrostatistics
spellingShingle Yan Liang
Peter Melchior
Sicong Lu
Andy Goulding
Charlotte Ward
Autoencoding Galaxy Spectra. II. Redshift Invariance and Outlier Detection
The Astronomical Journal
Galaxies
Spectroscopy
Astrostatistics
title Autoencoding Galaxy Spectra. II. Redshift Invariance and Outlier Detection
title_full Autoencoding Galaxy Spectra. II. Redshift Invariance and Outlier Detection
title_fullStr Autoencoding Galaxy Spectra. II. Redshift Invariance and Outlier Detection
title_full_unstemmed Autoencoding Galaxy Spectra. II. Redshift Invariance and Outlier Detection
title_short Autoencoding Galaxy Spectra. II. Redshift Invariance and Outlier Detection
title_sort autoencoding galaxy spectra ii redshift invariance and outlier detection
topic Galaxies
Spectroscopy
Astrostatistics
url https://doi.org/10.3847/1538-3881/ace100
work_keys_str_mv AT yanliang autoencodinggalaxyspectraiiredshiftinvarianceandoutlierdetection
AT petermelchior autoencodinggalaxyspectraiiredshiftinvarianceandoutlierdetection
AT siconglu autoencodinggalaxyspectraiiredshiftinvarianceandoutlierdetection
AT andygoulding autoencodinggalaxyspectraiiredshiftinvarianceandoutlierdetection
AT charlotteward autoencodinggalaxyspectraiiredshiftinvarianceandoutlierdetection