Autoencoding Galaxy Spectra. II. Redshift Invariance and Outlier Detection
We present an unsupervised outlier detection method for galaxy spectra based on the spectrum autoencoder architecture spender , which reliably captures spectral features and provides highly realistic reconstructions for SDSS galaxy spectra. We interpret the sample density in the autoencoder latent s...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IOP Publishing
2023-01-01
|
Series: | The Astronomical Journal |
Subjects: | |
Online Access: | https://doi.org/10.3847/1538-3881/ace100 |
_version_ | 1797698044037169152 |
---|---|
author | Yan Liang Peter Melchior Sicong Lu Andy Goulding Charlotte Ward |
author_facet | Yan Liang Peter Melchior Sicong Lu Andy Goulding Charlotte Ward |
author_sort | Yan Liang |
collection | DOAJ |
description | We present an unsupervised outlier detection method for galaxy spectra based on the spectrum autoencoder architecture spender , which reliably captures spectral features and provides highly realistic reconstructions for SDSS galaxy spectra. We interpret the sample density in the autoencoder latent space as a probability distribution, and identify outliers as low-probability objects with a normalizing flow. However, we found that the latent-space position is not, as expected from the architecture, redshift invariant, which introduces stochasticity into the latent space and the outlier detection method. We solve this problem by adding two novel loss terms during training, which explicitly link latent-space distances to data-space distances, preserving locality in the autoencoding process. Minimizing the additional losses leads to a redshift-invariant, nondegenerate latent-space distribution with clear separations between common and anomalous data. We inspect the spectra with the lowest probability and find them to include blends with foreground stars, extremely reddened galaxies, galaxy pairs and triples, and stars that are misclassified as galaxies. We release the newly trained spender model and the latent-space probability for the entire SDSS-I galaxy sample to aid further investigations. |
first_indexed | 2024-03-12T03:49:01Z |
format | Article |
id | doaj.art-fbc3ccb893ce413e9e681d1a7bc433b5 |
institution | Directory Open Access Journal |
issn | 1538-3881 |
language | English |
last_indexed | 2024-03-12T03:49:01Z |
publishDate | 2023-01-01 |
publisher | IOP Publishing |
record_format | Article |
series | The Astronomical Journal |
spelling | doaj.art-fbc3ccb893ce413e9e681d1a7bc433b52023-09-03T12:35:53ZengIOP PublishingThe Astronomical Journal1538-38812023-01-0116627510.3847/1538-3881/ace100Autoencoding Galaxy Spectra. II. Redshift Invariance and Outlier DetectionYan Liang0https://orcid.org/0000-0002-1001-1235Peter Melchior1https://orcid.org/0000-0002-8873-5065Sicong Lu2https://orcid.org/0000-0002-8814-1670Andy Goulding3https://orcid.org/0000-0003-4700-663XCharlotte Ward4https://orcid.org/0000-0002-4557-6682Department of Astrophysical Sciences, Princeton University , Princeton, NJ 08544, USA ; yanliang@princeton.eduDepartment of Astrophysical Sciences, Princeton University , Princeton, NJ 08544, USA ; yanliang@princeton.edu; Center for Statistics & Machine Learning, Princeton University , Princeton, NJ 08544, USADepartment of Physics and Astronomy, University of Pennsylvania , Philadelphia, PA 19104, USADepartment of Astrophysical Sciences, Princeton University , Princeton, NJ 08544, USA ; yanliang@princeton.eduDepartment of Astrophysical Sciences, Princeton University , Princeton, NJ 08544, USA ; yanliang@princeton.eduWe present an unsupervised outlier detection method for galaxy spectra based on the spectrum autoencoder architecture spender , which reliably captures spectral features and provides highly realistic reconstructions for SDSS galaxy spectra. We interpret the sample density in the autoencoder latent space as a probability distribution, and identify outliers as low-probability objects with a normalizing flow. However, we found that the latent-space position is not, as expected from the architecture, redshift invariant, which introduces stochasticity into the latent space and the outlier detection method. We solve this problem by adding two novel loss terms during training, which explicitly link latent-space distances to data-space distances, preserving locality in the autoencoding process. Minimizing the additional losses leads to a redshift-invariant, nondegenerate latent-space distribution with clear separations between common and anomalous data. We inspect the spectra with the lowest probability and find them to include blends with foreground stars, extremely reddened galaxies, galaxy pairs and triples, and stars that are misclassified as galaxies. We release the newly trained spender model and the latent-space probability for the entire SDSS-I galaxy sample to aid further investigations.https://doi.org/10.3847/1538-3881/ace100GalaxiesSpectroscopyAstrostatistics |
spellingShingle | Yan Liang Peter Melchior Sicong Lu Andy Goulding Charlotte Ward Autoencoding Galaxy Spectra. II. Redshift Invariance and Outlier Detection The Astronomical Journal Galaxies Spectroscopy Astrostatistics |
title | Autoencoding Galaxy Spectra. II. Redshift Invariance and Outlier Detection |
title_full | Autoencoding Galaxy Spectra. II. Redshift Invariance and Outlier Detection |
title_fullStr | Autoencoding Galaxy Spectra. II. Redshift Invariance and Outlier Detection |
title_full_unstemmed | Autoencoding Galaxy Spectra. II. Redshift Invariance and Outlier Detection |
title_short | Autoencoding Galaxy Spectra. II. Redshift Invariance and Outlier Detection |
title_sort | autoencoding galaxy spectra ii redshift invariance and outlier detection |
topic | Galaxies Spectroscopy Astrostatistics |
url | https://doi.org/10.3847/1538-3881/ace100 |
work_keys_str_mv | AT yanliang autoencodinggalaxyspectraiiredshiftinvarianceandoutlierdetection AT petermelchior autoencodinggalaxyspectraiiredshiftinvarianceandoutlierdetection AT siconglu autoencodinggalaxyspectraiiredshiftinvarianceandoutlierdetection AT andygoulding autoencodinggalaxyspectraiiredshiftinvarianceandoutlierdetection AT charlotteward autoencodinggalaxyspectraiiredshiftinvarianceandoutlierdetection |