Species complex delimitations in the genus Hedychium: A machine learning approach for cluster discovery

Premise Statistical methods used by most morphologists to validate species boundaries (such as principal component analysis [PCA] and non‐metric multidimensional scaling [nMDS]) are limiting because these methods are mostly used as visualization methods, and because the groups are identified by taxo...

Full description

Bibliographic Details
Main Authors: Preeti Saryan, Shubham Gupta, Vinita Gowda
Format: Article
Language:English
Published: Wiley 2020-07-01
Series:Applications in Plant Sciences
Subjects:
Online Access:https://doi.org/10.1002/aps3.11377
_version_ 1818152268912918528
author Preeti Saryan
Shubham Gupta
Vinita Gowda
author_facet Preeti Saryan
Shubham Gupta
Vinita Gowda
author_sort Preeti Saryan
collection DOAJ
description Premise Statistical methods used by most morphologists to validate species boundaries (such as principal component analysis [PCA] and non‐metric multidimensional scaling [nMDS]) are limiting because these methods are mostly used as visualization methods, and because the groups are identified by taxonomists (i.e., supervised), adding human bias. Here, we use a spectral clustering algorithm for the unsupervised discovery of species boundaries followed by the analysis of the cluster‐defining characters. Methods We used spectral clustering, nMDS, and PCA on 16 morphological characters within the genus Hedychium to group 93 individuals from 10 taxa. A radial basis function kernel was used for the spectral clustering with user‐specified tuning values (gamma). The goodness of the discovered clusters using each gamma value was quantified using eigengap, a normalized mutual information score, and the Rand index. Finally, mutual information–based character selection and a t‐test were used to identify cluster‐defining characters. Results Spectral clustering revealed five, nine, and 12 clusters of taxa in the species complexes examined here. Character selection identified at least four characters that defined these clusters. Discussion Together with our proposed character analysis methods, spectral clustering enabled the unsupervised discovery of species boundaries along with an explanation of their biological significance. Our results suggest that spectral clustering combined with a character selection analysis can enhance morphometric analyses and is superior to current clustering methods for species delimitation.
first_indexed 2024-12-11T13:52:02Z
format Article
id doaj.art-fb06137762ab4bd1bfde4c5a81431c6f
institution Directory Open Access Journal
issn 2168-0450
language English
last_indexed 2024-12-11T13:52:02Z
publishDate 2020-07-01
publisher Wiley
record_format Article
series Applications in Plant Sciences
spelling doaj.art-fb06137762ab4bd1bfde4c5a81431c6f2022-12-22T01:04:15ZengWileyApplications in Plant Sciences2168-04502020-07-0187n/an/a10.1002/aps3.11377Species complex delimitations in the genus Hedychium: A machine learning approach for cluster discoveryPreeti Saryan0Shubham Gupta1Vinita Gowda2Department of Biological Sciences Indian Institute of Science Education and Research Bhopal Bhopal Bypass Road Bhopal Madhya Pradesh462066IndiaDepartment of Computer Science and Automation Indian Institute of Science Bengaluru Karnataka560012IndiaDepartment of Biological Sciences Indian Institute of Science Education and Research Bhopal Bhopal Bypass Road Bhopal Madhya Pradesh462066IndiaPremise Statistical methods used by most morphologists to validate species boundaries (such as principal component analysis [PCA] and non‐metric multidimensional scaling [nMDS]) are limiting because these methods are mostly used as visualization methods, and because the groups are identified by taxonomists (i.e., supervised), adding human bias. Here, we use a spectral clustering algorithm for the unsupervised discovery of species boundaries followed by the analysis of the cluster‐defining characters. Methods We used spectral clustering, nMDS, and PCA on 16 morphological characters within the genus Hedychium to group 93 individuals from 10 taxa. A radial basis function kernel was used for the spectral clustering with user‐specified tuning values (gamma). The goodness of the discovered clusters using each gamma value was quantified using eigengap, a normalized mutual information score, and the Rand index. Finally, mutual information–based character selection and a t‐test were used to identify cluster‐defining characters. Results Spectral clustering revealed five, nine, and 12 clusters of taxa in the species complexes examined here. Character selection identified at least four characters that defined these clusters. Discussion Together with our proposed character analysis methods, spectral clustering enabled the unsupervised discovery of species boundaries along with an explanation of their biological significance. Our results suggest that spectral clustering combined with a character selection analysis can enhance morphometric analyses and is superior to current clustering methods for species delimitation.https://doi.org/10.1002/aps3.11377cluster characterizationHedychiummorphological analysisspectral clustering
spellingShingle Preeti Saryan
Shubham Gupta
Vinita Gowda
Species complex delimitations in the genus Hedychium: A machine learning approach for cluster discovery
Applications in Plant Sciences
cluster characterization
Hedychium
morphological analysis
spectral clustering
title Species complex delimitations in the genus Hedychium: A machine learning approach for cluster discovery
title_full Species complex delimitations in the genus Hedychium: A machine learning approach for cluster discovery
title_fullStr Species complex delimitations in the genus Hedychium: A machine learning approach for cluster discovery
title_full_unstemmed Species complex delimitations in the genus Hedychium: A machine learning approach for cluster discovery
title_short Species complex delimitations in the genus Hedychium: A machine learning approach for cluster discovery
title_sort species complex delimitations in the genus hedychium a machine learning approach for cluster discovery
topic cluster characterization
Hedychium
morphological analysis
spectral clustering
url https://doi.org/10.1002/aps3.11377
work_keys_str_mv AT preetisaryan speciescomplexdelimitationsinthegenushedychiumamachinelearningapproachforclusterdiscovery
AT shubhamgupta speciescomplexdelimitationsinthegenushedychiumamachinelearningapproachforclusterdiscovery
AT vinitagowda speciescomplexdelimitationsinthegenushedychiumamachinelearningapproachforclusterdiscovery