Species complex delimitations in the genus Hedychium: A machine learning approach for cluster discovery
Premise Statistical methods used by most morphologists to validate species boundaries (such as principal component analysis [PCA] and non‐metric multidimensional scaling [nMDS]) are limiting because these methods are mostly used as visualization methods, and because the groups are identified by taxo...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2020-07-01
|
Series: | Applications in Plant Sciences |
Subjects: | |
Online Access: | https://doi.org/10.1002/aps3.11377 |
_version_ | 1818152268912918528 |
---|---|
author | Preeti Saryan Shubham Gupta Vinita Gowda |
author_facet | Preeti Saryan Shubham Gupta Vinita Gowda |
author_sort | Preeti Saryan |
collection | DOAJ |
description | Premise Statistical methods used by most morphologists to validate species boundaries (such as principal component analysis [PCA] and non‐metric multidimensional scaling [nMDS]) are limiting because these methods are mostly used as visualization methods, and because the groups are identified by taxonomists (i.e., supervised), adding human bias. Here, we use a spectral clustering algorithm for the unsupervised discovery of species boundaries followed by the analysis of the cluster‐defining characters. Methods We used spectral clustering, nMDS, and PCA on 16 morphological characters within the genus Hedychium to group 93 individuals from 10 taxa. A radial basis function kernel was used for the spectral clustering with user‐specified tuning values (gamma). The goodness of the discovered clusters using each gamma value was quantified using eigengap, a normalized mutual information score, and the Rand index. Finally, mutual information–based character selection and a t‐test were used to identify cluster‐defining characters. Results Spectral clustering revealed five, nine, and 12 clusters of taxa in the species complexes examined here. Character selection identified at least four characters that defined these clusters. Discussion Together with our proposed character analysis methods, spectral clustering enabled the unsupervised discovery of species boundaries along with an explanation of their biological significance. Our results suggest that spectral clustering combined with a character selection analysis can enhance morphometric analyses and is superior to current clustering methods for species delimitation. |
first_indexed | 2024-12-11T13:52:02Z |
format | Article |
id | doaj.art-fb06137762ab4bd1bfde4c5a81431c6f |
institution | Directory Open Access Journal |
issn | 2168-0450 |
language | English |
last_indexed | 2024-12-11T13:52:02Z |
publishDate | 2020-07-01 |
publisher | Wiley |
record_format | Article |
series | Applications in Plant Sciences |
spelling | doaj.art-fb06137762ab4bd1bfde4c5a81431c6f2022-12-22T01:04:15ZengWileyApplications in Plant Sciences2168-04502020-07-0187n/an/a10.1002/aps3.11377Species complex delimitations in the genus Hedychium: A machine learning approach for cluster discoveryPreeti Saryan0Shubham Gupta1Vinita Gowda2Department of Biological Sciences Indian Institute of Science Education and Research Bhopal Bhopal Bypass Road Bhopal Madhya Pradesh462066IndiaDepartment of Computer Science and Automation Indian Institute of Science Bengaluru Karnataka560012IndiaDepartment of Biological Sciences Indian Institute of Science Education and Research Bhopal Bhopal Bypass Road Bhopal Madhya Pradesh462066IndiaPremise Statistical methods used by most morphologists to validate species boundaries (such as principal component analysis [PCA] and non‐metric multidimensional scaling [nMDS]) are limiting because these methods are mostly used as visualization methods, and because the groups are identified by taxonomists (i.e., supervised), adding human bias. Here, we use a spectral clustering algorithm for the unsupervised discovery of species boundaries followed by the analysis of the cluster‐defining characters. Methods We used spectral clustering, nMDS, and PCA on 16 morphological characters within the genus Hedychium to group 93 individuals from 10 taxa. A radial basis function kernel was used for the spectral clustering with user‐specified tuning values (gamma). The goodness of the discovered clusters using each gamma value was quantified using eigengap, a normalized mutual information score, and the Rand index. Finally, mutual information–based character selection and a t‐test were used to identify cluster‐defining characters. Results Spectral clustering revealed five, nine, and 12 clusters of taxa in the species complexes examined here. Character selection identified at least four characters that defined these clusters. Discussion Together with our proposed character analysis methods, spectral clustering enabled the unsupervised discovery of species boundaries along with an explanation of their biological significance. Our results suggest that spectral clustering combined with a character selection analysis can enhance morphometric analyses and is superior to current clustering methods for species delimitation.https://doi.org/10.1002/aps3.11377cluster characterizationHedychiummorphological analysisspectral clustering |
spellingShingle | Preeti Saryan Shubham Gupta Vinita Gowda Species complex delimitations in the genus Hedychium: A machine learning approach for cluster discovery Applications in Plant Sciences cluster characterization Hedychium morphological analysis spectral clustering |
title | Species complex delimitations in the genus Hedychium: A machine learning approach for cluster discovery |
title_full | Species complex delimitations in the genus Hedychium: A machine learning approach for cluster discovery |
title_fullStr | Species complex delimitations in the genus Hedychium: A machine learning approach for cluster discovery |
title_full_unstemmed | Species complex delimitations in the genus Hedychium: A machine learning approach for cluster discovery |
title_short | Species complex delimitations in the genus Hedychium: A machine learning approach for cluster discovery |
title_sort | species complex delimitations in the genus hedychium a machine learning approach for cluster discovery |
topic | cluster characterization Hedychium morphological analysis spectral clustering |
url | https://doi.org/10.1002/aps3.11377 |
work_keys_str_mv | AT preetisaryan speciescomplexdelimitationsinthegenushedychiumamachinelearningapproachforclusterdiscovery AT shubhamgupta speciescomplexdelimitationsinthegenushedychiumamachinelearningapproachforclusterdiscovery AT vinitagowda speciescomplexdelimitationsinthegenushedychiumamachinelearningapproachforclusterdiscovery |