On the Unfounded Enthusiasm for Soft Selective Sweeps III: The Supervised Machine Learning Algorithm That Isn’t

In the last 15 years or so, soft selective sweep mechanisms have been catapulted from a curiosity of little evolutionary importance to a ubiquitous mechanism claimed to explain most adaptive evolution and, in some cases, most evolution. This transformation was aided by a series of articles by Daniel...

Full description

Bibliographic Details
Main Authors: Eran Elhaik, Dan Graur
Format: Article
Language:English
Published: MDPI AG 2021-04-01
Series:Genes
Subjects:
Online Access:https://www.mdpi.com/2073-4425/12/4/527
_version_ 1797538809357795328
author Eran Elhaik
Dan Graur
author_facet Eran Elhaik
Dan Graur
author_sort Eran Elhaik
collection DOAJ
description In the last 15 years or so, soft selective sweep mechanisms have been catapulted from a curiosity of little evolutionary importance to a ubiquitous mechanism claimed to explain most adaptive evolution and, in some cases, most evolution. This transformation was aided by a series of articles by Daniel Schrider and Andrew Kern. Within this series, a paper entitled “Soft sweeps are the dominant mode of adaptation in the human genome” (Schrider and Kern, <i>Mol. Biol. Evolut</i>. <b>2017</b>, <i>34</i>(8), 1863–1877) attracted a great deal of attention, in particular in conjunction with another paper (Kern and Hahn, <i>Mol. Biol. Evolut</i>. <b>2018</b>, <i>35</i>(6), 1366–1371), for purporting to discredit the Neutral Theory of Molecular Evolution (Kimura 1968). Here, we address an alleged novelty in Schrider and Kern’s paper, i.e., the claim that their study involved an artificial intelligence technique called supervised machine learning (SML). SML is predicated upon the existence of a training dataset in which the correspondence between the input and output is known empirically to be true. Curiously, Schrider and Kern did not possess a training dataset of genomic segments known <i>a priori</i> to have evolved either neutrally or through soft or hard selective sweeps. Thus, their claim of using SML is thoroughly and utterly misleading. In the absence of legitimate training datasets, Schrider and Kern used: (1) simulations that employ many manipulatable variables and (2) a system of data cherry-picking rivaling the worst excesses in the literature. These two factors, in addition to the lack of negative controls and the irreproducibility of their results due to incomplete methodological detail, lead us to conclude that all evolutionary inferences derived from so-called SML algorithms (e.g., S/HIC) should be taken with a huge shovel of salt.
first_indexed 2024-03-10T12:36:41Z
format Article
id doaj.art-5db7f03a24624622ba70754f919c482d
institution Directory Open Access Journal
issn 2073-4425
language English
last_indexed 2024-03-10T12:36:41Z
publishDate 2021-04-01
publisher MDPI AG
record_format Article
series Genes
spelling doaj.art-5db7f03a24624622ba70754f919c482d2023-11-21T14:14:22ZengMDPI AGGenes2073-44252021-04-0112452710.3390/genes12040527On the Unfounded Enthusiasm for Soft Selective Sweeps III: The Supervised Machine Learning Algorithm That Isn’tEran Elhaik0Dan Graur1Department of Biology, Lund University, Sölvegatan 35, 22362 Lund, SwedenDepartment of Biology & Biochemistry, University of Houston, Science & Research Building 2, Suite #342, 3455 Cullen Bldv., Houston, TX 77204-5001, USAIn the last 15 years or so, soft selective sweep mechanisms have been catapulted from a curiosity of little evolutionary importance to a ubiquitous mechanism claimed to explain most adaptive evolution and, in some cases, most evolution. This transformation was aided by a series of articles by Daniel Schrider and Andrew Kern. Within this series, a paper entitled “Soft sweeps are the dominant mode of adaptation in the human genome” (Schrider and Kern, <i>Mol. Biol. Evolut</i>. <b>2017</b>, <i>34</i>(8), 1863–1877) attracted a great deal of attention, in particular in conjunction with another paper (Kern and Hahn, <i>Mol. Biol. Evolut</i>. <b>2018</b>, <i>35</i>(6), 1366–1371), for purporting to discredit the Neutral Theory of Molecular Evolution (Kimura 1968). Here, we address an alleged novelty in Schrider and Kern’s paper, i.e., the claim that their study involved an artificial intelligence technique called supervised machine learning (SML). SML is predicated upon the existence of a training dataset in which the correspondence between the input and output is known empirically to be true. Curiously, Schrider and Kern did not possess a training dataset of genomic segments known <i>a priori</i> to have evolved either neutrally or through soft or hard selective sweeps. Thus, their claim of using SML is thoroughly and utterly misleading. In the absence of legitimate training datasets, Schrider and Kern used: (1) simulations that employ many manipulatable variables and (2) a system of data cherry-picking rivaling the worst excesses in the literature. These two factors, in addition to the lack of negative controls and the irreproducibility of their results due to incomplete methodological detail, lead us to conclude that all evolutionary inferences derived from so-called SML algorithms (e.g., S/HIC) should be taken with a huge shovel of salt.https://www.mdpi.com/2073-4425/12/4/527artificial intelligence (AI)supervised machine learning (SML)evolutionary biologymolecular and genome evolutionselective sweepspopulation size
spellingShingle Eran Elhaik
Dan Graur
On the Unfounded Enthusiasm for Soft Selective Sweeps III: The Supervised Machine Learning Algorithm That Isn’t
Genes
artificial intelligence (AI)
supervised machine learning (SML)
evolutionary biology
molecular and genome evolution
selective sweeps
population size
title On the Unfounded Enthusiasm for Soft Selective Sweeps III: The Supervised Machine Learning Algorithm That Isn’t
title_full On the Unfounded Enthusiasm for Soft Selective Sweeps III: The Supervised Machine Learning Algorithm That Isn’t
title_fullStr On the Unfounded Enthusiasm for Soft Selective Sweeps III: The Supervised Machine Learning Algorithm That Isn’t
title_full_unstemmed On the Unfounded Enthusiasm for Soft Selective Sweeps III: The Supervised Machine Learning Algorithm That Isn’t
title_short On the Unfounded Enthusiasm for Soft Selective Sweeps III: The Supervised Machine Learning Algorithm That Isn’t
title_sort on the unfounded enthusiasm for soft selective sweeps iii the supervised machine learning algorithm that isn t
topic artificial intelligence (AI)
supervised machine learning (SML)
evolutionary biology
molecular and genome evolution
selective sweeps
population size
url https://www.mdpi.com/2073-4425/12/4/527
work_keys_str_mv AT eranelhaik ontheunfoundedenthusiasmforsoftselectivesweepsiiithesupervisedmachinelearningalgorithmthatisnt
AT dangraur ontheunfoundedenthusiasmforsoftselectivesweepsiiithesupervisedmachinelearningalgorithmthatisnt