HaploCatcher: An R package for prediction of haplotypes

Abstract Wheat (Triticum aestivum L.) is crucial to global food security but is often threatened by diseases, pests, and environmental stresses. Wheat‐stem sawfly (Cephus cinctus Norton) poses a major threat to food security in the United States, and solid‐stem varieties, which carry the stem‐solidn...

Full description

Bibliographic Details
Main Authors: Zachary James Winn, Emily Hudson‐Arns, Mikayla Hammers, Noah DeWitt, Jeanette Lyerly, Guihua Bai, Paul St.Amand, Punya Nachappa, Scott Haley, Richard Esten Mason
Format: Article
Language:English
Published: Wiley 2024-03-01
Series:The Plant Genome
Online Access:https://doi.org/10.1002/tpg2.20412
_version_ 1797253812771094528
author Zachary James Winn
Emily Hudson‐Arns
Mikayla Hammers
Noah DeWitt
Jeanette Lyerly
Guihua Bai
Paul St.Amand
Punya Nachappa
Scott Haley
Richard Esten Mason
author_facet Zachary James Winn
Emily Hudson‐Arns
Mikayla Hammers
Noah DeWitt
Jeanette Lyerly
Guihua Bai
Paul St.Amand
Punya Nachappa
Scott Haley
Richard Esten Mason
author_sort Zachary James Winn
collection DOAJ
description Abstract Wheat (Triticum aestivum L.) is crucial to global food security but is often threatened by diseases, pests, and environmental stresses. Wheat‐stem sawfly (Cephus cinctus Norton) poses a major threat to food security in the United States, and solid‐stem varieties, which carry the stem‐solidness locus (Sst1), are the main source of genetic resistance against sawfly. Marker‐assisted selection uses molecular markers to identify lines possessing beneficial haplotypes, like that of the Sst1 locus. In this study, an R package titled “HaploCatcher” was developed to predict specific haplotypes of interest in genome‐wide genotyped lines. A training population of 1056 lines genotyped for the Sst1 locus, known to confer stem solidness, and genome‐wide markers was curated to make predictions of the Sst1 haplotypes for 292 lines from the Colorado State University wheat breeding program. Predicted Sst1 haplotypes were compared to marker‐derived haplotypes. Our results indicated that the training set was substantially predictive, with kappa scores of 0.83 for k‐nearest neighbors and 0.88 for random forest models. Forward validation on newly developed breeding lines demonstrated that a random forest model, trained on the total available training data, had comparable accuracy between forward and cross‐validation. Estimated group means of lines classified by haplotypes from PCR‐derived markers and predictive modeling did not significantly differ. The HaploCatcher package is freely available and may be utilized by breeding programs, using their own training populations, to predict haplotypes for whole‐genome sequenced early generation material.
first_indexed 2024-04-24T21:40:01Z
format Article
id doaj.art-f7ec1ef267da4ec1a5c5209aa40055a9
institution Directory Open Access Journal
issn 1940-3372
language English
last_indexed 2024-04-24T21:40:01Z
publishDate 2024-03-01
publisher Wiley
record_format Article
series The Plant Genome
spelling doaj.art-f7ec1ef267da4ec1a5c5209aa40055a92024-03-21T11:34:18ZengWileyThe Plant Genome1940-33722024-03-01171n/an/a10.1002/tpg2.20412HaploCatcher: An R package for prediction of haplotypesZachary James Winn0Emily Hudson‐Arns1Mikayla Hammers2Noah DeWitt3Jeanette Lyerly4Guihua Bai5Paul St.Amand6Punya Nachappa7Scott Haley8Richard Esten Mason9Department of Soil and Crop Sciences Colorado State University Fort Collins Colorado USADepartment of Soil and Crop Sciences Colorado State University Fort Collins Colorado USADepartment of Soil and Crop Sciences Colorado State University Fort Collins Colorado USASchool of Plant, Environmental, and Soil Sciences Louisiana State University Baton Rouge Louisiana USADepartment of Crop and Soil Sciences North Carolina State University Raleigh North Carolina USAUSDA Agricultural Research Service Hard Winter Wheat Genetics Research Unit Manhattan Kansas USAUSDA Agricultural Research Service Hard Winter Wheat Genetics Research Unit Manhattan Kansas USADepartment of Agricultural Biology Colorado State University Fort Collins Colorado USADepartment of Soil and Crop Sciences Colorado State University Fort Collins Colorado USADepartment of Soil and Crop Sciences Colorado State University Fort Collins Colorado USAAbstract Wheat (Triticum aestivum L.) is crucial to global food security but is often threatened by diseases, pests, and environmental stresses. Wheat‐stem sawfly (Cephus cinctus Norton) poses a major threat to food security in the United States, and solid‐stem varieties, which carry the stem‐solidness locus (Sst1), are the main source of genetic resistance against sawfly. Marker‐assisted selection uses molecular markers to identify lines possessing beneficial haplotypes, like that of the Sst1 locus. In this study, an R package titled “HaploCatcher” was developed to predict specific haplotypes of interest in genome‐wide genotyped lines. A training population of 1056 lines genotyped for the Sst1 locus, known to confer stem solidness, and genome‐wide markers was curated to make predictions of the Sst1 haplotypes for 292 lines from the Colorado State University wheat breeding program. Predicted Sst1 haplotypes were compared to marker‐derived haplotypes. Our results indicated that the training set was substantially predictive, with kappa scores of 0.83 for k‐nearest neighbors and 0.88 for random forest models. Forward validation on newly developed breeding lines demonstrated that a random forest model, trained on the total available training data, had comparable accuracy between forward and cross‐validation. Estimated group means of lines classified by haplotypes from PCR‐derived markers and predictive modeling did not significantly differ. The HaploCatcher package is freely available and may be utilized by breeding programs, using their own training populations, to predict haplotypes for whole‐genome sequenced early generation material.https://doi.org/10.1002/tpg2.20412
spellingShingle Zachary James Winn
Emily Hudson‐Arns
Mikayla Hammers
Noah DeWitt
Jeanette Lyerly
Guihua Bai
Paul St.Amand
Punya Nachappa
Scott Haley
Richard Esten Mason
HaploCatcher: An R package for prediction of haplotypes
The Plant Genome
title HaploCatcher: An R package for prediction of haplotypes
title_full HaploCatcher: An R package for prediction of haplotypes
title_fullStr HaploCatcher: An R package for prediction of haplotypes
title_full_unstemmed HaploCatcher: An R package for prediction of haplotypes
title_short HaploCatcher: An R package for prediction of haplotypes
title_sort haplocatcher an r package for prediction of haplotypes
url https://doi.org/10.1002/tpg2.20412
work_keys_str_mv AT zacharyjameswinn haplocatcheranrpackageforpredictionofhaplotypes
AT emilyhudsonarns haplocatcheranrpackageforpredictionofhaplotypes
AT mikaylahammers haplocatcheranrpackageforpredictionofhaplotypes
AT noahdewitt haplocatcheranrpackageforpredictionofhaplotypes
AT jeanettelyerly haplocatcheranrpackageforpredictionofhaplotypes
AT guihuabai haplocatcheranrpackageforpredictionofhaplotypes
AT paulstamand haplocatcheranrpackageforpredictionofhaplotypes
AT punyanachappa haplocatcheranrpackageforpredictionofhaplotypes
AT scotthaley haplocatcheranrpackageforpredictionofhaplotypes
AT richardestenmason haplocatcheranrpackageforpredictionofhaplotypes