HaploCatcher: An R package for prediction of haplotypes
Abstract Wheat (Triticum aestivum L.) is crucial to global food security but is often threatened by diseases, pests, and environmental stresses. Wheat‐stem sawfly (Cephus cinctus Norton) poses a major threat to food security in the United States, and solid‐stem varieties, which carry the stem‐solidn...
Main Authors: | , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2024-03-01
|
Series: | The Plant Genome |
Online Access: | https://doi.org/10.1002/tpg2.20412 |
_version_ | 1797253812771094528 |
---|---|
author | Zachary James Winn Emily Hudson‐Arns Mikayla Hammers Noah DeWitt Jeanette Lyerly Guihua Bai Paul St.Amand Punya Nachappa Scott Haley Richard Esten Mason |
author_facet | Zachary James Winn Emily Hudson‐Arns Mikayla Hammers Noah DeWitt Jeanette Lyerly Guihua Bai Paul St.Amand Punya Nachappa Scott Haley Richard Esten Mason |
author_sort | Zachary James Winn |
collection | DOAJ |
description | Abstract Wheat (Triticum aestivum L.) is crucial to global food security but is often threatened by diseases, pests, and environmental stresses. Wheat‐stem sawfly (Cephus cinctus Norton) poses a major threat to food security in the United States, and solid‐stem varieties, which carry the stem‐solidness locus (Sst1), are the main source of genetic resistance against sawfly. Marker‐assisted selection uses molecular markers to identify lines possessing beneficial haplotypes, like that of the Sst1 locus. In this study, an R package titled “HaploCatcher” was developed to predict specific haplotypes of interest in genome‐wide genotyped lines. A training population of 1056 lines genotyped for the Sst1 locus, known to confer stem solidness, and genome‐wide markers was curated to make predictions of the Sst1 haplotypes for 292 lines from the Colorado State University wheat breeding program. Predicted Sst1 haplotypes were compared to marker‐derived haplotypes. Our results indicated that the training set was substantially predictive, with kappa scores of 0.83 for k‐nearest neighbors and 0.88 for random forest models. Forward validation on newly developed breeding lines demonstrated that a random forest model, trained on the total available training data, had comparable accuracy between forward and cross‐validation. Estimated group means of lines classified by haplotypes from PCR‐derived markers and predictive modeling did not significantly differ. The HaploCatcher package is freely available and may be utilized by breeding programs, using their own training populations, to predict haplotypes for whole‐genome sequenced early generation material. |
first_indexed | 2024-04-24T21:40:01Z |
format | Article |
id | doaj.art-f7ec1ef267da4ec1a5c5209aa40055a9 |
institution | Directory Open Access Journal |
issn | 1940-3372 |
language | English |
last_indexed | 2024-04-24T21:40:01Z |
publishDate | 2024-03-01 |
publisher | Wiley |
record_format | Article |
series | The Plant Genome |
spelling | doaj.art-f7ec1ef267da4ec1a5c5209aa40055a92024-03-21T11:34:18ZengWileyThe Plant Genome1940-33722024-03-01171n/an/a10.1002/tpg2.20412HaploCatcher: An R package for prediction of haplotypesZachary James Winn0Emily Hudson‐Arns1Mikayla Hammers2Noah DeWitt3Jeanette Lyerly4Guihua Bai5Paul St.Amand6Punya Nachappa7Scott Haley8Richard Esten Mason9Department of Soil and Crop Sciences Colorado State University Fort Collins Colorado USADepartment of Soil and Crop Sciences Colorado State University Fort Collins Colorado USADepartment of Soil and Crop Sciences Colorado State University Fort Collins Colorado USASchool of Plant, Environmental, and Soil Sciences Louisiana State University Baton Rouge Louisiana USADepartment of Crop and Soil Sciences North Carolina State University Raleigh North Carolina USAUSDA Agricultural Research Service Hard Winter Wheat Genetics Research Unit Manhattan Kansas USAUSDA Agricultural Research Service Hard Winter Wheat Genetics Research Unit Manhattan Kansas USADepartment of Agricultural Biology Colorado State University Fort Collins Colorado USADepartment of Soil and Crop Sciences Colorado State University Fort Collins Colorado USADepartment of Soil and Crop Sciences Colorado State University Fort Collins Colorado USAAbstract Wheat (Triticum aestivum L.) is crucial to global food security but is often threatened by diseases, pests, and environmental stresses. Wheat‐stem sawfly (Cephus cinctus Norton) poses a major threat to food security in the United States, and solid‐stem varieties, which carry the stem‐solidness locus (Sst1), are the main source of genetic resistance against sawfly. Marker‐assisted selection uses molecular markers to identify lines possessing beneficial haplotypes, like that of the Sst1 locus. In this study, an R package titled “HaploCatcher” was developed to predict specific haplotypes of interest in genome‐wide genotyped lines. A training population of 1056 lines genotyped for the Sst1 locus, known to confer stem solidness, and genome‐wide markers was curated to make predictions of the Sst1 haplotypes for 292 lines from the Colorado State University wheat breeding program. Predicted Sst1 haplotypes were compared to marker‐derived haplotypes. Our results indicated that the training set was substantially predictive, with kappa scores of 0.83 for k‐nearest neighbors and 0.88 for random forest models. Forward validation on newly developed breeding lines demonstrated that a random forest model, trained on the total available training data, had comparable accuracy between forward and cross‐validation. Estimated group means of lines classified by haplotypes from PCR‐derived markers and predictive modeling did not significantly differ. The HaploCatcher package is freely available and may be utilized by breeding programs, using their own training populations, to predict haplotypes for whole‐genome sequenced early generation material.https://doi.org/10.1002/tpg2.20412 |
spellingShingle | Zachary James Winn Emily Hudson‐Arns Mikayla Hammers Noah DeWitt Jeanette Lyerly Guihua Bai Paul St.Amand Punya Nachappa Scott Haley Richard Esten Mason HaploCatcher: An R package for prediction of haplotypes The Plant Genome |
title | HaploCatcher: An R package for prediction of haplotypes |
title_full | HaploCatcher: An R package for prediction of haplotypes |
title_fullStr | HaploCatcher: An R package for prediction of haplotypes |
title_full_unstemmed | HaploCatcher: An R package for prediction of haplotypes |
title_short | HaploCatcher: An R package for prediction of haplotypes |
title_sort | haplocatcher an r package for prediction of haplotypes |
url | https://doi.org/10.1002/tpg2.20412 |
work_keys_str_mv | AT zacharyjameswinn haplocatcheranrpackageforpredictionofhaplotypes AT emilyhudsonarns haplocatcheranrpackageforpredictionofhaplotypes AT mikaylahammers haplocatcheranrpackageforpredictionofhaplotypes AT noahdewitt haplocatcheranrpackageforpredictionofhaplotypes AT jeanettelyerly haplocatcheranrpackageforpredictionofhaplotypes AT guihuabai haplocatcheranrpackageforpredictionofhaplotypes AT paulstamand haplocatcheranrpackageforpredictionofhaplotypes AT punyanachappa haplocatcheranrpackageforpredictionofhaplotypes AT scotthaley haplocatcheranrpackageforpredictionofhaplotypes AT richardestenmason haplocatcheranrpackageforpredictionofhaplotypes |