A High-Performance Database Management System for Managing and Analyzing Large-Scale SNP Data in Plant Genotyping and Breeding Applications

A DNA fingerprint database is an efficient, stable, and automated tool for plant molecular research that can provide comprehensive technical support for multiple fields of study, such as pan-genome analysis and crop breeding. However, constructing a DNA fingerprint database for plants requires signi...

Full description

Bibliographic Details
Main Authors: Yikun Zhao, Bin Jiang, Yongxue Huo, Hongmei Yi, Hongli Tian, Haotian Wu, Rui Wang, Jiuran Zhao, Fengge Wang
Format: Article
Language:English
Published: MDPI AG 2021-10-01
Series:Agriculture
Subjects:
Online Access:https://www.mdpi.com/2077-0472/11/11/1027
_version_ 1797511621240684544
author Yikun Zhao
Bin Jiang
Yongxue Huo
Hongmei Yi
Hongli Tian
Haotian Wu
Rui Wang
Jiuran Zhao
Fengge Wang
author_facet Yikun Zhao
Bin Jiang
Yongxue Huo
Hongmei Yi
Hongli Tian
Haotian Wu
Rui Wang
Jiuran Zhao
Fengge Wang
author_sort Yikun Zhao
collection DOAJ
description A DNA fingerprint database is an efficient, stable, and automated tool for plant molecular research that can provide comprehensive technical support for multiple fields of study, such as pan-genome analysis and crop breeding. However, constructing a DNA fingerprint database for plants requires significant resources for data output, storage, analysis, and quality control. Large amounts of heterogeneous data must be processed efficiently and accurately. Thus, we developed plant SNP database management system (PSNPdms) using an open-source web server and free software that is compatible with single nucleotide polymorphism (SNP), insertion–deletion (InDel) markers, Kompetitive Allele Specific PCR (KASP), SNP array platforms, and 23 species. It fully integrates with the KASP platform and allows for graphical presentation and modification of KASP data. The system has a simple, efficient, and versatile laboratory personnel management structure that adapts to complex and changing experimental needs with a simple workflow process. PSNPdms internally provides effective support for data quality control through multiple dimensions, such as the standardized experimental design, standard reference samples, fingerprint statistical selection algorithm, and raw data correlation queries. In addition, we developed a fingerprint-merging algorithm to solve the problem of merging fingerprints of mixed samples and single samples in plant detection, providing unique standard fingerprints of each plant species for construction of a standard DNA fingerprint database. Different laboratories can use the system to generate fingerprint packages for data interaction and sharing. In addition, we integrated genetic analysis into the system to enable drawing and downloading of dendrograms. PSNPdms has been widely used by 23 institutions and has proven to be a stable and effective system for sharing data and performing genetic analysis. Interested researchers are required to adapt and further develop the system.
first_indexed 2024-03-10T05:47:49Z
format Article
id doaj.art-206423fb4b8a40dbab89be79188b144b
institution Directory Open Access Journal
issn 2077-0472
language English
last_indexed 2024-03-10T05:47:49Z
publishDate 2021-10-01
publisher MDPI AG
record_format Article
series Agriculture
spelling doaj.art-206423fb4b8a40dbab89be79188b144b2023-11-22T21:58:08ZengMDPI AGAgriculture2077-04722021-10-011111102710.3390/agriculture11111027A High-Performance Database Management System for Managing and Analyzing Large-Scale SNP Data in Plant Genotyping and Breeding ApplicationsYikun Zhao0Bin Jiang1Yongxue Huo2Hongmei Yi3Hongli Tian4Haotian Wu5Rui Wang6Jiuran Zhao7Fengge Wang8Maize Research Center, Beijing Academy of Agricultural and Forest Sciences (BAAFS)/Beijing Key Laboratory of Maize DNA Fingerprinting and Molecular Breeding, Beijing 100097, ChinaMaize Research Center, Beijing Academy of Agricultural and Forest Sciences (BAAFS)/Beijing Key Laboratory of Maize DNA Fingerprinting and Molecular Breeding, Beijing 100097, ChinaMaize Research Center, Beijing Academy of Agricultural and Forest Sciences (BAAFS)/Beijing Key Laboratory of Maize DNA Fingerprinting and Molecular Breeding, Beijing 100097, ChinaMaize Research Center, Beijing Academy of Agricultural and Forest Sciences (BAAFS)/Beijing Key Laboratory of Maize DNA Fingerprinting and Molecular Breeding, Beijing 100097, ChinaMaize Research Center, Beijing Academy of Agricultural and Forest Sciences (BAAFS)/Beijing Key Laboratory of Maize DNA Fingerprinting and Molecular Breeding, Beijing 100097, ChinaMaize Research Center, Beijing Academy of Agricultural and Forest Sciences (BAAFS)/Beijing Key Laboratory of Maize DNA Fingerprinting and Molecular Breeding, Beijing 100097, ChinaMaize Research Center, Beijing Academy of Agricultural and Forest Sciences (BAAFS)/Beijing Key Laboratory of Maize DNA Fingerprinting and Molecular Breeding, Beijing 100097, ChinaMaize Research Center, Beijing Academy of Agricultural and Forest Sciences (BAAFS)/Beijing Key Laboratory of Maize DNA Fingerprinting and Molecular Breeding, Beijing 100097, ChinaMaize Research Center, Beijing Academy of Agricultural and Forest Sciences (BAAFS)/Beijing Key Laboratory of Maize DNA Fingerprinting and Molecular Breeding, Beijing 100097, ChinaA DNA fingerprint database is an efficient, stable, and automated tool for plant molecular research that can provide comprehensive technical support for multiple fields of study, such as pan-genome analysis and crop breeding. However, constructing a DNA fingerprint database for plants requires significant resources for data output, storage, analysis, and quality control. Large amounts of heterogeneous data must be processed efficiently and accurately. Thus, we developed plant SNP database management system (PSNPdms) using an open-source web server and free software that is compatible with single nucleotide polymorphism (SNP), insertion–deletion (InDel) markers, Kompetitive Allele Specific PCR (KASP), SNP array platforms, and 23 species. It fully integrates with the KASP platform and allows for graphical presentation and modification of KASP data. The system has a simple, efficient, and versatile laboratory personnel management structure that adapts to complex and changing experimental needs with a simple workflow process. PSNPdms internally provides effective support for data quality control through multiple dimensions, such as the standardized experimental design, standard reference samples, fingerprint statistical selection algorithm, and raw data correlation queries. In addition, we developed a fingerprint-merging algorithm to solve the problem of merging fingerprints of mixed samples and single samples in plant detection, providing unique standard fingerprints of each plant species for construction of a standard DNA fingerprint database. Different laboratories can use the system to generate fingerprint packages for data interaction and sharing. In addition, we integrated genetic analysis into the system to enable drawing and downloading of dendrograms. PSNPdms has been widely used by 23 institutions and has proven to be a stable and effective system for sharing data and performing genetic analysis. Interested researchers are required to adapt and further develop the system.https://www.mdpi.com/2077-0472/11/11/1027SNPSNP arrayKASPdatabaseDNA fingerprintalgorithms
spellingShingle Yikun Zhao
Bin Jiang
Yongxue Huo
Hongmei Yi
Hongli Tian
Haotian Wu
Rui Wang
Jiuran Zhao
Fengge Wang
A High-Performance Database Management System for Managing and Analyzing Large-Scale SNP Data in Plant Genotyping and Breeding Applications
Agriculture
SNP
SNP array
KASP
database
DNA fingerprint
algorithms
title A High-Performance Database Management System for Managing and Analyzing Large-Scale SNP Data in Plant Genotyping and Breeding Applications
title_full A High-Performance Database Management System for Managing and Analyzing Large-Scale SNP Data in Plant Genotyping and Breeding Applications
title_fullStr A High-Performance Database Management System for Managing and Analyzing Large-Scale SNP Data in Plant Genotyping and Breeding Applications
title_full_unstemmed A High-Performance Database Management System for Managing and Analyzing Large-Scale SNP Data in Plant Genotyping and Breeding Applications
title_short A High-Performance Database Management System for Managing and Analyzing Large-Scale SNP Data in Plant Genotyping and Breeding Applications
title_sort high performance database management system for managing and analyzing large scale snp data in plant genotyping and breeding applications
topic SNP
SNP array
KASP
database
DNA fingerprint
algorithms
url https://www.mdpi.com/2077-0472/11/11/1027
work_keys_str_mv AT yikunzhao ahighperformancedatabasemanagementsystemformanagingandanalyzinglargescalesnpdatainplantgenotypingandbreedingapplications
AT binjiang ahighperformancedatabasemanagementsystemformanagingandanalyzinglargescalesnpdatainplantgenotypingandbreedingapplications
AT yongxuehuo ahighperformancedatabasemanagementsystemformanagingandanalyzinglargescalesnpdatainplantgenotypingandbreedingapplications
AT hongmeiyi ahighperformancedatabasemanagementsystemformanagingandanalyzinglargescalesnpdatainplantgenotypingandbreedingapplications
AT honglitian ahighperformancedatabasemanagementsystemformanagingandanalyzinglargescalesnpdatainplantgenotypingandbreedingapplications
AT haotianwu ahighperformancedatabasemanagementsystemformanagingandanalyzinglargescalesnpdatainplantgenotypingandbreedingapplications
AT ruiwang ahighperformancedatabasemanagementsystemformanagingandanalyzinglargescalesnpdatainplantgenotypingandbreedingapplications
AT jiuranzhao ahighperformancedatabasemanagementsystemformanagingandanalyzinglargescalesnpdatainplantgenotypingandbreedingapplications
AT fenggewang ahighperformancedatabasemanagementsystemformanagingandanalyzinglargescalesnpdatainplantgenotypingandbreedingapplications
AT yikunzhao highperformancedatabasemanagementsystemformanagingandanalyzinglargescalesnpdatainplantgenotypingandbreedingapplications
AT binjiang highperformancedatabasemanagementsystemformanagingandanalyzinglargescalesnpdatainplantgenotypingandbreedingapplications
AT yongxuehuo highperformancedatabasemanagementsystemformanagingandanalyzinglargescalesnpdatainplantgenotypingandbreedingapplications
AT hongmeiyi highperformancedatabasemanagementsystemformanagingandanalyzinglargescalesnpdatainplantgenotypingandbreedingapplications
AT honglitian highperformancedatabasemanagementsystemformanagingandanalyzinglargescalesnpdatainplantgenotypingandbreedingapplications
AT haotianwu highperformancedatabasemanagementsystemformanagingandanalyzinglargescalesnpdatainplantgenotypingandbreedingapplications
AT ruiwang highperformancedatabasemanagementsystemformanagingandanalyzinglargescalesnpdatainplantgenotypingandbreedingapplications
AT jiuranzhao highperformancedatabasemanagementsystemformanagingandanalyzinglargescalesnpdatainplantgenotypingandbreedingapplications
AT fenggewang highperformancedatabasemanagementsystemformanagingandanalyzinglargescalesnpdatainplantgenotypingandbreedingapplications