Резюме: | <p>Whole genome sequencing (WGS) is increasingly used to diagnose rare genetic diseases (RGD). WGS data contains approximately 5,000,000 variants per patient, out of which often one variant causes disease. Variant prioritisation algorithms (VPA) help identify disease-causing variants and functional studies can provide evidence to confirm a variant’s effect on a gene or gene product. In this thesis, I test and develop VPA on simulated and real patient data and conduct a functional study to contribute to the validation of HDLBP as a novel disease gene for Fine-Lubinsky syndrome (FLS).</p>
<p>Two established analysis frameworks, Exomiser and VAAST+Phevor, contain VPA that only use genotypic data (GA) and VPA that use genotypic and phenotypic data (GPA). The GA and GPA performance is benchmarked on eleven real WGS patient cases for which disease-causing variants were previously identified in known and novel genes. The GPA performed better than the GA, ranking more benchmark variants first (Exomiser: 4 vs. 5; VAAST+Phevor: 1 vs. 8), whilst reducing the percentage of variants requiring further analysis (Exomiser: from 32.3% to 2.2%; VAAST+Phevor: from 25.4% to 11.2%).</p>
<p>Identifying disease-causing variants in genes that are novel for a specific phenotype is challenging for GPA. A VPA called GPET is developed for disease-causing variant identification in novel genes based on genotypic, phenotypic, and tissue-specific expression data. GPET is benchmarked against Exomiser on a simulated dataset of disease-causing and non-disease-causing variants with imperfect phenotypic annotations, achieving an Area Under the Curve of 0.95, compared to 0.91 for Exomiser. GPET performs worse than Exomiser on the eleven RGD cases for known disease genes and better for all novel disease genes without phenotypic annotations.</p>
<p>Patients for one of the RGD cases are affected by FLS and carry a candidate variant in HDLBP. The variant is suspected to cause FLS by decreasing the RNA-binding activity of vigilin, the protein encoded by HDLBP. A functional study based on oligo(dT) capture is conducted, confirming reduced RNA-binding activity of vigilin due to the candidate variant.</p>
<p>The investigations in this thesis demonstrate that phenotypic and tissue-specific expression data can improve VPA performance for novel disease genes and provide evidence to support HDLBP as a disease gene for FLS.</p>
|