Gene prediction with conditional random fields

Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.

Bibliographic Details
Main Author: Doherty, Matthew K
Other Authors: James Galagan and David DeCaprio.
Format: Thesis
Language:eng
Published: Massachusetts Institute of Technology 2008
Subjects:
Online Access:http://hdl.handle.net/1721.1/41646
_version_ 1811085359590670336
author Doherty, Matthew K
author2 James Galagan and David DeCaprio.
author_facet James Galagan and David DeCaprio.
Doherty, Matthew K
author_sort Doherty, Matthew K
collection MIT
description Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.
first_indexed 2024-09-23T13:08:22Z
format Thesis
id mit-1721.1/41646
institution Massachusetts Institute of Technology
language eng
last_indexed 2024-09-23T13:08:22Z
publishDate 2008
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/416462019-04-12T16:07:56Z Gene prediction with conditional random fields Applications of conditional random fields in bioinformatics Doherty, Matthew K James Galagan and David DeCaprio. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007. Includes bibliographical references (p. 75-77). The accurate annotation of an organism's protein-coding genes is crucial for subsequent genomic analysis. The rapid advance of sequencing technology has created a gap between genomic sequences and their annotations. Automated annotation methods are needed to bridge this gap, but existing solutions based on hidden Markov models cannot easily incorporate diverse evidence to make more accurate predictions. In this thesis, I built upon the semi-Markov conditional random field framework created by DeCaprio et al. to predict protein-coding genes in DNA sequences. Several novel extensions were designed and implemented, including a 29-state model with both semi-Markov and Markov states, an N-best Viterbi inference algorithm, several classes of discriminative feature functions that incorporate diverse evidence, and parallelization of the training and inference algorithms. The extensions were tested on the genomes of Phytophthora infestans, Culex pipiens, and Homo sapiens. The gene predictions were analyzed and the benefits of discriminative methods were explored. by Matthew K. Doherty. M.Eng. 2008-05-19T16:04:55Z 2008-05-19T16:04:55Z 2007 2007 Thesis http://hdl.handle.net/1721.1/41646 219708684 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 77 p. application/pdf Massachusetts Institute of Technology
spellingShingle Electrical Engineering and Computer Science.
Doherty, Matthew K
Gene prediction with conditional random fields
title Gene prediction with conditional random fields
title_full Gene prediction with conditional random fields
title_fullStr Gene prediction with conditional random fields
title_full_unstemmed Gene prediction with conditional random fields
title_short Gene prediction with conditional random fields
title_sort gene prediction with conditional random fields
topic Electrical Engineering and Computer Science.
url http://hdl.handle.net/1721.1/41646
work_keys_str_mv AT dohertymatthewk genepredictionwithconditionalrandomfields
AT dohertymatthewk applicationsofconditionalrandomfieldsinbioinformatics