Approximation to the mean curve in the LCS problem

The problem of sequence comparison via optimal alignments occurs naturally in many areas of applications. The simplest such technique is based on evaluating a score given by the length of a longest common subsequence divided by the average length of the original sequences. In this paper we investiga...

Szczegółowa specyfikacja

Opis bibliograficzny
Główni autorzy: Durringer, C, Hauser, R, Matzinger, H
Format: Report
Wydane: Unspecified 2006
Opis
Streszczenie:The problem of sequence comparison via optimal alignments occurs naturally in many areas of applications. The simplest such technique is based on evaluating a score given by the length of a longest common subsequence divided by the average length of the original sequences. In this paper we investigate the expected value of this score when the input sequences are random and their length tends to infinity. The corresponding limit exists but is not known precisely. We derive a large-deviation, convex analysis and Montecarlo based method to compute a consistent sequence of upper bounds on the unknown limit. Raphael Hauser was supported through grant NAL/00720/G from the Nuffield Foundation and through grant GR/M30975 from the Engineering and Physical Sciences Research Council of the UK. All three authors also acknowledge the generous support through the Sonderforschungsbereich Grant SFB 701 A3 from the German Research Foundation.