A study on biomolecular sequence alignment using machine learning techniques

Pairwise sequence alignment is used to compare the sequence of nucleotides or protein with the aims of inferring structural, functional and evolutionary relationships. The main reason of sequence alignment is to find an optimal alignment. The most used method in research and have been certify to pro...

Full description

Bibliographic Details
Main Authors: Othman, Muhamad Razib, Salim, Naomie, Abdul Jalil, Rozita, Deris, Safaai, Mat Yatim, Safie, Md. Illias, Rosli
Format: Monograph
Language:English
Published: Faculty of Cmputer Sience and Information System 2004
Subjects:
Online Access:http://eprints.utm.my/4400/3/75079.pdf
Description
Summary:Pairwise sequence alignment is used to compare the sequence of nucleotides or protein with the aims of inferring structural, functional and evolutionary relationships. The main reason of sequence alignment is to find an optimal alignment. The most used method in research and have been certify to produce an optimal sequence alignment are dynamic programming methods Smith-Waterman for local alignment. Based from the previous research, scoring schemes in dynamic programming can be improved by using substitutions matrices and introduction of gap in alignment with gap penalty function. The reasons are to optimize result of alignments with perpetuate biology concept like evolution changes in molecular structures caused by mutation. Today, no general theory guides the selection of substitution matrices and gap penalties for local sequence alignment. Because of that, this project will implement dynamic programming method Smith-Waterman with different parameter of substitution matrices and gap penalty function in scoring schemes. Substitution matrices that will be used are BLOSUM45, BLOSUM62 and BLOSUM80. While linear gap penalty with range values parameter from (–d=1 to –d=10) or affine gap penalty with range values parameter for opening gap from (–d=1 to –d=12) and extension gap from (–e=1 to–e=5). Intensive comparison will be done to test the efficiency and determine the effective substitution matrices and gap penalty parameter for sequence alignment. 27 sets of data protein sequences categorized by length and percentage similarity identity will be used for sequence alignment. The results will give the guideline for the selection of effective substitution matrices and gap penalty parameter for sequence alignment.