The development of machine learning based software for predicting protein-protein interactions and protein function from protein primary structure

Understanding proteins functions is a major goal in the post-genomic era. Proteins usually work in context of other proteins and rarely function alone. Therefore, it is highly relevant to study the interaction partners of a protein in order to understand its function. For this reason, the main objec...

Full description

Bibliographic Details
Main Authors: Othman, Muhamad Razib, Deris, Safaai, Alashwal, Hany Taher Ahmed, Md. Illias, Rosli, Mat Yatim, Safie
Format: Monograph
Language:English
Published: Faculty of Computer Science and Information System 2007
Subjects:
Online Access:http://eprints.utm.my/4140/1/74288.pdf
Description
Summary:Understanding proteins functions is a major goal in the post-genomic era. Proteins usually work in context of other proteins and rarely function alone. Therefore, it is highly relevant to study the interaction partners of a protein in order to understand its function. For this reason, the main objective of this thesis is to predict protein-protein interactions based only on protein primary structure. Using the Support Vector Machines (SVM), different protein features have been studied and examined. These features include protein domain structures, hydrophobicity and amino acid compositions. The results imply that the protein domain structure is the most informative feature for predicting protein-protein interactions. It also requires much lower running time compared to the other features. However, using normal binary SVM requires positive and negative data samples. Although it is easy to get a dataset of interacting proteins as positive examples, there are no experimentally confirmed non-interacting proteins to be considered as negative examples. Previous researches cope with this problem by artificially generate random set of proteins pairs that are not listed in the Database of Interacting Proteins (DIP) as negative examples. This approach can be used for comparing features because the error will be uniform. In this research, we consider this problem as a one-class classification problem and solve it using the One-Class SVM. Using only positive examples (interacting protein pairs) in training phase, the one-class SVM achieves accuracy of 80%. These results imply that protein-protein interaction can be predicted using one-class classifier with comparable accuracy to the binary classifiers that use artificially constructed negative examples. Finally, a Bayesian Kernel for SVM was implemented to incorporate the probabilistic information about protein-protein interactions that were compiled from different sources. The probabilistic output from the Bayesian Kernel can assist the biologist to conduct more research on the highly predicted interactions.