A cluster-based hybrid replica control protocol for high availability in data grid

Data Grid provides a scalable infrastructure for managing and storing large amount of data files in Grid computing system. In Data Grid, data replication is a widely used technique for managing data, where exact copies of data or replicas are created and stored at many distributed sites. This tec...

ver descrição completa

Detalhes bibliográficos
Autor principal: Mabni, Zulaile
Formato: Tese
Idioma:English
Publicado em: 2019
Assuntos:
Acesso em linha:http://psasir.upm.edu.my/id/eprint/84549/1/FSKTM%20%28fsktm%29%202019%2045.pdf
_version_ 1825951946666672128
author Mabni, Zulaile
author_facet Mabni, Zulaile
author_sort Mabni, Zulaile
collection UPM
description Data Grid provides a scalable infrastructure for managing and storing large amount of data files in Grid computing system. In Data Grid, data replication is a widely used technique for managing data, where exact copies of data or replicas are created and stored at many distributed sites. This technique provides high data availability and increases the performance of the distributed systems. In recent years, the number of distributed nodes has become very large in Grid computing system. The growing number of nodes has raised few issues in data replication. The first issue is, nodes in the Grid systems are dynamic where they can join or leave the system at any time. Therefore, a replica control protocol must consider the dynamic aspects of the Data Grid. Next important issue is replica placement which determines the suitable nodes to place the replicas. Previously, replica placement has not been an issue since the research only focuses on small-scale systems. However, in a larger system such as Data Grid, the existing replica control protocols require bigger number of replicas to construct read and write quorums. As the number of replicas increases, the communication cost also increases and thus, degrades the performance of the protocols. Another issue is replica consistency that needs to be ensured when copying data in a large-scale system. In order to maintain replica consistency, if there is concurrent update to several replicas of the same file, then all other replicas must have the same updated contents. Thus, an efficient mechanism is needed to improve performance of the system while ensuring replica consistency in Data Grid. Therefore, in this thesis, we proposed a new replica control protocol named Cluster-Based Hybrid (CBH) protocol for large-scale system with the objectives to reduce the communication cost, increase data availability, and maintain replica consistency. CBH employs a hybrid replication strategy by combining the advantages of two common replica control protocols to improve the performance of the existing protocols. A clustering algorithm has been proposed to group the large nodes into clusters and organize these clusters into a tree structure. Another proposed algorithm is replica placement algorithm which selects and places only one replica in each cluster. The performance of CBH protocol is evaluated theoretically and using simulations. A discrete event simulator called GridSim and Java programming language is used to simulate the proposed protocol. The performance metrics which are communication cost and data availability of the protocol are evaluated and compared with two latest quorum-based protocols which are Dynamic Hybrid (DH) and Duplication on Grid (DDG) protocol. CBH shows that by grouping the nodes into clusters and having only one replica in each cluster, has minimized the number of replicas involved in constructing read and write quorums. This research has contributed a dynamic cluster-based hybrid replica control protocol which proposed a clustering algorithm to determine the number of clusters, a mechanism for dynamic participation of nodes in the network, and a replica placement algorithm that produces low communication cost and high data availability as compared to DH and DDG protocols. CBH has proven that replica consistency is maintained by satisfying the Quorum Intersection Properties.
first_indexed 2024-03-06T10:37:38Z
format Thesis
id upm.eprints-84549
institution Universiti Putra Malaysia
language English
last_indexed 2024-03-06T10:37:38Z
publishDate 2019
record_format dspace
spelling upm.eprints-845492021-12-31T08:24:30Z http://psasir.upm.edu.my/id/eprint/84549/ A cluster-based hybrid replica control protocol for high availability in data grid Mabni, Zulaile Data Grid provides a scalable infrastructure for managing and storing large amount of data files in Grid computing system. In Data Grid, data replication is a widely used technique for managing data, where exact copies of data or replicas are created and stored at many distributed sites. This technique provides high data availability and increases the performance of the distributed systems. In recent years, the number of distributed nodes has become very large in Grid computing system. The growing number of nodes has raised few issues in data replication. The first issue is, nodes in the Grid systems are dynamic where they can join or leave the system at any time. Therefore, a replica control protocol must consider the dynamic aspects of the Data Grid. Next important issue is replica placement which determines the suitable nodes to place the replicas. Previously, replica placement has not been an issue since the research only focuses on small-scale systems. However, in a larger system such as Data Grid, the existing replica control protocols require bigger number of replicas to construct read and write quorums. As the number of replicas increases, the communication cost also increases and thus, degrades the performance of the protocols. Another issue is replica consistency that needs to be ensured when copying data in a large-scale system. In order to maintain replica consistency, if there is concurrent update to several replicas of the same file, then all other replicas must have the same updated contents. Thus, an efficient mechanism is needed to improve performance of the system while ensuring replica consistency in Data Grid. Therefore, in this thesis, we proposed a new replica control protocol named Cluster-Based Hybrid (CBH) protocol for large-scale system with the objectives to reduce the communication cost, increase data availability, and maintain replica consistency. CBH employs a hybrid replication strategy by combining the advantages of two common replica control protocols to improve the performance of the existing protocols. A clustering algorithm has been proposed to group the large nodes into clusters and organize these clusters into a tree structure. Another proposed algorithm is replica placement algorithm which selects and places only one replica in each cluster. The performance of CBH protocol is evaluated theoretically and using simulations. A discrete event simulator called GridSim and Java programming language is used to simulate the proposed protocol. The performance metrics which are communication cost and data availability of the protocol are evaluated and compared with two latest quorum-based protocols which are Dynamic Hybrid (DH) and Duplication on Grid (DDG) protocol. CBH shows that by grouping the nodes into clusters and having only one replica in each cluster, has minimized the number of replicas involved in constructing read and write quorums. This research has contributed a dynamic cluster-based hybrid replica control protocol which proposed a clustering algorithm to determine the number of clusters, a mechanism for dynamic participation of nodes in the network, and a replica placement algorithm that produces low communication cost and high data availability as compared to DH and DDG protocols. CBH has proven that replica consistency is maintained by satisfying the Quorum Intersection Properties. 2019-02 Thesis NonPeerReviewed text en http://psasir.upm.edu.my/id/eprint/84549/1/FSKTM%20%28fsktm%29%202019%2045.pdf Mabni, Zulaile (2019) A cluster-based hybrid replica control protocol for high availability in data grid. Doctoral thesis, Universiti Putra Malaysia. Computational grids (Computer systems)
spellingShingle Computational grids (Computer systems)
Mabni, Zulaile
A cluster-based hybrid replica control protocol for high availability in data grid
title A cluster-based hybrid replica control protocol for high availability in data grid
title_full A cluster-based hybrid replica control protocol for high availability in data grid
title_fullStr A cluster-based hybrid replica control protocol for high availability in data grid
title_full_unstemmed A cluster-based hybrid replica control protocol for high availability in data grid
title_short A cluster-based hybrid replica control protocol for high availability in data grid
title_sort cluster based hybrid replica control protocol for high availability in data grid
topic Computational grids (Computer systems)
url http://psasir.upm.edu.my/id/eprint/84549/1/FSKTM%20%28fsktm%29%202019%2045.pdf
work_keys_str_mv AT mabnizulaile aclusterbasedhybridreplicacontrolprotocolforhighavailabilityindatagrid
AT mabnizulaile clusterbasedhybridreplicacontrolprotocolforhighavailabilityindatagrid