Performance analysis of data replication and scheduling in data grid

The Grid is an infrastructure that enables dynamic sharing and coordinated access of resources among different organizations. As a specialization and extension of the Grid, Data Grid emphasizes on the sharing of large-scale data sets and data storage resources. It has evolved to be the solution for...

Fuld beskrivelse

Bibliografiske detaljer
Hovedforfatter:	Zhang, Junwei
Andre forfattere:	Lee Bu Sung, Francis
Format:	Thesis
Sprog:	English
Udgivet:	2010
Fag:	DRNTU::Engineering::Computer science and engineering::Computer systems organization::Performance of systems DRNTU::Engineering::Computer science and engineering::Computer systems organization::Computer-communication networks
Online adgang:	https://hdl.handle.net/10356/38584

Beskrivelse
Summary:	The Grid is an infrastructure that enables dynamic sharing and coordinated access of resources among different organizations. As a specialization and extension of the Grid, Data Grid emphasizes on the sharing of large-scale data sets and data storage resources. It has evolved to be the solution for data intensive applications, such as global climate change, High Energy Physics (HEP), astrophysics, and computational genomics. In these research domains, the size of scientific data is measured in terabytes (1024 gigabyte) or even petabytes (1024 terabytes). Such scientific data are stored as large files and replicated across the Data Grid. Scientists geographically located all over the world are able to download these datasets and analyze them for various purposes. Hierarchical Data Grid is a class of Data Grid that has been adopted by European Organization for Nuclear Research (CERN) to support the distribution of large experimental datasets across the globe. There have been a lot of research works on replication algorithms for the Hierarchical Data Grid. I have developed a probabilistic model of data replication in a Hierarchical Data Grid environment. The model enables us to evaluate the optimality of the replication algorithm in terms of average response time and average bandwidth cost. The accuracy of the model is verified through simulation.

Performance analysis of data replication and scheduling in data grid

Lignende værker