Indexing strategies of MapReduce for Information Retrieval in Big Data

In Information Retrieval (IR), the efficient strategy of indexing large dataset and terabyte-scale data is still an issue because of information overload as the result of increasing the knowledge, increasing the number of different media, increasing the number of platforms, and increasing the intero...

Full description

Bibliographic Details
Main Authors:	Farid, Mazen, Latip, Rohaya, Hussin, Masnida, Al-Mekhlafi, Mohammed Abdulkarem
Format:	Article
Language:	English
Published:	The World Academy of Research in Science and Engineering 2016
Subjects:	Hadoop; Indexing; MapReduce; Sensei; Terrier
Online Access:	http://psasir.upm.edu.my/id/eprint/54548/1/Indexing%20strategies%20of%20MapReduce%20for%20Information%20Retrieval%20in%20Big%20Data.pdf

_version_	1796976104276230144
author	Farid, Mazen Latip, Rohaya Hussin, Masnida Al-Mekhlafi, Mohammed Abdulkarem
author_facet	Farid, Mazen Latip, Rohaya Hussin, Masnida Al-Mekhlafi, Mohammed Abdulkarem
author_sort	Farid, Mazen
collection	UPM
description	In Information Retrieval (IR), the efficient strategy of indexing large dataset and terabyte-scale data is still an issue because of information overload as the result of increasing the knowledge, increasing the number of different media, increasing the number of platforms, and increasing the interoperability of platforms. Across multiple processing machines, MapReduce has been suggested as a suitable platform that use for distributing the intensive data operations. In this project, sensei and Per-posting list indexing (Terrier) will be analyze as they are the two efficient MapReduce indexing strategies. The two indexing will be implemented in an existing framework of IR, and an experiment will be performed by using the Hadoop for MapReducing with the same large dataset. In particular, this paper will study the effectiveness of two indexing strategies (Sensei & Terrier), and try to find and verify the better efficient strategy between them. The experiment will measure the performance of retrieving when the size and processing power enlarge. The experiment examines how the indexing strategies scaled and work with large size of dataset and distributed number of machines. The throughput will be measured by using MB/S (Megabyte per Second), and the experiment results analyzing the performance and efficiency of indexing strategies between Sensei & Per-posting list indexing (Terrier).
first_indexed	2024-03-06T09:21:00Z
format	Article
id	upm.eprints-54548
institution	Universiti Putra Malaysia
language	English
last_indexed	2024-03-06T09:21:00Z
publishDate	2016
publisher	The World Academy of Research in Science and Engineering
record_format	dspace
spelling	upm.eprints-545482018-03-28T04:34:26Z http://psasir.upm.edu.my/id/eprint/54548/ Indexing strategies of MapReduce for Information Retrieval in Big Data Farid, Mazen Latip, Rohaya Hussin, Masnida Al-Mekhlafi, Mohammed Abdulkarem In Information Retrieval (IR), the efficient strategy of indexing large dataset and terabyte-scale data is still an issue because of information overload as the result of increasing the knowledge, increasing the number of different media, increasing the number of platforms, and increasing the interoperability of platforms. Across multiple processing machines, MapReduce has been suggested as a suitable platform that use for distributing the intensive data operations. In this project, sensei and Per-posting list indexing (Terrier) will be analyze as they are the two efficient MapReduce indexing strategies. The two indexing will be implemented in an existing framework of IR, and an experiment will be performed by using the Hadoop for MapReducing with the same large dataset. In particular, this paper will study the effectiveness of two indexing strategies (Sensei & Terrier), and try to find and verify the better efficient strategy between them. The experiment will measure the performance of retrieving when the size and processing power enlarge. The experiment examines how the indexing strategies scaled and work with large size of dataset and distributed number of machines. The throughput will be measured by using MB/S (Megabyte per Second), and the experiment results analyzing the performance and efficiency of indexing strategies between Sensei & Per-posting list indexing (Terrier). The World Academy of Research in Science and Engineering 2016 Article NonPeerReviewed text en http://psasir.upm.edu.my/id/eprint/54548/1/Indexing%20strategies%20of%20MapReduce%20for%20Information%20Retrieval%20in%20Big%20Data.pdf Farid, Mazen and Latip, Rohaya and Hussin, Masnida and Al-Mekhlafi, Mohammed Abdulkarem (2016) Indexing strategies of MapReduce for Information Retrieval in Big Data. International Journal of Advances in Computer Science and Technology, 5 (3). pp. 1-6. ISSN 2320-2602 Hadoop; Indexing; MapReduce; Sensei; Terrier
spellingShingle	Hadoop; Indexing; MapReduce; Sensei; Terrier Farid, Mazen Latip, Rohaya Hussin, Masnida Al-Mekhlafi, Mohammed Abdulkarem Indexing strategies of MapReduce for Information Retrieval in Big Data
title	Indexing strategies of MapReduce for Information Retrieval in Big Data
title_full	Indexing strategies of MapReduce for Information Retrieval in Big Data
title_fullStr	Indexing strategies of MapReduce for Information Retrieval in Big Data
title_full_unstemmed	Indexing strategies of MapReduce for Information Retrieval in Big Data
title_short	Indexing strategies of MapReduce for Information Retrieval in Big Data
title_sort	indexing strategies of mapreduce for information retrieval in big data
topic	Hadoop; Indexing; MapReduce; Sensei; Terrier
url	http://psasir.upm.edu.my/id/eprint/54548/1/Indexing%20strategies%20of%20MapReduce%20for%20Information%20Retrieval%20in%20Big%20Data.pdf
work_keys_str_mv	AT faridmazen indexingstrategiesofmapreduceforinformationretrievalinbigdata AT latiprohaya indexingstrategiesofmapreduceforinformationretrievalinbigdata AT hussinmasnida indexingstrategiesofmapreduceforinformationretrievalinbigdata AT almekhlafimohammedabdulkarem indexingstrategiesofmapreduceforinformationretrievalinbigdata

Indexing strategies of MapReduce for Information Retrieval in Big Data

Similar Items