An Approach to Page Ranking Based on Discourse Structures

World Wide Web (WWW) which is predominant source for Information Retrieval today (IR) is essentially a set of hyperlinked documents. A web page containing more number of related hyperlinks satisfy the user needs in a single page. The IR systems should give high priority to such web pages. While assi...

Full description

Bibliographic Details
Main Authors: Subalalitha Chinnaudayar Navaneethakrishnan, Anita Ramalingam
Format: Article
Language:English
Published: Croatian Communications and Information Society (CCIS) 2016-12-01
Series:Journal of Communications Software and Systems
Subjects:
Online Access:https://jcomss.fesb.unist.hr/index.php/jcomss/article/view/78
_version_ 1818545695675645952
author Subalalitha Chinnaudayar Navaneethakrishnan
Anita Ramalingam
author_facet Subalalitha Chinnaudayar Navaneethakrishnan
Anita Ramalingam
author_sort Subalalitha Chinnaudayar Navaneethakrishnan
collection DOAJ
description World Wide Web (WWW) which is predominant source for Information Retrieval today (IR) is essentially a set of hyperlinked documents. A web page containing more number of related hyperlinks satisfy the user needs in a single page. The IR systems should give high priority to such web pages. While assigning a rank for a web page, existing web mining techniques such as Hypertext Induced Topic Selection (HITS) and Page Ranking algorithms focus on the number of in links and out links present in the web page. Instead of just relying on the number of links present in the web page, the discovery of semantic relations between the web page and the hyperlinks present in the web page can improve the quality of the IR systems. The Rhetorical Structure Theory (RST) is widely used to find the semantic relations between text fragments by analysing the discourse structure of a text. In this paper, we propose a novel approach to find the semantic relation between a web page and the links present in the web page using RST. The proposed approach uses RST based discourse relations to find the relation between a web page and the hyperlinks present in the web page. We have implemented and evaluated our approach on an IR system using 500 Tamil language and 50 English tourism domain specific web pages. A comparison between the proposed approach and an existing page ranking algorithm has also been done.
first_indexed 2024-12-12T07:43:26Z
format Article
id doaj.art-b53fdf693beb431cae424b4159905348
institution Directory Open Access Journal
issn 1845-6421
1846-6079
language English
last_indexed 2024-12-12T07:43:26Z
publishDate 2016-12-01
publisher Croatian Communications and Information Society (CCIS)
record_format Article
series Journal of Communications Software and Systems
spelling doaj.art-b53fdf693beb431cae424b41599053482022-12-22T00:32:43ZengCroatian Communications and Information Society (CCIS)Journal of Communications Software and Systems1845-64211846-60792016-12-01124195200An Approach to Page Ranking Based on Discourse StructuresSubalalitha Chinnaudayar NavaneethakrishnanAnita RamalingamWorld Wide Web (WWW) which is predominant source for Information Retrieval today (IR) is essentially a set of hyperlinked documents. A web page containing more number of related hyperlinks satisfy the user needs in a single page. The IR systems should give high priority to such web pages. While assigning a rank for a web page, existing web mining techniques such as Hypertext Induced Topic Selection (HITS) and Page Ranking algorithms focus on the number of in links and out links present in the web page. Instead of just relying on the number of links present in the web page, the discovery of semantic relations between the web page and the hyperlinks present in the web page can improve the quality of the IR systems. The Rhetorical Structure Theory (RST) is widely used to find the semantic relations between text fragments by analysing the discourse structure of a text. In this paper, we propose a novel approach to find the semantic relation between a web page and the links present in the web page using RST. The proposed approach uses RST based discourse relations to find the relation between a web page and the hyperlinks present in the web page. We have implemented and evaluated our approach on an IR system using 500 Tamil language and 50 English tourism domain specific web pages. A comparison between the proposed approach and an existing page ranking algorithm has also been done.https://jcomss.fesb.unist.hr/index.php/jcomss/article/view/78Discourse structureLink Analysis and Rhetorical Structure Theory
spellingShingle Subalalitha Chinnaudayar Navaneethakrishnan
Anita Ramalingam
An Approach to Page Ranking Based on Discourse Structures
Journal of Communications Software and Systems
Discourse structure
Link Analysis and Rhetorical Structure Theory
title An Approach to Page Ranking Based on Discourse Structures
title_full An Approach to Page Ranking Based on Discourse Structures
title_fullStr An Approach to Page Ranking Based on Discourse Structures
title_full_unstemmed An Approach to Page Ranking Based on Discourse Structures
title_short An Approach to Page Ranking Based on Discourse Structures
title_sort approach to page ranking based on discourse structures
topic Discourse structure
Link Analysis and Rhetorical Structure Theory
url https://jcomss.fesb.unist.hr/index.php/jcomss/article/view/78
work_keys_str_mv AT subalalithachinnaudayarnavaneethakrishnan anapproachtopagerankingbasedondiscoursestructures
AT anitaramalingam anapproachtopagerankingbasedondiscoursestructures
AT subalalithachinnaudayarnavaneethakrishnan approachtopagerankingbasedondiscoursestructures
AT anitaramalingam approachtopagerankingbasedondiscoursestructures