Summary: | Web content credibility implies finding credible and correct information on the web. Recent studies have shown there is an increasing trend of users turning towards the web for searching information related to a variety of topics including health, stocks, education, politics to name few. Information credibility is a critical factor in these domains for the decision makers. There is no limitation on the authorship of those articles and content. One criterion for evaluating credibility is to check the authority or source of information. However, there are situations when wrong information flows from credible sources. There are various approaches towards credibility assessment, broadly categorized into human-based and computational approaches. Computational approaches utilizing machine learning based techniques are computationally expensive. Reputation based approaches overcome this, however the latest work fails to take into account issue of negative referrals and utilizes simple summation as the calculation structure making it more resilient to attacks. This paper put forth verified hypothesis of direct relationship of credibility to the expertise of entity. Authors proposed a Bayesian based approach using feedback in the form of interaction among the entities to compute their expertise level, thereby showing improved results in terms of Precision, Correlation and Mean Average Error. The experiments are performed on two different datasets, one of the dataset is developed from a survey as the part of the research study. The results from the two experiments show that the reputation ranks are independent of the pattern of ratings and density of data, unlike previous techniques whose results were limited by these factors. The proposed technique gives 27% and 18% more precise results for the two experiments respectively compared to the baseline. The correlation results are also significant in both experiments for the proposed technique with significant values of 0.39 and 0.87 showing a linear relationship between predicted and original data. The paper also discusses the reputation attacks and proposes counter measures to tackle these attacks through simulation results.
|