Extracting integrate and search healthcare knowledge from the web (III)

Currently, there is a trend where users post questions and edit questions via the use of online websites. These sites are also known as Community Question Answering (CQA) sites. CQA sites are beneficial to the web users because of the valuable knowledge accumulated from everybody around the world. H...

Full description

Bibliographic Details
Main Author: Lim, Lionel Guan Chuan.
Other Authors: School of Computer Engineering
Format: Final Year Project (FYP)
Language:English
Published: 2013
Subjects:
Online Access:http://hdl.handle.net/10356/51991
Description
Summary:Currently, there is a trend where users post questions and edit questions via the use of online websites. These sites are also known as Community Question Answering (CQA) sites. CQA sites are beneficial to the web users because of the valuable knowledge accumulated from everybody around the world. However, as beneficial as CQA sites may be, there comes a complexity of how to extract only relevant information which is beneficial to the web user. The goal of this project aims to consolidate healthcare information and allow web users to extract information which is beneficial to them. To do so, java-programmed web crawlers are programmed to retrieve the URL, category, question answer from the CQA health category. The question answer pairs crawled are then saved into an XML format. Lucene, a java IR java library, is used for speed indexing of the various XML documents.Another goal is to design a centralised search engine that can retrieve relevant healthcare information from CQA data. As this project will be a continuation from Senior Lee Qian Hui’s progress, i am tasked to utilise Information Retrieval Models to data crawl from more CQA sites that resemble WikiAnswers, which was previously implemented by Senior Lee.