Measuring the Extent of the Synonym Problem in Full-Text Searching

Objective – This article measures the extent of the synonym problem in full-text searching. The synonym problem occurs when a search misses documents because the search was based on a synonym and not on a more familiar term. Methods – We considered a sample of 90 single word synonym pairs and...

Full description

Bibliographic Details
Main Authors: Jeffrey Beall, Karen Kafadar
Format: Article
Language:English
Published: University of Alberta 2008-12-01
Series:Evidence Based Library and Information Practice
Subjects:
Online Access:https://journals.library.ualberta.ca/eblip/index.php/EBLIP/article/view/4081
_version_ 1818146744638111744
author Jeffrey Beall
Karen Kafadar
author_facet Jeffrey Beall
Karen Kafadar
author_sort Jeffrey Beall
collection DOAJ
description Objective – This article measures the extent of the synonym problem in full-text searching. The synonym problem occurs when a search misses documents because the search was based on a synonym and not on a more familiar term. Methods – We considered a sample of 90 single word synonym pairs and searched for each word in the pair, both singly and jointly, in the Yahoo! database. We determined the number of web sites that were missed when only one but not the other term was included in the search field. Results – Depending upon how common the usage is of the synonym, the percentage of missed web sites can vary from almost 0% to almost 100%. When the search uses a very uncommon synonym ("diaconate"), a very high percentage of web pages can be missed (95%), versus the search using the more common term (only 9% are missed when searching web pages for the term "deacons"). If both terms in a word pair were nearly equal in usage ("cooks" and "chefs"), then a search on one term but not the other missed almost half the relevant web pages. Conclusion – Our results indicate great value for search engines to incorporate automatic synonym searching not only for user-specified terms but also for high usage synonyms. Moreover, the results demonstrate the value of information retrieval systems that use controlled vocabularies and cross references to generate search results.
first_indexed 2024-12-11T12:24:13Z
format Article
id doaj.art-4a73727f42a7411886bf632bb1996871
institution Directory Open Access Journal
issn 1715-720X
language English
last_indexed 2024-12-11T12:24:13Z
publishDate 2008-12-01
publisher University of Alberta
record_format Article
series Evidence Based Library and Information Practice
spelling doaj.art-4a73727f42a7411886bf632bb19968712022-12-22T01:07:26ZengUniversity of AlbertaEvidence Based Library and Information Practice1715-720X2008-12-013410.18438/B8MC85Measuring the Extent of the Synonym Problem in Full-Text SearchingJeffrey Beall0Karen KafadarUniversity of Colorado DenverObjective – This article measures the extent of the synonym problem in full-text searching. The synonym problem occurs when a search misses documents because the search was based on a synonym and not on a more familiar term. Methods – We considered a sample of 90 single word synonym pairs and searched for each word in the pair, both singly and jointly, in the Yahoo! database. We determined the number of web sites that were missed when only one but not the other term was included in the search field. Results – Depending upon how common the usage is of the synonym, the percentage of missed web sites can vary from almost 0% to almost 100%. When the search uses a very uncommon synonym ("diaconate"), a very high percentage of web pages can be missed (95%), versus the search using the more common term (only 9% are missed when searching web pages for the term "deacons"). If both terms in a word pair were nearly equal in usage ("cooks" and "chefs"), then a search on one term but not the other missed almost half the relevant web pages. Conclusion – Our results indicate great value for search engines to incorporate automatic synonym searching not only for user-specified terms but also for high usage synonyms. Moreover, the results demonstrate the value of information retrieval systems that use controlled vocabularies and cross references to generate search results.https://journals.library.ualberta.ca/eblip/index.php/EBLIP/article/view/4081Full-text searchingsynonymssearch precisioninformation retrieval
spellingShingle Jeffrey Beall
Karen Kafadar
Measuring the Extent of the Synonym Problem in Full-Text Searching
Evidence Based Library and Information Practice
Full-text searching
synonyms
search precision
information retrieval
title Measuring the Extent of the Synonym Problem in Full-Text Searching
title_full Measuring the Extent of the Synonym Problem in Full-Text Searching
title_fullStr Measuring the Extent of the Synonym Problem in Full-Text Searching
title_full_unstemmed Measuring the Extent of the Synonym Problem in Full-Text Searching
title_short Measuring the Extent of the Synonym Problem in Full-Text Searching
title_sort measuring the extent of the synonym problem in full text searching
topic Full-text searching
synonyms
search precision
information retrieval
url https://journals.library.ualberta.ca/eblip/index.php/EBLIP/article/view/4081
work_keys_str_mv AT jeffreybeall measuringtheextentofthesynonymprobleminfulltextsearching
AT karenkafadar measuringtheextentofthesynonymprobleminfulltextsearching