Introducing a novel method for Automatic facet extraction in the faceted search (Case Study: gynecology and obstetrics domain)

In this research a new algorithm for facet extraction has been developed and introduced, which provides the experimental possibility of identifying facets based on a literary warrant. In the field of automatic facet extraction two main ideas were considered by reviewing the researches. The first ide...

Full description

Bibliographic Details
Main Authors: Abdolhossein Farajpahlou, Farideh Osareh, Seyed Mostafa Fakhrahmad, Leila Dehghani
Format: Article
Language:fas
Published: Iranian Research Institute for Information and Technology 2022-03-01
Series:Iranian Journal of Information Processing & Management
Subjects:
Online Access:http://jipm.irandoc.ac.ir/article-1-4690-en.html
_version_ 1818531795544571904
author Abdolhossein Farajpahlou
Farideh Osareh
Seyed Mostafa Fakhrahmad
Leila Dehghani
author_facet Abdolhossein Farajpahlou
Farideh Osareh
Seyed Mostafa Fakhrahmad
Leila Dehghani
author_sort Abdolhossein Farajpahlou
collection DOAJ
description In this research a new algorithm for facet extraction has been developed and introduced, which provides the experimental possibility of identifying facets based on a literary warrant. In the field of automatic facet extraction two main ideas were considered by reviewing the researches. The first idea is that the facet appears in the context. Therefore, to identify the facet in a corpus, its context must be examined. The second idea is that the facet is the focal point in a lexical tree that is neither very general nor very specific. Based on these two ideas, first, the corpus in the medicine area and the obstetrics and gynaecology domain was prepared. The research team selected three corpora from the literary warrant and used the abstract and title of the collection of articles in top 20 journals of the field to create a contextual corpus. This collection contained 167071 documents. 2000 articles were randomly selected to create the origin corpus. The third body is the lexical corpus. The proper words of the corpus were extracted using a web-based service. The output contained 514 words. Duplicate words were removed and finally, 480 important words were identified. Then, the words were expanded in the contextual corpus with the help of the supervisor (Mesh) and then-candidate dissertations were extracted based on the two conditions of frequency-based Shifting and rank-based Shifting. Finally, using the three rules of specificity, substitution, and generality, the identified facets were modified and named. Finally, 26 facets were identified in the domain of gynaecology and obstetrics. Comparing the proposed algorithm with other algorithms, it was found that the combination of statistical approach and tree pruning can have better results than purely statistical approach or tree pruning. Also, the comparison of the output facets of the algorithm with the traditional facets in this obstetrics and gynaecology domain showed that the output of the algorithm is smaller and more useful for browsing information retrieval tools. Also, in this study was specified that specialized domain facets are different from general facets and can be redefined independently, but the results cannot be generalized to all medical domains and other researches are needed to be done in other fields.
first_indexed 2024-12-11T17:37:07Z
format Article
id doaj.art-e17bbafc45a24a58ac40727249821607
institution Directory Open Access Journal
issn 2251-8223
2251-8231
language fas
last_indexed 2024-12-11T17:37:07Z
publishDate 2022-03-01
publisher Iranian Research Institute for Information and Technology
record_format Article
series Iranian Journal of Information Processing & Management
spelling doaj.art-e17bbafc45a24a58ac407272498216072022-12-22T00:56:39ZfasIranian Research Institute for Information and TechnologyIranian Journal of Information Processing & Management2251-82232251-82312022-03-01373807838Introducing a novel method for Automatic facet extraction in the faceted search (Case Study: gynecology and obstetrics domain)Abdolhossein Farajpahlou0Farideh Osareh1Seyed Mostafa Fakhrahmad2Leila Dehghani3 School of Education & Psychology; Shahid Chamran University; Ahvaz, Iran; School of Education & Psychology; Shahid Chamran University; Ahvaz, Iran; Department of Computer Science and Engineering & IT; Shiraz University; Shiraz, Iran Medical librarianship group, Department of Paramedical Medicine ; Bushehr University of Medical Sciences; Bushehr, Iran In this research a new algorithm for facet extraction has been developed and introduced, which provides the experimental possibility of identifying facets based on a literary warrant. In the field of automatic facet extraction two main ideas were considered by reviewing the researches. The first idea is that the facet appears in the context. Therefore, to identify the facet in a corpus, its context must be examined. The second idea is that the facet is the focal point in a lexical tree that is neither very general nor very specific. Based on these two ideas, first, the corpus in the medicine area and the obstetrics and gynaecology domain was prepared. The research team selected three corpora from the literary warrant and used the abstract and title of the collection of articles in top 20 journals of the field to create a contextual corpus. This collection contained 167071 documents. 2000 articles were randomly selected to create the origin corpus. The third body is the lexical corpus. The proper words of the corpus were extracted using a web-based service. The output contained 514 words. Duplicate words were removed and finally, 480 important words were identified. Then, the words were expanded in the contextual corpus with the help of the supervisor (Mesh) and then-candidate dissertations were extracted based on the two conditions of frequency-based Shifting and rank-based Shifting. Finally, using the three rules of specificity, substitution, and generality, the identified facets were modified and named. Finally, 26 facets were identified in the domain of gynaecology and obstetrics. Comparing the proposed algorithm with other algorithms, it was found that the combination of statistical approach and tree pruning can have better results than purely statistical approach or tree pruning. Also, the comparison of the output facets of the algorithm with the traditional facets in this obstetrics and gynaecology domain showed that the output of the algorithm is smaller and more useful for browsing information retrieval tools. Also, in this study was specified that specialized domain facets are different from general facets and can be redefined independently, but the results cannot be generalized to all medical domains and other researches are needed to be done in other fields.http://jipm.irandoc.ac.ir/article-1-4690-en.htmldata retrievalfacetfaceted searchautomatic facet extraction.
spellingShingle Abdolhossein Farajpahlou
Farideh Osareh
Seyed Mostafa Fakhrahmad
Leila Dehghani
Introducing a novel method for Automatic facet extraction in the faceted search (Case Study: gynecology and obstetrics domain)
Iranian Journal of Information Processing & Management
data retrieval
facet
faceted search
automatic facet extraction.
title Introducing a novel method for Automatic facet extraction in the faceted search (Case Study: gynecology and obstetrics domain)
title_full Introducing a novel method for Automatic facet extraction in the faceted search (Case Study: gynecology and obstetrics domain)
title_fullStr Introducing a novel method for Automatic facet extraction in the faceted search (Case Study: gynecology and obstetrics domain)
title_full_unstemmed Introducing a novel method for Automatic facet extraction in the faceted search (Case Study: gynecology and obstetrics domain)
title_short Introducing a novel method for Automatic facet extraction in the faceted search (Case Study: gynecology and obstetrics domain)
title_sort introducing a novel method for automatic facet extraction in the faceted search case study gynecology and obstetrics domain
topic data retrieval
facet
faceted search
automatic facet extraction.
url http://jipm.irandoc.ac.ir/article-1-4690-en.html
work_keys_str_mv AT abdolhosseinfarajpahlou introducinganovelmethodforautomaticfacetextractioninthefacetedsearchcasestudygynecologyandobstetricsdomain
AT faridehosareh introducinganovelmethodforautomaticfacetextractioninthefacetedsearchcasestudygynecologyandobstetricsdomain
AT seyedmostafafakhrahmad introducinganovelmethodforautomaticfacetextractioninthefacetedsearchcasestudygynecologyandobstetricsdomain
AT leiladehghani introducinganovelmethodforautomaticfacetextractioninthefacetedsearchcasestudygynecologyandobstetricsdomain