Saliency Prediction in Uncategorized Videos Based on Audio-Visual Correlation

Substantial research has been done in saliency modeling to make intelligent machines that can perceive and interpret their surroundings and focus only on the salient regions in a visual scene. But existing spatio–temporal saliency models either treat videos as merely image sequences exclu...

Full description

Bibliographic Details
Main Authors: Maryam Qamar, Suleman Qamar, Muhammad Muneeb, Sung-Ho Bae, Anis Rahman
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10042304/
_version_ 1797900217538838528
author Maryam Qamar
Suleman Qamar
Muhammad Muneeb
Sung-Ho Bae
Anis Rahman
author_facet Maryam Qamar
Suleman Qamar
Muhammad Muneeb
Sung-Ho Bae
Anis Rahman
author_sort Maryam Qamar
collection DOAJ
description Substantial research has been done in saliency modeling to make intelligent machines that can perceive and interpret their surroundings and focus only on the salient regions in a visual scene. But existing spatio–temporal saliency models either treat videos as merely image sequences excluding any audio information or are unable to cope with inherently varying content. Based on the hypothesis that an audiovisual saliency model will perform better than traditional spatio–temporal saliency models, this work aims to provide a generic preliminary audio/video saliency model. This is achieved by augmenting visual saliency map with an audio saliency map computed by synchronizing low-level audio and visual features. The proposed model was evaluated using different criteria against eye fixations data for a publicly available video dataset DIEM. The evaluation results show that the model outperforms two state-of-the-art visual spatio–temporal saliency models. Thus, supporting our hypothesis that an audiovisual model performs better in comparison to a visual model for natural uncategorized videos.
first_indexed 2024-04-10T08:42:30Z
format Article
id doaj.art-53886144739a4fd883aa1c45ea4bb703
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-04-10T08:42:30Z
publishDate 2023-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-53886144739a4fd883aa1c45ea4bb7032023-02-23T00:00:33ZengIEEEIEEE Access2169-35362023-01-0111154601547010.1109/ACCESS.2023.324419110042304Saliency Prediction in Uncategorized Videos Based on Audio-Visual CorrelationMaryam Qamar0https://orcid.org/0000-0003-0774-5411Suleman Qamar1https://orcid.org/0000-0001-6528-1681Muhammad Muneeb2https://orcid.org/0000-0002-6506-4430Sung-Ho Bae3https://orcid.org/0000-0003-2677-3186Anis Rahman4https://orcid.org/0000-0002-8306-475XDepartment of Computing, National University of Science and Technology, Islamabad, PakistanDepartment of Computer and Information Sciences, Pakistan Institute of Engineering and Applied Sciences, Islamabad, PakistanKhalifa University of Science and Technology, Abu Dhabi, United Arab EmiratesDepartment of Computer Science and Engineering, Kyung Hee University, Seoul, South KoreaDepartment of Computing, National University of Science and Technology, Islamabad, PakistanSubstantial research has been done in saliency modeling to make intelligent machines that can perceive and interpret their surroundings and focus only on the salient regions in a visual scene. But existing spatio–temporal saliency models either treat videos as merely image sequences excluding any audio information or are unable to cope with inherently varying content. Based on the hypothesis that an audiovisual saliency model will perform better than traditional spatio–temporal saliency models, this work aims to provide a generic preliminary audio/video saliency model. This is achieved by augmenting visual saliency map with an audio saliency map computed by synchronizing low-level audio and visual features. The proposed model was evaluated using different criteria against eye fixations data for a publicly available video dataset DIEM. The evaluation results show that the model outperforms two state-of-the-art visual spatio–temporal saliency models. Thus, supporting our hypothesis that an audiovisual model performs better in comparison to a visual model for natural uncategorized videos.https://ieeexplore.ieee.org/document/10042304/Saliencyaudiovisualuncategorized videosspatio–temporal
spellingShingle Maryam Qamar
Suleman Qamar
Muhammad Muneeb
Sung-Ho Bae
Anis Rahman
Saliency Prediction in Uncategorized Videos Based on Audio-Visual Correlation
IEEE Access
Saliency
audiovisual
uncategorized videos
spatio–temporal
title Saliency Prediction in Uncategorized Videos Based on Audio-Visual Correlation
title_full Saliency Prediction in Uncategorized Videos Based on Audio-Visual Correlation
title_fullStr Saliency Prediction in Uncategorized Videos Based on Audio-Visual Correlation
title_full_unstemmed Saliency Prediction in Uncategorized Videos Based on Audio-Visual Correlation
title_short Saliency Prediction in Uncategorized Videos Based on Audio-Visual Correlation
title_sort saliency prediction in uncategorized videos based on audio visual correlation
topic Saliency
audiovisual
uncategorized videos
spatio–temporal
url https://ieeexplore.ieee.org/document/10042304/
work_keys_str_mv AT maryamqamar saliencypredictioninuncategorizedvideosbasedonaudiovisualcorrelation
AT sulemanqamar saliencypredictioninuncategorizedvideosbasedonaudiovisualcorrelation
AT muhammadmuneeb saliencypredictioninuncategorizedvideosbasedonaudiovisualcorrelation
AT sunghobae saliencypredictioninuncategorizedvideosbasedonaudiovisualcorrelation
AT anisrahman saliencypredictioninuncategorizedvideosbasedonaudiovisualcorrelation