Saliency Prediction in Uncategorized Videos Based on Audio-Visual Correlation
Substantial research has been done in saliency modeling to make intelligent machines that can perceive and interpret their surroundings and focus only on the salient regions in a visual scene. But existing spatio–temporal saliency models either treat videos as merely image sequences exclu...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2023-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10042304/ |
_version_ | 1797900217538838528 |
---|---|
author | Maryam Qamar Suleman Qamar Muhammad Muneeb Sung-Ho Bae Anis Rahman |
author_facet | Maryam Qamar Suleman Qamar Muhammad Muneeb Sung-Ho Bae Anis Rahman |
author_sort | Maryam Qamar |
collection | DOAJ |
description | Substantial research has been done in saliency modeling to make intelligent machines that can perceive and interpret their surroundings and focus only on the salient regions in a visual scene. But existing spatio–temporal saliency models either treat videos as merely image sequences excluding any audio information or are unable to cope with inherently varying content. Based on the hypothesis that an audiovisual saliency model will perform better than traditional spatio–temporal saliency models, this work aims to provide a generic preliminary audio/video saliency model. This is achieved by augmenting visual saliency map with an audio saliency map computed by synchronizing low-level audio and visual features. The proposed model was evaluated using different criteria against eye fixations data for a publicly available video dataset DIEM. The evaluation results show that the model outperforms two state-of-the-art visual spatio–temporal saliency models. Thus, supporting our hypothesis that an audiovisual model performs better in comparison to a visual model for natural uncategorized videos. |
first_indexed | 2024-04-10T08:42:30Z |
format | Article |
id | doaj.art-53886144739a4fd883aa1c45ea4bb703 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-04-10T08:42:30Z |
publishDate | 2023-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-53886144739a4fd883aa1c45ea4bb7032023-02-23T00:00:33ZengIEEEIEEE Access2169-35362023-01-0111154601547010.1109/ACCESS.2023.324419110042304Saliency Prediction in Uncategorized Videos Based on Audio-Visual CorrelationMaryam Qamar0https://orcid.org/0000-0003-0774-5411Suleman Qamar1https://orcid.org/0000-0001-6528-1681Muhammad Muneeb2https://orcid.org/0000-0002-6506-4430Sung-Ho Bae3https://orcid.org/0000-0003-2677-3186Anis Rahman4https://orcid.org/0000-0002-8306-475XDepartment of Computing, National University of Science and Technology, Islamabad, PakistanDepartment of Computer and Information Sciences, Pakistan Institute of Engineering and Applied Sciences, Islamabad, PakistanKhalifa University of Science and Technology, Abu Dhabi, United Arab EmiratesDepartment of Computer Science and Engineering, Kyung Hee University, Seoul, South KoreaDepartment of Computing, National University of Science and Technology, Islamabad, PakistanSubstantial research has been done in saliency modeling to make intelligent machines that can perceive and interpret their surroundings and focus only on the salient regions in a visual scene. But existing spatio–temporal saliency models either treat videos as merely image sequences excluding any audio information or are unable to cope with inherently varying content. Based on the hypothesis that an audiovisual saliency model will perform better than traditional spatio–temporal saliency models, this work aims to provide a generic preliminary audio/video saliency model. This is achieved by augmenting visual saliency map with an audio saliency map computed by synchronizing low-level audio and visual features. The proposed model was evaluated using different criteria against eye fixations data for a publicly available video dataset DIEM. The evaluation results show that the model outperforms two state-of-the-art visual spatio–temporal saliency models. Thus, supporting our hypothesis that an audiovisual model performs better in comparison to a visual model for natural uncategorized videos.https://ieeexplore.ieee.org/document/10042304/Saliencyaudiovisualuncategorized videosspatio–temporal |
spellingShingle | Maryam Qamar Suleman Qamar Muhammad Muneeb Sung-Ho Bae Anis Rahman Saliency Prediction in Uncategorized Videos Based on Audio-Visual Correlation IEEE Access Saliency audiovisual uncategorized videos spatio–temporal |
title | Saliency Prediction in Uncategorized Videos Based on Audio-Visual Correlation |
title_full | Saliency Prediction in Uncategorized Videos Based on Audio-Visual Correlation |
title_fullStr | Saliency Prediction in Uncategorized Videos Based on Audio-Visual Correlation |
title_full_unstemmed | Saliency Prediction in Uncategorized Videos Based on Audio-Visual Correlation |
title_short | Saliency Prediction in Uncategorized Videos Based on Audio-Visual Correlation |
title_sort | saliency prediction in uncategorized videos based on audio visual correlation |
topic | Saliency audiovisual uncategorized videos spatio–temporal |
url | https://ieeexplore.ieee.org/document/10042304/ |
work_keys_str_mv | AT maryamqamar saliencypredictioninuncategorizedvideosbasedonaudiovisualcorrelation AT sulemanqamar saliencypredictioninuncategorizedvideosbasedonaudiovisualcorrelation AT muhammadmuneeb saliencypredictioninuncategorizedvideosbasedonaudiovisualcorrelation AT sunghobae saliencypredictioninuncategorizedvideosbasedonaudiovisualcorrelation AT anisrahman saliencypredictioninuncategorizedvideosbasedonaudiovisualcorrelation |