Audio-Visual CNN using Transfer Learning for TV Commercial Break Detection

The TV commercial detection problem is a hard challenge due to the variety of programs and TV channels. The usage of deep learning methods to solve this problem has shown good results. However, it takes a long time with many training epochs to get high accuracy. This research uses transfer lear...

Full description

Bibliographic Details
Main Authors:	Muhammad Zha'farudin Pudya Wardana, Moh. Edi Wibowo
Format:	Article
Language:	English
Published:	Universitas Gadjah Mada 2023-07-01
Series:	IJCCS (Indonesian Journal of Computing and Cybernetics Systems)
Subjects:	commercial, tv, cnn, transfer learning, inceptionv3, mobilenetv2, densenet169, video
Online Access:	https://jurnal.ugm.ac.id/ijccs/article/view/76058

_version_	1827814577894588416
author	Muhammad Zha'farudin Pudya Wardana Moh. Edi Wibowo
author_facet	Muhammad Zha'farudin Pudya Wardana Moh. Edi Wibowo
author_sort	Muhammad Zha'farudin Pudya Wardana
collection	DOAJ
description	The TV commercial detection problem is a hard challenge due to the variety of programs and TV channels. The usage of deep learning methods to solve this problem has shown good results. However, it takes a long time with many training epochs to get high accuracy. This research uses transfer learning techniques to reduce training time and limits the number of training epochs to 20. From video data, the audio feature is extracted with Mel-spectrogram representation, and the visual features are picked from a video frame. The datasets were gathered by recording programs from various TV channels in Indonesia. Pre-trained CNN models such as MobileNetV2, InceptionV3, and DenseNet169 are re-trained and are used to detect commercials at the shot level. We do post-processing to cluster the shots into segments of commercials and non-commercials. The best result is shown by Audio-Visual CNN using transfer learning with an accuracy of 93.26% with only 20 training epochs. It is faster and better than the CNN model without using transfer learning with an accuracy of 88.17% and 77 training epochs. The result by adding post-processing increases the accuracy of Audio-Visual CNN using transfer learning to 96.42%.
first_indexed	2024-03-11T23:47:58Z
format	Article
id	doaj.art-662fe5728a684da7867c3e7314b1559d
institution	Directory Open Access Journal
issn	1978-1520 2460-7258
language	English
last_indexed	2024-03-11T23:47:58Z
publishDate	2023-07-01
publisher	Universitas Gadjah Mada
record_format	Article
series	IJCCS (Indonesian Journal of Computing and Cybernetics Systems)
spelling	doaj.art-662fe5728a684da7867c3e7314b1559d2023-09-19T08:56:00ZengUniversitas Gadjah MadaIJCCS (Indonesian Journal of Computing and Cybernetics Systems)1978-15202460-72582023-07-0117329130010.22146/ijccs.7605834197Audio-Visual CNN using Transfer Learning for TV Commercial Break DetectionMuhammad Zha'farudin Pudya Wardana0Moh. Edi Wibowo1Master Program in Computer Science, FMIPA UGM, YogyakartaDepartment of Computer Science and Electronics, FMIPA UGM, YogyakartaThe TV commercial detection problem is a hard challenge due to the variety of programs and TV channels. The usage of deep learning methods to solve this problem has shown good results. However, it takes a long time with many training epochs to get high accuracy. This research uses transfer learning techniques to reduce training time and limits the number of training epochs to 20. From video data, the audio feature is extracted with Mel-spectrogram representation, and the visual features are picked from a video frame. The datasets were gathered by recording programs from various TV channels in Indonesia. Pre-trained CNN models such as MobileNetV2, InceptionV3, and DenseNet169 are re-trained and are used to detect commercials at the shot level. We do post-processing to cluster the shots into segments of commercials and non-commercials. The best result is shown by Audio-Visual CNN using transfer learning with an accuracy of 93.26% with only 20 training epochs. It is faster and better than the CNN model without using transfer learning with an accuracy of 88.17% and 77 training epochs. The result by adding post-processing increases the accuracy of Audio-Visual CNN using transfer learning to 96.42%.https://jurnal.ugm.ac.id/ijccs/article/view/76058commercial, tv, cnn, transfer learning, inceptionv3, mobilenetv2, densenet169, video
spellingShingle	Muhammad Zha'farudin Pudya Wardana Moh. Edi Wibowo Audio-Visual CNN using Transfer Learning for TV Commercial Break Detection IJCCS (Indonesian Journal of Computing and Cybernetics Systems) commercial, tv, cnn, transfer learning, inceptionv3, mobilenetv2, densenet169, video
title	Audio-Visual CNN using Transfer Learning for TV Commercial Break Detection
title_full	Audio-Visual CNN using Transfer Learning for TV Commercial Break Detection
title_fullStr	Audio-Visual CNN using Transfer Learning for TV Commercial Break Detection
title_full_unstemmed	Audio-Visual CNN using Transfer Learning for TV Commercial Break Detection
title_short	Audio-Visual CNN using Transfer Learning for TV Commercial Break Detection
title_sort	audio visual cnn using transfer learning for tv commercial break detection
topic	commercial, tv, cnn, transfer learning, inceptionv3, mobilenetv2, densenet169, video
url	https://jurnal.ugm.ac.id/ijccs/article/view/76058
work_keys_str_mv	AT muhammadzhafarudinpudyawardana audiovisualcnnusingtransferlearningfortvcommercialbreakdetection AT mohediwibowo audiovisualcnnusingtransferlearningfortvcommercialbreakdetection

Audio-Visual CNN using Transfer Learning for TV Commercial Break Detection

Similar Items