Transformer-based cascade networks with spatial and channel reconstruction convolution for deepfake detection

The threat posed by forged video technology has gradually grown to include individuals, society, and the nation. The technology behind fake videos is getting more advanced and modern. Fake videos are appearing everywhere on the internet. Consequently, addressing the challenge posed by frequent updat...

Olles dieđut

Bibliográfalaš dieđut
Váldodahkkit: Xue Li, Huibo Zhou, Ming Zhao
Materiálatiipa: Artihkal
Giella:English
Almmustuhtton: AIMS Press 2024-02-01
Ráidu:Mathematical Biosciences and Engineering
Fáttát:
Liŋkkat:https://www.aimspress.com/article/doi/10.3934/mbe.2024183?viewType=HTML
_version_ 1827319267730653184
author Xue Li
Huibo Zhou
Ming Zhao
author_facet Xue Li
Huibo Zhou
Ming Zhao
author_sort Xue Li
collection DOAJ
description The threat posed by forged video technology has gradually grown to include individuals, society, and the nation. The technology behind fake videos is getting more advanced and modern. Fake videos are appearing everywhere on the internet. Consequently, addressing the challenge posed by frequent updates in various deepfake detection models is imperative. The substantial volume of data essential for their training adds to this urgency. For the deepfake detection problem, we suggest a cascade network based on spatial and channel reconstruction convolution (SCConv) and vision transformer. Our network model's front portion, which uses SCConv and regular convolution to detect fake videos in conjunction with vision transformer, comprises these two types of convolution. We enhance the feed-forward layer of the vision transformer, which can increase detection accuracy while lowering the model's computing burden. We processed the dataset by splitting frames and extracting faces to obtain many images of real and fake faces. Examinations conducted on the DFDC, FaceForensics++, and Celeb-DF datasets resulted in accuracies of 87.92, 99.23 and 99.98%, respectively. Finally, the video was tested for authenticity and good results were obtained, including excellent visualization results. Numerous studies also confirm the efficacy of the model presented in this study.
first_indexed 2024-04-25T00:17:56Z
format Article
id doaj.art-996fdc92cd86462db549b81654d20e1f
institution Directory Open Access Journal
issn 1551-0018
language English
last_indexed 2024-04-25T00:17:56Z
publishDate 2024-02-01
publisher AIMS Press
record_format Article
series Mathematical Biosciences and Engineering
spelling doaj.art-996fdc92cd86462db549b81654d20e1f2024-03-13T01:25:46ZengAIMS PressMathematical Biosciences and Engineering1551-00182024-02-012134142416410.3934/mbe.2024183Transformer-based cascade networks with spatial and channel reconstruction convolution for deepfake detectionXue Li0Huibo Zhou 1Ming Zhao2School of Mathematical Sciences, Harbin Normal University, Harbin 150025, ChinaSchool of Mathematical Sciences, Harbin Normal University, Harbin 150025, ChinaSchool of Mathematical Sciences, Harbin Normal University, Harbin 150025, ChinaThe threat posed by forged video technology has gradually grown to include individuals, society, and the nation. The technology behind fake videos is getting more advanced and modern. Fake videos are appearing everywhere on the internet. Consequently, addressing the challenge posed by frequent updates in various deepfake detection models is imperative. The substantial volume of data essential for their training adds to this urgency. For the deepfake detection problem, we suggest a cascade network based on spatial and channel reconstruction convolution (SCConv) and vision transformer. Our network model's front portion, which uses SCConv and regular convolution to detect fake videos in conjunction with vision transformer, comprises these two types of convolution. We enhance the feed-forward layer of the vision transformer, which can increase detection accuracy while lowering the model's computing burden. We processed the dataset by splitting frames and extracting faces to obtain many images of real and fake faces. Examinations conducted on the DFDC, FaceForensics++, and Celeb-DF datasets resulted in accuracies of 87.92, 99.23 and 99.98%, respectively. Finally, the video was tested for authenticity and good results were obtained, including excellent visualization results. Numerous studies also confirm the efficacy of the model presented in this study.https://www.aimspress.com/article/doi/10.3934/mbe.2024183?viewType=HTMLdeepfake detectionscconvtransformerredundantvisualization
spellingShingle Xue Li
Huibo Zhou
Ming Zhao
Transformer-based cascade networks with spatial and channel reconstruction convolution for deepfake detection
Mathematical Biosciences and Engineering
deepfake detection
scconv
transformer
redundant
visualization
title Transformer-based cascade networks with spatial and channel reconstruction convolution for deepfake detection
title_full Transformer-based cascade networks with spatial and channel reconstruction convolution for deepfake detection
title_fullStr Transformer-based cascade networks with spatial and channel reconstruction convolution for deepfake detection
title_full_unstemmed Transformer-based cascade networks with spatial and channel reconstruction convolution for deepfake detection
title_short Transformer-based cascade networks with spatial and channel reconstruction convolution for deepfake detection
title_sort transformer based cascade networks with spatial and channel reconstruction convolution for deepfake detection
topic deepfake detection
scconv
transformer
redundant
visualization
url https://www.aimspress.com/article/doi/10.3934/mbe.2024183?viewType=HTML
work_keys_str_mv AT xueli transformerbasedcascadenetworkswithspatialandchannelreconstructionconvolutionfordeepfakedetection
AT huibozhou transformerbasedcascadenetworkswithspatialandchannelreconstructionconvolutionfordeepfakedetection
AT mingzhao transformerbasedcascadenetworkswithspatialandchannelreconstructionconvolutionfordeepfakedetection