The Same Name Is Not Always the Same: Correlating and Tracing Forgery Methods across Various Deepfake Datasets

Deepfakes are becoming increasingly ubiquitous, particularly in facial manipulation. Numerous researchers and companies have released multiple datasets of face deepfakes labeled to indicate different methods of forgery. However, naming these labels is often arbitrary and inconsistent, leading to the...

Full description

Bibliographic Details
Main Authors: Yi Sun, Jun Zheng, Lingjuan Lyn, Hanyu Zhao, Jiaxing Li, Yunteng Tan, Xinyu Liu, Yuanzhang Li
Format: Article
Language:English
Published: MDPI AG 2023-05-01
Series:Electronics
Subjects:
Online Access:https://www.mdpi.com/2079-9292/12/11/2353
_version_ 1797597714067750912
author Yi Sun
Jun Zheng
Lingjuan Lyn
Hanyu Zhao
Jiaxing Li
Yunteng Tan
Xinyu Liu
Yuanzhang Li
author_facet Yi Sun
Jun Zheng
Lingjuan Lyn
Hanyu Zhao
Jiaxing Li
Yunteng Tan
Xinyu Liu
Yuanzhang Li
author_sort Yi Sun
collection DOAJ
description Deepfakes are becoming increasingly ubiquitous, particularly in facial manipulation. Numerous researchers and companies have released multiple datasets of face deepfakes labeled to indicate different methods of forgery. However, naming these labels is often arbitrary and inconsistent, leading to the fact that most researchers now choose to use only one of the datasets for research work. However, researchers must use these datasets in practical applications and conduct traceability research. In this study, we employ some models to extract forgery features from various deepfake datasets and utilize the K-means clustering method to identify datasets with similar feature values. We analyze the feature values using the Calinski Harabasz Index method. Our findings reveal that datasets with the same or similar labels in different deepfake datasets exhibit different forgery features. We proposed the KCE system to solve this problem, which combines multiple deepfake datasets according to feature similarity. We analyzed four groups of test datasets and found that the model trained based on KCE combined data faced unknown data types, and Calinski Harabasz scored 42.3% higher than combined by forged names. Furthermore, it is 2.5% higher than the model using all data, although the latter has more training data. It shows that this method improves the generalization ability of the model. This paper introduces a fresh perspective for effectively evaluating and utilizing diverse deepfake datasets and conducting deepfake traceability research.
first_indexed 2024-03-11T03:09:23Z
format Article
id doaj.art-3bcce54c660245e79c772d24b180b332
institution Directory Open Access Journal
issn 2079-9292
language English
last_indexed 2024-03-11T03:09:23Z
publishDate 2023-05-01
publisher MDPI AG
record_format Article
series Electronics
spelling doaj.art-3bcce54c660245e79c772d24b180b3322023-11-18T07:43:49ZengMDPI AGElectronics2079-92922023-05-011211235310.3390/electronics12112353The Same Name Is Not Always the Same: Correlating and Tracing Forgery Methods across Various Deepfake DatasetsYi Sun0Jun Zheng1Lingjuan Lyn2Hanyu Zhao3Jiaxing Li4Yunteng Tan5Xinyu Liu6Yuanzhang Li7Beijing Institute of Technology, No. 5, South Street, Zhongguancun, Haidian District, Beijing 100811, ChinaBeijing Institute of Technology, No. 5, South Street, Zhongguancun, Haidian District, Beijing 100811, ChinaSony AI Inc., 1-7-1 Konan Minato-ku, Tokyo 108-0075, JapanBeijing Institute of Technology, No. 5, South Street, Zhongguancun, Haidian District, Beijing 100811, ChinaBeijing Institute of Technology, No. 5, South Street, Zhongguancun, Haidian District, Beijing 100811, ChinaBeijing Institute of Technology, No. 5, South Street, Zhongguancun, Haidian District, Beijing 100811, ChinaBeijing Institute of Technology, No. 5, South Street, Zhongguancun, Haidian District, Beijing 100811, ChinaBeijing Institute of Technology, No. 5, South Street, Zhongguancun, Haidian District, Beijing 100811, ChinaDeepfakes are becoming increasingly ubiquitous, particularly in facial manipulation. Numerous researchers and companies have released multiple datasets of face deepfakes labeled to indicate different methods of forgery. However, naming these labels is often arbitrary and inconsistent, leading to the fact that most researchers now choose to use only one of the datasets for research work. However, researchers must use these datasets in practical applications and conduct traceability research. In this study, we employ some models to extract forgery features from various deepfake datasets and utilize the K-means clustering method to identify datasets with similar feature values. We analyze the feature values using the Calinski Harabasz Index method. Our findings reveal that datasets with the same or similar labels in different deepfake datasets exhibit different forgery features. We proposed the KCE system to solve this problem, which combines multiple deepfake datasets according to feature similarity. We analyzed four groups of test datasets and found that the model trained based on KCE combined data faced unknown data types, and Calinski Harabasz scored 42.3% higher than combined by forged names. Furthermore, it is 2.5% higher than the model using all data, although the latter has more training data. It shows that this method improves the generalization ability of the model. This paper introduces a fresh perspective for effectively evaluating and utilizing diverse deepfake datasets and conducting deepfake traceability research.https://www.mdpi.com/2079-9292/12/11/2353deepfakedatasetscorrelationtraceabilityclusteringCalinski Harabasz
spellingShingle Yi Sun
Jun Zheng
Lingjuan Lyn
Hanyu Zhao
Jiaxing Li
Yunteng Tan
Xinyu Liu
Yuanzhang Li
The Same Name Is Not Always the Same: Correlating and Tracing Forgery Methods across Various Deepfake Datasets
Electronics
deepfake
datasets
correlation
traceability
clustering
Calinski Harabasz
title The Same Name Is Not Always the Same: Correlating and Tracing Forgery Methods across Various Deepfake Datasets
title_full The Same Name Is Not Always the Same: Correlating and Tracing Forgery Methods across Various Deepfake Datasets
title_fullStr The Same Name Is Not Always the Same: Correlating and Tracing Forgery Methods across Various Deepfake Datasets
title_full_unstemmed The Same Name Is Not Always the Same: Correlating and Tracing Forgery Methods across Various Deepfake Datasets
title_short The Same Name Is Not Always the Same: Correlating and Tracing Forgery Methods across Various Deepfake Datasets
title_sort same name is not always the same correlating and tracing forgery methods across various deepfake datasets
topic deepfake
datasets
correlation
traceability
clustering
Calinski Harabasz
url https://www.mdpi.com/2079-9292/12/11/2353
work_keys_str_mv AT yisun thesamenameisnotalwaysthesamecorrelatingandtracingforgerymethodsacrossvariousdeepfakedatasets
AT junzheng thesamenameisnotalwaysthesamecorrelatingandtracingforgerymethodsacrossvariousdeepfakedatasets
AT lingjuanlyn thesamenameisnotalwaysthesamecorrelatingandtracingforgerymethodsacrossvariousdeepfakedatasets
AT hanyuzhao thesamenameisnotalwaysthesamecorrelatingandtracingforgerymethodsacrossvariousdeepfakedatasets
AT jiaxingli thesamenameisnotalwaysthesamecorrelatingandtracingforgerymethodsacrossvariousdeepfakedatasets
AT yuntengtan thesamenameisnotalwaysthesamecorrelatingandtracingforgerymethodsacrossvariousdeepfakedatasets
AT xinyuliu thesamenameisnotalwaysthesamecorrelatingandtracingforgerymethodsacrossvariousdeepfakedatasets
AT yuanzhangli thesamenameisnotalwaysthesamecorrelatingandtracingforgerymethodsacrossvariousdeepfakedatasets
AT yisun samenameisnotalwaysthesamecorrelatingandtracingforgerymethodsacrossvariousdeepfakedatasets
AT junzheng samenameisnotalwaysthesamecorrelatingandtracingforgerymethodsacrossvariousdeepfakedatasets
AT lingjuanlyn samenameisnotalwaysthesamecorrelatingandtracingforgerymethodsacrossvariousdeepfakedatasets
AT hanyuzhao samenameisnotalwaysthesamecorrelatingandtracingforgerymethodsacrossvariousdeepfakedatasets
AT jiaxingli samenameisnotalwaysthesamecorrelatingandtracingforgerymethodsacrossvariousdeepfakedatasets
AT yuntengtan samenameisnotalwaysthesamecorrelatingandtracingforgerymethodsacrossvariousdeepfakedatasets
AT xinyuliu samenameisnotalwaysthesamecorrelatingandtracingforgerymethodsacrossvariousdeepfakedatasets
AT yuanzhangli samenameisnotalwaysthesamecorrelatingandtracingforgerymethodsacrossvariousdeepfakedatasets