The Same Name Is Not Always the Same: Correlating and Tracing Forgery Methods across Various Deepfake Datasets

Deepfakes are becoming increasingly ubiquitous, particularly in facial manipulation. Numerous researchers and companies have released multiple datasets of face deepfakes labeled to indicate different methods of forgery. However, naming these labels is often arbitrary and inconsistent, leading to the...

Full description

Bibliographic Details
Main Authors:	Yi Sun, Jun Zheng, Lingjuan Lyn, Hanyu Zhao, Jiaxing Li, Yunteng Tan, Xinyu Liu, Yuanzhang Li
Format:	Article
Language:	English
Published:	MDPI AG 2023-05-01
Series:	Electronics
Subjects:	deepfake datasets correlation traceability clustering Calinski Harabasz
Online Access:	https://www.mdpi.com/2079-9292/12/11/2353

_version_	1797597714067750912
author	Yi Sun Jun Zheng Lingjuan Lyn Hanyu Zhao Jiaxing Li Yunteng Tan Xinyu Liu Yuanzhang Li
author_facet	Yi Sun Jun Zheng Lingjuan Lyn Hanyu Zhao Jiaxing Li Yunteng Tan Xinyu Liu Yuanzhang Li
author_sort	Yi Sun
collection	DOAJ
description	Deepfakes are becoming increasingly ubiquitous, particularly in facial manipulation. Numerous researchers and companies have released multiple datasets of face deepfakes labeled to indicate different methods of forgery. However, naming these labels is often arbitrary and inconsistent, leading to the fact that most researchers now choose to use only one of the datasets for research work. However, researchers must use these datasets in practical applications and conduct traceability research. In this study, we employ some models to extract forgery features from various deepfake datasets and utilize the K-means clustering method to identify datasets with similar feature values. We analyze the feature values using the Calinski Harabasz Index method. Our findings reveal that datasets with the same or similar labels in different deepfake datasets exhibit different forgery features. We proposed the KCE system to solve this problem, which combines multiple deepfake datasets according to feature similarity. We analyzed four groups of test datasets and found that the model trained based on KCE combined data faced unknown data types, and Calinski Harabasz scored 42.3% higher than combined by forged names. Furthermore, it is 2.5% higher than the model using all data, although the latter has more training data. It shows that this method improves the generalization ability of the model. This paper introduces a fresh perspective for effectively evaluating and utilizing diverse deepfake datasets and conducting deepfake traceability research.
first_indexed	2024-03-11T03:09:23Z
format	Article
id	doaj.art-3bcce54c660245e79c772d24b180b332
institution	Directory Open Access Journal
issn	2079-9292
language	English
last_indexed	2024-03-11T03:09:23Z
publishDate	2023-05-01
publisher	MDPI AG
record_format	Article
series	Electronics
spelling	doaj.art-3bcce54c660245e79c772d24b180b3322023-11-18T07:43:49ZengMDPI AGElectronics2079-92922023-05-011211235310.3390/electronics12112353The Same Name Is Not Always the Same: Correlating and Tracing Forgery Methods across Various Deepfake DatasetsYi Sun0Jun Zheng1Lingjuan Lyn2Hanyu Zhao3Jiaxing Li4Yunteng Tan5Xinyu Liu6Yuanzhang Li7Beijing Institute of Technology, No. 5, South Street, Zhongguancun, Haidian District, Beijing 100811, ChinaBeijing Institute of Technology, No. 5, South Street, Zhongguancun, Haidian District, Beijing 100811, ChinaSony AI Inc., 1-7-1 Konan Minato-ku, Tokyo 108-0075, JapanBeijing Institute of Technology, No. 5, South Street, Zhongguancun, Haidian District, Beijing 100811, ChinaBeijing Institute of Technology, No. 5, South Street, Zhongguancun, Haidian District, Beijing 100811, ChinaBeijing Institute of Technology, No. 5, South Street, Zhongguancun, Haidian District, Beijing 100811, ChinaBeijing Institute of Technology, No. 5, South Street, Zhongguancun, Haidian District, Beijing 100811, ChinaBeijing Institute of Technology, No. 5, South Street, Zhongguancun, Haidian District, Beijing 100811, ChinaDeepfakes are becoming increasingly ubiquitous, particularly in facial manipulation. Numerous researchers and companies have released multiple datasets of face deepfakes labeled to indicate different methods of forgery. However, naming these labels is often arbitrary and inconsistent, leading to the fact that most researchers now choose to use only one of the datasets for research work. However, researchers must use these datasets in practical applications and conduct traceability research. In this study, we employ some models to extract forgery features from various deepfake datasets and utilize the K-means clustering method to identify datasets with similar feature values. We analyze the feature values using the Calinski Harabasz Index method. Our findings reveal that datasets with the same or similar labels in different deepfake datasets exhibit different forgery features. We proposed the KCE system to solve this problem, which combines multiple deepfake datasets according to feature similarity. We analyzed four groups of test datasets and found that the model trained based on KCE combined data faced unknown data types, and Calinski Harabasz scored 42.3% higher than combined by forged names. Furthermore, it is 2.5% higher than the model using all data, although the latter has more training data. It shows that this method improves the generalization ability of the model. This paper introduces a fresh perspective for effectively evaluating and utilizing diverse deepfake datasets and conducting deepfake traceability research.https://www.mdpi.com/2079-9292/12/11/2353deepfakedatasetscorrelationtraceabilityclusteringCalinski Harabasz
spellingShingle	Yi Sun Jun Zheng Lingjuan Lyn Hanyu Zhao Jiaxing Li Yunteng Tan Xinyu Liu Yuanzhang Li The Same Name Is Not Always the Same: Correlating and Tracing Forgery Methods across Various Deepfake Datasets Electronics deepfake datasets correlation traceability clustering Calinski Harabasz
title	The Same Name Is Not Always the Same: Correlating and Tracing Forgery Methods across Various Deepfake Datasets
title_full	The Same Name Is Not Always the Same: Correlating and Tracing Forgery Methods across Various Deepfake Datasets
title_fullStr	The Same Name Is Not Always the Same: Correlating and Tracing Forgery Methods across Various Deepfake Datasets
title_full_unstemmed	The Same Name Is Not Always the Same: Correlating and Tracing Forgery Methods across Various Deepfake Datasets
title_short	The Same Name Is Not Always the Same: Correlating and Tracing Forgery Methods across Various Deepfake Datasets
title_sort	same name is not always the same correlating and tracing forgery methods across various deepfake datasets
topic	deepfake datasets correlation traceability clustering Calinski Harabasz
url	https://www.mdpi.com/2079-9292/12/11/2353
work_keys_str_mv	AT yisun thesamenameisnotalwaysthesamecorrelatingandtracingforgerymethodsacrossvariousdeepfakedatasets AT junzheng thesamenameisnotalwaysthesamecorrelatingandtracingforgerymethodsacrossvariousdeepfakedatasets AT lingjuanlyn thesamenameisnotalwaysthesamecorrelatingandtracingforgerymethodsacrossvariousdeepfakedatasets AT hanyuzhao thesamenameisnotalwaysthesamecorrelatingandtracingforgerymethodsacrossvariousdeepfakedatasets AT jiaxingli thesamenameisnotalwaysthesamecorrelatingandtracingforgerymethodsacrossvariousdeepfakedatasets AT yuntengtan thesamenameisnotalwaysthesamecorrelatingandtracingforgerymethodsacrossvariousdeepfakedatasets AT xinyuliu thesamenameisnotalwaysthesamecorrelatingandtracingforgerymethodsacrossvariousdeepfakedatasets AT yuanzhangli thesamenameisnotalwaysthesamecorrelatingandtracingforgerymethodsacrossvariousdeepfakedatasets AT yisun samenameisnotalwaysthesamecorrelatingandtracingforgerymethodsacrossvariousdeepfakedatasets AT junzheng samenameisnotalwaysthesamecorrelatingandtracingforgerymethodsacrossvariousdeepfakedatasets AT lingjuanlyn samenameisnotalwaysthesamecorrelatingandtracingforgerymethodsacrossvariousdeepfakedatasets AT hanyuzhao samenameisnotalwaysthesamecorrelatingandtracingforgerymethodsacrossvariousdeepfakedatasets AT jiaxingli samenameisnotalwaysthesamecorrelatingandtracingforgerymethodsacrossvariousdeepfakedatasets AT yuntengtan samenameisnotalwaysthesamecorrelatingandtracingforgerymethodsacrossvariousdeepfakedatasets AT xinyuliu samenameisnotalwaysthesamecorrelatingandtracingforgerymethodsacrossvariousdeepfakedatasets AT yuanzhangli samenameisnotalwaysthesamecorrelatingandtracingforgerymethodsacrossvariousdeepfakedatasets

The Same Name Is Not Always the Same: Correlating and Tracing Forgery Methods across Various Deepfake Datasets

Similar Items