An experimental study of animating-based facial image manipulation in online class environments

Abstract Recent advances in artificial intelligence technology have significantly improved facial image manipulation, which is known as Deepfake. Facial image manipulation synthesizes or replaces a region of the face in an image with that of another face. The techniques for facial image manipulation...

Full description

Bibliographic Details
Main Authors:	Jeong-Ha Park, Chae-Yun Lim, Hyuk-Yoon Kwon
Format:	Article
Language:	English
Published:	Nature Portfolio 2023-03-01
Series:	Scientific Reports
Online Access:	https://doi.org/10.1038/s41598-023-31408-y

_version_	1797859969387724800
author	Jeong-Ha Park Chae-Yun Lim Hyuk-Yoon Kwon
author_facet	Jeong-Ha Park Chae-Yun Lim Hyuk-Yoon Kwon
author_sort	Jeong-Ha Park
collection	DOAJ
description	Abstract Recent advances in artificial intelligence technology have significantly improved facial image manipulation, which is known as Deepfake. Facial image manipulation synthesizes or replaces a region of the face in an image with that of another face. The techniques for facial image manipulation are classified into four categories: (1) entire face synthesis, (2) identity swap, (3) attribute manipulation, and (4) expression swap. Out of them, we focus on expression swap because it effectively manipulates only the expression of the face in the images or videos without creating or replacing the entire face, having advantages for the real-time application. In this study, we propose an evaluation framework of the expression swap models targeting the real-time online class environments. For this, we define three kinds of scenarios according to the portion of the face in the entire image considering actual online class situations: (1) attendance check (Scenario 1), (2) presentation (Scenario 2), and (3) examination (Scenario 3). Considering the manipulation on the online class environments, the framework receives a single source image and a target video and generates the video that manipulates a face of the target video to that in the source image. To this end, we select two models that satisfy the conditions required by the framework: (1) first order model and (2) GANimation. We implement these models in the framework and evaluate their performance for the defined scenarios. Through the quantitative and qualitative evaluation, we observe distinguishing properties of the used two models. Specifically, both models show acceptable results in Scenario 1, where the face occupies a large portion of the image. However, their performances are significantly degraded in Scenarios 2 and 3, where the face occupies less portion of the image; the first order model causes relatively less loss of image quality than GANimation in the result of the quantitative evaluation. In contrast, GANimation has the advantages of representing facial expression changes compared to the first order model. Finally, we devise an architecture for applying the expression swap model to the online video conferencing application in real-time. In particular, by applying the expression swap model to widely used online meeting platforms such as Zoom, Google Meet, and Microsoft Teams, we demonstrate its feasibility for real-time online classes.
first_indexed	2024-04-09T21:38:10Z
format	Article
id	doaj.art-9c36480273314797a2b095a676f035b4
institution	Directory Open Access Journal
issn	2045-2322
language	English
last_indexed	2024-04-09T21:38:10Z
publishDate	2023-03-01
publisher	Nature Portfolio
record_format	Article
series	Scientific Reports
spelling	doaj.art-9c36480273314797a2b095a676f035b42023-03-26T11:10:15ZengNature PortfolioScientific Reports2045-23222023-03-0113111210.1038/s41598-023-31408-yAn experimental study of animating-based facial image manipulation in online class environmentsJeong-Ha Park0Chae-Yun Lim1Hyuk-Yoon Kwon2Graduate School of Data Science, Seoul National University of Science and TechnologyDepartment of Industrial Engineering, Seoul National University of Science and TechnologyDepartment of Industrial Engineering/ Graduate School of Data Science/ Research Center for Electrical and Information Science, Seoul National University of Science and TechnologyAbstract Recent advances in artificial intelligence technology have significantly improved facial image manipulation, which is known as Deepfake. Facial image manipulation synthesizes or replaces a region of the face in an image with that of another face. The techniques for facial image manipulation are classified into four categories: (1) entire face synthesis, (2) identity swap, (3) attribute manipulation, and (4) expression swap. Out of them, we focus on expression swap because it effectively manipulates only the expression of the face in the images or videos without creating or replacing the entire face, having advantages for the real-time application. In this study, we propose an evaluation framework of the expression swap models targeting the real-time online class environments. For this, we define three kinds of scenarios according to the portion of the face in the entire image considering actual online class situations: (1) attendance check (Scenario 1), (2) presentation (Scenario 2), and (3) examination (Scenario 3). Considering the manipulation on the online class environments, the framework receives a single source image and a target video and generates the video that manipulates a face of the target video to that in the source image. To this end, we select two models that satisfy the conditions required by the framework: (1) first order model and (2) GANimation. We implement these models in the framework and evaluate their performance for the defined scenarios. Through the quantitative and qualitative evaluation, we observe distinguishing properties of the used two models. Specifically, both models show acceptable results in Scenario 1, where the face occupies a large portion of the image. However, their performances are significantly degraded in Scenarios 2 and 3, where the face occupies less portion of the image; the first order model causes relatively less loss of image quality than GANimation in the result of the quantitative evaluation. In contrast, GANimation has the advantages of representing facial expression changes compared to the first order model. Finally, we devise an architecture for applying the expression swap model to the online video conferencing application in real-time. In particular, by applying the expression swap model to widely used online meeting platforms such as Zoom, Google Meet, and Microsoft Teams, we demonstrate its feasibility for real-time online classes.https://doi.org/10.1038/s41598-023-31408-y
spellingShingle	Jeong-Ha Park Chae-Yun Lim Hyuk-Yoon Kwon An experimental study of animating-based facial image manipulation in online class environments Scientific Reports
title	An experimental study of animating-based facial image manipulation in online class environments
title_full	An experimental study of animating-based facial image manipulation in online class environments
title_fullStr	An experimental study of animating-based facial image manipulation in online class environments
title_full_unstemmed	An experimental study of animating-based facial image manipulation in online class environments
title_short	An experimental study of animating-based facial image manipulation in online class environments
title_sort	experimental study of animating based facial image manipulation in online class environments
url	https://doi.org/10.1038/s41598-023-31408-y
work_keys_str_mv	AT jeonghapark anexperimentalstudyofanimatingbasedfacialimagemanipulationinonlineclassenvironments AT chaeyunlim anexperimentalstudyofanimatingbasedfacialimagemanipulationinonlineclassenvironments AT hyukyoonkwon anexperimentalstudyofanimatingbasedfacialimagemanipulationinonlineclassenvironments AT jeonghapark experimentalstudyofanimatingbasedfacialimagemanipulationinonlineclassenvironments AT chaeyunlim experimentalstudyofanimatingbasedfacialimagemanipulationinonlineclassenvironments AT hyukyoonkwon experimentalstudyofanimatingbasedfacialimagemanipulationinonlineclassenvironments

An experimental study of animating-based facial image manipulation in online class environments

Similar Items