Detecting Deepfakes with Human Help to Help Humans Detect Deepfakes

Fake or manipulated video media (“deepfakes”) pose a clear threat to the integrity of online spaces that rely on video, from social media, to news media, to video conferencing platforms. To the human eye, these computer-generated fake videos are increasingly indistinguishable from genuine videos [45...

Full description

Bibliographic Details
Main Author: Fosco, Camilo L.
Other Authors: Oliva, Aude
Format: Thesis
Published: Massachusetts Institute of Technology 2024
Online Access:https://hdl.handle.net/1721.1/154206
Description
Summary:Fake or manipulated video media (“deepfakes”) pose a clear threat to the integrity of online spaces that rely on video, from social media, to news media, to video conferencing platforms. To the human eye, these computer-generated fake videos are increasingly indistinguishable from genuine videos [45, 20]. Computer vision models, however, can achieve impressive success at deepfake detection. Thus, the future of deepfake detection for humans may become a problem of AI-assisted decision-making, where humans must incorporate the output of a machine learning model into their judgment process. Previous work on AI-assisted decision making indicates that the design and format of a decision aid strongly determines whether it will impact human behavior [66, 60, 14, 26, 4]. In the domain of deepfake signaling, traditional methods of flagging manipulated video have relied on text-based prompts. However, recent studies indicate relatively low rates of compliance when the model’s prediction is conveyed using text: in one study, participants shown model predictions via text updated their response only 24% of the time, and switched their response (from ”real” to ”fake”, or vice versa) only 12% of the time [20]. More innovative approaches have been proposed, such as showing users a heatmap of regions predicted to be manipulated [8], but this did not increase acceptance rates relative to textbased indicators. Overall, to make an impact, the development of deepfake detection models must proceed alongside the exploration of innovative and effective ways to alert human users to a video’s authenticity. In this thesis, we present an analysis of current solutions to this issue, and examine methodologies for both improving automated deepfake detection while generating better indicators of doctored media to help humans spot deepfakes. To work towards this goal, we first collect human annotations that highlight parts of videos that humans find unnatural or indicative of doctoring. We use this data as additional supervision to train an artifact attention module that generates ”heat volumes” highlighting areas of a deepfake video that evidence its fake nature. This module is in turn leveraged to both improve classifier performance as well as generate our novel visual indicators (described below). This construction is integral to our exploration of how human annotations can augment attention-based deepfake detection techniques, and we research for the first time the feasibility of exacerbating artifacts in deepfake videos to facilitate early detection from a human perspective. As the quality of doctored videos becomes more impressive, too many generated fakes are indistinguishable from a genuine video to the human eye. We believe that it is crucial for humans to be able to detect, at first glance, if a video is doctored or not. This limits the spread of misinformation by stopping it at the source. We achieve this by proposing a new visual indicator of doctoring that we call deepfake caricatures: a targeted distortion that reveals the fake nature of deepfakes, while rendering real videos virtually untouched (see Figure 1-1). This targeted distortion takes the form of an amplification of unnatural areas in a fake video, dubbed artifacts in this manuscript. This thesis introduces a novel framework that provides strong classical deepfake detection, but crucially also creates this compelling visual indicator for fake videos by amplifying artifacts, making them more detectable to human observers. Because humans tend to be highly sensitive to distortions in faces, we hypothesize that focusing our visual indicator on amplifying artifacts is likely to yield a highly detectable and compelling visual indicator. We introduce a new model, “CariNet”, that identifies key artifacts in deepfakes using our novel Artifact Attention Module. This module leverages both human supervision and machine supervision to learn what distortions are most relevant to humans. CariNet then generates deepfake caricatures using a Caricature Generation Module that magnifies unnatural areas in fake videos, making them more visible to human users. We make three primary contributions: • We develop two annotation tools to (A) filter deepfakes according to their ease of detection, and (B) collect human annotations of fake and unnatural areas (artifacts) in doctored videos. This process yields a dataset of over 11K annotations across 1000 videos. • We develop a framework for identifying video artifacts that are relevant to humans. Allowing our deepfake detector to leverage this information boosts its accuracy by more than 5%, showing that human supervision can improve deepfake detection models. • We generate deepfake caricatures, and show in a user study that they increase human deepfake detection accuracy by up to 40% compared to non-signalled deepfakes.