Learning visual concepts with fewer human annotations

<p>This thesis explores the use of modern deep neural networks to learn visual concepts with fewer human annotations on data. While data is abundant and increasingly easier to collect, most deep learning methods need extensive human labelling to be trained, which is often costly and may requir...

Full description

Bibliographic Details
Main Author: Ehrhardt, S
Other Authors: Vedaldi, A
Format: Thesis
Language:English
Published: 2020
Subjects:
Description
Summary:<p>This thesis explores the use of modern deep neural networks to learn visual concepts with fewer human annotations on data. While data is abundant and increasingly easier to collect, most deep learning methods need extensive human labelling to be trained, which is often costly and may require expert-level knowledge. In this thesis we explore alternatives to human labelling by considering synthetic data, as well as partially and completely unlabelled data. We will study these alternatives within two visual concepts related to human-level intelligence: intuitive physics and object recognition. For the former, we will focus on synthetic and unlabelled real sequences while for the latter we will focus on collections of images of natural categories that are scarcely annotated.</p> <p>The first part of this thesis explores the ability of recent convolutional neural networks to learn physics through long term prediction. We first show that from limited visual data it is possible to make accurate future predictions of systems obeying the Newton’s laws of physics. We propose to work on more complex synthetic and real data than standard benchmarks. For real data, we also develop an automatic self-supervised labelling method. Second, we propose a meta-learning pipeline that is able to infer environment properties from past experiences without any need for annotations. In particular, this understands causal concepts such as solidity. To conclude our work on physics we explore physical plausibility. Inspired from recent research on neural physics, we propose a generative architecture with a structured latent space that works on unlabelled data. Our model is not only quantitatively superior to prior art but has also a deeper understanding of the relations between objects in a scene.</p> <p>The second part of this thesis is dedicated to object recognition. We consider the realistic scenario where data annotation is incomplete i.e lots of images are unlabelled. We make contributions in three different scenarios with decreasing levels of annotations. For each of them, we strongly advocate a systematic use of self-supervision as a pre-training step as well as self-training in the last step of training. We first introduce a novel semi-supervised learning method which performs optimisation alternatively on labelled and unlabelled data. Albeit simple, our method can work with as few as ten labelled instances per class. Next, we develop a clustering method that is initialised from supervision on other datasets. Clustering is then performed thanks to a new way of extracting pairwise similarity between two images: by comparing the rank of the most activated neurons of their network embedding. Finally, we extend the use of pairwise similarity to the nearest neighbours in the network embedding space and propose a clustering method that works without any supervised signal on different types of data.</p>