Unsupervised learning of clutter-resistant visual representations from natural videos

Populations of neurons in inferotemporal cortex (IT) maintain an explicit code for object identity that also tolerates transformations of object appearance e.g., position, scale, viewing angle [1, 2, 3]. Though the learning rules are not known, recent results [4, 5, 6] suggest the operation of an un...

Full description

Bibliographic Details
Main Authors: Liao, Qianli, Leibo, Joel Z, Poggio, Tomaso
Format: Technical Report
Language:en_US
Published: Center for Brains, Minds and Machines (CBMM), arXiv 2015
Subjects:
Online Access:http://hdl.handle.net/1721.1/100187
_version_ 1826208915405144064
author Liao, Qianli
Leibo, Joel Z
Poggio, Tomaso
author_facet Liao, Qianli
Leibo, Joel Z
Poggio, Tomaso
author_sort Liao, Qianli
collection MIT
description Populations of neurons in inferotemporal cortex (IT) maintain an explicit code for object identity that also tolerates transformations of object appearance e.g., position, scale, viewing angle [1, 2, 3]. Though the learning rules are not known, recent results [4, 5, 6] suggest the operation of an unsupervised temporal-association-based method e.g., Foldiak’s trace rule [7]. Such methods exploit the temporal continuity of the visual world by assuming that visual experience over short timescales will tend to have invariant identity content. Thus, by associating representations of frames from nearby times, a representation that tolerates whatever transformations occurred in the video may be achieved. Many previous studies verified that such rules can work in simple situations without background clutter, but the presence of visual clutter has remained problematic for this approach. Here we show that temporal association based on large class-specific filters (templates) avoids the problem of clutter. Our system learns in an unsupervised way from natural videos gathered from the internet, and is able to perform a difficult unconstrained face recognition task on natural images (Labeled Faces in the Wild [8]).
first_indexed 2024-09-23T14:14:38Z
format Technical Report
id mit-1721.1/100187
institution Massachusetts Institute of Technology
language en_US
last_indexed 2024-09-23T14:14:38Z
publishDate 2015
publisher Center for Brains, Minds and Machines (CBMM), arXiv
record_format dspace
spelling mit-1721.1/1001872019-04-10T19:04:58Z Unsupervised learning of clutter-resistant visual representations from natural videos Liao, Qianli Leibo, Joel Z Poggio, Tomaso Object Recognition Computer vision Machine Learning Artificial Intelligence Populations of neurons in inferotemporal cortex (IT) maintain an explicit code for object identity that also tolerates transformations of object appearance e.g., position, scale, viewing angle [1, 2, 3]. Though the learning rules are not known, recent results [4, 5, 6] suggest the operation of an unsupervised temporal-association-based method e.g., Foldiak’s trace rule [7]. Such methods exploit the temporal continuity of the visual world by assuming that visual experience over short timescales will tend to have invariant identity content. Thus, by associating representations of frames from nearby times, a representation that tolerates whatever transformations occurred in the video may be achieved. Many previous studies verified that such rules can work in simple situations without background clutter, but the presence of visual clutter has remained problematic for this approach. Here we show that temporal association based on large class-specific filters (templates) avoids the problem of clutter. Our system learns in an unsupervised way from natural videos gathered from the internet, and is able to perform a difficult unconstrained face recognition task on natural images (Labeled Faces in the Wild [8]). This work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF-1231216. 2015-12-10T23:55:38Z 2015-12-10T23:55:38Z 2015-04-27 Technical Report Working Paper Other http://hdl.handle.net/1721.1/100187 arXiv:1409.3879v2 en_US CBMM Memo Series;023 Attribution-NonCommercial 3.0 United States http://creativecommons.org/licenses/by-nc/3.0/us/ application/pdf Center for Brains, Minds and Machines (CBMM), arXiv
spellingShingle Object Recognition
Computer vision
Machine Learning
Artificial Intelligence
Liao, Qianli
Leibo, Joel Z
Poggio, Tomaso
Unsupervised learning of clutter-resistant visual representations from natural videos
title Unsupervised learning of clutter-resistant visual representations from natural videos
title_full Unsupervised learning of clutter-resistant visual representations from natural videos
title_fullStr Unsupervised learning of clutter-resistant visual representations from natural videos
title_full_unstemmed Unsupervised learning of clutter-resistant visual representations from natural videos
title_short Unsupervised learning of clutter-resistant visual representations from natural videos
title_sort unsupervised learning of clutter resistant visual representations from natural videos
topic Object Recognition
Computer vision
Machine Learning
Artificial Intelligence
url http://hdl.handle.net/1721.1/100187
work_keys_str_mv AT liaoqianli unsupervisedlearningofclutterresistantvisualrepresentationsfromnaturalvideos
AT leibojoelz unsupervisedlearningofclutterresistantvisualrepresentationsfromnaturalvideos
AT poggiotomaso unsupervisedlearningofclutterresistantvisualrepresentationsfromnaturalvideos