Phonetic Classification Using Hierarchical, Feed-forward, Spectro-temporal Patch-based Architectures

A preliminary set of experiments are described in which a biologically-inspired computer vision system (Serre, Wolf et al. 2005; Serre 2006; Serre, Oliva et al. 2006; Serre, Wolf et al. 2006) designed for visual object recognition was applied to the task of phonetic classification. During learning,...

Full description

Bibliographic Details
Main Authors: Rifkin, Ryan, Bouvrie, Jake, Schutte, Ken, Chikkerur, Sharat, Kouh, Minjoon, Ezzat, Tony, Poggio, Tomaso
Other Authors: Tomaso Poggio
Language:en_US
Published: 2007
Subjects:
Online Access:http://hdl.handle.net/1721.1/35835
_version_ 1826211583457492992
author Rifkin, Ryan
Bouvrie, Jake
Schutte, Ken
Chikkerur, Sharat
Kouh, Minjoon
Ezzat, Tony
Poggio, Tomaso
author2 Tomaso Poggio
author_facet Tomaso Poggio
Rifkin, Ryan
Bouvrie, Jake
Schutte, Ken
Chikkerur, Sharat
Kouh, Minjoon
Ezzat, Tony
Poggio, Tomaso
author_sort Rifkin, Ryan
collection MIT
description A preliminary set of experiments are described in which a biologically-inspired computer vision system (Serre, Wolf et al. 2005; Serre 2006; Serre, Oliva et al. 2006; Serre, Wolf et al. 2006) designed for visual object recognition was applied to the task of phonetic classification. During learning, the systemprocessed 2-D wideband magnitude spectrograms directly as images, producing a set of 2-D spectrotemporal patch dictionaries at different spectro-temporal positions, orientations, scales, and of varying complexity. During testing, features were computed by comparing the stored patches with patches fromnovel spectrograms. Classification was performed using a regularized least squares classifier (Rifkin, Yeo et al. 2003; Rifkin, Schutte et al. 2007) trained on the features computed by the system. On a 20-class TIMIT vowel classification task, the model features achieved a best result of 58.74% error, compared to 48.57% error using state-of-the-art MFCC-based features trained using the same classifier. This suggests that hierarchical, feed-forward, spectro-temporal patch-based architectures may be useful for phoneticanalysis.
first_indexed 2024-09-23T15:08:16Z
id mit-1721.1/35835
institution Massachusetts Institute of Technology
language en_US
last_indexed 2024-09-23T15:08:16Z
publishDate 2007
record_format dspace
spelling mit-1721.1/358352019-04-10T09:58:53Z Phonetic Classification Using Hierarchical, Feed-forward, Spectro-temporal Patch-based Architectures Rifkin, Ryan Bouvrie, Jake Schutte, Ken Chikkerur, Sharat Kouh, Minjoon Ezzat, Tony Poggio, Tomaso Tomaso Poggio Center for Biological and Computational Learning (CBCL) phonetic classification hierarchical models regularized least-squares spectrotemporal patches A preliminary set of experiments are described in which a biologically-inspired computer vision system (Serre, Wolf et al. 2005; Serre 2006; Serre, Oliva et al. 2006; Serre, Wolf et al. 2006) designed for visual object recognition was applied to the task of phonetic classification. During learning, the systemprocessed 2-D wideband magnitude spectrograms directly as images, producing a set of 2-D spectrotemporal patch dictionaries at different spectro-temporal positions, orientations, scales, and of varying complexity. During testing, features were computed by comparing the stored patches with patches fromnovel spectrograms. Classification was performed using a regularized least squares classifier (Rifkin, Yeo et al. 2003; Rifkin, Schutte et al. 2007) trained on the features computed by the system. On a 20-class TIMIT vowel classification task, the model features achieved a best result of 58.74% error, compared to 48.57% error using state-of-the-art MFCC-based features trained using the same classifier. This suggests that hierarchical, feed-forward, spectro-temporal patch-based architectures may be useful for phoneticanalysis. 2007-02-01T18:26:47Z 2007-02-01T18:26:47Z 2007-02-01 MIT-CSAIL-TR-2007-007 CBCL-266 http://hdl.handle.net/1721.1/35835 en_US Massachusetts Institute of Technology Computer Science and Artificial Intelligence Laboratory http://hdl.handle.net/1721.1/36865 http://hdl.handle.net/1721.1/36865 16 p. 2265616 bytes 383591 bytes application/postscript application/pdf application/postscript application/pdf
spellingShingle phonetic classification
hierarchical models
regularized least-squares
spectrotemporal patches
Rifkin, Ryan
Bouvrie, Jake
Schutte, Ken
Chikkerur, Sharat
Kouh, Minjoon
Ezzat, Tony
Poggio, Tomaso
Phonetic Classification Using Hierarchical, Feed-forward, Spectro-temporal Patch-based Architectures
title Phonetic Classification Using Hierarchical, Feed-forward, Spectro-temporal Patch-based Architectures
title_full Phonetic Classification Using Hierarchical, Feed-forward, Spectro-temporal Patch-based Architectures
title_fullStr Phonetic Classification Using Hierarchical, Feed-forward, Spectro-temporal Patch-based Architectures
title_full_unstemmed Phonetic Classification Using Hierarchical, Feed-forward, Spectro-temporal Patch-based Architectures
title_short Phonetic Classification Using Hierarchical, Feed-forward, Spectro-temporal Patch-based Architectures
title_sort phonetic classification using hierarchical feed forward spectro temporal patch based architectures
topic phonetic classification
hierarchical models
regularized least-squares
spectrotemporal patches
url http://hdl.handle.net/1721.1/35835
work_keys_str_mv AT rifkinryan phoneticclassificationusinghierarchicalfeedforwardspectrotemporalpatchbasedarchitectures
AT bouvriejake phoneticclassificationusinghierarchicalfeedforwardspectrotemporalpatchbasedarchitectures
AT schutteken phoneticclassificationusinghierarchicalfeedforwardspectrotemporalpatchbasedarchitectures
AT chikkerursharat phoneticclassificationusinghierarchicalfeedforwardspectrotemporalpatchbasedarchitectures
AT kouhminjoon phoneticclassificationusinghierarchicalfeedforwardspectrotemporalpatchbasedarchitectures
AT ezzattony phoneticclassificationusinghierarchicalfeedforwardspectrotemporalpatchbasedarchitectures
AT poggiotomaso phoneticclassificationusinghierarchicalfeedforwardspectrotemporalpatchbasedarchitectures