Phonetic Classification Using Hierarchical, Feed-forward, Spectro-temporal Patch-based Architectures
A preliminary set of experiments are described in which a biologically-inspired computer vision system (Serre, Wolf et al. 2005; Serre 2006; Serre, Oliva et al. 2006; Serre, Wolf et al. 2006) designed for visual object recognition was applied to the task of phonetic classification. During learning,...
Main Authors: | , , , , , , |
---|---|
Other Authors: | |
Language: | en_US |
Published: |
2007
|
Subjects: | |
Online Access: | http://hdl.handle.net/1721.1/35835 |
_version_ | 1826211583457492992 |
---|---|
author | Rifkin, Ryan Bouvrie, Jake Schutte, Ken Chikkerur, Sharat Kouh, Minjoon Ezzat, Tony Poggio, Tomaso |
author2 | Tomaso Poggio |
author_facet | Tomaso Poggio Rifkin, Ryan Bouvrie, Jake Schutte, Ken Chikkerur, Sharat Kouh, Minjoon Ezzat, Tony Poggio, Tomaso |
author_sort | Rifkin, Ryan |
collection | MIT |
description | A preliminary set of experiments are described in which a biologically-inspired computer vision system (Serre, Wolf et al. 2005; Serre 2006; Serre, Oliva et al. 2006; Serre, Wolf et al. 2006) designed for visual object recognition was applied to the task of phonetic classification. During learning, the systemprocessed 2-D wideband magnitude spectrograms directly as images, producing a set of 2-D spectrotemporal patch dictionaries at different spectro-temporal positions, orientations, scales, and of varying complexity. During testing, features were computed by comparing the stored patches with patches fromnovel spectrograms. Classification was performed using a regularized least squares classifier (Rifkin, Yeo et al. 2003; Rifkin, Schutte et al. 2007) trained on the features computed by the system. On a 20-class TIMIT vowel classification task, the model features achieved a best result of 58.74% error, compared to 48.57% error using state-of-the-art MFCC-based features trained using the same classifier. This suggests that hierarchical, feed-forward, spectro-temporal patch-based architectures may be useful for phoneticanalysis. |
first_indexed | 2024-09-23T15:08:16Z |
id | mit-1721.1/35835 |
institution | Massachusetts Institute of Technology |
language | en_US |
last_indexed | 2024-09-23T15:08:16Z |
publishDate | 2007 |
record_format | dspace |
spelling | mit-1721.1/358352019-04-10T09:58:53Z Phonetic Classification Using Hierarchical, Feed-forward, Spectro-temporal Patch-based Architectures Rifkin, Ryan Bouvrie, Jake Schutte, Ken Chikkerur, Sharat Kouh, Minjoon Ezzat, Tony Poggio, Tomaso Tomaso Poggio Center for Biological and Computational Learning (CBCL) phonetic classification hierarchical models regularized least-squares spectrotemporal patches A preliminary set of experiments are described in which a biologically-inspired computer vision system (Serre, Wolf et al. 2005; Serre 2006; Serre, Oliva et al. 2006; Serre, Wolf et al. 2006) designed for visual object recognition was applied to the task of phonetic classification. During learning, the systemprocessed 2-D wideband magnitude spectrograms directly as images, producing a set of 2-D spectrotemporal patch dictionaries at different spectro-temporal positions, orientations, scales, and of varying complexity. During testing, features were computed by comparing the stored patches with patches fromnovel spectrograms. Classification was performed using a regularized least squares classifier (Rifkin, Yeo et al. 2003; Rifkin, Schutte et al. 2007) trained on the features computed by the system. On a 20-class TIMIT vowel classification task, the model features achieved a best result of 58.74% error, compared to 48.57% error using state-of-the-art MFCC-based features trained using the same classifier. This suggests that hierarchical, feed-forward, spectro-temporal patch-based architectures may be useful for phoneticanalysis. 2007-02-01T18:26:47Z 2007-02-01T18:26:47Z 2007-02-01 MIT-CSAIL-TR-2007-007 CBCL-266 http://hdl.handle.net/1721.1/35835 en_US Massachusetts Institute of Technology Computer Science and Artificial Intelligence Laboratory http://hdl.handle.net/1721.1/36865 http://hdl.handle.net/1721.1/36865 16 p. 2265616 bytes 383591 bytes application/postscript application/pdf application/postscript application/pdf |
spellingShingle | phonetic classification hierarchical models regularized least-squares spectrotemporal patches Rifkin, Ryan Bouvrie, Jake Schutte, Ken Chikkerur, Sharat Kouh, Minjoon Ezzat, Tony Poggio, Tomaso Phonetic Classification Using Hierarchical, Feed-forward, Spectro-temporal Patch-based Architectures |
title | Phonetic Classification Using Hierarchical, Feed-forward, Spectro-temporal Patch-based Architectures |
title_full | Phonetic Classification Using Hierarchical, Feed-forward, Spectro-temporal Patch-based Architectures |
title_fullStr | Phonetic Classification Using Hierarchical, Feed-forward, Spectro-temporal Patch-based Architectures |
title_full_unstemmed | Phonetic Classification Using Hierarchical, Feed-forward, Spectro-temporal Patch-based Architectures |
title_short | Phonetic Classification Using Hierarchical, Feed-forward, Spectro-temporal Patch-based Architectures |
title_sort | phonetic classification using hierarchical feed forward spectro temporal patch based architectures |
topic | phonetic classification hierarchical models regularized least-squares spectrotemporal patches |
url | http://hdl.handle.net/1721.1/35835 |
work_keys_str_mv | AT rifkinryan phoneticclassificationusinghierarchicalfeedforwardspectrotemporalpatchbasedarchitectures AT bouvriejake phoneticclassificationusinghierarchicalfeedforwardspectrotemporalpatchbasedarchitectures AT schutteken phoneticclassificationusinghierarchicalfeedforwardspectrotemporalpatchbasedarchitectures AT chikkerursharat phoneticclassificationusinghierarchicalfeedforwardspectrotemporalpatchbasedarchitectures AT kouhminjoon phoneticclassificationusinghierarchicalfeedforwardspectrotemporalpatchbasedarchitectures AT ezzattony phoneticclassificationusinghierarchicalfeedforwardspectrotemporalpatchbasedarchitectures AT poggiotomaso phoneticclassificationusinghierarchicalfeedforwardspectrotemporalpatchbasedarchitectures |