Phonetic Classification Using Hierarchical, Feed-forward, Spectro-temporal Patch-based Architectures

A preliminary set of experiments are described in which a biologically-inspired computer vision system (Serre, Wolf et al. 2005; Serre 2006; Serre, Oliva et al. 2006; Serre, Wolf et al. 2006) designed for visual object recognition was applied to the task of phonetic classification. During learning,...

Full description

Bibliographic Details
Main Authors:	Rifkin, Ryan, Bouvrie, Jake, Schutte, Ken, Chikkerur, Sharat, Kouh, Minjoon, Ezzat, Tony, Poggio, Tomaso
Other Authors:	Tomaso Poggio
Language:	en_US
Published:	2007
Subjects:	phonetic classification hierarchical models regularized least-squares spectrotemporal patches
Online Access:	http://hdl.handle.net/1721.1/35835

_version_	1826211583457492992
author	Rifkin, Ryan Bouvrie, Jake Schutte, Ken Chikkerur, Sharat Kouh, Minjoon Ezzat, Tony Poggio, Tomaso
author2	Tomaso Poggio
author_facet	Tomaso Poggio Rifkin, Ryan Bouvrie, Jake Schutte, Ken Chikkerur, Sharat Kouh, Minjoon Ezzat, Tony Poggio, Tomaso
author_sort	Rifkin, Ryan
collection	MIT
description	A preliminary set of experiments are described in which a biologically-inspired computer vision system (Serre, Wolf et al. 2005; Serre 2006; Serre, Oliva et al. 2006; Serre, Wolf et al. 2006) designed for visual object recognition was applied to the task of phonetic classification. During learning, the systemprocessed 2-D wideband magnitude spectrograms directly as images, producing a set of 2-D spectrotemporal patch dictionaries at different spectro-temporal positions, orientations, scales, and of varying complexity. During testing, features were computed by comparing the stored patches with patches fromnovel spectrograms. Classification was performed using a regularized least squares classifier (Rifkin, Yeo et al. 2003; Rifkin, Schutte et al. 2007) trained on the features computed by the system. On a 20-class TIMIT vowel classification task, the model features achieved a best result of 58.74% error, compared to 48.57% error using state-of-the-art MFCC-based features trained using the same classifier. This suggests that hierarchical, feed-forward, spectro-temporal patch-based architectures may be useful for phoneticanalysis.
first_indexed	2024-09-23T15:08:16Z
id	mit-1721.1/35835
institution	Massachusetts Institute of Technology
language	en_US
last_indexed	2024-09-23T15:08:16Z
publishDate	2007
record_format	dspace
spelling	mit-1721.1/358352019-04-10T09:58:53Z Phonetic Classification Using Hierarchical, Feed-forward, Spectro-temporal Patch-based Architectures Rifkin, Ryan Bouvrie, Jake Schutte, Ken Chikkerur, Sharat Kouh, Minjoon Ezzat, Tony Poggio, Tomaso Tomaso Poggio Center for Biological and Computational Learning (CBCL) phonetic classification hierarchical models regularized least-squares spectrotemporal patches A preliminary set of experiments are described in which a biologically-inspired computer vision system (Serre, Wolf et al. 2005; Serre 2006; Serre, Oliva et al. 2006; Serre, Wolf et al. 2006) designed for visual object recognition was applied to the task of phonetic classification. During learning, the systemprocessed 2-D wideband magnitude spectrograms directly as images, producing a set of 2-D spectrotemporal patch dictionaries at different spectro-temporal positions, orientations, scales, and of varying complexity. During testing, features were computed by comparing the stored patches with patches fromnovel spectrograms. Classification was performed using a regularized least squares classifier (Rifkin, Yeo et al. 2003; Rifkin, Schutte et al. 2007) trained on the features computed by the system. On a 20-class TIMIT vowel classification task, the model features achieved a best result of 58.74% error, compared to 48.57% error using state-of-the-art MFCC-based features trained using the same classifier. This suggests that hierarchical, feed-forward, spectro-temporal patch-based architectures may be useful for phoneticanalysis. 2007-02-01T18:26:47Z 2007-02-01T18:26:47Z 2007-02-01 MIT-CSAIL-TR-2007-007 CBCL-266 http://hdl.handle.net/1721.1/35835 en_US Massachusetts Institute of Technology Computer Science and Artificial Intelligence Laboratory http://hdl.handle.net/1721.1/36865 http://hdl.handle.net/1721.1/36865 16 p. 2265616 bytes 383591 bytes application/postscript application/pdf application/postscript application/pdf
spellingShingle	phonetic classification hierarchical models regularized least-squares spectrotemporal patches Rifkin, Ryan Bouvrie, Jake Schutte, Ken Chikkerur, Sharat Kouh, Minjoon Ezzat, Tony Poggio, Tomaso Phonetic Classification Using Hierarchical, Feed-forward, Spectro-temporal Patch-based Architectures
title	Phonetic Classification Using Hierarchical, Feed-forward, Spectro-temporal Patch-based Architectures
title_full	Phonetic Classification Using Hierarchical, Feed-forward, Spectro-temporal Patch-based Architectures
title_fullStr	Phonetic Classification Using Hierarchical, Feed-forward, Spectro-temporal Patch-based Architectures
title_full_unstemmed	Phonetic Classification Using Hierarchical, Feed-forward, Spectro-temporal Patch-based Architectures
title_short	Phonetic Classification Using Hierarchical, Feed-forward, Spectro-temporal Patch-based Architectures
title_sort	phonetic classification using hierarchical feed forward spectro temporal patch based architectures
topic	phonetic classification hierarchical models regularized least-squares spectrotemporal patches
url	http://hdl.handle.net/1721.1/35835
work_keys_str_mv	AT rifkinryan phoneticclassificationusinghierarchicalfeedforwardspectrotemporalpatchbasedarchitectures AT bouvriejake phoneticclassificationusinghierarchicalfeedforwardspectrotemporalpatchbasedarchitectures AT schutteken phoneticclassificationusinghierarchicalfeedforwardspectrotemporalpatchbasedarchitectures AT chikkerursharat phoneticclassificationusinghierarchicalfeedforwardspectrotemporalpatchbasedarchitectures AT kouhminjoon phoneticclassificationusinghierarchicalfeedforwardspectrotemporalpatchbasedarchitectures AT ezzattony phoneticclassificationusinghierarchicalfeedforwardspectrotemporalpatchbasedarchitectures AT poggiotomaso phoneticclassificationusinghierarchicalfeedforwardspectrotemporalpatchbasedarchitectures

Phonetic Classification Using Hierarchical, Feed-forward, Spectro-temporal Patch-based Architectures

Similar Items