Training audio transformers for cover song identification
Abstract In the past decades, convolutional neural networks (CNNs) have been commonly adopted in audio perception tasks, which aim to learn latent representations. However, for audio analysis, CNNs may exhibit limitations in effectively modeling temporal contextual information. Analogous to the succ...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
SpringerOpen
2023-08-01
|
Series: | EURASIP Journal on Audio, Speech, and Music Processing |
Subjects: | |
Online Access: | https://doi.org/10.1186/s13636-023-00297-4 |