Describir: Self-Supervised Transfer Learning from Natural Images for Sound Classification