Zaslat SMS: Multimodal learning with transformers: a survey