Seol mar théacs é seo: Video understanding using multimodal deep learning