Seol mar théacs é seo: Visual object segmentation based on temporal and linguistic cues