Text this: Object level grouping for video shots