Text this: Video description method based on multidimensional and multimodal information