Text this: Multi-Modal Deep Learning for Computer Vision and Its Application