Text this: Latent 3D Volume for Joint Depth Estimation and Semantic Segmentation from a Single Image