Text this: Auto‐encoder‐based shared mid‐level visual dictionary learning for scene classification using very high resolution remote sensing images