Text this: Weakly supervised scale-invariant learning of models for visual recognition