Text this: Human-like Attention-Driven Saliency Object Estimation in Dynamic Driving Scenes