Text this: Extended Global–Local Representation Learning for Video Person Re-Identification