Text this: Audiovisual Tracking of Multiple Speakers in Smart Spaces