Stav dette: End-to-end learning, and audio-visual human-centric video understanding