Summary: | <p>Humans have long fantasised about the notion of a machine that sees and interprets a scene in ways that simulate the human visual system. As effortless as it appears to us, however, making sense of a scene is highly challenging for computers, due to the unbounded complexity and unlimited variations of a real-world scene. This thesis focuses on solving instance-aware pixel-level scene understanding with deep neural networks, which have potential applications in autonomous navigation, security systems, object counting, medical image analysis, amongst many others.</p>
<p>This thesis first proposes an instance-level human parsing network, which is capable of producing category-, instance-, and part-level segmentation of human in a single forward pass. The proposed network is trained end-to-end given detections, and exploits a differentiable conditional random field (CRF) defined over a dynamic number of part instances for every image. It is the first work to perform human parsing at an instance level and can be trivially adapted to parse other objects. At the time of publication, it achieves state-of-the-art performance on public leaderboards.</p>
<p>Prompted by the large amount of labour and cost required for annotating a pixel-level dataset, this thesis then seeks to reduce the reliance of instance-level scene parsing methods on fully labelled examples, by proposing a weakly-supervised training strategy that makes use of image-level tags and bounding boxes as supervision. Using the proposed weak supervision scheme leads to massive reductions in required annotation time per image and attains a high percentage of the performance achieved by fully supervised oracles. This thesis also presents the first weakly-supervised panoptic segmentation model.</p>
<p>Finally, realising that a CRF-based instance segmentation pipeline has several limitations in terms of optimisation and capacity, this thesis presents a new panoptic segmentation pipeline which exploits a fully connected -- and yet lightweight -- instance affinity term. Unlike the prior art, this approach can directly supervise panoptic segmentation outputs and train end-to-end, thanks to a differentiable mechanism for propagating panoptic logits according to predicted instance affinities. At the time of publication, this method also obtains state-of-the-art results on public leaderboards.</p>
|