Summary: | We rethink on the contradiction between accuracy and efficiency in the field of video pose estimation. Large networks are typically exploited in previous methods to pursue superior pose estimation results. However, those methods can hardly meet the low-latency requirement for real-time applications because of their computationally expensive nature. We present a novel architecture, PosePropagation-Net (PPN), to generate poses across video frames accurately and efficiently. Instead of extracting temporal cues or knowledge someways to enforce geometric consistency as most of the previous methods do, we explicitly propagate well-estimated pose from the preceding frame to the current frame by leveraging pose propagation mechanism, endowing lightweight networks with the capability of performing accurate pose estimation in videos. The experiments on two large-scale benchmarks for video pose estimation show that our method significantly outperforms previous state-of-the-art methods in both accuracy and efficiency. Compared with the previous best method, our two representative configurations, PPN-Stable and PPN-Swift, achieve 2.5× and 6× FLOPs reduction respectively, as well as significant accuracy improvement.
|