Massively parallel video networks

We introduce a class of causal video understanding models that aims to improve efficiency of video processing by maximising throughput, minimising latency, and reducing the number of clock cycles. Leveraging operation pipelining and multi-rate clocks, these models perform a minimal amount of computa...

Full description

Bibliographic Details
Main Authors: Carreira, J, Patraucean, V, Mazare, L, Zisserman, A, Osindero, S
Format: Conference item
Language:English
Published: Springer, Cham 2018
_version_ 1797052392348319744
author Carreira, J
Patraucean, V
Mazare, L
Zisserman, A
Osindero, S
author_facet Carreira, J
Patraucean, V
Mazare, L
Zisserman, A
Osindero, S
author_sort Carreira, J
collection OXFORD
description We introduce a class of causal video understanding models that aims to improve efficiency of video processing by maximising throughput, minimising latency, and reducing the number of clock cycles. Leveraging operation pipelining and multi-rate clocks, these models perform a minimal amount of computation (e.g. as few as four convolutional layers) for each frame per timestep to produce an output. The models are still very deep, with dozens of such operations being performed but in a pipelined fashion that enables depth-parallel computation. We illustrate the proposed principles by applying them to existing image architectures and analyse their behaviour on two video tasks: action recognition and human keypoint localisation. The results show that a significant degree of parallelism, and implicitly speedup, can be achieved with little loss in performance.
first_indexed 2024-03-06T18:30:58Z
format Conference item
id oxford-uuid:09a63482-0b82-4764-a999-38158a8be875
institution University of Oxford
language English
last_indexed 2024-03-06T18:30:58Z
publishDate 2018
publisher Springer, Cham
record_format dspace
spelling oxford-uuid:09a63482-0b82-4764-a999-38158a8be8752022-03-26T09:19:28ZMassively parallel video networksConference itemhttp://purl.org/coar/resource_type/c_5794uuid:09a63482-0b82-4764-a999-38158a8be875EnglishSymplectic ElementsSpringer, Cham2018Carreira, JPatraucean, VMazare, LZisserman, AOsindero, SWe introduce a class of causal video understanding models that aims to improve efficiency of video processing by maximising throughput, minimising latency, and reducing the number of clock cycles. Leveraging operation pipelining and multi-rate clocks, these models perform a minimal amount of computation (e.g. as few as four convolutional layers) for each frame per timestep to produce an output. The models are still very deep, with dozens of such operations being performed but in a pipelined fashion that enables depth-parallel computation. We illustrate the proposed principles by applying them to existing image architectures and analyse their behaviour on two video tasks: action recognition and human keypoint localisation. The results show that a significant degree of parallelism, and implicitly speedup, can be achieved with little loss in performance.
spellingShingle Carreira, J
Patraucean, V
Mazare, L
Zisserman, A
Osindero, S
Massively parallel video networks
title Massively parallel video networks
title_full Massively parallel video networks
title_fullStr Massively parallel video networks
title_full_unstemmed Massively parallel video networks
title_short Massively parallel video networks
title_sort massively parallel video networks
work_keys_str_mv AT carreiraj massivelyparallelvideonetworks
AT patrauceanv massivelyparallelvideonetworks
AT mazarel massivelyparallelvideonetworks
AT zissermana massivelyparallelvideonetworks
AT osinderos massivelyparallelvideonetworks