Multicore Processing and Efficient On-Chip Caching for H.264 and Future Video Decoders

Performance requirements for video decoding will continue to rise in the future due to the adoption of higher resolutions and faster frame rates. Multicore processing is an effective way to handle the resulting increase in computation. For power-constrained applications such as mobile devices, extra...

Full description

Bibliographic Details
Main Authors: Finchelstein, Daniel Frederic, Sze, Vivienne, Chandrakasan, Anantha P.
Other Authors: Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Format: Article
Language:en_US
Published: Institute of Electrical and Electronics Engineers 2010
Subjects:
Online Access:http://hdl.handle.net/1721.1/52412
https://orcid.org/0000-0002-5977-2748
https://orcid.org/0000-0003-4841-3990
_version_ 1826211810655600640
author Finchelstein, Daniel Frederic
Sze, Vivienne
Chandrakasan, Anantha P.
author2 Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
author_facet Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Finchelstein, Daniel Frederic
Sze, Vivienne
Chandrakasan, Anantha P.
author_sort Finchelstein, Daniel Frederic
collection MIT
description Performance requirements for video decoding will continue to rise in the future due to the adoption of higher resolutions and faster frame rates. Multicore processing is an effective way to handle the resulting increase in computation. For power-constrained applications such as mobile devices, extra performance can be traded-off for lower power consumption via voltage scaling. As memory power is a significant part of system power, it is also important to reduce unnecessary on-chip and off-chip memory accesses. This paper proposes several techniques that enable multiple parallel decoders to process a single video sequence; the paper also demonstrates several on-chip caching schemes. First, we describe techniques that can be applied to the existing H.264 standard, such as multiframe processing. Second, with an eye toward future video standards, we propose replacing the traditional raster-scan processing with an interleaved macroblock ordering; this can increase parallelism with minimal impact on coding efficiency and latency. The proposed architectures allow N parallel hardware decoders to achieve a speedup of up to a factor of N. For example, if N=3, the proposed multiple frame and interleaved entropy slice multicore processing techniques can achieve performance improvements of 2.64times and 2.91times, respectively. This extra hardware performance can be used to decode higher definition videos. Alternatively, it can be traded-off for dynamic power savings of 60% relative to a single nominal-voltage decoder. Finally, on-chip caching methods are presented that significantly reduce off-chip memory bandwidth, leading to a further increase in performance and energy efficiency. Data-forwarding caches can reduce off-chip memory reads by 53%, while using a last-frame cache can eliminate 80% of the off-chip reads. The proposed techniques were validated and benchmarked using full-system Verilog hardware simulations based on an existing decoder; they should- also be applicable to most other decoder architectures. The metrics used to evaluate the ideas in this paper are performance, power, area, memory efficiency, coding efficiency, and input latency.
first_indexed 2024-09-23T15:11:46Z
format Article
id mit-1721.1/52412
institution Massachusetts Institute of Technology
language en_US
last_indexed 2024-09-23T15:11:46Z
publishDate 2010
publisher Institute of Electrical and Electronics Engineers
record_format dspace
spelling mit-1721.1/524122022-09-29T13:17:46Z Multicore Processing and Efficient On-Chip Caching for H.264 and Future Video Decoders Finchelstein, Daniel Frederic Sze, Vivienne Chandrakasan, Anantha P. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology. Microsystems Technology Laboratories Chandrakasan, Anantha P. Sze, Vivienne Chandrakasan, Anantha P. video decoders parallelism multicore low-power H.264 Performance requirements for video decoding will continue to rise in the future due to the adoption of higher resolutions and faster frame rates. Multicore processing is an effective way to handle the resulting increase in computation. For power-constrained applications such as mobile devices, extra performance can be traded-off for lower power consumption via voltage scaling. As memory power is a significant part of system power, it is also important to reduce unnecessary on-chip and off-chip memory accesses. This paper proposes several techniques that enable multiple parallel decoders to process a single video sequence; the paper also demonstrates several on-chip caching schemes. First, we describe techniques that can be applied to the existing H.264 standard, such as multiframe processing. Second, with an eye toward future video standards, we propose replacing the traditional raster-scan processing with an interleaved macroblock ordering; this can increase parallelism with minimal impact on coding efficiency and latency. The proposed architectures allow N parallel hardware decoders to achieve a speedup of up to a factor of N. For example, if N=3, the proposed multiple frame and interleaved entropy slice multicore processing techniques can achieve performance improvements of 2.64times and 2.91times, respectively. This extra hardware performance can be used to decode higher definition videos. Alternatively, it can be traded-off for dynamic power savings of 60% relative to a single nominal-voltage decoder. Finally, on-chip caching methods are presented that significantly reduce off-chip memory bandwidth, leading to a further increase in performance and energy efficiency. Data-forwarding caches can reduce off-chip memory reads by 53%, while using a last-frame cache can eliminate 80% of the off-chip reads. The proposed techniques were validated and benchmarked using full-system Verilog hardware simulations based on an existing decoder; they should- also be applicable to most other decoder architectures. The metrics used to evaluate the ideas in this paper are performance, power, area, memory efficiency, coding efficiency, and input latency. Texas Instruments Incorporated Nokia Corporation IEEE Circuits and Systems Society 2010-03-09T14:57:26Z 2010-03-09T14:57:26Z 2009-10 2009-05 Article http://purl.org/eprint/type/JournalArticle http://hdl.handle.net/1721.1/52412 Finchelstein, D.F., V. Sze, and A.P. Chandrakasan. “Multicore Processing and Efficient On-Chip Caching for H.264 and Future Video Decoders.” Circuits and Systems for Video Technology, IEEE Transactions on 19.11 (2009): 1704-1713. © 2009 IEEE https://orcid.org/0000-0002-5977-2748 https://orcid.org/0000-0003-4841-3990 en_US http://dx.doi.org/10.1109/tcsvt.2009.2031459 IEEE Transactions on Circuits and Systems for Video Technology Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. application/pdf Institute of Electrical and Electronics Engineers IEEE
spellingShingle video decoders
parallelism
multicore
low-power
H.264
Finchelstein, Daniel Frederic
Sze, Vivienne
Chandrakasan, Anantha P.
Multicore Processing and Efficient On-Chip Caching for H.264 and Future Video Decoders
title Multicore Processing and Efficient On-Chip Caching for H.264 and Future Video Decoders
title_full Multicore Processing and Efficient On-Chip Caching for H.264 and Future Video Decoders
title_fullStr Multicore Processing and Efficient On-Chip Caching for H.264 and Future Video Decoders
title_full_unstemmed Multicore Processing and Efficient On-Chip Caching for H.264 and Future Video Decoders
title_short Multicore Processing and Efficient On-Chip Caching for H.264 and Future Video Decoders
title_sort multicore processing and efficient on chip caching for h 264 and future video decoders
topic video decoders
parallelism
multicore
low-power
H.264
url http://hdl.handle.net/1721.1/52412
https://orcid.org/0000-0002-5977-2748
https://orcid.org/0000-0003-4841-3990
work_keys_str_mv AT finchelsteindanielfrederic multicoreprocessingandefficientonchipcachingforh264andfuturevideodecoders
AT szevivienne multicoreprocessingandefficientonchipcachingforh264andfuturevideodecoders
AT chandrakasanananthap multicoreprocessingandefficientonchipcachingforh264andfuturevideodecoders