A polyphase filter for many-core architectures

In this article we discuss our implementation of a polyphase filter for real-time data processing in radio astronomy. The polyphase filter is a standard tool in digital signal processing and as such a well established algorithm. We describe in detail our implementation of the polyphase filter algori...

Full description

Bibliographic Details
Main Authors:	Adámek, K, Novotný, J, Armour, W
Format:	Journal article
Published:	Elsevier 2016

_version_	1826285565233856512
author	Adámek, K Novotný, J Armour, W
author_facet	Adámek, K Novotný, J Armour, W
author_sort	Adámek, K
collection	OXFORD
description	In this article we discuss our implementation of a polyphase filter for real-time data processing in radio astronomy. The polyphase filter is a standard tool in digital signal processing and as such a well established algorithm. We describe in detail our implementation of the polyphase filter algorithm and its behaviour on three generations of NVIDIA GPU cards (Fermi, Kepler, Maxwell), on the Intel Xeon CPU and Xeon Phi (Knights Corner) platforms. All of our implementations aim to exploit the potential for data reuse that the algorithm offers. Our GPU implementations explore two different methods for achieving this, the first makes use of L1/Texture cache, the second uses shared memory. We discuss the usability of each of our implementations along with their behaviours. We measure performance in execution time, which is a critical factor for real-time systems, we also present results in terms of bandwidth (GB/s), compute (GFLOP/s/s) and type conversions (GTc/s). We include a presentation of our results in terms of the sample rate which can be processed in real-time by a chosen platform, which more intuitively describes the expected performance in a signal processing setting. Our findings show that, for the GPUs considered, the performance of our polyphase filter when using lower precision input data is limited by type conversions rather than device bandwidth. We compare these results to an implementation on the Xeon Phi. We show that our Xeon Phi implementation has a performance that is to greater than our CPU implementation, however is not insufficient to compete with the performance of GPUs. We conclude with a comparison of our best performing code to two other implementations of the polyphase filter, showing that our implementation is faster in nearly all cases. This work forms part of the Astro-Accelerate project, a many-core accelerated real-time data processing library for digital signal processing of time-domain radio astronomy data.
first_indexed	2024-03-07T01:30:44Z
format	Journal article
id	oxford-uuid:93828a71-ac3a-46b2-b85f-6d18b8bccbb6
institution	University of Oxford
last_indexed	2024-03-07T01:30:44Z
publishDate	2016
publisher	Elsevier
record_format	dspace
spelling	oxford-uuid:93828a71-ac3a-46b2-b85f-6d18b8bccbb62022-03-26T23:32:46ZA polyphase filter for many-core architecturesJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:93828a71-ac3a-46b2-b85f-6d18b8bccbb6Symplectic Elements at OxfordElsevier2016Adámek, KNovotný, JArmour, WIn this article we discuss our implementation of a polyphase filter for real-time data processing in radio astronomy. The polyphase filter is a standard tool in digital signal processing and as such a well established algorithm. We describe in detail our implementation of the polyphase filter algorithm and its behaviour on three generations of NVIDIA GPU cards (Fermi, Kepler, Maxwell), on the Intel Xeon CPU and Xeon Phi (Knights Corner) platforms. All of our implementations aim to exploit the potential for data reuse that the algorithm offers. Our GPU implementations explore two different methods for achieving this, the first makes use of L1/Texture cache, the second uses shared memory. We discuss the usability of each of our implementations along with their behaviours. We measure performance in execution time, which is a critical factor for real-time systems, we also present results in terms of bandwidth (GB/s), compute (GFLOP/s/s) and type conversions (GTc/s). We include a presentation of our results in terms of the sample rate which can be processed in real-time by a chosen platform, which more intuitively describes the expected performance in a signal processing setting. Our findings show that, for the GPUs considered, the performance of our polyphase filter when using lower precision input data is limited by type conversions rather than device bandwidth. We compare these results to an implementation on the Xeon Phi. We show that our Xeon Phi implementation has a performance that is to greater than our CPU implementation, however is not insufficient to compete with the performance of GPUs. We conclude with a comparison of our best performing code to two other implementations of the polyphase filter, showing that our implementation is faster in nearly all cases. This work forms part of the Astro-Accelerate project, a many-core accelerated real-time data processing library for digital signal processing of time-domain radio astronomy data.
spellingShingle	Adámek, K Novotný, J Armour, W A polyphase filter for many-core architectures
title	A polyphase filter for many-core architectures
title_full	A polyphase filter for many-core architectures
title_fullStr	A polyphase filter for many-core architectures
title_full_unstemmed	A polyphase filter for many-core architectures
title_short	A polyphase filter for many-core architectures
title_sort	polyphase filter for many core architectures
work_keys_str_mv	AT adamekk apolyphasefilterformanycorearchitectures AT novotnyj apolyphasefilterformanycorearchitectures AT armourw apolyphasefilterformanycorearchitectures AT adamekk polyphasefilterformanycorearchitectures AT novotnyj polyphasefilterformanycorearchitectures AT armourw polyphasefilterformanycorearchitectures

A polyphase filter for many-core architectures

Similar Items