High-Performance Implementation and Optimization of Cooley-Tukey FFT Algorithm

The fast Fourier transform (FFT) algorithm is considered as an important element of the processor’s basic software ecology, and it is widely applied in the field of engineering, science, physics and mathematics. Meanwhile, the requirements for the performance of FFT in these applications...

Full description

Bibliographic Details
Main Author: GUO Jinxin, ZHANG Guangting, ZHANG Yunquan, CHEN Zehua, JIA Haipeng
Format: Article
Language:zho
Published: Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press 2022-06-01
Series:Jisuanji kexue yu tansuo
Subjects:
Online Access:http://fcst.ceaj.org/fileup/1673-9418/PDF/2011092.pdf
_version_ 1811241799448002560
author GUO Jinxin, ZHANG Guangting, ZHANG Yunquan, CHEN Zehua, JIA Haipeng
author_facet GUO Jinxin, ZHANG Guangting, ZHANG Yunquan, CHEN Zehua, JIA Haipeng
author_sort GUO Jinxin, ZHANG Guangting, ZHANG Yunquan, CHEN Zehua, JIA Haipeng
collection DOAJ
description The fast Fourier transform (FFT) algorithm is considered as an important element of the processor’s basic software ecology, and it is widely applied in the field of engineering, science, physics and mathematics. Meanwhile, the requirements for the performance of FFT in these applications are also continuously rising. Therefore, it is of definite significance to study the high-performance implementation of FFT algorithm, especially the high-performance implementation of large radices of FFT in ARMv8 and X86-64, and to improve the calculation performance of FFT algorithm. In view of the architectural features of the ARMv8 and X86-64 computing platforms, this paper studies the high-performance implementation and optimization methods of the FFT algorithm. Through the application of butterfly network optimization, large radices network stages decrease, large radices butterfly computation optimization, SIMD (single instruction multiple data) assembly optimization, and register usage optimization methods, this paper effectively improves the performance of the FFT algorithm, considerably improves the calculation performance of the large radices of FFT, and solves the performance bottlenecks of insufficiency of register resources. Lastly, the summary of a set of Cooley-Tukey FFT algorithm high-performance implementation strategies and optimization solutions is made. The experimental results indicate that for the ARM and X86-64 processors, the FFT algorithm implemented can achieve a significant improvement in performance compared with ARMPL (ARM performance library), Intel MKL (math kernel library) and FFTW (fastest Fourier transform in the West) and can achieve a significant improvement in performance compared with small and medium radices.
first_indexed 2024-04-12T13:41:40Z
format Article
id doaj.art-dedbd87ff16746849d61f47674267132
institution Directory Open Access Journal
issn 1673-9418
language zho
last_indexed 2024-04-12T13:41:40Z
publishDate 2022-06-01
publisher Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press
record_format Article
series Jisuanji kexue yu tansuo
spelling doaj.art-dedbd87ff16746849d61f476742671322022-12-22T03:30:50ZzhoJournal of Computer Engineering and Applications Beijing Co., Ltd., Science PressJisuanji kexue yu tansuo1673-94182022-06-011661304131510.3778/j.issn.1673-9418.2011092High-Performance Implementation and Optimization of Cooley-Tukey FFT AlgorithmGUO Jinxin, ZHANG Guangting, ZHANG Yunquan, CHEN Zehua, JIA Haipeng01. College of Data Science, Taiyuan University of Technology, Taiyuan 030024, China;2. State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, ChinaThe fast Fourier transform (FFT) algorithm is considered as an important element of the processor’s basic software ecology, and it is widely applied in the field of engineering, science, physics and mathematics. Meanwhile, the requirements for the performance of FFT in these applications are also continuously rising. Therefore, it is of definite significance to study the high-performance implementation of FFT algorithm, especially the high-performance implementation of large radices of FFT in ARMv8 and X86-64, and to improve the calculation performance of FFT algorithm. In view of the architectural features of the ARMv8 and X86-64 computing platforms, this paper studies the high-performance implementation and optimization methods of the FFT algorithm. Through the application of butterfly network optimization, large radices network stages decrease, large radices butterfly computation optimization, SIMD (single instruction multiple data) assembly optimization, and register usage optimization methods, this paper effectively improves the performance of the FFT algorithm, considerably improves the calculation performance of the large radices of FFT, and solves the performance bottlenecks of insufficiency of register resources. Lastly, the summary of a set of Cooley-Tukey FFT algorithm high-performance implementation strategies and optimization solutions is made. The experimental results indicate that for the ARM and X86-64 processors, the FFT algorithm implemented can achieve a significant improvement in performance compared with ARMPL (ARM performance library), Intel MKL (math kernel library) and FFTW (fastest Fourier transform in the West) and can achieve a significant improvement in performance compared with small and medium radices.http://fcst.ceaj.org/fileup/1673-9418/PDF/2011092.pdf|fast fourier transform (fft)|armv8|x86-64|fftw|simd optimization
spellingShingle GUO Jinxin, ZHANG Guangting, ZHANG Yunquan, CHEN Zehua, JIA Haipeng
High-Performance Implementation and Optimization of Cooley-Tukey FFT Algorithm
Jisuanji kexue yu tansuo
|fast fourier transform (fft)|armv8|x86-64|fftw|simd optimization
title High-Performance Implementation and Optimization of Cooley-Tukey FFT Algorithm
title_full High-Performance Implementation and Optimization of Cooley-Tukey FFT Algorithm
title_fullStr High-Performance Implementation and Optimization of Cooley-Tukey FFT Algorithm
title_full_unstemmed High-Performance Implementation and Optimization of Cooley-Tukey FFT Algorithm
title_short High-Performance Implementation and Optimization of Cooley-Tukey FFT Algorithm
title_sort high performance implementation and optimization of cooley tukey fft algorithm
topic |fast fourier transform (fft)|armv8|x86-64|fftw|simd optimization
url http://fcst.ceaj.org/fileup/1673-9418/PDF/2011092.pdf
work_keys_str_mv AT guojinxinzhangguangtingzhangyunquanchenzehuajiahaipeng highperformanceimplementationandoptimizationofcooleytukeyfftalgorithm